The Plant Identification Battle: GPT-4 vs.

Ondřej Veselý
March 21, 2024
7 min read
Share this post
How we did it?

We randomly selected 50 photos submitted through the API that were later confidently identified by botanical experts.

We have submitted the photos through:

  1. 3.1.0. model from the API,
  2. multimodal GPT-4 Turbo model through the OpenAI API. 

In GPT-4, the following prompt was used to obtain the three best suggestions:

Can you tell me the name of this plant? Give me your three best ideas as scientific names.

Here is an example of an answer:

Based on the image provided, the plant appears to be a type of variegated foliage plant commonly used as an indoor ornamental. Here are three possible scientific names for plants with a similar appearance:

1. Dracaena fragrans 'Lemon Lime' – A member of the Dracaena genus with variegated leaves that can resemble the plant in the photo.

2. Chlorophytum comosum 'Variegatum' – Also known as the variegated spider plant, which has striped leaves that could be similar to what is shown in the image.

3. Aglaonema commutatum – Recognized for its variegated leaves, which might look like the plant in the picture.

Without a more precise view or additional information about the plant's characteristics, such as flowers or growth habit, it's difficult to determine the exact species just from the image.

We evaluated whether the correct identification was in the first suggestion (top1) or in one of the first three suggestions (top3). We evaluated matches at the species level (e.g. Chlorophytum comosum) and the genus level (e.g. Chlorophytum).

What have we found?

On the species level, had almost five times less misidentifications (12%) than GPT-4 (58%) for top1 suggestions. For top3 suggestions, the difference was even greater, with giving 4% wrong answers and GPT-4 giving 36% wrong answers. 

The overall error rate was about twice as low on the genus level but the difference in performance between the models was similar: for top1 suggestions, was wrong about 4x less often than GPT-4, with 8% and 36% of wrong answers, respectively; for top3 suggestions, it was 10x less often, with 2% and 20%, respectively.

Although GPT-4 Turbo can be an interesting solution for people who want to identify plants, the results show that narrowly focused models such as are still superior in terms of identification accuracy.

The results, including the photos and GPT responses, can be found in this spreadsheet.

Proportion of wrong answers (the smaller the column, the better the result).

Methods details

Plant images were collected from February 16 to February 25, 2024. Genus and species names were considered, plant varieties (if present) were ignored for the evaluation. If a synonymous name was given, it was considered a match (e.g. the genera Schefflera and Plerandra). If a plant was identified only at the genus level (such as Rosa sp. or Pinus sp.; 13 cases), the response from the compared models was considered a match if the species level was provided (such as Rosa chinensis or Pinus halepensis). In three cases, GPT-4 refused to give the requested answer; these cases were considered as wrong answers.

Most of the selected plants (60%) were labeled as houseplants by an expert, reflecting the typical traffic of plant identification apps during winter in the northern hemisphere.

Most of the selected plants (60%) were labeled as houseplants by an expert, reflecting the typical traffic of plant identification apps during winter in the northern hemisphere.

Share this post
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.