Sometimes recognition software is excellent at correctly categorizing certain types of images but totally fails with others.
Some image recognition engines prefer cats over dogs, and some are far more descriptive with their color knowledge.
But which is the best overall?
Perficient Digital's image recognition accuracy study looked at image recognition -- one of the hottest areas of machine learning.
It looked at Amazon AWS Rekognition, Google Vision, IBM Watson, and Microsoft Azure Computer Vision to compare images.
Three users hand-tagged 2,000 images across four categories for comparison: Charts, Landscapes, People, and Products.
The research team used two different measures to evaluate each engine: Accuracy evaluation (for 500 images), a measure of the accuracy of each tag supplied from the image recognition engine; also, matching human descriptions (for 2,000 images) to determine how the tags supplied by the image recognition engine stacked up against how a human would describe each image.
Across the 500 images in the accuracy evaluation part of the study, each tag from the image recognition engines was evaluated based on whether ir was accurate -- either "yes," "no," or "I'm not sure." Only 1.2% of the tags were marked as "not sure."
A tag would be marked as accurate even if it contained a tag that a human was unlikely to use to describe the image.
On a pure accuracy basis, three out of the four engines -- Amazon, Google Vision, and Microsoft Azure Computer Vision -- scored higher than human tagging for tags with greater than 90% confidence.
In the analysis, Google is the clear winner across all categories, with Amazon AWS Rekogniton in second place. For accuracy, three out of the four engines scored higher than human tags when the engine tags had a confidence level of 90% or greater.
The analysis also analyzed how well the engine-generated descriptions matched up with the way that users would describe the image. Unfortunately, this did not perform very well at all.
Language analysis was performed on each of the engines to see if any of the recognition engines had a bias. Not surprisingly, Amazon was strongly biased toward products.
Human hand-tagged images core far higher than any of the engines. There is a clear difference between a tag being accurate and what a human would use to describe an image.
Interestingly, IBM Watson loves colors, coming up with the most color descriptions compared to the others, using words like steel blue, blue, electric blue, and purplish-blue. Microsoft Azure Computer Vision could describe image quality, such as blur and blurry.
IBM Watson loves highly descriptive words such as oxbow (river), arabesque (ornament), and alpenstock (climbing equipment). Amazon AWS Rekognition loves clothing, recognizing shorts, pants, and shirts more than other APIs.
Google Vision loves cat breeds, and IBM Watson recognized more dog breeds than the other engines.
Try the Smart Images AI Evaluator and upload some of your images to see how you fare. This resource, developed by Perficient Digital, compares how the image recognition engines from Adobe, Google, IBM, and Microsoft tag that image. You might be surprised at how good the results are.
Previous and related coverage:
Virtual assistants are becoming ubiquitous in our work and home lives. Many of us own at least one personal assistant -- either Siri on iPhone or Cortana on Windows 10 -- but which one is better and more accurate at responding to our requests?
You may not be as frustrated with your digital assistant as you used to be. A new study shows that these assistants are improving year on year.
Stone Temple has been asking Google Home and Amazon Alexa a lot of questions to find out which voice-activated device gives the most correct answers.