The singular example of AI's progress in the last several years is how well computers can recognize something in a picture. Still, even simple tests can show how brittle such abilities really are.
The latest trick to game the system comes courtesy of researchers at Auburn University in Auburn, Ala., and media titan Adobe Systems. In a paper released this week, they showed that top image-recognition neural networks easily fail if objects are moved or rotated even by slight amounts.
A fire truck, for example, seen from head on, could be correctly recognized. But once pointed up in the air, and turned around several times, the same fire truck is mis-classified by the neural net as a school bus or a fireboat or a bobsled.
While previous studies have shown images can be modified in terms of texture or lighting to fool a neural net, this is the first time the "pose," meaning the 3D orientation of an object, has been manipulated to generate image samples that could trip-up the network.
The upshot is that the state of the art in image recognition is "naive," and some greater understanding of three-dimensional structures seems needed to help them get better.
State of the art neural networks such as Google's Inception are good at "classifying" things in pictures, they conclude, but they are not really recognizing objects, in the true sense of that expression.
The paper, Strike (with) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects, is posted on the arXiv pre-print server, and is authored by Michael A. Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Wei-Shinn Ku, and Anh Nguyen of Auburn, and Long Mai of Adobe. (Adobe also contributed financial support to the effort.)
The authors have also posted the code they used to do it on GitHub.
The researchers purchased a data set of 100 three-dimensional computer-rendered objects that are smilier to things found in the ImageNet database used to train neural networks for image recognition. That means vehicles such as school buses and fire engines, and stop signs and benches and dogs.
They then modified those 3D objects by changing the pitch, yaw and roll of the objects. They used a procedure called "random search" to find poses that could fool Google's state-of-the-art "Inception v.3" network. Essentially, they were training a set of equations to get good at generating "adversarial examples" of the pictures, kind of pitting one neural network against another.
In their tests, Inception "was wrong for the vast majority of samples," they write. "The median percent of correct classifications for all 30 objects was only 3.09 percent."
The essential point is that Google's Inception didn't actually mis-characterize all images of an object -- some images generated by the system it got right. But it tended to be very narrow in what it got, getting confused by poses that were outside the norm.
"The DNN's ability to recognize an object (e.g., a fire truck) in an image varies radically as the object is rotated in the world," they write.
The authors found just a little bit of modification could have a big impact: a difference of moving the object 8 degrees in terms of the "pitch" and 9.17 in terms of the "roll" in a scene was enough to confound Inception. The adversarial neural net was able to send Inception off into the weeds, forcing it to mis-classify things across 797 different object classes.
The result is that Inception and other image recognition systems like aren't really recognizing objects, per se. "In sum, our work shows that state-of-the-art DNNs per- form image classification well but are still far from true object recognition," they write. But even object detection can be fooled, they found. The authors then used their adversarial system to take on the top-of-the-line "Yolo v3" objet recognition system. They found 75.5 percent of the images that beat Inception also fooled Yolo.
It's clear neural networks might need some substantial help to move forward. Even when the "AlexNet" neural network was re-trained, with the adversarial images included in the ImageNet database, it was still fooled when presented with new examples of adversarial images after the training.
The authors suggest that some of the problem may have to do with a certain aesthetic in the images found on the Internet that are used in training neural networks.
"Because ImageNet and MS COCO datasets are con- structed from photographs taken by people, the datasets reflect the aesthetic tendencies of their captors," they write.
The authors suggest that one solution is to load up ImageNet with lots of adversarial examples. But they point out that "acquring a large-scale, high-quality 3D object dataset is costly and labor intensive."
Another answer they propose is to use "geometric priors" to give neural nets greater sophistication.
The study, they offer, may be the beginning of creating entire "adversarial worlds" that could test deep learning systems. That kind of work could "serve as an interpretability tool for extracting useful insights about these black-box models' inner functions."
Previous and related coverage:
An executive guide to artificial intelligence, from machine learning and general AI to neural networks.
The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.
This guide explains what machine learning is, how it is related to artificial intelligence, how it works and why it matters.
An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud.