Hacking neural networks

Anything that can be hacked, will be hacked — including neural networks. Here's what researchers have learned about surprising artificial intelligence behavior.

Modern neural networks have achieved startling success in image and speech recognition — think Siri and Google Voice Search — using layers of simpler feature analyzers to break the problem down.

These loosely networked layers give the techniques their power, but also give rise to counter-intuitive behaviors.

In the recent paper Intriguing properties of neural networks by Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow and Rob Fergus found that artificial intelligence — like human intelligence — has some surprising behaviors. They looked at two areas — semantic analysis and image classification — but I'll focus on the latter.

Specifically, they found that in a state-of-the-art image recognition neural network that should robustly analyze slightly different images, that by

"...applying an imperceptible non-random perturbation to a test image, it is possible to arbitrarily change the network’s prediction. These perturbations are found by optimizing the input to maximize the prediction error. We term the so perturbed examples 'adversarial examples'."

These adversarial examples work across differently configured neural networks, even those trained on different images, suggesting that even AI systems have blind spots analogous to those of human intelligence.

The paper's examples of some adversarial images document their use of the term "imperceptible". Here's an example that shows (left) a correctly predicted image, (right) an adversarial, incorrectly identified image, and (center) a 10x magnification of the differences between the two:

Adversarial image example.

Imperceptible indeed!

The authors conclude:

"...if the network can generalize well, how can it be confused by these adversarial negatives, which are indistinguishable from the regular examples? The explanation is that the set of adversarial negatives is of extremely low probability, and thus is never (or rarely) observed in the test set, yet it is dense (much like the rational numbers, and so it is found near every virtually every test case."

The Storage Bits take

Images — and moving images — are a massive part of mankind's stored heritage. AI systems that scan and "recognize" their salient features vastly increase their searchability.

These systems also have application in self-driving vehicles, where adversarial images might have dangerous effects. While a remote possibility today, we have to consider how criminals, corporations, and national security agencies might take advantage of these counter-intuitive results to hack our digital world.