A new experiment: Does AI really know cats or dogs -- or anything?

New Google research suggests what AI does to identify images may not amount to "knowing" anything.
Written by Tiernan Ray, Senior Contributing Writer

Most humans can recognize a cat or dog at an early age. Asked to articulate how they know if an animal is a cat or a dog, an adult might fumble for an explanation by describing experience, something such as "cats appraise you in a distant fashion, but dogs try to jump up on you and lick your face."

We don't really articulate what we know, in other words.

The signature achievement of artificial intelligence in the past two decades is classifying pictures of cats and dogs, among other things, by assigning them to categories. But AI programs never explain how they "know" what they supposedly "know." They are the proverbial black box. 

A recent innovation from Google's AI researchers seeks to provide an answer as to how AI knows such things. It's a fascinating engineering project, and it also makes one wonder whether AI really knows anything at all. 

In the research paper, "Explaining in Style: Training a GAN to explain a classifier in StyleSpace," Google research scientist Oran Lang and colleagues set out to reverse engineer what's called a classifier: a machine learning program that develops a capability to automatically sort pictures into categories, such as "cat" and "dog." 

Their program, called "StylEx," is able to identify things that make a classifier assign a given photo to one category or another, and it's in ways that a person could recognize -- semantically significant attributes, in other words. 

Their technique employs novel use of what's known as a generative adversarial network, or GAN. The GAN approach, originally introduced by AI scientist Yoshua Bengio and his team at MILA in Montreal in 2014, is a widely popular kind of program for producing fake images that emulate a given style or genre.

The GAN program the scientists use is one called StyleGAN, introduced by Tero Karras and colleagues at Nvidia in 2019, and updated in 2020 to StyleGAN2. It can take a variety of real photos of people, places and things, and produce convincing fakes in the manner of the original image, as you can see on the Web site "this person does not exist," a collection of strikingly realistic fake headshots.

Also: Ethics of AI: Benefits and risks of artificial intelligence

That's right, GANs spearhead the "deep fakes" phenomenon. The point is, GANs can produce any kind of image, including fake albeit realistic-looking cat and dog pictures.

The important thing is that StyleGAN2 can be shifted along any number of parameters, such as making a face have a lighter or darker complexion in a headshot of a person, or making a cat's ears floppy like a dog's or conversely a dog's eyes big and round like a cat's. StyleGAN 2, in other words, can play with what we would think are expected visual attributes of something.

The authors took StyleGAN2 and they built the new version, StylEX, to play with pictures that had already been classified by a traditional neural net classifier program, in this case, the venerable MobileNet neural network. That classifier classifies real images of animals from a dataset called "AFHQ," introduced by Yunjey Choi and colleagues at Clova AI Research in 2020

It is MobileNet that is the black box they are seeking to pry open using StylEx. 

To develop their program, Lang and team made StylEx essentially compress the images classified by MobileNet, and then decompress them into the original. Then the new, fake versions of the originals were fed back into MobileNet to see if MobileNet would assign the same category to the fakes.


The StylEx program produces fake images from a "generator" neural network. When they are fed to a "Classifier," the researchers measure whether the fakes can flip the classifier's categorization from the right category to the wrong one. Red outlines show fake images whose attributes have been selectively modified and the originals on which they were based, with percent score, inset, showing how much a given attribute influenced the categorization.

Google, 2022

Along the way, StylEx can adjust those parameters, those visual attributes that a GAN is able to manipulate, and see if doing so throws off MobileNet. StylEx tries to fool MobileNet, if you will.

The thinking is, if StylEx can create a convincing fake that MobileNet accepts, and then another fake that messes up MobileNet's careful categorization, then the interaction between the two brings an insight as to what it is that MobileNet is acting upon when it classifies images. 

As the authors put it, "We train our GAN to explicitly explain the classifier." ("Training" is the stage in which a neural net's functionality is first developed by exposure to sample data.) 

The compressed version of the picture, where StylEx is divining what it is that MobileNet will respond to, is part of StylEx's "StyleSpace," a "disentangled latent space" of a GAN. The latent space is the part of the GAN program that separates out the visual elements of a picture, such as the areas that make up the eyes of a cat or the tongue of a dog.

As StylEx is manipulated via sliders -- really, there are slider controls that can show things such as eyes being made larger or mouths being open or shut -- the MobileNet classifier responds with increasing or decreasing probability scores for cat or dog. In that way, StylEx is being used like a medical probe to see what happens to MobileNet's categorization. 

In other words, twiddle the knobs to show how the fakes differ from the real, and see what it does to classification.

Lang and colleagues call this "counterfactual" experimentation, which, as they write, is "a statement of the form 'Had the input x been x' then the classifier output would have been y ̃ instead of y', where the difference between x and x ̃ is easy to explain."

The notion, "Easy to explain" is the crux of it. The point of StylEx is to identify things in the StyleSpace that a human being could understand -- "big eyes" or "wagging tongue" -- as explanation. This is part of the movement in AI toward "explainable" AI, stuff that opens the black box and tells society how the machine arrives at decisions.

As the authors write, 

For instance, consider a classifier trained to distinguish between cat and dog images. A counterfactual explanation for an image classified as a cat could be "If the pupils were made larger, then the output of the classifier for the probability of cat would decrease by 10%." A key advantage of this approach is that it provides per-example explanations, pinpointing which parts of the input are salient towards the classification and also how they can be changed in order to obtain an alternative outcome. 

Lang and colleagues submitted their counterfactuals not just for cats and dogs but for a whole bunch of image classifications, including human age -- "old" versus "young" -- and human gender, to people working on the Amazon Mechanical Turk platform. They asked those people to describe the changes in the pictures from real to fake. "Users are then asked to describe in 1-4 words the single most prominent attribute they see changing in the image."

You can try it yourself by visiting the site Lang and team created, with examples of the manipulations.

Which brings us to the question: What does the MobileNet classifier know? Without getting into the epistemological question of what knowledge really is, the StylEx program shows that a MobileNet classifier will repeatedly respond to some elements of a picture that can be isolated, such as larger or smaller eyes of a cat or dog. 

In other words, a classifier neural net knows about the degree of some feature along a continuum of degrees, big or small, light or dark. 

But there's an extra wrinkle. As Lang and colleagues found, their StylEx program, when it creates fakes with altered features, is in each case affecting how the classifier handles specific images. That word specific is important, because it turns out that the MobileNet classifier is fairly specific in categorizing each image. 

As the authors write, "these are not the attributes with largest average effect across many images, but rather those that most affect this specific image." Every image, in other words, has a quirk, as far as the classifier is concerned. Or a set of quirks.

So, the classifier really seeks something particular that is emphasized in the case of each image, and the StylEx program is recreating that something. It's not clear that such image-specific aspects are knowledge. They can be thought of as artifacts that the classifier program has used to associate a given picture to a given category. It's also not clear there are commonalities that extend across images, the way you would expect if a program really knows a cat or a dog likeness. 

If knowledge, in practical terms, is about naming things, StylEx raises interesting questions. The act of naming, or classifying, imposes a binary choice, cat or dog, to which the program is responding by associating one or a group of elements in order to complete the test. Whether that is actually part of knowledge about a picture, in the deeper sense of that term, remains an open question.   

Editorial standards