Can AI put humans back in the loop?

Scientists at Germany's Technische Universität Darmstadt have developed a procedure for a human domain expert to look at the inner workings of an AI model as it is trained to solve a simple problem, and to correct where the machine goes wrong. Can it work for more complex problems?

What is AI safety? Tonya Hall sits down with Max Tegmark, scientist, author, and co-founder of the Future Of Life Institute, to determine what AI safety looks like and how one can achieve it.

Is it possible to make artificial intelligence more trustworthy by inserting a human being into the decision process of machine learning?

It may be, but you don't get something for nothing. That human being better be an individual who knows a lot about what the neural network is trying to figure out. And that presents a conundrum, given that one of the main promises of AI is precisely to find out things humans don't know. 

It's a conundrum that is sidestepped in a new bit of AI work by scientists at the Technische Universität Darmstadt in Germany. Lead author Patrick Schramowski and colleagues propose to have a human check on the explanations provided by a neural network. The idea is to extend what's been called "explainable AI" and "interpretable AI." They contend that it's not enough to have explanations of what a neural net is doing, the human should actually be intimately involved in fixing what goes wrong with a neural net. 

In so doing, Schramowski and colleagues hope humans will gain greater trust in machine learning. 

"[I]t is necessary to learn and explain interactively, for the user to understand and appropriately build trust in the model's decisions," write Schramowski and colleagues in Right for the Wrong Scientific Reasons: Revising Deep Networks by Interacting with their Explanations.

Their solution is "XIL," standing for "explanatory interactive learning," with the emphasis not only on providing explanations of machine behavior but also the exchange between person and machine.

Also: IBM offers explainable AI toolkit, but it's open to interpretation

The work is heavily inspired by the recent work of Sebastian Lapuschkin of the Fraunhofer Heinrich Hertz Institute in Berlin, who has written that neural networks can sometimes be like "Clever Hans." Clever Hans was a famous horse who dazzled the public in the early 1900s by seeming to be able to do arithmetic. 

Upon closer examination, it turned out Hans was merely responding to human gestures such as nods of the head. Lapuschkin argues despite the impressive quality of AI, sometimes it's merely exploiting data set particularities rather than really learning relevant representations of a problem. People, therefore, need to have a bit of caution about all the "excitement about machine intelligence."

Philosophically, the authors of the present work take a page from algorithm genius Don Knuth. They agree with Knuth that "Instead of imagining that our main task is to instruct a computer what to do," the goal is to "let us concentrate rather on explaining to human beings what we want a computer to do."

Schramowski and the team's experimental setup for XIL is to have a convolutional neural network solve a straightforward problem in classifying the phenotype of a plant as healthy or diseased. They have the convolutional net examine images leaves of the sugar beet plant, a staple crop around the world, for instances of disease. They then visualize what features the network was using, then they have an expert on plant biology correct where the neural network fell down. Proper learning should involve the net focusing only on dark patches on the leaves of the plant that indicate the disease "Cercospora Leaf Spot."

schramowski-xil-learning-outline-2020.png

Schramowski and colleagues at Germany's Technische Universität Darmstadt propose putting a human being back in the loop in AI by having a domain expert correct where a neural net goes wrong. 

Schramowski et al.

As Schramowski and colleagues put it, "In each step, the learner [the neural net] explains its interactive query to the domain expert, and she responds by correcting the explanations, if necessary, to provide feedback […] we let an expert revise the learning of the machine by constraining the machine's explanations to match domain knowledge."

Feedback, in this case, is formalized as an additional loss function added to the two normal loss functions of "cross-entropy" and "L2 regularization" that are usually employed in a neural net training session. That third loss function acts as a new constraint added to the convolutional neural network. 

Also: Google's DeepMind asks what it means for AI to fail

In the case of the convolutional net looking at beet leaves, the visualizations reveal that an uncorrected network sometimes looks at the wrong signals: It takes into account artifacts in the image that are not in the area of the leaf, such as the plate on which the leaf is lying. That's a mistake, an example of what you might call a naturally occurring adversarial example. Another way to call it is a "confounder," a variable that shouldn't be in the calculation. 

They create a binary mask of the contours of the beet leaf in each picture. Then it becomes very simple to use the loss function to penalize the neural net if it looks anywhere other than the area in the picture of the leaf. They find that accuracy can improve in some cases, but, more important, it would appear the accuracy is now based on the right signals, so it's more trustworthy. 

schramowski-xil-diagram-2020.png

On the right, an example of heat maps created by the Grad-CAMs technology to expose what features a neural net is focusing on, and on the left, a clustering of the  solution strategies the neural net pursues.

Schramowski et al.

Schramowski and colleagues are building on a lot of prior knowledge in explainable AI. From Lapuschkin and his colleagues, they adopt SpRAy, or "spectral relevance analysis," a program that creates heat maps of what convolutional nets are "seeing" based on activations of the neurons at different layers. SpRAy is itself developed from a visualization technique created by Bolei Zhou and colleagues at MIT in 2015 called "class activation maps," and refined in 2017 by Ramprasaath R. Selvaraju and colleagues at Georgia Institute of Technology in the form of what is called "Grad-CAMs." Grad-CAMs allow one to create a heat map of a particular feature as it flows through the network from beginning to end.

The authors write in their conclusion that they hope to bring this interactive element to lots of other forms of explainable or interpretable AI, such as "Coactive Learning," developed in 2015 by scientists at LinkedIn and Cornell University, and "human-guided probabilistic learning," developed in 2018 by scientists at Georgia Tech and UT Dallas.

What's left open by the end of the article is whether any of this approach is applicable beyond very simple supervised classifiers of the kind they've demonstrated. Given that feature discovery in deep learning is supposed to find things that might very well be unknown to a human, it's not clear how a human could step into the loop to correct when the machine messes up if the human's own domain knowledge is presumably being surpassed in some sense. 

But that's not to say it can't happen. At least, there is a framework in the work of Schramowski and colleagues for how a person and machine can interact, and there are "desiderata," things to strive for. Now it remains to be seen how broadly applicable it can be.