Inside the black box: Understanding AI decision-making

Artificial intelligence algorithms are increasingly influential in peoples' lives, but their inner workings are often opaque. We examine why, and explore what's being done about it.
Written by Charles McLellan, Senior Editor
Image: Getty Images/iStockphoto

Neural networks, machine-learning systems, predictive analytics, speech recognition, natural-language understanding and other components of what's broadly defined as 'artificial intelligence' (AI) are currently undergoing a boom: research is progressing apace, media attention is at an all-time high, and organisations are increasingly implementing AI solutions in pursuit of automation-driven efficiencies.

The first thing to establish is what we're not talking about, which is human-level AI -- often termed 'strong AI' or 'artificial general intelligence' (AGI). A survey conducted among four groups of experts in 2012/13 by AI researchers Vincent C. Müller and Nick Bostrom reported a 50 percent chance that AGI would be developed between 2040 and 2050, rising to 90 percent by 2075; so-called 'superintelligence' -- which Bostrom defines as "any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest" -- was expected some 30 years after the achievement of AGI (Fundamental Issues of Artificial Intelligence, Chapter 33). This stuff will happen, and it certainly needs careful consideration, but it's not happening right now.

What is happening right now, at an increasing pace, is the application of AI algorithms to all manner of processes that can significantly affect peoples' lives -- at work, at home and as they travel around. Although hype around these technologies is approaching the 'peak of expectation' (sensu Gartner), there's a potential fly in the AI ointment: the workings of many of these algorithms are not open to scrutiny -- either because they are the proprietary assets of an organisation or because they are opaque by their very nature.

If not properly addressed, such concerns could help to turn overhyped expectations for AI into a backlash (Gartner's 'trough of disillusionment').


Many AI-related technologies are approaching, or have already reached, the 'peak of inflated expectations' in Gartner's Hype Cycle, with the backlash-driven 'trough of disillusionment' lying in wait.

Image: Gartner / Annotations: ZDNet

Here's an example: in May this year, COMPAS, a proprietary risk assessment algorithm that's widely used to decide on the freedom or incarceration of defendants passing through the US criminal justice system was alleged by online investigative journalism site ProPublica to be systematically biased against African Americans compared to whites. Although Northpointe (the for-profit company behind COMPAS) disputed ProPublica's statistical analysis, generating further controversy, the widespread use of closely-guarded proprietary algorithms in sensitive areas such as criminal justice is a cause for concern at the very least.

Sometimes, bias can be introduced via the data on which neural network-based algorithms are trained. In July this year, for example, Rachael Tatman, a National Science Foundation Graduate Research Fellow in the Linguistics Department at the University of Washington, found that Google's speech recognition system performed better for male voices than female ones when auto-captioning a sample of YouTube videos, a result she ascribed to 'unbalanced training sets' with a preponderance of male speakers. As Tatman noted, a few incorrect YouTube captions aren't going to cause any harm, but similar speech recognition biases in medical or connected-car applications, for example, would be another matter altogether.


Although AI is often equated with 'deep learning' neural networks, the artificial intelligence ecosystem encompasses many types of algorithm.

Image: Narrative Science

Neural networks as 'black boxes'

Neural networks are a particular concern not only because they are a key component of many AI applications -- including image recognition, speech recognition, natural language understanding and machine translation -- but also because they're something of a 'black box' when it comes to elucidating exactly how their results are generated.

Neural networks are so-called because they mimic, to a degree, the way the human brain is structured: they're built from layers of interconnected, neuron-like, nodes and comprise an input layer, an output layer and a variable number of intermediate 'hidden' layers -- 'deep' neural nets merely have more than one hidden layer. The nodes themselves carry out relatively simple mathematical operations, but between them, after training, they can process previously unseen data and generate correct results based on what was learned from the training data.


The structure and training of deep neural networks.

Image: Nuance

Key to the training is a process called 'back propagation', in which labelled examples are fed into the system and intermediate-layer settings are progressively modified until the output layer provides an optimal match to the input layer.

It's one thing to create a model that gives accurate results with previously unseen data, but -- as discussed earlier -- in many real-world applications it will be desirable to examine the internal decision-making process in detail.

Nils Lenke, Senior Director, Corporate Research at Nuance, acknowledges the problem: "It's a very interesting and relevant topic, because compared to, say, rule-based systems, neural networks or other machine-learning algorithms are not that transparent. It's not always clear what happens inside -- you let the network organise itself, but that really means it does organise itself: it doesn't necessarily tell you how it did it."

Peering inside the black box

This 'black box' problem was addressed in a recent paper from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), which examined neural networks trained on text-based data using a system comprising two modules -- a 'generator' and an 'encoder'. The generator extracts key segments of text from the training data, giving high scores to short, coherent strings; these are then passed to the extractor, which performs the classification task. The goal is to maximise both the generator scores and the accuracy of the extractor predictions.

To assess how well this system works, one of the training datasets the researchers used was a set of around 1,500 reviews from a website devoted to beer. A thousand or so of these reviews had been annotated by hand to indicate the correspondence between particular sentences and reviewer scores (from 1-5) for appearance, smell and palate. If the generator/extractor neural network managed to pinpoint the same sentences and correlate them with the same reviewer ratings, then it would be exercising human-like judgement.

The results were impressive, with the neural network showing high levels of agreement with the human annotators on appearance (96.3%) and smell (95.1%), although it was slightly less sure-footed on the tougher concept of palate (80.2%).

According to MIT the researchers have applied their rationale-extraction method to medical data, both text-based (pathology reports on breast biopsies) and image-based (mammograms), although no published report on this work is available yet.

A helping human hand

These are encouraging developments, but what to do if a current AI system can't be trusted to make important decisions on its own?

Nuance's Nils Lenke outlines the options: "The first thing you need for more specific cases is a confidence measure, so not only do you get a result from the neural network, but you also get an understanding of how confident it is that it has the right result. That can help you make decisions -- do you need additional evidence, do you need a human being to look into the result, can you take it at face value?"

"Then you need to look at the tasks at hand," Lenke continues. "For some, it's not really critical if you don't fully understand what happens, or even if the network is wrong. A system that suggests music, for example: all that can go wrong is, you listen to boring piece of music. But with applications like enterprise customer service, where transactions are involved, or computer-assisted clinical documentation improvement, what we typically do there is, we don't put the AI in isolation, but we have it co-work with a human being."


A human-assisted virtual assistant (HAVA) deployed in an enterprise customer service application.

Image: Nuance

"In the customer-care arena we call that HAVA, or the Human-Assisted Virtual Assistant," explains Lenke. "The interesting thing here is, we have something called 'passage retrieval': say the customer asks a question, either via speech recognition or typed input from a chat or web interface; then the virtual assistant goes through its facts and data -- which may be a collection of manuals and documents provided by the company -- and finds relevant passages, which it presents to the human agent, who makes the final call. It's more efficient, because the AI presents the relevant information to him or her."

"I think you can see from Microsoft's experience with its chat bot that putting the AI in a mode where it's not supervised may bear risks," Lenke adds. "That's why we believe this curated way, where a human looks at the material and has the final call, is the right way to do it for critical applications."

Ethics and AI

Many people -- including Stephen Hawking, Elon Musk and leading AI researchers -- have expressed concerns about how AI might develop, leading to the creation of organisations like Open AI and Partnership on AI aimed at avoiding potential pitfalls.

The goal of Open AI, founded in December 2015 and co-chaired by Elon Musk and Sam Altman, is "to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return."

Partnership on AI -- announced in September 2016 with founding members Amazon, Facebook, Google, IBM and Microsoft -- seeks to support research and recommend best practices, advance public understanding and awareness of AI, and create an open platform for discussion and engagement.

Most recently, Carnegie Mellon University announced a $10 million gift from a leading law firm (K&L Gates) to study ethical and policy issues surrounding artificial intelligence and other computing technologies.

A perfect example of why the ethics of AI need monitoring came in a recent paper entitled Automated Inference on Criminality using Face Images by two researchers from Shanghai Jiao Tong University. In a disturbing echo of long-discredited attempts to correlate physiognomy with the propensity for criminality, Xiaolin Wu and Xi Zhang built four classifiers -- including a convolutional neural network -- using "facial images of 1,856 real persons controlled for race, gender, age and facial expressions, nearly half of whom were convicted criminals". The authors claim that "All four classifiers perform consistently well and produce evidence for the validity of automated face-induced inference on criminality, despite the historical controversy surrounding the topic", adding that they found "some discriminating structural features for predicting criminality, such as lip curvature, eye inner corner distance, and the so-called nose-mouth angle."

This paper is on the arXiv pre-print server and has not been peer-reviewed, but, speaking to the BBC, Susan McVie, Professor of Quantitative Criminology at the University of Edinburgh, noted that "What this research may be picking up on is stereotypes that lead to people being picked up by the criminal justice system, rather than the likelihood of somebody offending...There is no theoretical reason that the way somebody looks should make them a criminal."

Any AI-driven resurgence of the idea that criminality can be inferred from facial images would be particularly unhelpful, given the current political climate on both sides of the Atlantic.

AI implementation in the enterprise

AI is clearly a developing field, but that hasn't stopped organisations forging ahead and implementing it -- even if they're often not fully aware they have done so. In July this year, Narrative Science, which develops advanced natural-language-generation (NLG) systems, presented the results of a survey of 235 business executives covering the deployment of AI-powered applications within their organisations. Headline findings from Outlook on Artificial Intelligence in the Enterprise 2016 were:

AI adoption is imminent, despite marketplace confusion: although only 38 percent of the survey group confirmed they were using AI, 88 percent of the remainder actually did use AI technologies such as predictive analytics, automated written reporting and communications, and voice recognition/response.

Predictive analytics is dominating the enterprise: 58 percent of respondents used data mining, statistics, modelling and machine learning to analyse current data and make predictions; in second place, at about 25 percent, was automated written reporting and/or communications and voice recognition/response.

The shortage of data science talent continues to affect organisations: 59 percent of respondents named 'shortage of data science talent' as the primary barrier to realising value from their big data technologies. Almost all of the respondents (95%) who indicated they were skilled at using big data to solve business problems or generate insights also used AI technologies.

Companies that generate the most value from their technology investments make innovation a priority: 61 percent of respondents who had an innovation strategy used AI to identify opportunities in data that would be otherwise missed, compared to only 22 percent of respondents without such a strategy.

There are certainly more companies involved in AI than ever before, and also an emerging 'technology stack', as this recent landscape infographic from Bloomberg Beta makes clear:

Image: Bloomberg Beta

In their analysis, Bloomberg Beta's Shivon Zilis and James Cham note that the version 3.0 landscape contains a third more companies than the first one two years ago, and that "it feels even more futile to try to be comprehensive, since this just scratches the surface of all of the activity out there." This is to be expected in a technology area that's racing to the peak of the hype cycle, and there will be plenty more startups and M&A activity as the market matures. But which AI startups will prosper? According to the Bloomberg Beta authors, "Companies we see successfully entering a long-term trajectory can package their technology as a new problem-specific application for enterprise or simply transform an industry themselves as a new entrant."


In the near term, how is AI likely to progress?

"There will be more variants of neural networks, and people will pay more attention to what actually happens during the processing," says Nuance's Nils Lenke. "You'll want to visualise what happens on the layers and how they engage with the data, and make it more transparent which piece of the evidence led to which decision, so that the network not only produces a result, but also points out the evidence and the reasoning process."

Lenke also emphasises that AI does not always mean neural networks: "We also do AI based on knowledge representation and rule-based systems, and for some critical things it may be better to go with rule-based systems where you're in full control of which rules are there and which are not there. You can have that in your toolbox for things where it makes sense, where rules can easily be codified by a human."

AI is becoming relatively straightforward to implement, with data, algorithms and computing resources all increasingly available. But there's always the human factor to consider: humans can ask the wrong questions, use flawed training data, and accept output from algorithms without inquiring into its provenance.

Should we fear superintelligent AI? Maybe, in due course. But more pressingly, we should pay attention to what people might do with today's AI technology. Or as Bloomberg Beta's Zilis and Cham put it: "In the next few years, the danger here isn't what we see in dystopian sci-fi movies. The real danger of machine intelligence is that executives will make bad decisions about what machine intelligence capabilities to build."

Editorial standards