What is ‘bias’ in AI really, and why can’t AI neutralize it?

Selection algorithms everywhere are exhibiting traits that appear to be racist, sexist, and otherwise discriminatory. Have neural networks already developed their own neuropathy? Or are people somehow the problem?
Written by Scott Fulton III, Contributing Editor on

Suppose a CCTV camera were to spot your face in a crowd outside a sports stadium. In a data center somewhere on the planet, an artificial neural network analyzes images from the CCTV footage frame-by-frame. How confident are you right now that this algorithm can exclude your face from a set of mug shots in an Interpol wanted list?

If the police were to call you aside for questioning, inform you exactly what the algorithm inferred from the image, and tell you they had reason to detain you, how would you defend yourself? Could you claim it's unfair for anyone or anything to presume to know what a terrorist typically looks like? Would you accuse some developer or operator of discrimination against people with your skin tone or gender? Or would you instead opt for a more technical assertion -- for instance, that these algorithms tend to be wrong? Which of these assertions would have the greatest likelihood of getting you home by the end of the day?

Bias cognition


The biggest problem with machine learning systems is that we ourselves don't quite understand everything they're supposedly learning, nor are we certain they're learning everything they should or could be. We've created systems that draw mostly, though never entirely, correct inferences from ordinary data, by way of logic that is by no means obvious.

One of the things that human beings tend to spend a lot of their time doing is determining whether a causal relationship or a correlation exists between two series of events. For instance, the moon relative to Earth directly correlates with the level of the ocean tides. If the relationship between these two series over time were plotted on an x/y chart, the points would appear to fall on a sinusoidal curve. It's not too difficult for someone to write a function that describes such a curve.

The whole point of machine learning is to infer the relationships between objects when, unlike the tides, it isn't already clear to human beings what those relationships are. Machine learning is put to use when linear regression or best-fit curves are insufficient -- when math can't explain the relationship. But perhaps that should have been our first clue: If no mathematical correlation exists, then shouldn't any other kind of relationship we can extrapolate be naturally weaker? Does a relationship exist, for instance, between a certain tech journalist with a goatee and any recorded inferences from suspected, goatee-wearing watch-list terrorists? And if there does exist such a relationship, should it?

Bias, at least in everyday discussion, is exemplified by evidence of a relationship where there shouldn't be one -- for example, the annual divorce rate in Ohio and the national consumption rate per capita of margarine. If bias is endemic, then by definition it must be a pattern. And neural networks are supposed to be good at detecting patterns.


Prof. Vijay Janapa Reddi

Harvard University

"I tend to think of bias very much as what the model has been taught," explained Vijay Janapa Reddi, Associate Professor at the John A. Paulson School of Engineering at Harvard University. "So if it only sees one set of demographics when it's getting taught, then naturally when you ask it for responses, it's going to base all of its responses on whatever it's been given in the past."

A convolutional neural network, which carries the delightful abbreviation CNN, is a type of learning system that builds an image in memory that incorporates aspects of all the data it's been given. So if a CNN is taught to recognize a printed or handwritten character of text, it's because it's seen several examples of every such character, and has built up a "learned" image of each one that has its basic features.

If a CNN model is trained with a variety of human faces, it will have built an amalgam of those faces in its model -- perhaps not necessarily a photograph, but a series of functions that represents the basic geometry of a face, such as the angle between the tip of the ear, the top of the cheekbone, and the tip of the nose. Train that model with a series of faces of known terrorists, and the model should build some basic construct that answers the question, "What does a terrorist look like?" There's no way a person could rationally respond to that question in a manner that some, if not most, listeners would consider unbiased. And if a process were capable of rendering that average terrorist's face in living color, someone, somewhere would be rightfully enraged.

But think of this problem from the perspective of a software developer: Isolating individuals' faces in a crowd from CCTV footage and comparing them point-by-point to individual terrorists' mug shots, is a job that even a supercomputer might not be able to perform in anything approaching real-time. How should the developer winnow the crowd faces to a more manageable subset? A CNN offers one option: an "average face" capable of excluding a wide range of improbable candidates. It's the visual equivalent of a hash function, simplifying long searches by making quick comparisons to some common element.

Here's the problem: Almost any argument that a terrorist-amalgamating CNN is biased at its core would be given some merit. The recording of people's faces in an international database of suspicious people may always be the last act in a process fraught with racism and discrimination. On the other hand, it's the only visual data we've got. If we were to correct that data somehow, to be more representative of the broader human race, then how much less effective would our target function become? And how much cognitive bias would the person tasked with this cleansing introduce to the process? "Whoever's training it," warns Prof. Janapa Reddi, "needs to be very careful."

Pondering these questions awakens us to how little we actually understand them.

Margin of error


People have a tendency to fear what they don't understand. Nothing amplifies those fears more profoundly than the web, whose contributors have recently speculated that bias may be imprinted upon machine learning algorithms by programmers with nefarious motives.  

"Whilst we are optimistic about the potential of AI, it raises many new ethical questions, that would have seemed like issues from science fiction only a few years ago," stated Rt. Hon. Jeremy Wright, the UK's Digital Secretary, during a speech to London Tech Week last June. "The algorithms and structures that govern AI will only be effective if they do not reflect the subconscious biases of the programmers who create them."

It's not an irrational fear: As algorithms are trusted more often to make the same types of deductions or inferences that humans would make, at some point, you might think they'd start making the same types of errors. If an artificial intelligence can mimic the reasoning capacity of human beings, perhaps it's inevitable it will adopt some of their mental foibles as well. At the extreme, it could appear that an AI has a subconscious motivation.

"What most people are worried about is, when they run the algorithm, some mysterious or unknown input or stimulus changes the output to be something that's outside of the margin of error, but you may or may not be aware of it," remarked John Roese, chairman of the Cloud Foundry Foundation and Global Chief Technology Officer of Dell EMC, speaking with ZDNet. Continued Roese:

At the root of what all machine intelligence is about is, you're trying to predict decisions better. If a decision gets distorted in some way, whatever process that decision is a part of, potentially can lead to an incorrect answer or a sub-optimal path in the decision tree. It's one of those nasty little things: People are aware that human beings have bias in their thinking — we've been talking about it for ages. And technologies that are fed by information that is created in the real world, fundamentally could have the same effect. So now the question is, how do you think about that when you're actually shifting a cognitive task completely into a machine, where you don't have the same kind of qualitative reaction that human beings will have? We either have to re-create those functions, or try to minimize bias, or choose areas where bias is less impactful.

If human biases truly are imprinted upon AI algorithms, either subconsciously or through a phenomenon we don't yet understand, then what's to stop that same phenomenon from interfering with humans tasked with correcting that bias? Granted, the question itself sounds like it belongs in a Twilight Zone episode. Yet at the highest levels of public discussion today, the source of error in neural network algorithms is being treated not as a mathematical factor but a subliminal influence.

In June 2018, a team of researchers from the Prague University of Economics and Germany's Technische Universitat Darmstadt in Hessen published a study [PDF] investigating this phenomenon: specifically, the links between the psychological well-being of AI developers and the inference abilities of the algorithms they create. Their theory is based on Martie Haselton and Daniel Nettle's research into cognitive bias and how it may have evolved along with the human species. This research suggested that the human mind evolved survival strategies over countless generations, all of which culminated in people's capability to make snap-judgment, rash, risk-averse decisions. Humans learned to leap to conclusions, in other words, when they didn't have time to think about it. The team cites this quote from Haselton and Nettle: "The human mind shows good design, although it is design for fitness maximization, not truth preservation."

Those cognitive leaps, the Prague/Hessen team concluded, correlate directly to people's tendency to misinterpret the logic in rule-based decision making systems. When the rules encoded within such a system are incontrovertible, the biases that emerge from any interpretation of that system, they demonstrate, must be the product of flawed human judgments.

The team identified 20 categories of such flawed judgment phenomena that lead to what they assume to be false analyses. For each category, they suggest what they call a debiasing technique or "antidote" -- essentially a best practice that human interpreters can undertake to cleanse themselves of the tendency to make such judgments. Here are a few examples:

  • The availability heuristic, as they describe it, is the tendency to ascribe an attribute to a class of data based mainly on how quickly that attribute comes to mind when talking about it. They cite a more esoteric example, but a common one might be to presume that rock music stars are generally old, mainly because the older stars' music has had longer to imprint itself on the mind and is thus more quickly recalled.
  • The reiteration effect is the implicit presumption that an oft-repeated statement (e.g., "Women never get anywhere in this company," "John Kerry never served in Vietnam") must be true. The implication here is that conversational assumptions may become de facto rules, even if the people sharing those statements don't think they believe them.
  • Confirmation bias is fairly simple to explain: It's a tendency to give credence to a rule that analysts or developers believed to be true to begin with.
  • Negativity bias is, oddly enough, quite the opposite: the tendency to give extra credence to a rule or conclusion that disproves a hypothesis, even if that hypothesis was weak to begin with.
  • The primacy effect is a phenomenon whereby analysts may conclude a rule to be true based on initial data, though continue to treat that rule as true when later data controverts it. It's the kind of phenomenon that, for instance, lends weight to the early front-runner in a cumulative tally such as a vote count, even though that front-runner status may have been attributable solely to which data was arbitrarily counted first.

For many of these cognitive biases, the prescribed debiasing antidote is mainly informational or educational in nature. If developers or data analysts are aware of the underlying logic, and the fallacy of disputing it, then they'll be more likely to catch their own biases before they act on them.

But this study itself makes one potentially weak hypothesis: that AI algorithms in practice are typically rule-based decision engines, not machine learning systems. They relied on analysts' ability to explain what the rules meant, in English. In explaining what they called the "conjunction fallacy," they even attempted to redefine, in logical and contextual detail, the meaning of "and." For this team, cognitive bias maps itself onto AI bias by means of language -- through misunderstanding of the rules and misinterpretation of their results.

Machine learning systems are, by design, not rule-based. Indeed, their entire objective is to determine what the rules are or might be, when we don't know them to begin with. If human cognitive biases actually can imprint themselves upon machine learning, their only way into the system is through the data.

The trade-off


In machine learning, bias is a calculable estimate of the degree to which inferences made about a set of data tend to be wrong. By "wrong" in this context, we don't mean improper or unseemly, like the topic of a political argument on Twitter, but rather inaccurate. In the mathematical sense, there may be any number of ways to calculate bias, but here is one methodology that has the broadest bearing in the context of AI software: Quantitatively, bias in a new algorithm is the difference between its determined rate of error and the error rate of an existing, trusted algorithm in the same category. Put another way, when we get down to 0s and 1s, all bias is relative.

"ML models are opaque and inherently biased," stated Douglas Merrill, CEO of credit software firm ZestFinance, in testimony before a US House committee last June. "Thus, lenders put themselves, consumers, and the safety and soundness of our financial system at risk if they do not appropriately validate and monitor ML models."

Merrill painted a picture of machine learning systems as "black boxes" -- devices with clear inputs and outputs, but offering no insight into the connections between the two. Indeed, neural networks are, by design, non-deterministic. Like human minds, though on a much more limited scale, they can make inferences, deductions, or predictions without revealing how. That's a problem for an institution whose algorithms determine whether to approve an applicant's request for credit. Laws in the U.S. and elsewhere require credit reporting agencies to be transparent about their processes. That becomes almost impossible if the financial institutions controlling the data on which they report can't explain what's going on for themselves.

So if an individual's credit application is turned down, it would seem the processes that led to that decision belong to a mechanism that's opaque by design.

In a lecture for his computer science students at Cornell University, Associate Professor Kilian Q. Weinberger offered this explanation: Suppose the purpose of a machine learning algorithm is to identify an object, based on training data that represents that same object. Formally speaking, such a determination is an hypothesis (h). The purpose of a machine learning algorithm is to describe that hypothesis as a mapping function -- a way to relate the input data to the classifiers that result from its analysis. The target function is the product of a neural network training process. Through that function, a machine learning system can evaluate test data and render a classifier -- a name or label that identifies it (y). So if you process fresh input data through a target function that was itself produced by way of training data, then that function should be able to recognize that input data as one of the classifiers it's already learned.


Prof. Kilian Q. Weinberger

Cornell University

Every machine learning algorithm has an error rate (E), which is a measure of its reliability. Mathematically, the error rate represents the probable difference between definition and fact. Any calculated error rate estimate breaks down ("decomposes") into three components:

  • Variance is the difference between the predicted target function for the training data, and the function the system renders for the test data. Both functions are estimates, and the estimate for the test data function is expected to match the training data function -- the extent to which it doesn't is the variance. Expectations in this instance may come from an average of all hypothesized results for the training set, as Prof. Weinberger demonstrated, or in other cases, from an extrapolation of results for the same algorithm applied to previous training sets. Variance can be a valuable tool, pointing the way toward where an error might be.
  • Noise (also called irreducible error) is the difference between the label or identity the function infers from the test data, and the label that function is expected to have inferred given the training data. It's the measure of inaccuracy on the part of the inputs, when the target function clearly gets it wrong.
  • Once variance and noise have been accounted for and factored out, what remains is the degree of error that typically has no account or explanation. This is the final measure of the function's inability to represent reality, with all other factors excluded. This is bias. "If I had infinitely many training data sets," explained Weinberger, "and get the expected classifier, and noise doesn't matter. . . how much error would I still get? What does that capture? That [remainder] basically captures how much my classifier is biased towards some other explanation that is not really in the data."

From Prof. Weinberger's perspective, bias is the one component of error whose root causes are not revealed by the data, or through any mathematical analysis of that data. His lecture appears to prove our original theory: Once you take away the patterns of error whose causes are rational, what remains should be those patterns that are irrational ("some other explanation that is not really in the data"). It also has the virtue of sounding like Sherlock Holmes ("Whatever remains, however improbable...") assuming Holmes was looking for the opposite of the truth.


If a machine learning system were only capable of generating a linear regression model (depicted upper left) then it would produce a vector that best fit the graph of its training data. Such a vector might have a high bias (lower left) when compared against the vector for its test data. Machine learning is actually capable of producing a function with relatively low bias, that closely fits the training data (upper right). The problem is, when you eliminate as much bias as possible, the variance between the training and test functions -- which would be the sum of the squares of all the distances between the predicted and actual values -- is huge.

Explained Dell EMC CTO Roese:


John Roese - Global CTO

Dell EMC

There are many ways to describe bias. You can describe it as the drift of an algorithm over time based on its inputs. You can describe it as the delta between the predicted margin of error and what you actually achieve. The challenge with bias, though, in this context, tends to be around the fact that it isn't easily quantified in a particular real-time function, because it usually is a manifestation of previous activities or data sets, where you don't have complete transparency into their provenance and origin.

Ironically, the measure of machine learning's effectiveness in creating reliable target functions has been how much less bias they produce, compared to a purely mathematical function like best-fit curves or linear regression. Although it's impossible to render a target function as a pure formula, the tale it tells is in its results: It's capable of more closely fitting the test data than anything produced by, say, the sum-of-least-squares method. Yet at the same time, it could go completely the other way: The theoretical extent to which a machine learning target function can more closely approximate the expected result is the same extent to which it can less closely approximate it. That's variance.

There's an inverse relationship between bias and variance, for what AI practitioners call the bias/variance tradeoff. If bias can be reduced for a model's training set, variance increases. This is the hardest fact to reconcile with. While bias represents the degree to which the target function's results can be embarrassing, variance represents how much it can be wrong.

Black box

Here is where it's tempting to remove any connection or causality between human cognitive bias and AI computational bias -- to say that bias is not something that people imprint, unintentionally or otherwise, onto the machine learning model. But researchers are discovering, and even proving, that the two concepts are not coincidentally correlated. For the May 2019 meeting of the Web Conference in San Francisco, researchers from Penn State and Purdue Universities attempted to apply the principle of causality to bias and a quantity they called fairness [PDF]. In so doing, they discovered the bias/variance trade-off worked against them. "There is growing interest in algorithmic decision-making systems that are demonstrably fair," the team wrote. They continued:

Unfortunately, choosing the appropriate definition of fairness in a given context is extremely challenging due to a number of reasons. First, depending on the relationship between a protected attribute and data, enforcing certain definitions of fairness can actually increase discrimination. Second, different definitions of fairness can be impossible to satisfy simultaneously. Many of these difficulties can be attributed to the fact that fairness criteria are based solely on the joint probability distribution of the random variables of interest... Hence, it is tempting to approach the problem of fairness through the lens of causality. Answering questions of fairness through the lens of causality entails replacing the question "Is the decision discriminatory with respect to a protected attribute?" by: "Does the protected attribute have a causal effect on the decision?"

Put another way: If someone were to make a conscious effort to eliminate the appearance of racism from a machine learning model, they could negatively influence the accuracy of the results. This is a dangerous claim to be making in a public forum. Outside of the proper context, it might appear to say the only accurate models are those that are prone to make assertions that are, at the very least, embarrassing, and at worst a violation of people's rights.

But here, the Penn State/Purdue team introduce one of my favorite words: context. The problem with a model that is capable of displaying traits of cognitive bias is that it runs in isolation. Its universe is limited to the training data sets, and the test data being analyzed. There's nothing with which to make a relative judgment, such as that prison convicts may have a broader variety of skin tones and first languages, or that detained immigrants are less likely to have criminal records, or that applicants for a loan don't all look like the cast of an "Andy Hardy" movie, or that representatives in Congress don't all have to be men.

When the scope of a function is limited to a data set, its results, though probably more accurate, are equally limited. When protections are introduced to selected components of that function, its results may be less controversial, but they also may be less accurate. That's what this team discovered using three sets of data: one that's entirely synthetic representing a straight line; another that's a set of more than 46,000 individuals, along with their (binary) genders and salaries; and publicly available data from the New York Police Department's Stop-and-Frisk data set (NYCSF) -- data collected when individuals were stopped on the street without cause, supposedly at random. They trained a pair of machine learning models for each data set: a model representing a test of fairness for selected subsets, called Fair on Average Causal Effect (FACE), and a model that tested each subset as though its distinguishing element (e.g., race, gender) were not a factor, called Fair on Average Causal Effect on the Treated (FACT). For example, would stop-and-frisk tactics have detained certain people for lengthy periods if it were not obvious they were black or Hispanic, or would a woman earn a higher salary if it weren't clear she was female?

At one level, the FACE/FACT experiment found significant evidence that women should be earning more if gender were not an issue, though surprisingly less evidence of the NYPD's stop-and-frisk tactics being racially targeted (a federal court ordered the Department to reform those tactics in 2013). That's the story that made the headlines, and as far as the press was concerned, it stopped there. But the whole point of their research was lost: The difference between the world as it is (FACE) and the world as we would like it to be (FACT) is calculable.

Cone of shame

So which is the more relevant data set? As The New York Times first reported in January, a study conducted by researchers from the University of Toronto and MIT [PDF] revealed the extent to which public awareness of the biased nature of cloud service providers' training data, had a measurable impact on the level of bias their AI services rendered over time. For a problem that was supposed to be unsolvable, somebody seems to be making some progress at solving it.

The team both practices and advocates what they call public algorithmic audits -- the publication of analyses of facial recognition services from Amazon AWS, Google Cloud, Microsoft Azure, and IBM, and independent services such as Kairos and Face++. The MIT/Toronto team writes:

Through the simulation of a mock user population, these audits can uncover problematic patterns in models of interest. Targeted public algorithmic audits provide one mechanism to incentivize corporations to address the algorithmic bias present in data-centric technologies that continue to play an integral role in daily life, from governing access to information and economic opportunities to influencing personal freedoms.

The implication here is that an algorithm that does not normalize for the distributions of gender and racial groups in the training data will make inaccurate assumptions about members of those groups for the test data. They cite the prior work of the MIT Media Lab's Gender Shades study in making the public aware of the effects of algorithmic bias, which they define as "a software defect or bug that poses a threat to user dignity or access to opportunity." Then they conducted their own experiments to determine how much the effects of that bias appeared to have been reduced since The New York Times first made the public aware of Gender Shades in May 2017. Like the original study, MIT/Toronto created a training data set where the racial and gender subgroups were evenly proportioned and equally distributed, compared against the untreated "black box" publicly available data. The difference between the two results was treated as "Subgroup Classification Error."


As the team reported, the error rate for predictions concerning dark-skinned females (DF) dropped by over 30 percent for Face++ in August 2018 over the intervening 15 months. Error results for Azure dropped by over 19 percent in the same period, and for IBM by nearly 18 percent. This while, at the same time, error rates for light-skinned males (LM) were flat for Azure over the period, dropping 0.3 percent for Face++ and almost undetectably for IBM. Though no one could provide any direct correlation between these services' apparent bias improvement and the Times story, the clear implication is that these services scrambled to find some way to improve the appearance of bias.

What MIT/Toronto also does not speculate about is how these services achieved these results: whether their publicly available sample data was amended or changed, or their algorithms were adjusted to detect bias and compensate. The latter, according to some, is not supposed to be possible; the former raises questions about data sampling and data cleansing.

"Specifically, ML models have a black box problem," ZestFinance CEO Merrill testified before Congress. "Lenders know only that an ML algorithm made a decision, not why it made a decision. Without understanding why a model made a decision, bad outcomes will occur."

As part of his testimony, Merrill attached a white paper [PDF] explaining the guidance his firm would offer to financial institutions, especially in credit reporting, seeking to use machine learning in decision making. Machine learning models require, for reasons ranging from convenience to legal requirements, explainability. "To explain a model," ZestFinance writes, "is to relate the model's decisions to the input data on which its decisions are based." If models could simply log their own processes, even automatically -- rather than wait for a public audit to do it for them -- analysts could see, said Merrill, when a loan application was deemed higher risk due to the age of the car, as opposed to the skin tone of the driver.

Merrill's white paper goes on to become something of an advertisement for ZAML, his firm's own machine learning model development system. In comparing ZAML to other approaches that attempt to reveal the reasoning behind a "black box" ML model, ZestFinance explains one method for determining the impact of an input factor on the final result of the target function.

"A natural approach," ZestFinance writes, "is to perturb a given input and observe the effect on the output. If the model is highly sensitive to the perturbation, then the features involved were important to the prediction." In other words, the pertinence of an input to the validity of a result can be assured when it's changed or even removed, and the result is appreciably altered -- say, by about 30 percentage points. Similarly, the firm goes on, if genuine inputs were replaced by random noise -- that other factor Prof. Weinberger spoke of -- and the result is essentially unaltered, those inputs were of little value anyway. So when MIT/Toronto changes an input and the results are not only plainly obvious but strictly limited, the model may have been anesthetized, but the results may also have been rendered pointless.

FACE value

If we can make "FACE" data into "FACT" data -- if we "treat" the inputs to a cleansing of the factors we deem impertinent or irrelevant -- and the results of the functions emerging from them are unchanged, then perhaps we've done the world a service. But when the results are measurable enough to merit headlines in the Times, as was the case with MIT/Toronto, will the resulting functions continue to have any significant bearing upon reality? This is a difficult question to ask, because it appears to imply that a race- and gender-conscious viewpoint is the only one that's applicable to science.


We're getting results from machine learning algorithms that are embarrassing, and we want to know what to adjust and how, so we can avoid being embarrassed again. We'd like to believe that bias is something we can eliminate from our solutions the way we would eliminate anger or otherwise bad behavior: through social training. The trouble is, the more sensitive we become to the things that embarrass us about social media, the more such things there appear to be. As a result, there's always more bias in a system than we're capable of managing.

In the broader world of human beings, where intelligence is not artificial but at times still indefinite, the existence of cognitive bias is plainly obvious. Out here, in the universe that's immensely greater than a few mere data sets, we have the virtue of context -- of being able to compare the measurements we take with the world we think we see. This is the discovery that both the Penn State/Purdue team and the MIT/Toronto team are on the cusp of reaching: We can't change the facts by changing the input data. But we can change the scenario that gave rise to those facts, and we can do so more aggressively by rethinking "error" as a precise measurement of unmade progress.

When we compare the world we want to the world we've recorded, the difference between the two is no error, but a fathoming of the distance we have yet to go. In that experiment, bias is your friend.

Learn more — From the CBS Interactive Network



Why sudo is so important in Linux and how to use it

Why sudo is so important in Linux and how to use it

What is Android 12 Game Mode and should you be using it?
The Game settings window on Android 12.

What is Android 12 Game Mode and should you be using it?

DeepMind's 'Gato' is mediocre, so why did they build it?

DeepMind's 'Gato' is mediocre, so why did they build it?

AI & Robotics