These experts are racing to protect AI from hackers. Time is running out

Artwork by Robert Rodriguez

AI is becoming a key part of our lives. Hacking it could cause chaos, so the race is on to build defenses.

Bruce Draper bought a new car recently. The car has all the latest technology, but those bells and whistles bring benefits -- and, more worryingly, some risks.

"It has all kinds of AI going on in there: lane assist, sign recognition, and all the rest," Draper says, before adding: "You could imagine all that sort of thing being hacked -- the AI being attacked."

It's a growing fear for many -- could the often-mysterious AI algorithms, which are used to manage everything from driverless cars to critical infrastructure, healthcare, and more, be broken, fooled or manipulated?

What if a driverless car could be fooled into driving through stop signs, or an AI-powered medical scanner tricked into making the wrong diagnosis? What if an automated security system was manipulated to let the wrong person in, or maybe not even recognize there was ever a person there at all?

As we all rely on automated systems to make decisions with huge potential consequences, we need to be sure that AI systems can't be fooled into making bad or even dangerous decisions. City-wide gridlock or essential services being interrupted could be just some of the most visible problems that could result from the failure of AI-powered systems. Other harder-to-spot AI system failures could create even more problems.

During the past few years, we've placed more and more trust in the decisions made by AI, even if we can't understand the decisions that are reached. And now the concern is that the AI technology we're increasingly relying on could become the target of all-but-invisible attacks -- with very visible real-world consequences. And while these attacks are rare right now, experts are expecting a lot more will take place as AI becomes more common.

"We're getting into things like smart cities and smart grids, which are going to be based on AI and have a ton of data here that people might want to access -- or they try to break the AI system," says Draper.

"The benefits are real, but we have to do it with our eyes open -- there are risks and we have to defend our AI systems."

Draper, a program manager at Defense Advanced Research Projects Agency (DARPA), the research and development body of the US Department of Defense, is in a better position to recognize the risk than most.

He's spearheading DARPA's Guaranteeing AI Robustness Against Deception (GARD) project, which aims to ensure that AI and algorithms are developed in a way that shields them from attempts at manipulation, tampering, deception, or any other form of attack.

"As AI becomes commonplace, it becomes used in all kinds of industries and settings; those all become potential parts of an attack surface. So, we want to give everyone the opportunity to defend themselves," he says.

Fooling AI even if you can't fool humans

Concerns about attacks on AI are far from new but there is now a growing understanding of how deep-learning algorithms can be tricked by making slight -- but imperceptible -- changes, leading to a misclassification of what the algorithm is examining.

"Think of the AI system as a box that makes an input and then outputs some decision or some information," says Desmond Higham, professor of numerical analysis at University of Edinburgh's School of Mathematics. "The aim of the attack is to make a small change to the input, which causes a big change to the output."

For example, you might take an image that a human would recognize as a cat, make changes to the pixels that make up the image, and confuse the AI image-classification tool into thinking it's a dog.

This recognition process isn't an error; it happened because humans specifically tampered with the image to fool the algorithm -- a tactic that is known as an adversarial attack.

"This isn't just a random perturbation; this imperceptible change wasn't chosen at random. It's been chosen incredibly carefully, in a way that causes the worst possible outcome," warns Higham. "There are lots of pixels there that you can play around with. So, if you think about it that way, it's not so surprising that these systems can't be stable in every possible direction."

ai-mistakenly-identifying-a-person-as-a-vehicle-due-to-being-confused-by-the-patch.png — AI identifying vehicles and people in a simulation. One of the people has incorrectly been identified as a vehicle.
Image: Two Six Technologies

Tricking an AI into thinking a cat is a dog or, as demonstrated by researchers, a panda is a gibbon is a relatively small concern -- but it doesn't take much imagination to come up with contexts where small confusions could lead to dangerous consequences, such as where a car mistakes a pedestrian for a vehicle.

If there's still a person involved, then errors will be noticed -- but as automation begins to take more control, there might not be anyone double-checking the work of the AI to make sure a panda really is a panda.

"You can do an adversarial attack that the human would immediately recognize as being a change. But if there is no human in the loop, then all that matters is whether the automated system is fooled," explains Higham.

An adversarial input, overlaid on a typical image, caused this classifier to miscategorize a panda as a gibbon.
Image: DARPA

Worse still, these aren't just theoretical examples: a few years back, some researchers showed how they could create 3D adversarial objects that could fool a neural network into thinking a turtle was a rifle.

Professor Dawn Song at University of California, Berkeley also showed how stickers in certain locations on a stop sign could trick AI into reading it as a speed limit sign instead. The research showed that the image-classification algorithms that control a self-driving car could be fooled.

There are some caveats here -- the stickers were designed in such a way that they'd be misinterpreted by the image-classification algorithms, and they had to be put in the right places. But if it's possible to fool AI in this way, even if the tests are carefully curated, the research still demonstrates there's a very real risk that algorithms can be tricked into responding in ways that might still make sense to them, but not to us.

How do we stop attacks on AI?

So, what to do about these disconcerting challenges? Help might come from DARPA's multi-million dollar GARD project, which has three key goals. The first is to develop the algorithms that will protect machine learning from vulnerabilities and disruptions right now. The second is to develop theories around how to ensure AI algorithms will still be defended against attacks as the technology becomes more advanced and more freely available.

And third, GARD aims to develop tools that can protect against attacks from AI systems and assess if AI is well-defended, and then to share these tools broadly, rather than stockpiling them within the agency.

There's already a gloomy precedent -- the development of the internet itself is a good example of what happens when security is an afterthought, as we're still trying to deal with the cyber criminals and malicious hackers that exploit vulnerabilities and loopholes in old and new technology.

bruce-draper-official-photo — Bruce Draper, program manager at DARPA.
Image: DARPA

With AI, the stakes are even higher. GARD's goal is to prevent abuse of -- and attacks against -- AI before it's too late.

"Many of us use AI now, but we often use it in ways that are not safety-critical. Netflix recommends what I should watch next -- if that got hacked, it wouldn't ruin my life. But if we think about things like self-driving cars, it becomes much more critical that our AI systems are safe and they're not being attacked," Draper explains.

Right now, the amount of adversarial AI in practice is very small but we don't think it will be in future, he says. "We think, as AI gets more valuable and more pervasive, it's going to grow -- and that's why we're trying to do this work on GARD now," he warns.

DARPA is working with a number of tech companies, including IBM and Google, to provide platforms, libraries, datasets, and training materials to the DARPA GARD program to evaluate the robustness of AI models and their defenses to adversarial attacks, both those they're facing today, and those they'll face in the future.

ibm-almaden-research-center-campus — The IBM Almaden Research Center campus outside San Jose, California. Here, AI researchers are aiding the GARD project.
Image: Getty

One key component of GARD is Armory, a virtual platform, available on GitHub, which serves as a test bed for researchers in need of repeatable, scalable, and robust evaluations of adversarial defenses created by others.

Another is Adversarial Robustness Toolbox (ART), a set of tools for developers and researchers to defend their machine-learning models and applications against adversarial threats, which is also available to download from GitHub.

ART was developed by IBM prior to the GARD scheme, but it has become a major part of the program.

"IBM has been interested in trusted AI for a long time. To have any machine-learning model, you need data -- but if you don't have trusted data, then it becomes tricky," says Nathalie Baracaldo, who leads the AI security and privacy solutions team at IBM's Almaden Research Center. "We saw the DARPA GARD project and we saw it was very much aligned to what we were doing," she adds.

nathalie-baracaldo-ibm — Nathalie Baracaldo leads AI security and privacy at IBM.
Image: IBM

"It's split into two parts; the ART Blue Team where you try to defend, but you also need to assess what are the risks out there, and how good your model is. ART provides the tools for both -- for blue and red teams," Baracaldo explains.

Building platforms and tools to assess and protect AI systems against the threats of today is difficult enough. Trying to figure out what hackers will throw at these systems tomorrow is even harder.

"One of the main challenges in robustness research is that you can do everything as good as you can, think that you're right, publish your paper -- then someone else comes out with a better attack, then your claims can be wrong," explains Nicholas Carlini, a research scientist specializing in the intersection of machine learning and computer security at Google Brain -- Google's deep-learning AI research team.

"It's possible to simultaneously try as hard as possible to be correct and to be wrong -- and this happens all the time," he adds.

One of Carlini's roles within the GARD project is to ensure that the research on AI robustness is up to date and that the teams working on defensive solutions aren't developing something that will be obsolete before it's even finished -- while also providing guidance for others involved to help conduct their own research.

"The hope here is that by presenting people the list of things that were known to be broken along with the solutions for how to break them, people could study this," he explains.

"Because once they get good at breaking things that we know how to attack, hopefully they can then extend this to knowing how to break things that they then create themselves. And then by doing that, they'll be able to produce something that's more likely to be correct."

Why data poisoning could ruin AI

While much of the work being done by DARPA and others is designed to protect against future threats, there are already examples of AI algorithms being manipulated, be it by researchers looking to secure things or attackers trying to exploit them.

"The most common threat that has been in academic literature is direct modification to an image or video. The panda that looks like a panda, but it's classified as a school bus -- that sort of thing," says David Slater, senior principal research scientist at Two Six Technologies, a cybersecurity and technology company that works with national security agencies and is involved in the GARD project.

But this direct modification is just one risk. Perhaps a bigger threat is from data poisoning, where the training data used to create the AI is altered by attackers to alter the decisions that the AI makes.

david-slater-senior-prinicpal-research-scientist-machine-learning-two-six-technologies — David Slater, Two Six Technologies.
Image: Two Six

"Data poisoning can be one of the most powerful threats and something that we should care a lot more about. At present, it doesn't require a sophisticated adversary to pull it off. If you can poison these models, and then they're used widely downstream, you multiply the impact -- and poisoning is very hard to detect and deal with once it's in the model," says Slater.

If that algorithm is being trained in a closed environment, it should -- in theory -- be reasonably well protected from poisoning unless hackers can break in.

But a bigger problem emerges when an AI is being trained on a dataset that is being drawn from the public domain, especially if people know this is the case. Because there are people out there -- either through a desire to cause damage, or just to cause trouble -- who will try to poison the algorithm.

"Now we live in a world where we collect data from everywhere -- models are trained from data from the entire internet and now you have to be worried about poisoning," says Carlini.

"Because when you're going to crawl the internet and train on whatever people give you, some fraction of people on the internet just want to watch the world burn and they're going to do malicious things," he adds.

One infamous example of this trend is Microsoft's artificial intelligence bot, Tay. Microsoft sent Tay out onto Twitter to interact and learn from humans, so it could pick up how to use natural language and speak like people do. But in just a matter of hours, people had corrupted Tay into saying offensive things and Microsoft took it down.

This is the sort of concern that needs to be considered when thinking about how to protect AI systems from data poisoning -- and that's one of the aims of GARD.

"One of the things we're thinking about is how do we evaluate what a defense looks like in the case of poisoning -- it's very challenging," says Carlini.

Because while training a chatbot to be offensive is bad, if an algorithm was learning important information, such as medical data, and that insight got corrupted, the impact could be disastrous for patients.

"Someone can look at the literature and see how it's really trivial to attack these things, so maybe we shouldn't give cancer predictions based on this single piece of information -- maybe we should still involve the human," suggests Carlini, who hopes that GARD's work will help make systems safer and more secure, even if it means delaying the wider use of these technologies, because that'll be for the greater good in the long run.

AI in the world today

We can already see some of the problems concerning AI security being played out visibly in the real world.

For example, there has been a sudden interest in AI art generators. You can give them a few of your selfies and they'll create an array of arty profile pics that you can then use on social media. These AI systems are trained on millions of images found on the internet and can produce new images based on many genres. The problem is the AI also tends to include the biases found in the original art, creating sexualised images of women and prioritizing Western styles over others. The AI is replicating -- and reinforcing -- the biases found in the data used to train it.

ChatGPT is another interesting case study of the challenges ahead for AI. The chatbot has been a sensation and has shown how AI can disrupt everything from programming to writing essays. But its rise has also shown us how AI is far from perfect, even if we want it to be. Early users of the ChatGPT-powered Bing Chat, for example, found it relatively easy to use a so-called 'prompt injection' attack to get the chatbot to reveal the rules governing its behaviour and its codename (Sydney).

And as early users continued their testing, they found themselves having arguments with the bot over facts, or they became involved in increasingly strange and unnerving conversations. No surprise, then, that Microsoft has now tweaked the bot to stop some of its weirder utterances.

AI's defender and the road ahead

An example of AI becoming confused by an adversarial T-shirt by identifying a person as a bird.
Image: Intel Corporation

All of these threats will mean protecting AI from attacks sooner rather than later, so we're not playing catch-up -- like we had to with cybersecurity and the internet.

"If I'm a bad actor right now, cyberattacks are easier -- they're something I already know, and a lot of companies haven't sufficiently defended against yet. I could create a lot of havoc with a cyberattack. But as cyber defenses get better, we're starting to see more of the AI attacks," says DARPA's Draper.

One of the key goals of the GARD project is to get the tools out there into the hands of developers and companies deploying AI-based tools -- and on that basis, the scheme is already proving to be a success.

"We know the usage of ART is increasing rapidly," Draper explains. "If no one was starting to discuss it, we wouldn't have an audience for the tools. But I think now is the time -- there's an interest and there's an audience," he adds.

One of the main aims of the DARPA GARD project is to look to the future and create a legacy to protect AI going forward. That's why industry collaboration is playing such a key role.

"What we're trying to do is get all this out there. Because it's great if the government uses it, but we're all buying our systems from the commercial sector. The last thing we want is a nightmare scenario down the road where we're all using self-driving cars and someone figures out how to defeat them; it could bring a city to a stop," says Draper.

"It'll be a perpetual game of cat and mouse; someone will try to come up with better attacks, so this isn't the end. But that's part of trying to build an open-source community in the hope that the community becomes committed to this repository, and it can be an active learning resource," he adds.

IBM's Baracaldo sees this sense of community as essential to the whole project.

"What happens when a lot of people contribute is that the tool gets better. Because when a single person or a single entity has something, and they put it out, they don't know exactly what the other use cases are -- but others might," she says.

"And if something works for you and makes your research better, you're more inclined to make it better yourself and help the community. Because you want the community to use what you're doing in your research. So, I think it helps a lot," Baracaldo adds.

For Two Six's Slater, the open-source element of GARD is also something that is going to be critical for long-term success -- as is ensuring the systems remain robust and secure, based on the foundations laid out by DARPA.

"If we're making an impact on actual end users, I think that's important. Have we raised the alarm bells loud enough that people are like, 'okay, yes, this is a problem', and we need to meet it and so we're going to invest in it."

That continued investment is vital because, after GARD's scheme ends, malicious attackers aren't suddenly going to disappear. "It's important that this takes off because, in two years, the DARPA program goes away. But we still need the community to be to be working on this, because, unfortunately, bad actors are not going to go away," he says.

"As AI becomes more important to our lives, it becomes more valuable to our lives. We really need to learn how to defend it."

Show Comments

These experts are racing to protect AI from hackers. Time is running out

These experts are racing to protect AI from hackers. Time is running out

Fooling AI even if you can't fool humans

How do we stop attacks on AI?

Why data poisoning could ruin AI

AI in the world today

AI's defender and the road ahead

More In-depth Stories