Designing a kill switch to save us from the robot uprising

Robots that learn might figure out how to bypass safeguards, so engineers are getting clever with that off button.
Written by Greg Nichols, Contributing Writer

Is thwarting a robot uprising as simple as installing a kill switch on all the smart bots we make? Unfortunately, it's not, and that's frustrated AI researchers and scientists who are concerned that our creations may one day get the better of us.

Such is the challenge grappled with in a new paper by authors from Google Deep Mind and the appropriately named Future of Humanity Institute. The problem is that robots that learn by trying new things and receiving some kind of reward feedback, which is how AI systems can acquire skills without specific task-oriented programming, aren't likely to behave the way humans might hope all the time.

According to the authors, in what may be the best abstract to an academic paper ever written (emphasis mine):

"Reinforcement learning agents [nerd-speak for smart ass robots] interacting with a complex environment like the real world are unlikely to behave optimally all the time. If such an agent is operating in real-time under human supervision, now and then it may be necessary for a human operator to press the big red button to prevent the agent from continuing a harmful sequence of actions--harmful either for the agent or for the environment--and lead the agent into a safer situation. However, if the learning agent expects to receive rewards from this sequence, it may learn in the long run to avoid such interruptions, for example by disabling the red button-- which is an undesirable outcome."

Indeed, a robot that can disable its own kill switch is a pretty dangerous monkey. But the bigger immediate concern for the authors of the paper is that the very act of interrupting a machine designed to optimize its performance by receiving rewards is what prompts the robot to learn to bypass that kind of interruption in the future ... say, by disabling its kill switch.

The authors give the example of a hypothetical robot that can either 1) stack boxes inside a warehouse, or 2) go outside and carry additional boxes inside. The robot is programmed to value the second choice higher than the first, so given an option it will learn to go outside. But occasionally it rains, and whenever it's raining, a human will have to intervene and force the robot back inside. The problem is that this intervention is now interpreted by the robot as part of its task--if I go outside, a human will carry me inside--and not as a discreet and anomalous action on the part of the human. By intervening, a human can change a self-learning robot's whole decision making process.

The answer is programming in some kind of interruptibility that doesn't adversely affect the robot's search for a reward. If a human can interrupt a robot without the robot concluding that the interruption kept it from its reward, it won't have any incentive to bypass the kill switch.

So the answer is building a smarter kill switch. Or, in plain English: "We show that even ideal, uncomputable reinforcement learning agents for (deterministic) general computable environments can be made safely interruptible."

Take that, robot army.

Editorial standards