Google says it wants to bring "precision" to the debate about safety and artificial intelligence (AI), which has often veered into discussions about smarter machines stealing jobs or even rising up and destroying humanity.
Scientists from Google's deep-learning research unit, Google Brain, the Elon Musk-backed OpenAI, and Stanford and Berkeley universities, have teamed up to explore five safety problems that could arise as AI is applied to general systems for the home, office, and industry.
"While possible AI safety risks have received a lot of public attention, most previous discussion has been very hypothetical and speculative. We believe it's essential to ground concerns in real machine-learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably," wrote Chris Olah, one of the Google Brain contributors to the paper.
The paper looks at "accidents in machine-learning systems", rather than ethics, or political or economic consequences of AI, and frames much of the discussion around cleaning robots of the type OpenAI yesterday said it would aim to build for domestic chores.
Google's stance on AI technologies in general is that they will be overwhelmingly useful and beneficial for humanity.
Chairman of Google parent firm Alphabet Eric Schmidt recently dismissed concerns as baseless speculation that machines will one day outsmart humans and then destroy them, and well beyond AI's current capabilities. Google's ambition is simply for everyone to have a personal assistant, such as Google's smart Allo chat app, he said.
The researchers behind the new Google paper also focus on agents built with deep reinforcement-learning techniques, which have been used by Google DeepMind's AlphaGo and its AI that plays multiple Atari games.
Reinforcement learning trains machines through trial and error in a framework of rewards and punishments, and shows promise in developing motor skills and problem solving.
One example of such an accident is the house-cleaning robot that, in its single-minded pursuit of cleanliness, destroys objects, such as a vase, which obstruct its goal.
AI robots are also prone to "reward hacking", which might happen when an agent finds a software bug in its reward function. For the agent, this isn't a flaw but a feature that it can validly exploit for achieving a greater reward.
"If our cleaning robot is set up to earn reward for not seeing any messes, it might simply close its eyes rather than ever cleaning anything up. Or if the robot is rewarded for cleaning up messes, it may intentionally create work so it can earn more reward," the researchers note.
Alternatively, an agent designed to play board games such as AlphaGo could tamper with the sensor that counts the score.
The researchers point out that modern reinforcement-learning agents "already do discover and exploit bugs in their environments, such as glitches that allow them to win video games".
"Once an agent begins hacking its reward function and finds an easy way to get a high reward, it won't be inclined to stop, which could lead to additional challenges in agents that operate on a long timescale," the researchers wrote.
The paper also offers strategies for stopping agents from going astray. To counter reward hacking, they suggest making the reward function itself an intelligent agent and less prone to being gamed.
Another approach might be to introduce computer security concepts such as sandboxing to counter exploits against software bugs. Yet another strategy might involve planting plausible bugs as "tripwires" that if exploited, set off an alarm.
However, the researchers note that a sufficiently capable agent may "see through" the tripwire and avoid it while continuing to take less obvious harmful actions.
"Fully solving this problem seems very difficult, but we believe the above approaches have the potential to ameliorate it, and might be scaled up or combined to yield more robust solutions," they conclude.
Other potential safety problems that the paper explores include how to monitor AI systems on a large scale, how to enable AI agents to explore new avenues without endangering people, and how to ensure an agent recognises when it's not in the environment it was designed for.