Google wants robots to write their own Python code

Robots can code their physical actions, in Python, when given instructions by humans.
Written by Liam Tung, Contributing Writer
Image: Sabrina Ortiz / ZDNET

Google has unveiled a new approach to using large language models (LLMs) that shows how robots can write their own code on the basis of instructions from humans. 

The latest work builds on Google's PaLM-SayCan model for robots to understand open-ended prompts from humans and respond reasonably and safely in a physical space. It also builds on OpenAI's GPT-3 LLM and related work in automated code completion, like GitHub's Copilot feature

"What if when given instructions from people, robots could autonomously write their own code to interact with the world?" said Google's researchers. The latest generation of language models, such as PaLM, are capable of complex reasoning and have also been trained on millions of lines of code, Google said. "Given natural language instructions, current language models are highly proficient at writing not only generic code but, as we've discovered, code that can control robot actions as well."

Also: Google makes massive commitment to support more languages using AI

Google Research calls its new development "Code as Policies" and asserts that code-writing LLMs can be re-purposed to write robot policy code in response to natural language commands. 

"When provided as input several example language commands (formatted as comments) followed by corresponding policy code (via few-shot prompting), LLMs can take in new commands and autonomously re-compose API calls to generate new policy code respectively," Google researchers note in a new paper, Code as Policies: Language Model Programs for Embodied Control. 

In the examples given, a user would say "stack the blocks on the empty bowl" or "put the blocks in a horizontal line near the top" of a square 2D perimeter. Google's language model generated programs then write the code in Python to accurately instruct the robot to follow the spoken commands. It relies on the structure of Python programming but also makes use of libraries like Shapely, in that case for spatial-geometric reasoning.  

The improvement Google is claiming is that language models can be better for this task than directly learning robot tasks and outputting natural language actions. 

"CaP extends our prior work, PaLM-SayCan, by enabling language models to complete even more complex robotic tasks with the full expression of general-purpose Python code. With CaP, we propose using language models to directly write robot code through few-shot prompting," Google Research notes

Besides generalizing to new instructions, Google says the models can translate precise values, such as velocities, based on ambiguous descriptions such as "faster" or "to the left". CaP also supports instructions with non-English languages and even emojis.    

While the model can write code that instructs a robot to push different colored blocks to the top of a 2D square, it can't translate more complex instructions like "build a house with the blocks" because it has no 3D references, according to Google.  

It also warns that, while CaP gives robots additional flexibility, this also "raises potential risks since synthesized programs (unless manually checked per runtime) may result in unintended behaviors with physical hardware."

Editorial standards