Microsoft researchers are using ChatGPT to instruct robots and drones

Microsoft's next challenges for ChatGPT are being a tool for human-to-robot interactions and generating code for robotic actions.
Written by Liam Tung, Contributing Writer
robot hand typing on a laptop
Prostock-Studio/Getty Images

OpenAI's ChatGPT isn't just good at generating coherent text responses to natural language prompts -- it can also play a role in human-to-robot interactions and use sensor feedback to write code for robot actions. 

Microsoft recently conducted research to "see if ChatGPT can think beyond text, and reason about the physical world to help with robotics tasks." The aim was to see if people can use ChatGPT to instruct robots without learning programming languages or understanding robotic systems. 

In Depth: These experts are racing to protect AI from hackers. Time is running out

"The key challenge here is teaching ChatGPT how to solve problems considering the laws of physics, the context of the operating environment, and how the robot's physical actions can change the state of the world," a team from Microsoft Autonomous Systems and Robotics Research note in a blogpost

The Microsoft researchers explored ChatGPT's capability at generating code, mostly in Python, for robotics scenarios, such as zero-shot planning and code generation, after ChatGPT was given access to object-detection and object-distance data through application interfaces. 

ChatGPT can produce code because it was trained on large amounts of code and written text. The system has been shown to be capable of solving coding problems and debugging programs, with the added unique capability of responding to dialogue and seeking clarifications. There's also Codex, OpenAI's GPT-3-based model that underpins GitHub's Copilot paired-programming service, which auto completes code for developers in multiple languages

Also: What is ChatGPT? Here's everything you need to know

With these dialogue and clarification capabilities in mind, Microsoft tested ChatGPT's ability as a language-based interface between a non-technical user and drone. As the researchers note in a paper, while GPT-3, LaMDA and Codex showed promise in robotics-planning and code-generation tasks, ChatGPT specifically is "a potentially more versatile tool for the robotics domain, as it incorporates the strengths of natural linage and code generation models along with the flexibility of dialogue."

The researchers note in their blog post: "ChatGPT asked clarification questions when the user's instructions were ambiguous, and wrote complex code structures for the drone such as a zig-zag pattern to visually inspect shelves."

Microsoft tested ChatGPT to use a robotic arm to move blocks around to form the Microsoft logo. The researchers also tasked ChatGPT with writing an algorithm for a drone to reach a point without crashing into obstacles. They also tested whether ChatGPT can decide where a robot should go based on sensor feedback in real time.  

Researchers at Google Research and Alphabet-owned Everyday Robots have also worked on similar robotics challenges using a large language models called PaLM, or Pathways Language Model, which helped a robot to process open-ended prompts and respond in reasonable ways. 

Editorial standards