MIT work raises a question: Can robots be teammates with humans rather than slaves?

Scientists at the Computer Science and AI Laboratories at MIT used machine learning approaches to train a robot arm to take action based on question-and-response dialogues with a human teammate. Could it re-focus humanity away from its obsession with robots as slaves?
Written by Tiernan Ray, Senior Contributing Writer

The image that most of society has of robots is that of slaves -- creations that can be forced to do what humans want. 

Researchers at the Massachusetts Institute of Technology have formed an interesting take on the robot question that is less about slavery, more about cooperation. They observed that language is a function of humans cooperating on tasks, and imagined how robots might use language when working with humans to achieve some result. 

The word team is a word used prominently way up top in the paper, "Decision-Making for Bidirectional Communication in Sequential Human-Robot Collaborative Tasks," written by scientists Vaibhav V. Unhelkar, Shen Li, and Julie A. Shah of the Computer Science and AI Laboratories at MIT and posted on the MIT website on March 31

The use of the word team is significant given the structure of the experiment the scientists designed.

In broad terms, the work shows it's possible to use language to help train a robotic arm to perform a task, such as helping to prepare meals in a kitchen. The approach is something called "reinforcement learning," which has been exploited spectacularly in recent years. Google's DeepMind unit used RL to train a computer program to beat humans at chess and the game of go.


The CommPlan program uses a reinforcement learning approach called a "Markov Decision Process" to known when to utter statements, such as a planned move, to inform a human partner so as to avoid conflicts. The experiment is a kitchen set-up where humans make sandwiches and robots pour cups of juice and the two parties have to share the workspace in the most efficient manner possible. 

Unhelkar et al. 2020

What's different here is the insertion of options for verbal utterances by the robot (verbal in the sense that the computer running the robot arm speaks the utterances through a speaker using a commercial text-to-speech program.) The robot program can query a human, or make requests of a human. The experimental set-up as designed by Unhelkar and colleagues is a kitchen, where a person is making sandwiches and the robot arm is pouring juice into cups. The robot and human have to share the space, which means they have to negotiate what each will do next, at each moment, so that they don't collide. 

Also: Stuart Russell: Will we choose the right objective for AI before it destroys us all?

As a team, they get a reward whose value is dependent on the extent to which they do the task most efficiently. Another way of saying it is that, from the point of view of the robot's programming, the robot has to seek to maximize the efficiency of its actions in conjunction with what the human chooses to do, it has to take into account human intentions. The machine is programmed to cooperate, in other words.

"To our knowledge, our approach is the first to perform decision-making for multiple communication types while anticipating the human teammate's behavior and latent states," write Unhelkar and colleagues. 

The scientific question that Unhelkar and colleagues set out to answer is whether a robot's ability to cooperate with a person and to optimize a task as part of a team is enhanced via verbal communications. 

That simple question is an important twist for robotics. Previous research has explored human-robot communication, but not generally with respect to a task. 

Also: Is this the tipping point for delivery by robot?

The algorithm developed by Unhelkar and colleagues, called "CommPlan," uses machine learning to develop not only the actions of the machine, as DeepMind did with go but also its communications.

"CommPlan jointly reasons about the robot's actions and communication to arrive at its policy." 

In the experiments they conducted, Unhelkar and colleagues used a "UR10 collaborative robot," a robotic arm that has several ways in which it can pivot, rotate, and bend. It's made by Universal Robots, based in the city of Odense in Denmark. A "three-finger gripper" is attached to it, made by startup Robotiq, based in Quebec. The authors compared the performance of CommPlan to "baseline" approaches where there are either no utterances between robot and person, or where the choice of utterances at each moment is rigidly planned by the programmer. 

In contrast, CommPlan is solving the equations for what utterances to use in real-time, which you might call "learning" to communicate. The hypothesis was that the learned approach will yield better results than either a rigid protocol or silence.

Indeed, it did. They report that CommPlan beat both baselines. It yielded "higher cumulative reward and lower task completion times as compared to the Silent policy." And compared to the rigidly programmed approach, "Despite making only one more communication (on average) than the Hand-crafted policy, CommPlan accrues substantially higher reward."

A video of the experiment is posted on the MIT website.

As inspiring as a robot-human "team" might be, the work raises another question, Who is calling the shots? In any collaboration, including human collaborations, at times there can be one party that is telling collaborators what to do, taking the lead, acting as a sort-of boss, even if it's ostensibly an equal partnership. 

If the robot is not going to be a slave to the human, the reverse is true too, that it's probably undesirable to build robots to enslave people. And so it's important to observe where dominance and subservience arise. 

Unhelkar and colleagues sort this out by making the way that the robot communicates with the human have a cost that affects the ultimate reward, and which is something that can be learned. That gives the programmer an indirect way to affect how the robot acts with respect to human choices. 

In an email exchange, ZDNet asked Unhelkar what happens if a human chooses not to follow requests made by the robot. Unhelkar told ZDNet that by tweaking the "cost function," variables that affect the final team reward, the robot will modify its communications to adjust to being more deferential to a human. 

Also: Yardwork: Automation strolls out the warehouse door

"If we want a more 'polite' robot, in the cost model, we can say a request is less costly than a command," wrote Unhelkar. "Our model also captures that the human may follow a command differently during different steps of the task," wrote Unhelkar. "For example, the human may not pay attention to the robot's suggestion if it has committed to a decision, but may be open to suggestions while she is still deciding."

"If the human declines the request, the robot replans and adapts to the human behavior."

One can go a little further in this line of inquiry, however: Can such a partnership be humane, in the sense of ensuring the best interests of people are not trampled by a machine trying to optimize its activity to obtain some reward?

That question has been raised elegantly by Stuart Russell of UC Berkeley's Center for Human-Compatible AI. Russell argues that the goals of artificial intelligence should be those that are in accord with the primacy of human life. In his view, that means understanding what a human might want but isn't expressing explicitly, which comes back to the matter of communication. 

Russell has suggested modifying the typical specification of an intelligent machine's objective. Rather than saying, "Machines are intelligent to the extent that their actions can be expected to achieve their objectives," instead he proposes, "Machines are beneficial to the extent that their actions can be expected to achieve our objectives," where the emphasis is Russell's. 

Asked about Russell's view, Unhelkar told ZDNet he shares Russell's concern and said that specifying the correct objective is "both challenging and critical to design of AI systems." 

Also: Watching YouTube videos may someday let robots copy humans

The challenge is that a machine system in some sense has to infer what a human's desire is. The CommPlan program can achieve that only in part. It infers what a human's "latent state of decision-making" is by posing questions and observing responses from a person. But more work is needed to know what humans' intentions are based on how they communicate, Unhelkar told ZDNet.

"The learning component of CommPlan can be extended to learn humans' latent preferences over communications," wrote Unhelkar in email.  

"In future work, we intend to explore this extension," added Unhelkar. The main challenge with inferring humans' intentions via communications is that utterances in a task setting tend to be sparse, observed Unhelkar. That means it's tricky to gather enough examples of human utterances to create a data set from which a program can learn.

"I posit that such a setting will be better suited for scenarios where the robot is interacting and learning over a longer period (i.e., long-term interactions)," said Unhelkar, "as it would allow collecting sufficient data of requests and utterances required for learning."

That raises one more interesting challenge, the problem of how to safely train robots when they're performing tasks around humans. They're learning by trial and error, and one doesn't want their errors to be dangerous. 

In general, "errors during the trial-and-error need to be well understood and safety guaranteed before letting the robot train with humans," Unhelkar told ZDNet. There's also the fact that substantial training with a person takes time on the part of the human, which was not the case for DeepMind's chess program, which was playing against itself. 

Also: Google suggests all software could use a little robot AI

To speed things along, for the time being, CommPlan is not pure machine learning. Only part of the program is "learned." 

CommPlan is an example of what's called a "Markov Decision Process," whereby a state of affairs and possible actions are evaluated at each turn of the task, to calculate which actions lead to the states of affairs that maximize future returns. (This is similar to but different from the method DeepMind used, a "Monte Carlo Tree Search.") 

Only some of the parameters of the Markov process are learned from data; others are programmed by the developer "by hand." Replacing those hand-coded parameters with learned parameters is a complex task that will take time, Unhelkar told ZDNet. Unhelkar proposes leveraging "domain expertise when available, as it speeds up learning and gives us a better understanding of why a robot/agent is making a certain decision." 

There is also "a huge potential for work in designing algorithms that can digest human's domain expertise more seamlessly (e.g., by learning from high-level instructions, instead of the low bandwidth labels in the supervised learning sense)," Unhelkar told ZDNet. He cited the example of work by colleagues at MIT in which robots learn from task descriptions.

It's way too soon to speak of robots as either master or slave. Robots today are automated mechanical structures with limited degrees of freedom, capable of only the simplest repetitive routines. But we as a society are obsessed with how we will eventually communicate with a truly sophisticated robot. 

The work on CommPlan suggests maybe we should think about teamwork and partnership in preparation for when that day arrives.

Editorial standards