Human meets AI: Intel Labs team pushes at the boundaries of human-machine interaction with deep learning
What is it like for a person to live partly inside of the objective function of an AI program? Intel scientist Lama Nachman shares insights from her team’s work with Peter Scott-Morgan, a person willing to transform his body and his life to interact intimately with a machine.
Lama Nachman spent years helping the late Stephen Hawking through various upgrades of the computer technology that helped him to work and communicate. Hawking passed away in 2018.
Her team at Intel Labs is now working with Peter Scott-Morgan, a roboticist who has undergone several operations to head off the incapacity that comes from ALS, the same affliction as Hawking suffered. Working with a variety of technologies, including GPT-2, OpenAI's generative deep learning model for text, Nachman and team are pushing at the boundaries of how a person can exist in a give and take relationship with AI.
Part ethnographer, Nachman shows great sensitivity to the nuances of how humans encounter technology. She explained to ZDNet the differences in temperament between her two very different collaborators, first Hawking, now Scott-Morgan.
Hawking was "the best validation engineer ever," says Nachman. He endured tons and tons of trial and error with new technology, and seemed to derive great satisfaction in finding bugs in the software. It was almost like man versus machine, to hear Nachman tell it, John Henry versus the pneumatic drill.
Scott-Morgan, in contrast, sees himself as becoming one with the machine, both helping to train it, and at the same time learning a new mode of being from it in a symbiotic fashion.
"I think of myself as partly human and partly AI," is how Scott-Morgan views it, in Nachman's telling. "I am willing to be nudged by the AI system," Scott-Morgan has told her.
ZDNet: Lama, tell us a bit about the work that you do.
Lama Nachman: I lead a research lab called the Anticipatory Computing Lab at Intel Labs, and it is a multidisciplinary lab. Basically, it's really at the intersection of ethnography, end-user research, design, and AI. And our focus is really on assisting people with AI in different aspects of their life. We start with user-centric research. We look at areas like, for example, education or manufacturing, smart home and health, and assistive computing for people with disabilities. Basically, what we're looking for is how can we provide assistance in daily life using robust perception and prediction, with multimodal sensing. We've done a lot of work, for example, deploying in schools, trying to improve engagement of students at an early age, helping technicians in manufacturing facilities by watching over their backs what they're doing, suggesting how they can learn from other people.
ZDNet: And you did quite a bit of work with Dr. Stephen Hawking. Tell us about that.
LN: The work on assistive computing got started back in 2011 working with Professor Stephen Hawking to essentially revamp his software platform that enables him to talk to people but also perform all sorts of different tasks with his machine, surfing the Web and even giving lectures. We went in and tried to understand what were the issues with his existing system, what are the constraints. Initially, we were hoping that we could find something that we could utilize that was off the shelf. But then the more we tested different concepts and ideas with him, trying things like gaze tracking, and all sorts of different systems, nothing really worked. So we had to go back to the drawing board and think about how you build essentially a modular platform that allows you to essentially take different ideas, targets, different constraints, build it in a way where you can connect different components and so on. We wanted it to be modular from the get-go because we were shooting for clearly not just building it for Stephen but taking it open source. That's what we call ACAT, which is the Assistive Context-Aware Toolkit. We started in 2011, and then, a couple of years later, we were able to build something that we could deploy with Stephen. And then we continued to work with him throughout his life to continue to have capabilities, improve things for him, understand where there are gaps in the system, continue to evolve and work things out. And we put it out as open source, to continue to innovate on it and add features to it. [NOTE: ACAT can be downloaded from Intel's open-source code Web site.]
ZDNet: And your work with Dr. Scott-Morgan became a part of that?
LN: So then a couple of years ago, Peter, the companies that came together to assist him contacted me because of the work that I had done with Stephen, and the research we do at the intersection of humans and AI. At the time, our assumption was that Peter was going to use gaze tracking. With Stephen, when we built ACAT, we were trying to look for essentially where is the gap in this space in general. If you look out in the world, there are a lot of gaze tracking systems that exist out there and people can just use it. We wanted to actually address the needs of people who can't really be served with other solutions that are out there. So we decided to look at, essentially, different modalities of input. In the case of Stephen, he had the proximity sensor, the strap on his glasses, and every time he moved his cheek it would essentially trigger that sensor. You would go in and highlight different things on the screen, and there's certain patterns depending on what we predicted he might want to do next. And then once the thing that he's interested in, whether it's a letter or a function, or whatever, becomes highlighted, he would push the button with his cheek.
ZDNet: And you ended up with a different approach for Dr. Scott-Morgan?
LN: When we had the discussions with Peter, since he was going to use gaze tracking, he was able to do it, one of the things we had been thinking of with ACAT was, knowing everything that we know with Stephen from that experience, how do we improve the performance of that overall system that can allow him to communicate with people? We started with ACAT, adding gaze capability, building the interaction and the UI in a way that made sense given the mode of interaction, which is different than what you have with the trigger functions. Here you just gaze at the thing that you are interested in. Then you have to kind-of refine the interaction, and then understand how do you tailor it. We went down that path, focusing on what's urgent, given the surgery that was coming, and how important that was to complete for him to work. But at the same time, given what we've learned from Stephen, and Peter's specific interest, which was, reducing what we call the silent gap. Imagine when someone is speaking to him, we will say whatever it is that we want to say, he would listen to that, and then he would start formulating his sentence through words, through letters, and then we added word prediction. But then even with very fast gaze control, even at 400 milliseconds or so, what we can do today with a gaze tracker, you still have quite a bit of a silent gap. And when you are trying to actually have a conversation with somebody, that is really quite problematic. And when you are having a conversation, you have much more leeway in what it is that you can use rather than having to necessarily dictate every letter, unlike, for example, if you are trying to write a book or a document. So the idea there, in my discussions with Peter was, Okay, can we actually train a system that can listen to the conversation and rather than allowing the interface to be at a lower level, meaning letters or word prediction, can we actually recommend certain responses, and he could just quickly gaze at a response of interest? So, in addition to the here and now, just getting ACAT to work with gaze tracking, the research that we started, which we haven't deployed with him yet, because it's quite complex to actually make happen, is developing a response-generation system that bridges into the conversation, that he can basically interface on top of, that allows him to nudge that in a way that he can control it. So if it happens that it's good enough for you, you choose it; if it's not, then he would enter a keyword or a theme, and then that would help the response generator to generate a different set of options. If all else fails, he would just dictate them.
ZDNet: Do you have a sense of what the time frame is in which you'll be able to implement this for him?
LN: We're essentially building on top of GPT-2, and there are some key problems that are quite different with this than if you had a chat bot. One is, you would want him to be able to control the system. This notion of nudging is important. And that means the way that he would need to train and fine-tune the model would look very different. Now you have to essentially train it with his speech, not just, here's a sentence, here's a prompt, but how does he control it. We've been experimenting with all sorts of methods to do that. We have reasonable levels. The problem that I see in terms of getting something like this out is having the level of reliability that would actually make it work and be fine. Clearly, you will get, every once in a while, these really strange answers. So we have been trying to essentially improve that, and figure out better ways of controlling with keywords or themes or topics and things like that. That was one. The second one is, actually being able to bring in some of his content to train, to fine tune it. We're talking about a system that has tons of data, and trained on tons of conversations, being able to actually nudge this with his own data, and learning from that limited data set. My assumption right now, given were we are in this process, is that it's probably nine months to a year before we can at least get a first system out to him. However, one of the things that we've been thinking a lot about is, how does that system continue to learn and improve over time? One of the issues, not a very obvious issue but an issue that I see there, is enabling him to continue to train the system in a way that isn't dictated by the need to quickly respond. One of the things that I've been quite concerned about is that if you generate responses for him, clearly he has this huge reason, a big reason, to go and select something that was there, because if he has to go and write something, it would take some time. Over time, if he continues the practice of picking for expediency, it would probably make the system move further and further away, and make him feel just more boxed in. So, one of the things that we're designing into the system is a way for him to essentially flag, as he selects something, that he's doing it out of expediency, but that it's not what he would have selected. He can just flag that as he selects. And then when he has more time, he's not in the middle of the conversation, he could set some time to work with the system to train. He could ask for all of these cases where these things were not ideal, and it would play back the thing for him, and then he has the time there to go dictate what it is that he would have said, had he had the time. So we're trying to think about how the system will be able to evolve with him.
ZDNet: How does one annotate, in other words, is one way of looking at it?
LN: Exactly. And you know for him, the interesting part is, if you think about the type of function that you would want to do, because he also writes books, and so at that point, one really wants much more control, right? You essentially want to be able to dial in and dial out control, and at what level are you interacting with the system, depending on whether you're really trying to go for expediency and low latency versus being able to really have much, much finer control on what you're trying to express. And thinking of the interaction at all different levels. At a minimum, you could enter letters and get your voice predictor to work really well. How does it work in the context of use? Another thing we've been working on improving is just the word predictor to actually be much more cognizant of the context of use. A lot of our testing right now has been with text. In other things that we do, not necessarily in this specific project, we work a lot with systems that are based on voice. So we have to think about the ASR [automatic speech recognition] component of this. We have much more information from ASR than just the text. We're thinking about what are the resources of that ASR system and how can we bring that in.
ZDNet: Thinking about GPT-2, you have all these different things to balance, you have the prompt, and then you have fine tuning as you talked about… There's a feedback loop where he's giving signals… There's all these different places I guess you can push or pull on to make this kind of system better…
LN: Yeah, exactly. And then, how do you actually incorporate some of these themes and keywords? The interesting part is, how do you think of a system that continues to bring different options. If it's not going to do what you want with zero effort, can it do it with minimal effort? Part of the struggle that we see is that the system might do quite well on things that are much more generic in nature, but getting it to the specificity becomes much, much harder.
ZDNet: And if you want to get to the long tail, the parts of the probability distribution that are part of human freedom, and expression, for this individual, you want to find ways to nudge the system away from simply being in its sweet spot all the time. Because he wants to express himself along the fat tail of this curve.
LN: Exactly. And then also, the context of this specific conversation, so that you don't sound repetitive. At the same time, you have this humongous model [GPT-2] with billions of parameters. And then you're trying to bring what we think is his style to that process with very, very limited data. You want to be able to not just reinforce the most common things, you want to be able to give much more weight to his own data and style. And even within that, understand how that's not a uniform case. One of the things we've been thinking about is, Peter is someone who is very sharp and quick, and he uses sarcasm. How do you bring some of that back in? When you really intend to be sarcastic, do you just, kind-of, select the sarcasm mode? Luckily, I was able to meet him before he had his surgery. He's brilliant, he's funny, his character is something you want to be able to retain. How do we express that? In some sense, you could have him explicitly express that. You could have a combination of trying to get out of what he's saying what is the sentiment of the content that he's given, and then enable that tagging to happen more easily. One thing we've been studying is how do you enable him to nudge the system but not end up with this humongous interface.
ZDNet: I guess from an engineering standpoint, it's kind of like recovering the signal from the noise?
LN: Exactly. But you know, the interesting thing about Peter, we've had a lot of conversations about, How possible is something like this? And his point, which he kept pushing — and he's 180 degrees from where Stephen's approach and attitude was about this — is that he's like, Well, I think of myself as partly human, partly AI when I get to that phase. I am willing to be nudged by the AI system, right? I'm willing to give up a degree of control. I'm willing to spend the time and work with it, and learn from it, and it can learn from me.
ZDNet: And Hawking did not think of it in that same way?
LN: No, it's totally the opposite. Stephen, one of the things that was frustrating to him is he wanted to predict his word predictor. And the more intelligence we added to the word predictor, he was getting really pissed off! Because Stephen learned over time, over years and years of using it, what to expect and where. And so when we actually brought in a more advanced word predictor, he kept complaining. And so I took a lot of measurements, a lot of data, and I showed him exactly the performance—
ZDNet: To prove to him!
LN: To show him how much better it was. And he kept saying, but I'm looking for the answer in a specific place!
ZDNet: Which is a highly intelligent response to an upgrade, actually!
LN: Exactly. And Stephen was a very unique case. How many people with ALS used a system for more than a decade? With Stephen, it was, okay, I've been using it for 20 years. When there's a word prediction going on in his head, anything that changes the way that the interface operates throws him off.
ZDNet: And in hacking the system that way, Hawking was finding a way that was in accord with himself as a person outside of strictly the objective function of the device.
ZDNet: And in contrast, in the system now you're working on with Dr. Scott-Morgan, he's saying he will put himself in accord with the loss function, aggressively, to be in the stream of how it functions because he's actually intrigued by it.
LN: Exactly. That's a fantastic way of putting it.
ZDNet: You're making the work with ACAT open-source, correct?
LN: Yes, this is important. In fact from early on, when we first started working with Stephen — I always joke, Stephen was both a designer and a validation engineer on that. Stephen was the best validation engineer ever. He would spend weeks and weeks debugging, validating a system. It became almost a challenge for him. He would have this smile on his face when he found a bug. But he was adamant about it being open source. He was someone that so many people reached out to, to find solutions. So it was obvious to him there was a gap in this space, and there needed to be something open source. When we started to think about it being open source, we had to think about how could it support people with all different kinds of abilities to control the system. For example, this whole trigger-function notion, you could enable it to support people with different abilities by essentially decoupling that trigger function from the rest of the system. Part of the thinking is, how do you enable people to innovate on top of that system as well, and Anyone who has a capability they want to bring in, they don't have to go and spend another three years trying to build a system like that. With the trigger, specifically, we've used different types of sensors. There are people, for example, who may not be able to move a muscles in their faces, but maybe they can move a finger. They can't move it with enough dexterity to push a button, but they have some motion in their fingers. So one of the things we have tried is build a ring with an accelerometer in it, so they can just put it on and they can move that finger. Another one was control with eye movements; even if you can't get good gaze control, if you're able to just move right or left, and use that as a trigger. It's this idea of just expanding the kinds of signals that people can use. Over the last couple of years, we've been really trying to address the people who can't move any muscle. So we've been working on BCI [brain computer interface], specifically. One of the issues that we've seen is that there are BCI systems out there with tons of electrodes, really high fidelity systems. One thing we've been working on is can we actually use, kind-of, an open BCI system, one for a couple hundred dollars, not a lot of electrodes, just worn as a cap, and that makes it more usable, and more accessible. And, essentially, compensate that with a lot of signal processing and machine learning to get reasonable accuracy. What slowed down BCI was how many iterations you have to do before you have enough confidence that you know what someone is trying to do. That is something we're trying to get out, actually, into open source soon. I'm very excited about getting that into open source because I think it would really unlock access for a lot of patients. That's imminent, actually, we have the whole thing working in our lab, so I'm hoping by the end of the year.
ZDNet: Thank you, Lama, for giving us this wonderful window into your team's work.