Amazon's intelligent, voice-enabled assistant, Alexa, has become an integral part of everyday experiences. Alexa gets more than one billion requests per week, Amazon said Wednesday, while customers have access to more than 100,000 Alexa skills.
Now, the technology giant is developing a new capability for Alexa: the ability to communicate using others' voices, so she can help users remember loved ones who have passed away. On Wednesday at the re:MARS conference (Amazon's event for machine learning, automation, robotics, and space), Amazon's Rohit Prasad briefly described the skill.
He showed a short video of a boy speaking to an Amazon Echo speaker. "Alexa," the boy asks, "Can grandma finish reading me 'The Wizard of Oz'?" A woman's voice begins speaking, and Prasad confirmed the voice is supposed to be that of the boy's deceased grandmother.
"One thing that surprised me the most about Alexa is the companionship relationship we have with it," said Prasad, Alexa AI SVP and head scientist. "Human attributes of empathy and affect are key for building trust. They have become even more important in these times of the ongoing pandemic, when so many of us have lost someone we love. While AI can't eliminate that pain of loss, it can definitely make their memories last."
Prasad didn't say when the skill will be available; Amazon is "working on" it. An Amazon representative told ZDNet that it has nothing further to share regarding the timing of its availability.
Generating a voice like this presents a technical challenge, Prasad explained in his remarks, because it requires producing a high-quality voice with less than a minute of recording, versus hours of recording a voice in a studio. Prasad's team addressed the challenge as a voice conversion task rather than a speech generation task.
"We are unquestionably living in the golden era of AI, where our dreams and science fiction are becoming a reality," Prasad said.
To make Alexa even more human-like, Amazon is building generalizable intelligence into the tool, Prasad shared. Generalizable intelligence comprises three key attributes: learning across many different tasks, continually adapting to user environments, and learning new concepts through self-supervision.
Amazon is also working on approaches like think-before-you-speak, in which Alexa effectively uses "implicit commonsense knowledge" (built with a large language model and a commonsense knowledge graph) to generate responses to a user.
For instance, if a customer on Valentine's Day says, "Alexa, I want to buy flowers for my wife," Alexa could leverage world knowledge and temporal context to respond with, "Perhaps you should get her red roses."