Innovation

Amazon Echo: The four hard problems Amazon had to solve to make it work

Talking to Alexa might be easy but there's a huge amount of complexity inside Amazon's smart home assistant.

Written by Steve Ranger, Global News Director Sept. 14, 2016 at 8:56 a.m. PT

Amazon's aim is to have Alexa indistinguishable from a human voice.

Dave Limp, Amazon's SVP of device and services business is standing in front of an image of the bridge of the starship Enterprise, explaining the inspiration for Amazon's surprise hit Echo device.

"A lot of people wondered what was the inspiration for this vision, it really was this, this cultural icon it started here with the tap on the lapel to talk to the computer. And later in the Star Trek series you could be anywhere on the starship Enterprise and you could talk to the computer and she would respond quickly with an answer," he says.

Amazon Echo: Ringmaster of the home automation IoT circus

Read now

The Amazon Echo, a cylindrical, voice-controlled speaker has been something of a sleeper hit in the US, selling around three million since it was launched last year and winning some rave reviews along the way. "We wanted to build a computer in the cloud that was completely controlled by your voice," Limp says.

The Echo is activated someone saying Alexa (or Amazon, or Echo) at which point it begins streaming spoken requests to the cloud where they are analysed using neural network technology in order to generate the right response to questions such as "Alexa, will it rain tomorrow?", or "Alexa, how is traffic?" or many others.

It can play music from streaming services such as Amazon Music and Spotify, or play audio books from Audible. The Echo has evolved into a digital home hub and allows users to control - by voice - things like lights, switches, and thermostats, while other companies can also offer 'skills' - like apps to connect to their own services. As such, the Echo and Alexa have become Amazon's counter to Siri and Google Now.

But while using your voice is a simple way to control a device, building the hardware and software to make that possible involved solving some major problems, says Limp.

"When we started developing the product it turns out that you discover a large number of hard problems. Its often true that when you have a very simple interface ... underneath the covers are a large number of hard problems needing to be solved."

Limp identified four hard problems that the team solved before they could deliver Echo:

1. Far field voice recognition

Voice recognition has been around for decades but mostly it has been based on near-field recognition, where the microphone is close to the users mouth which means a clear signal and less ambient noise. Amazon wanted to design a device which could function in an everyday family kitchen - a much noisier scenario.

Core to solving this is the seven microphone array in the body of the device which use beam forming to identify the microphone closest to the voice and amplify that one - and suppresses the others. And when music is playing - a common use of the Echo - the device uses machine learning driven 'Echo canceller' to make it easier for the device to hear human voices.

2. Natural language understanding

"The first stage of any voice recognition system is taking that sound file we send up and turning it into text. That's a reasonably solved problem. the hard problem that's been vexing computer scientists for decades is understanding the context of what you say, parsing the words," said Limp.

The service needs to understand what is being said and and disambiguate it so we get the right answer as quickly as possible. Limp pointed to recent breakthroughs in machine learning and deep neural networks as providing the breakthrough. "We still see dramatic improvements month on month or accuracy and the amount [Alexa] can learn," he said.

3. Privacy