​Microsoft's newest milestone? World's lowest error rate in speech recognition

Microsoft has leapfrogged IBM to claim a significant test result in the quest for machines to understand speech better than humans.
Written by Liam Tung, Contributing Writer

The techniques Microsoft Research used to achieve a new world-best error rate will eventually enhance the Cortana Windows 10 personal assistant.

Image: Microsoft

Microsoft claims to have achieved the world's lowest error rate for speech recognition, as the company jostles with Amazon, Apple, Google, and IBM to develop products that understand speech as well as humans can.

According to Microsoft, its speech scientists at Microsoft Research have achieved a word error rate (WER) of just 6.3 percent under an industry-standard evaluation, using techniques that will eventually enhance Cortana.

The previous lowest error rate was 6.9 percent, achieved by IBM's Watson team, which beat their own record of eight percent set last year.

Both Microsoft and IBM presented papers detailing their work on speech recognition at the Interspeech conference in San Francisco this week, where papers were also presented by Google's speech researchers.

As Microsoft notes, 20 years ago the lowest error rate in speech recognition was 43 percent and that was achieved by IBM in 1995. By 2004, IBM had cut its error rate to 15.2 percent.

However, these days with more research funds being funnelled into deep neural networks, tech giants are boasting error rates of well below 10 percent, but not quite at a level that exceeds human-level accuracy, which IBM estimates to be at about four percent.

Google CEO Sundar Pichai last year boasted its deep neural networks helped it achieve an error rate of eight percent in speech recognition systems that power voice Search and Android.

More recently, Apple's senior director of Siri, Alex Acero, a former Microsoft Research member, said error rates for speech recognition have been "cut by a factor of two in all languages", with greater gains in some languages, again thanks to its work on deep neural networks.

Acero's statement was more cautious than a claim by Apple vice president of software engineering Craig Federighi which suggested that Siri had an error rate of just five percent under industry-standard tests.

Microsoft's speech recognition systems were assessed against the NIST 2000 Switchboard task, an evaluation that started in 2000 to test conversational speech recognition over the telephone.

Back then, it evaluated technology from SRI, the company acquired by Apple in 2010 as the basis of Siri, Dragon software, IBM, and BBN Technologies, acquired by Raytheon in 2009.

Like its rivals, Microsoft has made artificial intelligence a key plank in its strategy for human-computer interaction with voice-based platforms such as Cortana set to play a key role in enabling computing in wearables, mobile, the home, vehicles, and the enterprise.


Editorial standards