AI-powered speech recognition is entering a new phase: Total global comprehension

A speech recognition startup has taken an ambitious path to leapfrog Apple, Google and other tech giants. Can AI understand all 7,000 languages in the world?
Written by Greg Nichols, Contributing Writer
Speech recognition on a smartphone
Getty/Oscar Wong

A speech recognition startup just landed $62 million in Series B funding. How will the money be used? In a quest to enable a computer to understand every voice in the world.

If that doesn't strike you as hugely ambitious you haven't spent enough time trying to get Siri to compose a text message. Speech recognition has been a huge challenge for developers, and it's a puzzle that's being closely watched in a variety of industries. The technology has implications for human-machine interfaces in fields like robotics, autonomous vehicles, and personal computing, all of which will benefit from computers that can accurately interpret natural speech. 

Speech recognition, then, is a kind of technological entry point, a market need that can help spur the development of technologies that will have broad resonance and incalculable implications for how we interact with machines. 

It's also an equity issue. Not surprisingly, speech recognition currently works well for a small part of the global population.

A big part of the challenge is the training model. Most training data needs to be manually classified, which means that accuracy is only achievable across a very narrow set of speakers (not surprisingly, that narrow set corresponded precisely to the most valuable consumers). Speechmatics is taking a different approach in its bid for more representative speech recognition. 

Based on datasets used in Stanford's 'Racial Disparities in Speech Recognition' study, Speechmatics recorded an overall accuracy of 82.8% for African American voices compared to Google (68.6%) and Amazon (68.6). This level of accuracy equates to a 45% reduction in speech recognition errors – the equivalent of three words in an average sentence.

Its engine is exposed to hundreds of thousands of individual voices using unlabelled, more representative voice data that doesn't require human intervention. That's helped drive coverage beyond English-language speakers.

"Our progress in the last few years left us inundated with interest from investors for our Series B fundraise," says Katy Wigdahl, CEO. "The Speechmatics team is hugely ambitious. We have a real heritage in speech technology combined with some of the world's most talented speech and machine learning experts."

At present, the engine understands 34 languages, a small drop in a very large linguistic bucket (there are over 7,000 languages spoken worldwide). But the platform has made impressive strides in punctuation, numbers, currencies, and addresses, which traditionally stymie speech recognition engines.

All of this has attracted major interest in the UK-based company. Companies like 3Play Media, Veritone, Deloitte UK, and Vonage, as well as government departments across the world, are using the platform.

In line with its global goals, Speechmatics is headquartered in the UK but has offices in Boston (U.S.), Chennai (India), and Brno (the Czech Republic). The company will use the investment to support global expansion across the United States and Asia-Pacific.

Editorial standards