Mozilla releases dataset and model to lower voice-recognition barriers

The browser maker has collected nearly 500 hours of speech to help voice-recognition projects get off the ground.
Written by Chris Duckett, Contributor
(Image: Getty Images/iStockphoto)

Mozilla has released its Common Voice collection, which contains almost 400,000 recordings from 20,000 people, and is claimed to be the second-largest voice dataset publicly available.

The voice samples in the collection were obtained from Mozilla's Common Voice project, which allowed users via an iOS app or website to donate their utterances. It is hoped that creating a large public dataset will allow for better voice-enabled applications.

"One reason so few services are commercially available is a lack of data," Mozilla senior vice president of emerging technologies Sean White said in a blog post.

"Startups, researchers, or anyone else who wants to build voice-enabled technologies need high-quality, transcribed voice data on which to train machine-learning algorithms. Right now, they can only access fairly limited data sets."

At the moment, the collection is focused on English, but there are plans to extend it to other languages in the first half of 2018.

Alongside its dataset, Mozilla also released its open-source Project DeepSpeech voice-recognition model based on work done by Chinese internet giant Baidu. It is claimed that with its 6.5 percent error rate on the LibriSpeech dataset, DeepSpeech is approaching human levels of recognition.

In August, Microsoft said it had reached a voice-recognition error rate of 5.1 percent on the Switchboard corpus, the same level as professional human transcribers.

Despite the new milestone, Microsoft acknowledges that machines still find it tough to recognise different accents and speaking styles, and don't perform well in noisy conditions.

Earlier in the year, Google said it had a 4.9 percent error rate in its speech-recognition software.

Samsung has said it is looking to use voice recognition throughout its home appliance line-up by 2020, and recently partnered with Kakao to cooperate on AI and voice recognition.

Related Coverage

Google bets on AI-first as computer vision, voice recognition, machine learning improve

At Google I/O, CEO Sundar Pichai said that all of the company and its products are being revamped to be AI-first. The shift may be bigger than mobile computing.

Google Home now supports multiple accounts with voice recognition

Up to six people can access their accounts on the smart speaker, simply by speaking to it.

Cisco debuts the first voice-activated assistant built just for meetings

While virtual assistants are becoming increasingly common, Cisco argues its Spark Assistant will be the first one that's truly useful in the enterprise space.

Microsoft hits new record for AI speech recognition (TechRepublic)

Microsoft recently achieved a 5.1 percent word error rate for its speech recognition technology, matching human professionals and setting an industry milestone.

How to use speech recognition to improve productivity on your smartphone (TechRepublic)

Typing and swiping on a touch screen is the slow way to enter text on a phone. Instead, use speech dictation. It's more accurate and faster than ever before.

Editorial standards