Mozilla has released its Common Voice collection, which contains almost 400,000 recordings from 20,000 people, and is claimed to be the second-largest voice dataset publicly available.
The voice samples in the collection were obtained from Mozilla's Common Voice project, which allowed users via an iOS app or website to donate their utterances. It is hoped that creating a large public dataset will allow for better voice-enabled applications.
"One reason so few services are commercially available is a lack of data," Mozilla senior vice president of emerging technologies Sean White said in a blog post.
"Startups, researchers, or anyone else who wants to build voice-enabled technologies need high-quality, transcribed voice data on which to train machine-learning algorithms. Right now, they can only access fairly limited data sets."
At the moment, the collection is focused on English, but there are plans to extend it to other languages in the first half of 2018.
Alongside its dataset, Mozilla also released its open-source Project DeepSpeech voice-recognition model based on work done by Chinese internet giant Baidu. It is claimed that with its 6.5 percent error rate on the LibriSpeech dataset, DeepSpeech is approaching human levels of recognition.
In August, Microsoft said it had reached a voice-recognition error rate of 5.1 percent on the Switchboard corpus, the same level as professional human transcribers.
Despite the new milestone, Microsoft acknowledges that machines still find it tough to recognise different accents and speaking styles, and don't perform well in noisy conditions.
Earlier in the year, Google said it had a 4.9 percent error rate in its speech-recognition software.
At Google I/O, CEO Sundar Pichai said that all of the company and its products are being revamped to be AI-first. The shift may be bigger than mobile computing.
Up to six people can access their accounts on the smart speaker, simply by speaking to it.
While virtual assistants are becoming increasingly common, Cisco argues its Spark Assistant will be the first one that's truly useful in the enterprise space.
Microsoft hits new record for AI speech recognition (TechRepublic)
Microsoft recently achieved a 5.1 percent word error rate for its speech recognition technology, matching human professionals and setting an industry milestone.
Typing and swiping on a touch screen is the slow way to enter text on a phone. Instead, use speech dictation. It's more accurate and faster than ever before.