Devices and tools activated through speaking will soon be the primary way people interact with technology, yet none of the main voice assistants, including Amazon's Alexa, Apple's Siri, and Google Assistant, support a single native African language.
Mozilla has sought to address this problem through the Common Voice project, which is now working to expand voice technology to the 100 million people who speak Kiswahili across Kenya, Uganda, Tanzania, Rwanda, Burundi, and South Sudan.
The open-source project makes it easy for anyone to donate their voice to a publicly available database that can then be used to train voice-enabled devices, and over the past two years, more than 840 Rwandans have donated over 1,700 hours of voice data in Kinyarwanda, a language with over 12 million speakers.
That voice data is now being used to help train voice chatbots with speech-to-text and text-to-speech functionality that has important information about COVID-19, according to Chenai Chair, special advisor for Africa Innovation at the Mozilla Foundation.
A handful of major tech companies control the voice data that is currently used to train machine learning algorithms, posing a challenge for companies seeking to develop high-quality speech recognition technologies while also exacerbating the voice recognition divide between English speakers and the rest of the world.
Thanks to the success of the Kinyarwanda project, Mozilla is teaming up with the German Corporation for International Cooperation, the UK's Foreign Commonwealth & Development Office, and the Gates Foundation to expand the project to Kiswahili.
Balthas Seibold, an official with the German Corporation for International Cooperation, said voice-enabled products have "the unique opportunity to better reach millions of people who are traditionally excluded from digital services."
"But this requires the technology to understand the people and vice versa. Most importantly, for a true democratization of the foundations of AI it needs the perspectives of those voices who are not heard yet. Together with our partners on the ground, we want to help increase access to technology, unlock local expertise and innovation, and help drive adoption at scale by the population that would benefit most from support," Seibold said.
The organizations have invested $3.4 million into the effort and Chair explained that Mozilla originally developed the project as a way to level the playing field while also democratizing and diversifying voice technology.
"Amazon's Alexa, Apple's Siri, and Google Home didn't actually support a single native African language. so that's a set of people who've been excluded," Chair said, adding that there has been significant interest in using the technology for agricultural and economic questions.
"One of the barriers to access is around language, as most of the information that's available is probably available in English. People may not have the literacy skills to read this information in English but may be able to understand it in their own language."
Chair noted that Mozilla has been invested in thinking about the internet as a global community because Africa continues to be one of the most underserved continents when it comes to technology.
The new funding will allow the Common Voice team to expand their staff and bring on even more machine learning experts and community liaisons, Chair explained, adding that the money will also go toward addressing issues of bias in the voice samples collected.
"We are designing this model, thinking about our community engagement and taking into account issues around bias for age, gender, and regional accents. We are excited to have new fellows who are part of these communities and speak Kiswahili," Chair said.
"That connection to the community is also going to allow for us to think of the different types of users that we have, knowing fully that not everybody has a smartphone. Not everyone has constant internet access. How do we make sure that what we're building is something that can be used along a spectrum of the diversity of Internet users?"
Since it was launched globally in 2017, Common Voice is now the world's largest multi-language public domain voice data set, with more than 9,000 hours of voice data in 60 different languages, including Welsh, Kabyle, and many others.
Common Voice will now partner with African companies, companies, start-ups, and universities to develop locally suitable, voice-enabled technology solutions that can help underserved communities.
"Language is a powerful part of who we are, and people, not profit-making companies, are the right guardians of how language appears in our digital lives," Chair said.
"By making it easy to donate voice data in Kiswahili, Common Voice will empower East Africans to play a direct role in creating technology that helps rather than harms their communities. We are thrilled to join with partners who share Mozilla's vision for helping more people in more places to access voice technology."