"Language is fundamental to how people communicate and make sense of the world," said Jeff Dean, Google senior fellow. "But more than 7,000 languages are spoken around the world, and only a few are well represented online today."
Since the undertaking is extremely ambitious, the project will likely take many years to see fruition. However, Google is already working on reaching its goal.
The tech giant developed a Universal Speech Model (USM) that is trained on over 400 languages, providing the most coverage in a speech model to date, according to a blog post. Google is also partnering with communities around the world to source speech data.
Google's attention to expanding its language capabilities is nothing new. Recently, the tech company added 24 more languages to its Google Translate platform and enabled voice typing for nine more African languages on Gboard.
Google is also working with local governments, NGOs and academic institutions in South Asia to collect audio samples of different dialects throughout the region.
Other major tech companies are also building large language models. In July, Meta announced an AI model called No Language Left Behind, which can translate across 200 languages.
Meta's efforts were also undertaken with the intention of bringing content to communities that are otherwise not represented on the web. Meta's AI model includes translations for 55 African languages – a significant advancement, since fewer than 25 African languages are supported by widely used translation tools.