Learning the lingo: Here's how Google Translate copes with even the rarest languages

Handling a billion translations a day, Google Translate has refined its methods so that its services extend to even the most obscure languages.

The flag of Friesland, Netherlands, where community input is helping Google Translate keep alive the local langauge. Image: iStock

Google Translate usually gathers its linguistic intelligence automatically from across the internet, where the world's most dominant languages have the most representation.

But to master translation involving dialects and relatively less widely used languages, Google needs input from users and native speakers. Without this community input, Google Translate won't be able to accommodate lesser-used languages.

As part of that process, late last month residents of Friesland, a northern province of the Netherlands, carried out an effort to improve Google Translate's ability to handle the local language, West Frisian.

The Friese community contributed over 200,000 translations through Google's Translate Community tool.

A 2007 book, published by the province of Friesland, reported that in 2005, 74 percent of Friesland's residents can speak West Frisian, a hybrid between English and Dutch. Furthermore, 75 percent of Friese people can read it, and 27 percent can write it.

Yet increasingly more of Friesland's young population has been moving to more economically attractive parts of the Netherlands where Frisian is not spoken.

Read this

​Google adds Word Lens technology to Translate app

Google has added Word Lens, a technology acquired last year, to its Google Translate app. The addition brings Google closer to its universal translator ambitions.

Read More

"Quality translations help bring cultures and languages online, preserving them for their own people through the web, and promoting them to the world. But our algorithms can only go so far. Since translations are generated by machines, they won't always be perfect," Google communications manager Meghan Casserly said.

The Alliance for Linguistic Diversity estimates there are about 7,000 spoken and signed languages in the world, of which 40 percent are at risk of becoming extinct. But while foreign languages are compulsory subjects in most secondary schools and some primary schools, the internet may be the only tool capable of preserving and educating the public about under-represented languages.

Without sufficient educative tools, these languages are at risk of vanishing. Google says about 100 other communities have contributed bulk translations to Google Translate, which have added more than 10 million words to the tool.

In addition, 500 million people use Google Translate every month and perform one billion translations every day.

The Alliance for Linguistic Diversity maintains a list of 3,227 endangered languages on a dedicated website, assigning each to a category: 'At risk' at the lowest vulnerability level to 'Severely endangered' at the highest. Some languages are listed as endangered but not assigned a vulnerability level. Frisian is categorized as 'Endangered' and 'Vulnerable', at the middle of the spectrum.

Google led the launch of the Alliance for Linguistic Diversity's Endangered Languages project and continues to provide technical resources to it. Currently, the Institute for Language Information and Technology at Eastern Michigan University and the First People's Cultural Council manage the project, which aims to document, preserve, and revive endangered languages around the world.

Connected networks and collaborative information gathering and sharing efforts such as these may keep fading languages and the cultural identities that depend on them thriving.

"With the help of the Frisian people, as more Frisian is added to Google Translate, we hope to be able to translate Friese passages - including [those found] on websites and even street signs - into dozens of other languages for people from around the world to understand and appreciate," Casserly wrote on the Google Policy Blog.

Read more about translation tech