Oh homonyms. Automated translation services can provide the gist of a passage of text, but computers just can’t deal with words that are spelled the same but have different meanings.
Machines learn to translate by searching for correlations in texts that have been translated by humans. Now it's time to put humans back in the loop. New Scientist reports.
Launched last month, Kamusi is a collaborative, multilingual dictionary project that could -- given a few million dollars in funding -- contain all the world's languages. Unlike other online dictionaries, this one is built around concepts as well as words.
Take the word "spring." It’s linked to several concepts: the season that comes after winter, a sudden upwards or forward motion. When asked to translate "spring in her step" into French, Google chooses "printemps" (the season). Kamusi avoids this problem by recognizing that "spring" is associated with multiple concepts and prompting the user to say which is relevant.
An algorithmic approach like Google Translate is cheap and fast once it's up and running. Kamusi, on the other hand, relies on bilingual speakers to add words. By comparison, humans are slow and expensive, says Kamusi creator Martin Benjamin.
The demo version now contains 100 words from 15 languages, including Swahili and Mandarin. But adding up wages and other expenses, it’ll cost around $5 to add each new concept -- 10,000 concepts in 100 languages would require $5 million.
So far, the project has relied on volunteers and a grant from the U.S. National Endowment for the Humanities. Benjamin thinks that speakers of minority languages will be motivated to add terms for free. He’s also betting on some top-down support: companies that do business in Africa, for instance, might be motivated to pay for large numbers of local words to be added.
[Via New Scientist]
This post was originally published on Smartplanet.com