The time is now for that Universal Translation Device that came in so handy on Star Trek, DARPA, the Defense Advanced Research Projects Agency, has decided. And so it is ready to spend gobs of money to achieve that holy grail - a machine that translates important non-Western languages like Arabic and Mandarin into English, and vice versa, say Ars Technica.
When bid solicitations went out last year, they told interested parties that DARPA wanted three separate modules built. The first handles the transcription of spoken languages into text. The second is a translation module that can convert foreign text into English, and the third is a "distillation" engine that can answer questions and summarize information provided by the other two modules. While this technology would certainly be put to use by military personnel in the field, it is really designed for deployment in the US, where analysts are easily overwhelmed by the electronic information gathered by the intelligence community.
So much information in these languages flows through the intelligence networks that only a small percentage can be translated. The GALE program (Global Autonomous Language Expolitation) would automate the translation, thus dramatically increasing the amount of communication available for analysis.
If GALE is a success, the US government would have access to transcriptions of foreign broadcast news, talk shows, newspaper articles, blogs, e-mails, and telephone conversations. Even with the translation work done, though, this information would be overwhelming, which is why the distillation engine is such an important component of the product.
There are three major teams working on the project - IBM, SRI and BBN. We looked at IBM's efforts last week. AP profiled BBN's efforts on Monday.
"Arabic has this property: 'He gave it to her' would be one word. Little pieces in the one word capture lots of meaning," said Salim Roukos, IBM's GALE chief. Meanwhile, tense and gender are absent in Chinese.
To wring improvements from their translation software, the GALE teams fed their computers huge pools of sample broadcasts and texts in Arabic and Chinese. As the machines were exposed to more and more foreign sentences, they analyzed the content and structure, compiling an ever-deeper library of how words are spoken and the rules governing the languages.
Or so the researchers hoped. The name of the game is to fine-tune the computer process, known as an algorithm, that does the language analysis. Programming missteps can cause a computer to gain minimal insight from the new language data it is fed. It could even get worse at its translation task.
"It's sort of trial and error guided by intuitions and some knowledge," BBN's Schwartz said.