Interpreting search

The newscaster at Qatar's Aljazeera network read off the day's headlines in Arabic in a Web video clip. A few moments later, a transcript was available in English.
Written by Michael Kanellos, Contributor
The newscaster at Qatar's Aljazeera network read off the day's headlines in Arabic in a Web video clip. A few moments later, a transcript was available in English.

The demonstration at PC Forum was a reminder of how the boundaries of what can be done with online information are being stretched. Commercial search services such as Yahoo and Google have revolutionized the ability to find vast amounts of facts on disparate topics in a short period of time. Now, start-ups and established companies are devising products that they say will extend how people work with information found on the Web or inside company databases.

University of Southern California spinoff Language Weaver, for instance, has come up with technology that performs functional translations of Internet articles or video clips on the fly. As it demonstrated with the Aljazeera clip, people can submit a Web page in French, Arabic, Chinese, Hindi or the ever-popular Somali, and a functional English version pops out in about a minute.

"In a couple of years, we will be at the level where people will not be able to distinguish between a first draft of a machine translation and a first draft of a human (translation)," Bryce Benjamin, chief executive of Language Weaver, said this week at PC Forum in Scottsdale, Ariz. (CNET Networks, publisher of News.com, bought conference sponsor EDventures Holdings last week.)

At the same time, Cambridge, Mass.-based MetaCarta has come up with software designed to enable intelligence agencies, oil exploration teams and marketing execs to search for documents in their own data files and then plot them geographically.

Say a car manufacturer wanted to figure out where to launch a new sports utility vehicle. A search on data tagged by MetaCarta's software would pull up documents relating to previous buyers and overlay them on a map of the United States so that the manufacturer could determine whether to launch a new car in Minnesota or Texas. A U.S. intelligence agency search on Mohammed Atta, the so-called 19th hijacker of Sept. 11, sketched out a paper trail of Atta's whereabouts in Germany before the attack, according to MetaCarta.

This kind of searching isn't easy, according to John Frank, MetaCarta's president. There are 44 cities and towns called Paris and 69 called Al-Hamra around the world. Most places on the globe also have more than one name, which further complicates searches, he added. Filtering out irrelevant results remains a huge task.

What's worse is that, in many documents, locations aren't described with much specificity. Instead, a text might say "land mines 22 miles north of Um" or "Indian Plate." The MetaCarta software essentially has to translate these statements into navigational coordinates to achieve the right results.

"A lot of these places aren't even on the map," Frank said. The difficulty also explains why the company's software currently sells in the six-figure range.

The MetaCarta and Language Weaver efforts essentially address the central paradox of search: The more you know, the less you know. The amount of information out there and the ways people want to use it are so wide-ranging that there are plenty of technology opportunities.

"It is much larger than I ever thought," Google CEO Eric Schmidt said. "There is no single platform strategy that will win out at the expense of others."

For its part, Google plans to integrate its Orkut social network to its main search services so that experts can provide answers to questions that can't readily be answered by standard searches. Intel and others, meanwhile, are fostering research projects that will allow people to conduct searches using images or audio clips instead of keywords.

Like established technologies, both Language Weaver and MetaCarta rely on probability to generate results. In a Spanish-to-English translation, Language Weaver first converts a Spanish phrase ("Que hambre tengo!") into a likely equivalent in broken English ("Have I that hunger!") by comparing how the sequences of words have been used in Spanish and English documents in a database. It then does another probability analysis to convert that sentence into standard usage ("I am so hungry!").

Language Weaver's database of documents for performing European or Chinese translations based on probability is fairly extensive. For Somali, the sources are more limited--the company had to use the Bible.

Similarly, MetaCarta's geographic search software will discard irrelevant results. For example, it excludes results in which the word "London" is followed by "Broil" or preceded by "Julie," if the search is about the course of the River Thames.

v So far, the technology has been used mostly by the U.S. government. Both Language Weaver and MetaCarta, in fact, received venture funding from In-Q-Tel, the venture fund the CIA established, the companies have said.

The potential for commercial use of their software is huge, however, and both companies have started to ship their products to nongovernment customers. Only 8 percent of the world's population speaks English as a primary language, but about 80 percent of Web sites are in English, according to Language Weaver's Benjamin. The company's technology is pricey now--a translation engine for French and English costs about $25,000--but the price will decline, as the translation database expands.

"We're not going and trying to reach out to people in their native languages and cultures," Benjamin said.

Editorial standards