Obviously, they can't, but computers can scan through text and deduct human opinions from factual information. This branch of natural-language processing (NLP) is called 'information extraction' and is used for sorting facts and opinions for Homeland Security. Right now, a consortium of three universities is for the U.S. Department of Homeland Security (DHS) which doesn't have enough in-house expertise in NLP. Read more...
A new research program by a Cornell computer scientist, in collaboration with colleagues at the University of Pittsburgh and University of Utah, aims to teach computers to scan through text and sort opinion from fact.
These three researchers are Claire Cardie, Cornell professor of computer science, Janyce Wiebe, associate professor of computer science at the University of Pittsburgh, and Ellen Riloff, associate professor of computer science at the University of Utah.
"In 'information extraction,' computers scan text for words and phrases that identify subjects, objects and specific types of information in order to understand the highly variable ways in which human beings express themselves." (Credits: Claire Cardie (diagram) and Bill Steele (caption) for the Cornell Chronicle Online).
Here is an example of how this technology works.
Computer programmers and science fiction fans know that computers are usually very literal and demand that information be presented according to rigid rules. Humans, on the other hand, are capable of understanding that "Please pass the salt," "May I have the salt," "Hey, is there any salt down there?" and "Yuk, this really needs salt" all mean much the same thing. Cardie's computer programs try to bridge the gap by identifying subjects, objects and other key parts of sentences to determine meaning.
The new research will use machine-learning algorithms to give computers examples of text expressing both fact and opinion and teach them to tell the difference. A simplified example might be to look for phrases like "according to" or "it is believed." Ironically, Cardie said, one of the phrases most likely to indicate opinion is "It is a fact that ..."
For more information about this subject, you can look at the long list of publications by the three researchers mentioned above. But for your reading pleasure, I have selected an article published by Language Resources and Evaluation under the title "Annotating Expressions of Opinions and Emotions in Language" (Volume 39, Numbers 2-3, May, 2005, but published online in February 2006). Here are two links to the abstract and to the full paper (PDF format, 41 pages). Here is an excerpt from the conclusions.
This paper described a detailed annotation scheme that identifies key components and properties of opinions and emotions in language. The scheme pulls together into one linguistic annotation scheme both the concept of private states and the concept of nested sources, and applies the scheme comprehensively to a large corpus, with the goal of annotating expressions in context, below the level of the sentence.
With this program, the DHS also expects to prioritize documents. "We're making sure that any information is tagged with a confidence. If it's low confidence, it's not useful information," Cardie added.
Sources: Cornell University News Service, via EurekAlert!, September 22, 2006; and other websites
You'll find related stories by following the links below.