Using computers to analyze huge quantities of unstructured data is notoriously difficult so this analysis is still often done by humans. But now the EU-funded Parmenides project has developed a new software to look through unstructured data which has been used for several real-world applications. For example, the Greek Ministry of Defense (MoD) has used the system to identify terrorist activity by automatically compiling a combination of its own files and newspaper reports. And Unilever has used the same tools to analyze journal articles, newspaper reports and even anecdotal data to build up a picture of the relationship between weight, health and food by analyzing journal articles and newspaper reports. What is even more interesting is that the system can monitor changes over time to identify new trends.
Here is the introduction of the IST Results article.
A new software system that enables faster and more comprehensive analysis of vast quantities of information is so effective that it not only creates order out of chaos and allows computers to perform tasks that before only people could perform, it is also creating new information from old data.
And here are some quotes from the coordinator of the Parmenides project.
"Our greatest contribution was to create a framework for integrating structured and unstructured information," says Dr Babis Theodoulidis, [from] the University of Manchester. Currently, the vast majority of information is unstructured text, like reports, newspaper articles, letters, memos, essentially any information that is not part of a database. "Analysing text requires human intervention and, when you are trying to analyse perhaps thousands of documents in many different languages, really large scale text analyses becomes very expensive, or even impossible," says Theodoulidis.
Here are some details about the use of the system done by the Greek Ministry of Defence (MoD).
The Greek MoD used the PARMENIDES system to analyse large quantities of unstructured data, like newspaper reports about terrorist attacks, and then combine that with military intelligence. This type of analysis could reveal that one group is changing its methods from car bombs to suicide bombs or chemical attacks. Or that one group is beginning to work with another.
"We got our greatest result with the MoD. Before PARMENIDES, they analysed all their unstructured data manually, essentially people reading articles. Now that's almost entirely automatic," says Theodoulidis.
Below is a picture showing an example of a cluster analysis, where the software found the following frequent terms: ransom, kidnappers, sayyaf, abu, hostages. "This indicates a new instance of a concept consisting of two terms: 'abu' and 'sayyaf'" (Credit: Parmenides Project Consortium).
For more information, you should read this description of the MoD Case Study in terms of information extraction and data mining (Microsoft Word format, 46 pages, 975 KB). The above screenshot can be found on page 45 along with more explanations. Here is why the Greek MoD was interested in the Parmenides technology.
After Greece secured the bid to host the Olympic Games to be held in 2004, the MoD had to step up its efforts against terrorism, not only in terms of the hard aspects, but also the intelligence aspects. It was expected that the Parmenides project could hopefully give the Greek MoD a helping hand in its intelligence efforts by trying to extract and compile information about terrorist groups and their activities. Events such as kidnappings, shooting, bombing and hijackings would be identified and detailed in order to facilitate those intelligence efforts already being undertaken.
You also should take a look at the architecture of the project which you'll find in this overview and which "consists of a number of components that work together in order to enhance the users' ability to extract knowledge, mostly automatically, from web-based and also from conventional sources."
Finally, one of the most interesting features of this system is its adaptation to new situations.
Parmenides' framework does not just provide a snapshot analysis, it can analyse data over time, too, enabling the system to spot new trends or developments that would remain hidden otherwise. Healthcare consultant BioVista, for example, combined recruitment and business information to track the shifting research priorities in biotech companies over time.
Now, the computer scientists involved in this project have an even more ambitious goal: put these tools in the hands of Internet users like you and me to improve our search experience. But no time frame is mentioned for such an availability.
Sources: IST Results, March 1, 2006; and various web sites
You'll find related stories by following the links below.