These days, the Defense Advanced Research Projects Agency (DARPA) is making the headlines for its Grand Challenge robotic vehicle race through the Nevada desert (check the official site for the latest results -- and congratulations to all the competitors). But as usual, DARPA is also busy funding many other projects. For example, it just awarded a $500,000 grant to two professors of the University of Texas at Arlington (UTA) for their research into data mining and artificial intelligence. Their project, SUBDUE, is a computer program that can find patterns in data represented as graphs. But what's more important is that this program can learn from its own discoveries. Potential applications for such a program include counter-terrorism, but also bioinformatics, web structure mining, social network analysis or seismic events analysis.
Here are some short quotes from an article from the Shorthorn, the student newspaper of UTA.
The agency has sent Diane Cook and Larry Holder of UTA’s Computer Science Department a grant for their efforts and research into data mining and artificial intelligence.
The project is called SUBDUE, a computer program that finds interesting patterns in data represented as a graph. It can examine large chunks of data easily and examine data points and relationships between them.
The Shorthorn adds that the two professors have been working on this program for 10 years, but doesn't say they're also a married couple with two kids.
But let's go back to Subdue and how it can learn from its own discoveries.
The program isn’t just some number-crunching program. It has the ability to learn from its information. This isn't learning as nonresearchers would learn. SUBDUE takes its output and makes a graph about the positive phenomenon. Then it takes that information and compares it with more graphs. This gives the researchers a chance to compound the information and develop a "predictive model to identify emerging criminal networks."
Below is a diagram showing how Subdue discovers common substructures within relational data by evaluating their ability to compress the graph using the Minimum Description Length (MDL) principle (Credit: UTA).
For more information about the SUBDUE project, please visit the project home page and start by reading the introduction.
SUBDUE is a graph-based knowledge discovery system that finds structural, relational patterns in data representing entities and relationships. SUBDUE represents data using a labeled, directed graph in which entities are represented by labeled vertices or subgraphs, and relationships are represented by labeled edges between the entities. SUBDUE uses the minimum description length (MDL) principle to identify patterns that minimize the number of bits needed to describe the input graph after being compressed by the pattern. SUBDUE can perform several learning tasks, including unsupervised learning, supervised learning, clustering and graph grammar learning.
You'll also find there links to many presentations and publications.
I've selected two articles published in 2005 for your reading pleasure: "Iterative Structure Discovery in Graph-Based Data," published by International Journal of Artificial Intelligence Techniques (Link, PDF format, 24 pages, 363 KB) (the diagram above comes from this paper) and "Graph-based Relational Learning with Application to Security," published by Fundamenta Informaticae (Link, PDF format, 16 pages, 149 KB)
Sources: Cole Dowden, The Shorthorn, Student Newspaper of the University of Texas at Arlington, October 4, 2005; and various web sites
You'll find related stories by following the links below.