X
Business

Can computers sort data like humans?

According to U.S. researchers, it is possible to train computers to discover trends and order in large datasets like we do since our childhood. The new algorithm, which was developed at the MIT, may impact the field of artificial intelligence. This model can help computers recognize patterns like we do. 'Instead of looking for a particular kind of structure, we came up with a broader algorithm that is able to look for all of these structures and weigh them against each other,' said one of the scientists. Besides helping scientists to analyze large amounts of data, this algorithm could also be used to discover how the human brain finds patterns. But read more...
Written by Roland Piquepaille, Inactive

According to U.S. researchers, it is possible to train computers to discover trends and order in large datasets like we do since our childhood. The new algorithm, which was developed at the MIT, may impact the field of artificial intelligence. This model can help computers recognize patterns like we do. 'Instead of looking for a particular kind of structure, we came up with a broader algorithm that is able to look for all of these structures and weigh them against each other,' said one of the scientists. Besides helping scientists to analyze large amounts of data, this algorithm could also be used to discover how the human brain finds patterns. But read more...

Structures discovered by computers at MIT

You can see on the left two examples of data structures automatically discovered by computers. On the top, you can see structures learned from biological features, while on the bottom are represented structures learned from Euclidean distances between faces represented as pixel vectors. (Credit: Kemp and Tenenbaum).

The computer algorithm was developed at MIT by Charles Kemp, now an assistant professor of psychology at Carnegie Mellon University, along with Joshua Tenenbaum, an associate professor of brain and cognitive sciences at MIT where he's in charge of the Computational Cognitive Science Group.

But how does this model work? "The model considers a range of possible data structures, such as trees, linear orders, rings, dominance hierarchies, clusters, etc. It finds the best-fitting structure of each type for a given data set and then picks the type of structure that best represents the data."

The MIT news release adds that this is what we're doing everyday -- and often unconsciously. "Several scientific milestones have resulted from the human skill of finding patterns in data -- for example, the development of the periodic table of the chemical elements or the organization of biological species into a tree-structured system of classification. Children exhibit this data organization skill at a young age, when they learn that social networks can be organized into cliques, and that words can fit into overlapping categories (for example, dog, mammal, animal)."

This research work has been published by the Proceedings of the National Academy of Sciences (PNAS) under the name "The discovery of structural form" (Volume 105, Issue 31, Pages 10687-10692, August 5, 2008).

Here is the beginning of the abstract. "Algorithms for finding structure in data have become increasingly important both as tools for scientific data analysis and as models of human learning, yet they suffer from a critical limitation. Scientists discover qualitatively new forms of structure in observed data: For instance, Linnaeus recognized the hierarchical organization of biological species, and Mendeleev recognized the periodic structure of the chemical elements. Analogous insights play a pivotal role in cognitive development: Children discover that object category labels can be organized into hierarchies, friendship networks are organized into cliques, and comparative relations (e.g., 'bigger than' or 'better than') respect a transitive order."

Here is an additional quote. "Standard algorithms, however, can only learn structures of a single form that must be specified in advance: For instance, algorithms for hierarchical clustering create tree structures, whereas algorithms for dimensionality-reduction create low-dimensional spaces. Here, we present a computational model that learns structures of many different forms and that discovers which form is best for a given dataset. The model makes probabilistic inferences over a space of graph grammars representing trees, linear orders, multidimensional spaces, rings, dominance hierarchies, cliques, and other forms and successfully discovers the underlying structure of a variety of physical, biological, and social domains. Our approach brings structure learning methods closer to human abilities and may lead to a deeper computational understanding of cognitive development."

This technical paper has been published as a "open access article." You can read it here or there (PDF format, 6 pages, 477 KB). The illustrations above have been extracted from this article.

In the same issue of PNAS, you'll find an article by Keith Holyoak, professor in the Department of Psychology at UCLA and responsible of the Reasoning Lab. His paper, "Induction as model selection," is a commentary about the Kemp and Tenenbaum article.

Here is an excerpt. "All intelligent systems, whether children, scientists, or futuristic robots, require the capacity for induction, broadly defined to encompass all inferential processes that expand knowledge in the face of uncertainty. Any finite set of data is consistent with an infinite number of inductive hypotheses. The apparent accuracy of many everyday inferences therefore suggests that humans have, as the philosopher Charles Peirce put it, 'special aptitudes for guessing right.' How can people, often restricted to sparse and noisy data, achieve some significant degree of success in discerning the underlying regularities in the world? The answer seems to require specifying inductive constraints. The report by Kemp and Tenenbaum in this issue of PNAS represents an important advance in understanding the constraints that guide successful induction across a broad set of domains."

If you want to learn more, here is a link to the full paper (PDF paper, 2 pages, 246 KB).

Sources: Anne Trafton, MIT News Office, August 25, 2008; and various websites

You'll find related stories by following the links below.

Editorial standards