This is the goal of many computer scientists around the world and I've already mentioned several research efforts about this (check here or there). Now, several projects under way at the MIT are leading to improved visual search done by computers. The MIT researchers have developed a new way to train computers to recognize people or objects in still images and in videos with great accuracy. For example, one of their systems "can detect people and cars in a street scene about 95 to 98 percent of the time." This research could soon be used in surveillance cameras, but also to train computers to perform preliminary medical diagnoses. Read more...
As you can guess, training a computer to make a distinction between different objects is quite difficult.
That challenge is being tackled by researchers at MIT's Center for Biological & Computational Learning (CBCL), led by Tomaso Poggio [...] Some students at the center are proposing software that could work, say, with surveillance cameras in an office building or military base, eliminating the need for a human to watch monitors or review videotapes. Other applications might automate computer editing of home movies, or sort and retrieve photos from a vast database of images.
But the work to make such exciting applications possible is daunting. "The fact that it seems so easy to do for a human is part of our greatest illusion," says Stanley Bileschi [...] Processing visual data is computationally complex, he says, noting that people use about 40 percent of their brains just on that task. There are many variables to take into account when identifying an object: color, lighting, spatial orientation, distance, and texture.
As an example, below is a diagram showing the major components of a face detection system (Credit: Stanley Bileschi).
Instead of using statistical learning systems to teach computers to recognize objects, the CBCL researchers used another approach: they looked at how our neurons are acting.
The programmers make a mathematical model of those patterns, tracking which neurons fire (and how strongly) and which don't. They tell the computer to reproduce the right pattern when it sees a particular pixel, and then they train the system with positive and negative examples of objects. This is a tree, and this is not.
But instead of learning about the objects themselves, the computer learns the neuron stimulation pattern for each type of object. (Essentially, it's learning patterns of patterns: the patterns of neural reactions not just to pixels but to groupings of pixels.) Later, when it sees a new image of a tree, it will see how closely the resulting neuron pattern matches the ones produced by other tree images. Poggio says this is similar to the way a baby's brain gets imprinted with visual information and learns about the world around it.
For more information about these projects, you can read a previous MIT news release from November 2005, "Neuroscientists break code on sight" or a long Bileschi's paper, "Advances in Component-based Face Detection" (PDF format, 53 pages, 689 KB), from which the above diagram has been extracted.
It will take some time before real products come from this lab, but as says Poggio, "evolution has spent four billion years developing vision."
Sources: Neil Savage, Technology Review, May 25, 2006; and various web sites
You'll find related stories by following the links below.