When you search for images on the Web, you use a search engine which relies on the text associated with the pictures -- and not on the images themselves. So the results are sometimes unsatisfactory. But this soon might change because engineers from UC San Diego (UCSD) have developed new algorithms to improve automated image labeling. Their approach could be integrated into the next-generation of image search engines. It's interesting to note that one of the researchers spent six months at Google using a cluster of 3,000 state-of-the-art Linux machines to refine the algorithms, based on what the team calls Supervised Multiclass Labeling (SML). The results obtained by this supervised trained system are pretty good, so it would not be surprising to see Google integrating this method soon.
The figure below illustrates the retrieval results obtained with one word queries for some visual concepts. In this case, the Corel database of images has been queried for "blooms," "mountain," "pool," "smoke," and "woman." As the diversity of the returned images can attest, it seems that this SML system has good generalization ability (Credit: UCSD).
To obtain these results, the SML system was first trained by using a Corel image set, containing 60,000 images with 442 annotations. "The image set was split into 600 image categories consisting of 100 images each, which were then annotated with a general description that reflected the image category as a whole. For performance evaluation, 40 percent of the images were reserved for training (23,878 images), and the remainder (35,817 images) were used for testing."
So how this Supervised Multiclass Labeling (SML) system really works?
To understand SML, you need to start with the training process, which involves showing the system many different pictures of the same visual concept or “class,” such as a mountain. When training the system to recognize mountains, the location of the mountains within the photos does not need to be specified. This makes it relatively easy to collect the training examples. After exposure to enough different pictures that include mountains, the system can identify images in which there is a high probability that mountains are present. During training, the system splits each image into 8-by-8 pixel squares and extracts some information from them. The information extracted from each of these squares is called a "localized feature." The localized features for an image are collectively known as a "bag of features."
Next, the researchers pool together each “bag of features” for a particular visual concept. This pooled information summarizes – in a computationally efficient way – the important information about each of the individual mountains. Pooling yields a density estimate that retains the critical details of all the different mountains without having to keep track of every 8 by 8 pixel square from each of the mountain training images. After the system is trained, it is ready to annotate pictures it has never encountered. The visual concepts that are most likely to be in a photo are labeled as such. In the tiger photo, the SML system processed the image and concluded that "cat, tiger, plants, leaf and grass" were the most likely items in the photograph.
These algorithms have been developed at UCSD since 2004 by Nuno Vasconcelos, professor of electrical engineering, Gustavo Carneiro, a UCSD postdoctoral researcher now at Siemens Corporate Research, and UCSD doctoral candidate Antoni Chan, who all were working for the Statistical Visual Computing Lab and who were helped by Google researcher Pedro Moreno.
For more information, this research work has been published by the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) under the title "Supervised Learning of Semantic Classes for Image Annotation and Retrieval" (Volume 29, Number 3, Pages 394-410, March 2007). Here are two links to the abstract and to the full paper (PDF format, 17 pages, 3.63 MB). The above illustration has been extracted from this paper, as well as some information about how the system was trained and tested.
Finally, you might want to read another version of the UCSD's Jacobs School of Engineering news release. And keep in mind that this system "can only label images with visual concepts that it has been trained to recognize." Still, it should be a basis for better image searching tools.
Sources: University of California - San Diego news release, via EurekAlert!, March 29, 2007; and various websites
You'll find related stories by following the links below.