Accenture's "Pocket Supercomputer" is in fact a phone behaving like a thin client. It can be used to send images and video of objects in real time to a server where they can be identified and linked to relevant information, which can then be sent back to the user.
The camera on the phone is used to take a video of an object — such as a book. According to the Accenture, the server software is smart enough to recognise the cover of the book — it's not yet able to read text — and can then, for example, return the price and history of the book, and details of where it can be bought.
By offloading the processing from a mobile device onto a server, there are few limits on the size and processing power available to be used for the storage and search of images.
"It started out as a robotics project," said Accenture's Fredrik Linaker who has lead the research on the project. "We added one, then two laptops to the robot. It became too heavy so we ripped the brain out of the body and put it in a different place, with a wireless link to the body."
The next step was accessing the central "brain" using a mobile device.
For the demonstration in Accenture's labs in Nice on Tuesday, Linaker used a laptop running Windows XP.
The video search can be set up to access any type of image that is in the database. For example, the technology can recognise an image of a painting; Linaker demonstrated using an image of Vermeer's Girl with a Pearl Earring. When the camera on the phone took video images of the painting, a search of the database returned results giving information about the painting, and linked the phone to the recent film of the same title.
Foodstuffs can be identified by their packets, even if the name of the foodstuff is written in non-Latin characters, such as the Chinese pack of soup seasoning pictured above.
Businesses can use the application for inventory purposes, or to train staff to recognise different electrical components, says Accenture.
A "three-dimensional" image of an object can also be uploaded onto the phone, to look at the virtual object from different angles. The motion-tracking technology Accenture uses for this is a free library of algorithms called Open Computer Vision originally developed by Intel. This could be used to train employees about certain pieces of stock in a warehouse, for example.
Foreign languages and characters can be translated into the user's language, so a user can find out what an object is. Search results can be personalised, so the user can be alerted if a foodstuff contains a certain allergen, for example.
The phone takes a video of the object at 10 frames per second, and the images are sent to a database in real-time using "video calling", a low-latency communications medium.
The database that is used for video search can be built automatically; Accenture has written spiders to crawl the web and download images on a specific theme such as Asian food.
Linaker, pictured above, explained that to pinpoint the features necessary to identify an object, the image is run through an algorithm called Scale-Invariant Feature Transform, or SIFT, a technology developed by academic David Lowe.
The software extracts feature points from a jpeg, according to Linaker, and makes a match against images in the database at the full frame rate of the camera, which is 10 frames per second. If a match exists then the software on the server retrieves information and sends it back to the user's phone.
The advantage with video is that if you have a "bad angle" you have another image for comparison supplied "a few milliseconds later", according to Linaker.
The researchers are currently in the process of adding features to the Pocket Supercomputer. Linaker said he would like to add optical character recognition, because once the system can recognise text, it can be linked to the internet.
Character searches online can be combined with video searches of sites such as YouTube, Linaker added.
Accenture is also studying how to add speech-recognition software to the system. This would enable the user to speak a word or phrase into the phone in one language, and have the phone "speak" back that word or sentence in a different language. Video and audio can also be synchronised, so a user could speak the name of a subject or object into the phone, and receive a video containing relevant information back.
The researchers also plan to study how a user could do a video search that returns images of objects that have similar features, rather than the current search, which returns images of objects with exactly the same features. For example, a user could video a certain type of CPU, and in turn receive information not only about that particular CPU, but also CPUs that have a similar specifications
Researchers hope to have a product ready for the market within three years.