Computer vision: Cheat Sheet

Computer see, computer do...
Written by Natasha Lomas, Contributor on

Computer see, computer do...

Computer vision? Are we talking about computers with eyes? Or do you mean seeing double as a result of staring at a computer screen all day?
You're on the right track with your first guess - computer vision has nothing to do with 'computer vision syndrome' so hold off on buying a new pair of glasses.

What exactly is computer vision then?
Computer vision is a research field working to equip computers with the ability to process and understand visual data, as sighted humans can.

Human brains process the gigabytes of data passing through our eyes every second and translate that data into sight - that is, into discrete objects and entities we can recognise or understand. Similarly, computer vision aims to give computers the ability to understand what they are seeing, and act intelligently on that knowledge.

Imagine a car equipped with a camera which is pointing at the road ahead and with an onboard computer perceiving a 2D array of variously shaded pixels. It's one thing for that computer to 'see' an image of the environment around the car, it's quite another for it to be able to identify where the edges of the road are, or understand that a person has just run out in front of the car.

Such visual-motor processing tasks are trivial for the average sighted human but, looked at from a computer's point of view, they are highly complex problems, requiring sophisticated image processing software that analyses the data to recognise key objects and features.

Doing this processing in real-time, as the human brain and eyes do, or recognising objects from various angles and perspectives or in different lighting conditions adds to the task's complexity - requiring complex algorithms to crunch and catalogue the data.

In short, the problem tackled by computer vision research is translating the flat 2D data a computer 'sees' into a 3D real-world reality that it can recognise and understand.

What do computer vision systems consist of then?
Basically a brain - that is, CPU hardware plus software - and at least one eye - that is, a camera.

A camera is essential for real-time computer vision systems, giving the computer its dynamic 'eye' on the environment, but computer vision systems can also get their visual data from static images such as photographs or medical scans.

What sort of applications make use of computer vision?
Applications for computer vision are near limitless since having the ability to perceive environments and derive data from real-world objects helps with countless tasks. Just as sight is essential for all manner of human tasks, computer vision can be applied to scores of scenarios where a computer is used to extract information from data.

One basic task is object recognition: identifying specific items within an environment. That basic ability has myriad uses but one use case would be as an aid for a visually-impaired person, who could use the system to identify objects that might otherwise feel the same - banknotes, say.

Object recognition also has applications in the security industry - scanning CCTV imagery for suspicious vehicles or behaviour, for example - and in the military for use in unmanned aerial vehicles, giving them the ability to identify targets or even individual people. Facial recognition technology, gesture-based computing and motion-tracking all fall under the computer vision umbrella - as this video of one PhD computer vision project demonstrates.

Robotics are a fertile field for deploying computer vision since robots obviously benefit from being able to see their environment. Driverless cars are an obvious example, with...

...computer vision utilised to give a robot the ability to drive itself along roads and navigate traffic - thanks to its cameras, onboard computer and high speed image-processing software.

Where else can I find computer vision?
A popular use of computer vision tech is in Microsoft's gesture-based Kinect peripheral for the Xbox. Here, the technology enables gamers to move their body to control gameplay, instead of having to hammer away at a physical controller.

Kinect's 'eyes' are quite sophisticated. It contains a standard RGB camera and a depth sensor in the form of an infrared camera which projects an invisible Z-shaped pattern over the room where the gamer is playing in order to gauge depth and help the software identify and map where its human controllers are in 3D space. Kinect's 'brain' is a custom processor running proprietary computer vision software.

In addition, computer vision has had a particular boost in recent years with the rise of powerful cameraphone smartphones - essentially pocket computers handily equipped with both eye (camera) and brain (CPU). Here, the technology can bring new functionality to augmented reality apps such as Layar. Earlier this year Layar launched Layar Vision, a platform enabling its software to recognise real-world objects and then map augmented reality content on top of them, thanks to computer vision.

That all sounds pretty cool and yet, well, I can't help thinking of Cylons. Or the Terminator. Isn't there something inherently scary about machines that can recognise who we are without us telling them?
There are certainly a lot of privacy and even ethical considerations arising from computers automatically knowing where we are and even who we are. Add facial recognition to an unmanned drone, for instance, and you've created an autonomous assassin. Or, in the case of game console in someone's living room, an advertiser's dream.

Last year the The Wall Street Journal's Digit blog reported Dennis Durkin, chief operating officer and chief financial officer for Microsoft's Xbox video game business, telling investors the company saw future "business opportunities" in targeting ads using Kinect's living room view into people's homes.

"We can cater which content we present to you based on who you are," he is reported as saying. "How many people are in the room when an ad is shown? How many people are in the room when a game is being played? When you add this sort of device to a living room, there's a bunch of business opportunities that come with that." However, Microsoft has since issued a statement saying it does not "use any information captured by Kinect for advertising targeting purposes". Still, as the various uses of computer vision technology become more widespread, the issues it raises are likely to be widely debated.

Anything else?
What about a video search engine that doesn't have to rely on tags to catalogue contents but is able to scan each video frame? US-based start-up VideoSurf uses computer vision algorithms to do just that. The company was acquired by Microsoft last month - with plans to integrate its technology into Redmond's Xbox 360 ecosystem to improve content discovery.

Computer vision has also been used to digitise museum collections, and as an aid to design processes. Sculptor Antony Gormley has used computer vision techniques to digitise photographs of casts of his body to create 3D models which are then scaled up into metres-high geometric giants.

As professor Roberto Cipolla, MD of Toshiba's Cambridge Research Lab and professor of information engineering at Cambridge University, told silicon.com last year: "This is going to be the decade of computer vision."

Editorial standards