X
International

On seeing but especially on being seen

A computer vision system developed at the University of Texas (Austin) can tell the difference between friendly behavior (shaking someone's hand) and aggressive behavior (punching someone's face). If true, the technology could render moot the most vexing question in mass video surveillance: How do we get enough people to sit in the dark watching screens?
Written by Ed Gottsman, Contributor

According to New Scientist, a computer vision system developed at the University of Texas (Austin) can tell the difference between friendly behavior (shaking someone's hand) and aggressive behavior (punching someone's face). If true, the technology could render moot the most vexing question in mass video surveillance: How do we get enough people to sit in the dark watching screens?

So what?

The problem is acute. Let's take London as an example. As of mid-2005 (according to the Wall Street Journal), the number of CCTV cameras in London was roughly 500,000, and the average Londoner could expect to be observed some 300 times each day. The cameras survey street corners, ticket offices, parking lots, shopping centers, tube stations, and trains. (There is a nod to civil liberties: Under the law, you're required to post a warning sign if you have an area under surveillance. But given the growing density of camera coverage, the sheer number of signs will eventually become ridiculous. Maybe they could just give you your own little sign, which you could take out and read occasionally as you walked around.) Surely, you ask uncertainly, This means that London has no crime at all...? Well...no, not entirely.

The problem seems to be two-fold. First, most of the cameras aren't (of course) monitored in real-time--rather, their tapes are consulted only after a crime has been reported. So the world's most effective deterrent--the certainty of instant capture--is not available. Second, the videos aren't usually very good. Sidewalk cameras are often mounted fairly high in the air, which means that a perpetrator wearing a baseball cap (or whatever they call it on that side of the pond) is effectively wearing a mask.

Enter UT Austin's system. If (a big If--this is a hard problem) it could watch all video feeds, detect crimes in progress and automatically dispatch squad cars, it would address the instant capture problem and render the face identification problem moot (after all, the bobbies could look at the criminal's face all they wanted after they caught him). But that's not the only benefit. Its ability to detect certain kinds of characteristic gestures might even (dare I hope?) help eradicate that most pernicious of social ills: the (sorry--I normally keep my prejudices separate from my work, but just this once) street mime.

Editorial standards