High speed motion capture using a single event camera

Machine vision has been a hot research area for decades, and tremendous progress has been made. Yet the new advances in artificial intelligence, and especially machine learning, has turbocharged the entire field. Here's one cutting edge example.
Written by Robin Harris, Contributor on

Advances in machine vision are making autonomous driving possible, but there are still new worlds to conquer, such as human motion capture without using multiple high speed cameras. Soon, you phone may be all the camera you need for sophisticated, high speed, motion capture and analysis.

EventCap: Monocular 3D Capture of High-Speed Human Motions using an Event Camera, is a case in point. Researchers at the Tsinghua-Berkeley Shenzhen Institute, Max Planck Institute for Informatics, and Hong Kong's Robotics Institute, show how they can capture fast motions at millisecond resolution without the downsides of high frame rate video.

Event camera?

Event cameras capture motion in a scene. If pointed at a scene where nothing changes, they see nothing. But as soon as there are pixel-level light changes, it will show that motion at millisecond resolutions, even in low or very bright light, with no motion blur and low power consumption.

Event cameras are relatively new, but show great promise for applications involving motion in wildly varying light levels, such as autonomous vehicles, or, motion capture.

The trouble with high frame rates

So why not use standard RGB high frame rate cameras, such as Vision Research's Phantom line, which offers 2k video at 6,600 frames per second. While these certainly have a place, they have two major issues.

First, since the capture time is short, subjects have to be very well lit. That's why many of their videos are shot in bright sunlight. Second, they generate massive amounts of data, which is a pain to store and analyze.

How massive? The Phantom 2640, configured with 288GB of internal memory, can only record 2k video for 7.8 seconds at maximum frame rate.

EventCap records at a far slower 1,000 FPS, but the problem remains. Standard RGB cameras generate a lot of data, very quickly. Just ask YouTube.

Motion capture

You've probably seen pictures of actors in marker suits, black, skin tight suits with white markers at each joint. For 3D space though, these systems usually require multiple cameras that must be synchronized and calibrated. At high frame rates, the data collection becomes truly massive in a short time. And that's before the z-axis computation.

The secret sauce

The EventCap technique begins with a pre-processing step that creates a skeletal mesh of the actor. First the system generates sparse motion trajectories between images. Then, in a convolutional neural network powered batch mode, the system optimizes the mesh motion at 1,000 FPS using the captured trajectories. Finally, the skeletal motion is refined based on the moving edges of the actor's body as captured by the event camera.

The net result is that EventCap achieves better spatial resolution than existing systems while requiring less than 5 percent of the bandwidth and storage capacity. 

The Storage Bits take

Event cameras are bio-inspired: frog's eyes only detect motion, the better to catch flies. Now that the sensors are being developed, the applications are just beginning to be explored. Even Intel is supporting interfaces to them for research. 

Yet despite the hoopla of Big Data and exabyte scale storage, problems such as high speed motion capture stress our data storage and data management capabilities. The more data we can store, the more uses we find for it. 

Courteous comments welcome, of course.

Editorial standards