Turning 2-D photos into 3-D models

When you take a picture, you know that the person or the landscape in front of you has three dimensions. But when you watch the result, it's definitively a flat picture. Extracting 3-D information from photos is still in its infancy. Now, Stanford University computer scientists have developed the Make3d algorithm which can take any 2-D image as input and create a 3-D 'fly around' model of its content, giving viewers access to the scene's depth and a range of points of view. You can test this software for free by uploading your photos to the website the researchers have built. But read more...

When you take a picture, you know that the person or the landscape in front of you has three dimensions. But when you watch the result, it's definitively a flat picture. Extracting 3-D information from photos is still in its infancy. Now, Stanford University computer scientists have developed the Make3d algorithm which can take any 2-D image as input and create a 3-D 'fly around' model of its content, giving viewers access to the scene's depth and a range of points of view. You can test this software for free by uploading your photos to the website the researchers have built. But read more...

An output of the Make3d software

You can see on the left two pictures. The one of the top is the original single still image while the one at the bottom is a 3-D reconstruction. (Credit: Ashutosh Saxena, Stanford University) The software used to build the 3-D model was developed by Ashutosh Saxena, a PhD candidate working under the supervision of Andrew Ng, an assistant professor of computer science.

But how can you reconstruct a 3-D model from a single 2-D picture? "In the past, some researchers have synthesized 3-D models by analyzing multiple images of a scene. Others, including Ng and Saxena in 2005, have developed algorithms that infer depth from single images by combining assumptions about what must be ground or sky with simple cues such as vertical lines in the image that represent walls or trees. But Make3d creates accurate and smooth models about twice as often as competing approaches, Ng said, by abandoning limiting assumptions in favor of a new, deeper analysis of each image and the powerful artificial intelligence technique 'machine learning'."

And how a computer can 'learn' anything from a flat photo? "The algorithm breaks the image up into tiny planes called 'superpixels,' which are within the image and have very uniform color, brightness and other attributes. By looking at a superpixel in concert with its neighbors, analyzing changes such as gradations of texture, the algorithm makes a judgment about how far it is from the viewer and what its orientation in space is. Unlike some previous algorithms, the Stanford one can account for planes at any angle, not just horizontal or vertical. This allows it to create models for scenes that have planes at many orientations, such as the curved branches of trees or the slopes of mountains."

If you want to test the software, you can visit the Make3d website and upload your pictures -- or give a Flickr address. You'll receive an e-mail when the model has been rendered. Here is an example of a fly-around movie from a still photo (11 seconds, 12.6 MB).

For more information, you should read an article written by Ng, Saxena and another student, Min Sun, which won the best paper award at the 3-D recognition and reconstruction workshop at the International Conference on Computer Vision in Rio de Janeiro in October 2007. You can read tha abstract on Saxena's page Single Image 3D Reconstruction, from which the above pictures have been extracted.

Here is the introduction of this paper. "We consider the problem of estimating detailed 3-d structure from a single still image of an unstructured environment. Our goal is to create 3-d models which are both quantitatively accurate as well as visually pleasing. For each small homogeneous patch in the image, we use a Markov Random Field (MRF) to infer a set of "plane parameters" that capture both the 3-d location and 3-d orientation of the patch. The MRF, trained via supervised learning, models both image depth cues as well as the relationships between different parts of the image. Other than assuming that the environment is made up of a number of small planes, our model makes no explicit assumptions about the structure of the scene; this enables the algorithm to capture much more detailed 3-d structure than does prior art."

Here is a link to this paper (PDF format, 8 pages, 5.61 MB). Jump to page 7 to discover how this algorithm is better than previous ones. Finally, here is a link to a gallery of 2,136 pictures processed by the Make3d algorithm.

Sources: David Orenstein, Stanford Report, January 23, 2008; and various websites

You'll find related stories by following the links below.

Newsletters

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
See All
See All