AI can now describe pictures: A look at the potential business impact

Researchers have combined image and natural language processing so computer vision can see a picture and then write a caption for it. Here's a stab at how this artificial intelligence advance could be applied to business.

Computer vision can now be combined with natural language processing to see an image and then describe it in text, according to Google and Stanford researchers in separate reports.

That task, which is second nature for humans, has been elusive for artificial intelligence and image recognition software. With Google and Stanford making progress, the effort could have implications across a bevy of functions. Both research teams, which were highlighted in a New York Times article, said they used neural networks to enable the description of images. Here's a look at Stanford's efforts:


And Google's research results:

goog image description

Although it's way early to start pondering where the latest computer vision research winds up, there are some areas where an impact could arrive sooner rather than later.


  • Big data: Video is part of the unstructured data equation, but could become more structured if computer vision can describe what's happening in plain English. Video would be easier to log.
  • Security: It's not a huge leap to realize that computer vision can go from recognizing things (pizza, tables and chairs) to identifying people. For instance, a company would know who entered the building at 1 a.m. beyond the usual folks.
  • Robotics: With computer vision and better natural language processing robots would be able to identify items and better know how to interact with them.


  • Media: Today you have shows on demand. What if computer image recognition could get you every car chase on a TV network in the last five years? How about every touchdown by Calvin Johnson of the Detroit Lions?
  • Healthcare: Computer image recognition could be extended to X-ray and MRI analysis.
  • Retail: If cameras with computer vision can describe behavior, they could in theory outline what shoppers are doing in real time. That monitoring could enable retailers to change floor displays on the fly.

The downside to computers being able to make descriptions is that the technology could up the ante on surveillance. Two pizzas sitting on a stove is one thing. Tracking workers and citizens is another item altogether.