Google's latest free gift? Millions of captioned images to help train AI systems

Google has published two huge datasets that it hopes will accelerate advances in computer vision and video understanding.
Written by Liam Tung, Contributing Writer

Google's Open Images dataset consists of nine million links to web images and descriptions of the objects they contain.

Image: Google

Google's latest gift to the public is a dataset of about nine million links to labeled images to help train computer-vision systems.

The dataset, called Open Images, provides links to images on the web that have been annotated with descriptions of objects within the image. It consists of 'machine-populated' annotations, as well as annotations validated by humans to weed out false positives.

According to Google Research's team, the dataset is large enough to enable researchers to use it to train a deep neural network "from scratch".

As to why Google is releasing the dataset, its researchers note that recent advances in computer vision were unlikely to have happened so rapidly without the availability of other large, publicly-available datasets to train machine-learning networks.

Examples of such datasets include ImageNet, which consists of 14 million images, and Microsoft's COCO image-recognition, segmentation, and captioning dataset.

Without these public resources, automated image-captioning and features such as Allo's automated replies to shared snapshots wouldn't be available yet.

The new dataset is the work of a collaboration between Google, Carnegie Mellon University, and Cornell University.

It was for the same reason that Google recently released YouTube-8M, a dataset of eight million YouTube videos and video-level labels that could help accelerate research into video understanding and deliver advances in video search and discovery.

The dataset consists of 500,000 hours of video, which Google has prepared for researchers by using a deep-learning model to extract 1.9 billion 'frame features' and compressed to less than 1.5TB.

The idea is to give researchers the ability to use the data for video understanding even if they lack big data and high-powered computers to process video.

Normally, if a researcher wanted to analyze data on the scale of YouTube-8M, they would need to have a petabyte of storage available and "dozens of CPU-years" of processing power, according to Google Research.

Read more about artificial intelligence

Editorial standards