Facebook's researchers have unveiled a new AI model that can learn from any random group of unlabeled images on the internet, in a breakthrough that, although still in its early stages, the team expects to generate a "revolution" in computer vision.
Dubbed SEER (SElf-SupERvised), the model was fed one billion publicly available Instagram images, which had not previously been manually curated. But even without the labels and annotations that typically go into algorithm training, SEER was able to autonomously work its way through the dataset, learning as it was going, and eventually achieving top levels of accuracy on tasks such as object detection.
The method, aptly named self-supervised learning, is already well-established in the field of AI: it consists of creating systems that can learn directly from the information they are given, without having to rely on carefully labeled datasets to teach them how to perform a task such as recognizing an object in a photo or translating a block of text.
Self-supervised learning has gathered a lot of scientific attention lately, because it means that much less data is required to be labeled by humans – a painstakingly time-consuming task that most researchers would rather do without. At the same time, without the need for a curated dataset, a self-supervising model can work with larger and more diverse datasets.
In some fields, especially natural language processing, the method has already led to breakthroughs; algorithms trained on ever-larger amounts of unlabeled text have enabled advances in applications like question answering, machine translation, natural language inference, and more.
In contrast, computer vision is yet to jump fully onto the self-supervised learning revolution. As Priya Gopal, software engineer at Facebook AI Research, explains, SEER marks a first in the field. "SEER is the first fully self-supervised computer vision model that is trained on random internet images, as compared to existing self-supervised works in computer vision which have been trained on the highly-curated ImageNet dataset," she tells ZDNet.
ImageNet, in effect, is a large-scale database of millions of pictures that have been labeled by researchers and opened up to the larger computer vision community to advance developments in AI.
The project's database was used as a benchmark by Facebook's researchers to evaluate the performance of SEER, who found that the self-supervised model outperformed state-of-the-art supervised AI systems on tasks such as low-shot, object detection, segmentation and image classification.
"SEER outperforms the existing self-supervised models by training on just random images," says Goyal. "This result essentially indicates that we don't need such highly curated datasets like ImageNet in computer vision and self-supervision on random images produces very high-quality models."
With the degree of sophistication that self-supervised learning requires, the researchers' work was not without challenges. When it comes to text, AI models are tasked with assigning meaning to words; but with images, the algorithm must decide how each pixel corresponds to a concept – while accounting for the various angles, views and shapes that a single concept can take in different pictures.
In other words, the researchers needed a lot of data, and a model capable of deriving every possible visual concept from this complex pool of information.
To carry out the task, Goyal and her team adapted a new algorithm from Facebook AI's existing work in self-supervised learning, called SwAV, which clusters images that show similar concepts into separate groups. The scientists also designed a convolutional network – a deep-learning algorithm that models the connectivity patterns of neurons in the human brain to assign importance to different objects in an image.
With a billion-strong Instagram-based dataset, the scale of the system was large, to say the least. Facebook's team used V100 Nvidia GPUs with 32GB of RAM, and as the model size increased, had to fit the model within the available RAM. But Goyal explains that further research will be useful to make sure that compute capabilities are adapted to the new system.
"As we train the model on more and more GPUs, the communication between those GPUs needs to be fast for faster training. Such a challenge could be addressed by developing clear software and research techniques that are efficient for the given memory and runtime budget," she says.
Although there is still some work to be done, therefore, before SEER can be leveraged for real-world use cases, Goyal argues that the technology's impact should not be underestimated. "With SEER, we can now make further advances in computer vision by training large models on large abundance of random internet images," she says.
"This breakthrough could enable a self-supervised learning revolution in computer vision similar to what we've seen in natural language processing with text."
Within Facebook, SEER could be used for a broad range of computer-vision tasks, ranging from automatically generating image description to helping identify policy-violating content. Outside of the company, the technology could also be useful in fields that have limited images and metadata, such as medical imaging.
Facebook's team has called for more work to be done to push SEER into its next stage of development. As part of the research, the team developed an all-purpose PyTorch-based library for self-supervised learning called VISSL, which it is open-sourcing to encourage the broader AI community to test with the technology.