Facebook is using artificial intelligence to describe photos for blind users

Facebook has built a computer vision system to help blind people interpret images shared on the social network.
Written by Liam Tung, Contributing Writer

Facebook is using AI to help describe images posted by users.

Image: Facebook

Facebook's iOS app is using object recognition technology to give blind people an audio breakdown of what's going on in photos posted on the social network.

The new accessibility feature, rolling out today, could be a major improvement on existing screen readers, which largely focus on text.

Until now, when blind users were checking their Facebook newsfeed and came across an image, they would only hear the word "photo" and the name of the person who shared it, which left the user still dependent on friends and family to interpret an image.

To improve the experience for blind people, Facebook has used its vast trove of user images to train a deep neural network that drives a computer vision system built to recognise objects in images.

Facebook then translates that into "alt text", a WC3 standard for providing text alternatives for images that can be interpreted by screen readers. This should mean that any screen reader can pick up Facebook's alt text output and read out images to the user.

From today on, blind people in English-speaking markets using Facebook's iOS app will begin to hear things like "image may contain: three people, smiling, outdoors".

Facebook thinks the system could go along way to including the world's 39 million blind people and 246 million people with severe visual impairment in conversations about photos on Facebook.

One blind participant in a March study Facebook conducted with Cornell University commented that "when it comes down to doing things people do on Facebook, it's more about the photos".

The researchers found that most blind people posted photos and responded to images shared by their friends. While technical strategies were used to overcome some accessibility issues, many still relied on friends for tasks like composing a photo, particularly if it was to be shared publicly on Facebook rather than privately among friends on WhatsApp.

A common problem in responding to photos on a social network was that images often lacked useful contextual information, which depends on the person who posted the photo to describe -- something that few people do.

For now the feature is only available in English for blind users in the US, UK, Canada, Australia, and New Zealand. However, Facebook intends to roll it out to more platforms, languages and markets soon.

According to Facebook, it took 10 months to get the system to its current state. Facebook has been cautious about how it explains concepts in photos. For now these are limited to 100 and include things like people's appearances, such as a baby, beard, smiling, as well as concepts common to images of nature, transport, sports, and food.

When describing what's in an image, the system currently builds sentences first around people, followed by objects, and scenes. It also always says the "image may contain" what it has detected to convey uncertainty.

More on Facebook

Editorial standards