Facebook enhances AI used to describe photos for visually impaired users

The latest iteration can detect and identify more concepts, as well as provide more detailed descriptions.

Facebook AI can now describe photos for visually impaired users

Facebook has announced new improvements to its artificial intelligence (AI) technology that is used to generate descriptions of photos posted on the social network for visually impaired users.

The technology, called automatic alternative text (AAT), was first introduced by Facebook in 2016 to improve the experience of visually impaired users. Up until then, visually impaired users who checked their Facebook newsfeed and came across an image would only hear the word "photo" and the name of the person who shared it.

With AAT, visually impaired users have been able to hear things like "image may contain: three people, smiling, outdoors".

Facebook said, with the latest iteration of AAT, the company has been able to expand the number of concepts that the AI technology can detect and identify in a photo, as well as provide more detailed descriptions to include activities, landmarks, food types, and types of animals, like "a selfie of two people, outdoors, the Leaning Tower of Pisa" instead of "an image of two people".

The company explained the increased number of concepts that the technology can recognise from 100 to more than 1,200 was made possible through training the model using weakly supervised learning using samples that it claimed are "both more accurate, and culturally and demographically inclusive".

See also: Facebook's approach to content moderation slammed by EU commissioners  

Facebook added that in order to provide more information about position and count, the company trained its two-stage object detector using an open-source platform developed by Facebook AI Research.

"We trained the models to predict locations and semantic labels of the objects within an image. Multilabel/multi–data set training techniques helped make our model more reliable with the larger label space," the company said.

Similar efforts have been made in the past by other tech companies to improve the user experience for visually impaired users.

Last year, Google released its TalkBack braille keyboard to help users type directly on their Android devices without the need to connect a physical braille keyboard. This was after the search engine giant launched its Lookout app, which uses AI to help users see by pointing their phone at objects to receive verbal feedback.

Prior to that, Amazon introduced a Show and Tell feature to Echo Show so it could recognise household pantry items. Users simply hold the item up to the display screen and ask, "Alexa, what am I holding?"

Related Coverage