For anyone who has marveled at the breathtaking array of challenges that Facebook has faced this year regarding its place in society, the latest bit of research from the company's artificial intelligence team offers a fascinating goal: To be "more engaging to humans."
Researchers at the Facebook AI unit found a way to train machine learning models to spit out not merely factual representations of images, but rather captions to photos that could take on a number of styles of comment that might be more interesting to a person, and that, crucially, are meant to represent the attitude, or personality traits, of disembodied entity that is doing the commenting.
Traditional machine learning tasks that successfully place a description automatically on an image "are useful to verify that a machine understands the content of an image," they write, but "they are not engaging to humans as captions."
Personality, in this case, could range from sweet to arrogant to anxious, and various arrangements in between. A picture of a sandwich, for example, could be affectionately labeled, "That is a lovely sandwich," or, more derisively, "I make better food than this."
The work is a mash-up of several state-of-the-art techniques, such as how to determine the content of an image, and then how to generate novel sentences.
The paper, "Engaging image captioning via personality,"and posted on the arXiv pre-print service, was authored by Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, and Jason Weston of Facebook AI Research.
The neural network model the authors created, which they dub "TransResNet," relies upon several state-of-the-art programs built to "encode" image data, including the "ResNet152" encoder developed by Sébastian Marcel and Yann Rodriguez in a piece of software called "Torchvision," in 2010.
The output of that encoder is then given to a "multi-layer perceptron with ReLU [restricted linear unit] activation units." To that, the authors add an "embedding" of a personality trait. Next, the authors train two encoders on what they call a "next-utterance retrieval task," which leverages a database that holds dialog consisting of "1.7 billion pairs of utterances, where one encodes the context and another the candidates for the next utterance."
The authors then show that the TransResNet is competitive or even superior on a bunch of standard benchmark tests for applying a caption to an image. But in order to show that the personality of a caption can have an impact, they had groups of people look at human-authored captions and the automatically generated captions and say which they found "more engaging."
Report the authors: "Captions conditioned on a personality were found to be significantly more engaging than those that were neutral captions of the image, with a win rate of 64.5 percent, which is statistically significant using a binomial two-tailed test."
And when comparing their work to "engaging" captions authored by people, the researchers found "our best TransResNet model [...] almost matched human authors, with a win rate of 49.5 percent (difference not significant, p > 0.6)."
The authors note this is a benchmark from which to pursue further development of their model, "leaving the possibility of superhuman performance coming soon in this domain."
Interestingly, the authors left by the way side some personality traits they could not model, such as "allocentric, insouciant, flexible, earthy and invisible," all of which, they write, are difficult to interpret.
There may be a broader lesson in all this about the mood in the world. In the study groups where humans were asked to evaluate how engaging a caption is, the authors write that when they were presented with both a caption that expressed no particular personality. one that's just factual, on the one hand, and a caption that expressed a positive point of view - "nice kitty!" or some such - on the other hand, people tended to find the positive caption more engaging. But when presented with negative captions, people found them less engaging than those that were just factual. Enough with the negativity, might be the takeaway.
Previous and related coverage:
New study shows artificial intelligence technology is paying off, but organizations face challenges.
Going beyond typical chatbots built for a single purpose, the Oracle Digital Assistant can be trained to support domain skills from multiple applications
Deloitte's annual AI survey reveals a bit of realism, cybersecurity worries and a 17 percent median return on investment.
SlashData's latest survey of 20,000 developers identifies machine learning and data science are the skills to know for 2019.
The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.
- There is no one role for AI or data science: this is a team effort
- Startup Kindred brings sliver of hope for AI in robotics
- AI: The view from the Chief Data Science Office
- Salesforce intros Einstein Voice, an AI voice assistant for enterprises
- It's not the jobs AI is destroying that bother me, it's the ones that are growing
- How Facebook scales AI
- Google Duplex worries me CNET
- How the Google Home is better than the Amazon Echo CNET