Instagram predicts the flu. Who knew? AI knew, that's who.

AI researchers in Finland compared how well hashtags like "flu" in Instagram posts, and images people put up of flu medication, anticipated the historical public health data for flu outbreak. It's the latest in a number of attempts to use social media as a way to gauge population and health trends.
Written by Tiernan Ray, Senior Contributing Writer

One can debate whether social media is good for you or not, but it could help at least take the temperature of society's general health.

AI researchers in Finland, using the public health data gathered rigorously for that nation of five million inhabitants, found posts on Instagram had a significant statistical correlation with recorded influenza outbreaks.

By searching for hashtags with words such as "flu," and comparing the image content of posts showing boxes and bottles of flu drugs, the researchers constructed a mash-up of convolutional neural network and a version of something called "tree learning" to predict historical outbreaks.

The paper, Predicting the flu from Instagram, was posted Tuesday on the Cornell University arXiv pre-print server, and is authored by Oguzhan Gencoglu and Miikka Ermes, who are, respectively, affiliated with the medical faculty of Finland's Tampere University, and software and services firm Tieto, Ltd.

Also: Facebook Oculus research crafts strange mashup of John Oliver and Stephen Colbert

Researchers in Finland studied over 22,000 Instagram posts for hashtags and images of pill boxes that anticipated historical outbreaks of influenza in the country.

The authors propose theirs is "the first study to employ images in social media for forecasting the influenza epidemics," but they also list several prior studies of society on social media, such as studies of Instagram posts for indicators of depression, and tobacco use among young people.

The investigators gathered six years' worth of weekly postings on Instagram, from April 2012 to May 2018, over 22,000 posts, gathering hashtags in Finnish pertaining to illness, such as the Finnish word "flunssa," meaning flu, or "lihaskipu," meaning muscle ache. It was important to the study, write Gencoglu and Miikka, that they were able to confine their search to "a single language and country" in order to be able to compare the posts to a single country's health data.

The data, gathered using a Python-based web crawler, was only from public posts. And the crawler recorded only post dates and hashtags, and the individual image URLs, it didn't record user names and it didn't store any of the images.

Also: Google ponders the shortcomings of machine learning

The authors then trained nine different neural network models by correlating numbers of hashtag references in posts to the official incidences of flu as recorded by Finland's National Institute for Health and Welfare. They trained against five years' worth of data, and then tested the model by having it use the sixth year of Instagram data and health data as the test to see how it did.

The image part of the work was done by combining two different convolutional neural networks, "Inception," and "ResNet," a cocktail that was first developed in 2016 by researchers at Google. They trained it to look at images of people showing pill bottles and similar health remedy packages, based on four sample images of drugs.

The final winning neural net approach, the authors write, was the combination of Inception and ResNet with something called "XGBoost," a form of tree search developed at the University of Washington in 2016.

Also: Watching YouTube videos may someday let robots copy humans

They concluded that the combo of image nets and XGBoost best fit the curve of health data, and was also "statistically significant" for predicting the flu outbreak in the final year's data.

A big question of this kind of social media search, left open for further work, is how the statistics are warped by the medium itself. The authors note the failure of Google's "Google Flu" search trends in 2013, because "heightened media attention" to the Google effort warped the search activity. 
They therefore conclude that in future work, "normalizing" the weekly post counts compared to "the total number of weekly Instagram posts in the population ... may enhance the predictive performance by taking the popularity aspect of the platform into account."

Social media cannot be trusted without these features

Previous and related coverage:

What is AI? Everything you need to know

An executive guide to artificial intelligence, from machine learning and general AI to neural networks.

What is deep learning? Everything you need to know

The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.

What is machine learning? Everything you need to know

This guide explains what machine learning is, how it is related to artificial intelligence, how it works and why it matters.

What is cloud computing? Everything you need to know about

An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud.

Related stories:

Editorial standards