Can the EU's Sensei project predict Brexit by data-mining social media chatter?

What is the web saying? Sensei is developing a natural language processing and analytics system to find out from millions of social media comments. The results could provide support for political or commercial decisions...
Written by Jack Schofield, Contributor
Map of Europe
Source: Sensei

The Sensei project, backed by the European Commission, is analyzing millions of online conversations to predict events, such as Brexit: the UK vote on whether to retain its EU membership. Barcelona-based Websays, one of the project's members, has already claimed a "stunning success" in predicting the Spanish general election.

Websays' founder, Dr Hugo Zaragoza, said in a statement (PDF): "There are others out there listening to social media, but we are the only ones with this deep combination of technology and human analysis through machine learning. We think this can add real value to business and help us to predict outcomes better than anyone else."

The hope is that "sentiment analysis" will prove more accurate than traditional polling systems. As one of the project members - Professor Massimo Poesio, from the University of Essex - says: "The recent UK general election illustrated just how wrong traditional pollsters can be."

Sensei's current prediction is that STAY will beat LEAVE by 53 to 47 percent, whereas a Guardian/ICM telephone poll, published today, shows a 52-48 split in favor of leaving the EU.

Sensei's graphical "radar chart" shows that the dominant UK sentiment, based on 1.5 million posts, is indignation, followed by amusement. The indignation is presumably caused by the misleading claims and incorrect statistics being used to bolster the arguments.

Sensei monitors hundreds of sources, including Twitter, Reddit, Google Plus, and Instagram, and dozens of news sources including the BBC, The Guardian, The Daily Mail and other newspapers. Coverage extends beyond the UK and Europe to include websites in the USA, Israel, Japan, Taiwan, Australia, New Zealand and other countries. This involves natural language processing in many different languages.

In addition, Sensei is also collecting data from one-to-one spoken conversations at call centers.

Sensei logo

The project is co-ordinated by the University of Trento (which developed the mood prediction algorithms) and includes Université d'Aix Marseille, Sheffield University and Essex University. The commercial partners are Teleperformance, a global call-center giant, and Websays, which will offer analytics services based on the technology.

Websays' system includes both AI and human analysts, and uses some open source software, including Elastic search and Kibana. The chart below shows the process.

The Sensei project's objectives include:

* Parse human conversations for both content, affect and other behavioral traits.

* Create adaptive technology to address the diversity and velocity of the media sources.

* Automatically generate human-readable multimedia, graphical and tabular summaries of dialogues and/or multiparty conversations.

The ultimate aim is to "fundamentally change the way we understand social media chatter and how it can be applied commercially to help commentators to understand what is being said."

Sensei is a Japanese honorific given to masters of particular arts, and is typically used to address teachers, professors, doctors, lawyers etc, as well as writers and musicians. This Sensei aims to master the web.

websays flowchart
Editorial standards