Among all the empty trivia on social networks, researchers now think important scientific data is just waiting to be found.
Scientists in Spain believe messages posted on social media can be a valuable source of information about adverse drug effects that have not been mentioned in clinical trials or registered in reporting systems by patients or doctors.
A research team at Carlos III University of Madrid (UC3M) is currently working on the development of a toolkit for processing large volumes of health data.
As part of the project, the team is developing ways to take social-media comments about drugs, diseases, and adverse drugs reactions, and transform them into structured information for better decision-making in the healthcare sector.
The use of social-media big data could also help pharmaceutical companies identify potential new drug candidates and lower research costs. This type of data may also assist in monitoring the effectiveness of drugs by collecting data in unrestricted and mixed environments.
So far, the team of scientists at UC3M has published a prototype, under the European research project TrendMiner, storing five million tweets and 40,000 comments, written in Spanish and collected over a year.
The prototype includes a linguistic processor based on MeaningCloud commercial software, which recognizes mentions of drugs, adverse effects, and diseases, and then displays them in clouds of tags and different timelines.
Listening to the right channels
Using these techniques, colloquial descriptions by patients in social networks are translated into manageable data in comparative studies to obtain patterns and trends.
The messages can also be combined with those obtained from other sources, such as electronic medical records, explained Paloma Martinez, researcher and professor of the laboratory of advanced database at UC3M.
Aggregated and standardized information is anonymized to comply with the highest level of security required by the current Spanish Organic Law on Data Protection.
For Martinez, whether medical agencies consider changes to the technical specifications of a drug, or pharmaceutical companies gather feedback on their products and know how to position themselves against competitors, it is important "to be listening to the right channels".
"Patients do not explain everything to doctors but are comfortable with sharing information in social networks and forums," she said.
To improve the system, the UC3M team is now working on algorithms to see whether the information obtained has statistical significance. Scientists are also trying to improve the analysis of patient-oriented language.
To do this analysis, they are applying deep-learning algorithms, based on the use of neural networks, capable of working with massive volumes of data, and trained to analyze and understand the language.
"In this domain not only do we work analyzing social media but also other documents such as clinical notes and scientific publications," said Martinez.
Marc Torrent, head of the Big Data Center of Excellence in Barcelona, said this kind of research may be a major breakthrough for the pharmaceutical industry.
However, Felip Miralles Barrachina, director of the eHealth unit at Barcelona Digital Technology Centre (BDigital), a member of the Catalan government's technology hub known as Eurecat, is not so sure.
"The results may be qualitative, but are unlikely to have clinical value in the study of adverse drug reactions," he said. "[But] we must not be dogmatic," he added.
According to Farmaindustria, an association representing the pharmaceutical industry in Spain, the sector spent €950m in research and development in 2014 and got back into the growth after three consecutive years of falling investment.
Read more about health data