We hear often of how only "anonymized" data is collected and used, but is it truly possible to stay invisible?
Our browsing habits, mobile phones, and even our shopping data can all be collected, stored and shared. In order to salve privacy worries, firms often say that information is stripped of any individual user's data, but a new formula has shed light on how difficult it can be to protect users' privacy when large data sets are collected.
A paper to be published this week in Scientific Reports examines this issue. Researchers at MIT and the Université Catholique de Louvain, in Belgium, examined data on 1.5 million cellphone users in Europe over 15 months, and found that by finding only four points of reference, they were able to identify 95 percent of users. In the most difficult cases, 11 points of reference were required.
The researchers were able to identify people based on proximity to cellphone transmitters -- at least four times in one year -- which is helped along if the user shared their location data across a social network. According to their formula, whether within a 15-hour period or between 15 adjacent cellphone towers, was still enough to identify at least half of the "anonymous" people within a data set.
The researchers believe that similar relationships could prove true for other types of data, and the findings may be useful for policy creators to research more rigorous safeguards for privacy.
"I would not be surprised if a similar result -- maybe requiring more points -- would, for example, extend to web browsing," Hidalgo says. "The space of potential combinations is really large. When a person is, in some sense, being expressed in a space in which the total number of combinations is huge, the probability that two people would have the same exact trajectory -- whether it's walking or browsing -- is almost nil."
Image credit: Johan Larsson
This post was originally published on Smartplanet.com