Facebook 'Likes' can reveal your sexuality, ethnicity, politics, and your parent's divorce
Summary: Big data is not your friend because it can easily be used to reveal highly personal information.
Facebook users who click "like" on a variety of cultural subjects reveal a surprisingly large amount of information about themselves even if they've taken steps to tighten up their privacy settings.
A recently published study by researchers at Cambridge University in the UK and Microsoft Research, used an automated analysis of 58,000 volunteers' Facebook "likes" to make highly accurate predictions about a person's private and very sensitive personal attributes.
The authors of the study, Private traits and attributes are predictable from digital records of human behavior, claimed that they were able to use "easily accessible digital records of behavior, Facebook Likes" to accurately predict a wide range of attributes that included:
Sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender.
The researchers developed a model that could predict whether a man was homosexual 88 percent of the time, and 75 percent of the time for women; ethnic origin (95 percent), gender (93 percent), religion (82 percent), political affiliation (85 percent), if they use addictive substances (75 percent), and relationship status (67 percent).
Margaret Weigel, writing on Journalist's Resource, noted that clicking "Like" on popular subjects such as "Britney Spears" or "Desperate Housewives" were among signs of a homosexual orientation.
The model was less accurate when attempting to predict the length of the parents' marriage (60 percent). "Individuals with parents who separated have a higher probability of liking statements preoccupied with relationships, such as 'If I'm with you then I'm with you. I don't want anyone else.'"
Foremski's take
Predictive models such as the ones used by the researchers in this study become even more accurate as more data is collected.
Easy access to such highly sensitive information could be used by employers, landlords, government agencies, educational institutes, and private organizations in ways that discriminate and punish individuals. And there's no way to fight it.
Big data is growing into a massive threat to individual well being in society. There is no difference between big data and Big Brother when it comes to commercial interests.
Big data is the Stasi of our online worlds
There are many "silent listeners" in social networks that collect people's "Likes" and other online behaviors so that the information can be sold discretely to third parties. Facebook, Google, and all other social networks also collect such behavioral information.
While the companies say that their behavioral big data is stripped of users' names, it is possible to use other databases such as electoral records, demographic information, and location data, to identify individuals by name.
It's essentially a secret dossier on more than a 1 billion social network users.
While this dossier is fragmented at the moment, sophisticated new technologies will soon make it trivial to pull together a massive amount of sensitive private data on every individual who interacts with the internet in any way.
Your phone records, or information about events and parties you attend, could implicate you in the future if they show a connection with people that are later identified as drug dealers, criminals, terrorists, or maybe even paedophiles.
Big data gradually accumulates a cloud of suspicion around you simply by association.
Deleting bad links
Google, for example, already assesses the quality of a website by who links to it. If you have lots of what Google considers to be low-quality, spammy site back links, it will downgrade your website's all-important PageRank and bury it deep within its index.
This is why website owners are desperately sending out letters to other websites asking them to delete their back links.
Think about how such methodology could be applied to determine the "TrustRank" of an individual in Google's world. People can't erase past links to friends and associates now considered "low quality" or possibly even criminal.
Welcome to the future obligations of your present life. Will people start purging their online social circles of unsavory characters? Or people they think might turn to criminal activities?
Big data knows you better than you know you
Big data technologies currently under development will be able to make highly accurate predictions about nearly every important aspect of your existence: Your health, your lifespan, even your sanity.
Big businesses loves big data because it helps them manage their risks — it's what corporations do best.
I was at a recent Cisco event featuring their futurists. One of them talked about people's FICA scores, which determine their ability to get a mortgage, and developing a type of healthcare "FICA" score for each person as a way of determining their ability to get healthcare insurance.
I pointed out that it was a brilliant idea for cutting healthcare costs, since it would likely result in mass hospital closures as companies only insured people with a low risk of needing their services.
The future benefits of big data are stacked firmly against the individual.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
I wouldn't say it's an absolutely positive ID.
To some degree, I think you're blowing this a bit out of proportion. Statistics are by nature not black and white. To paint these things as absolute certainties is to grossly misunderstand the nature of these things.
Adding to that
"Big data knows you better than you" sums it all up; no it doesn't. Big data in this context is what you liked... Or chose to tell others you liked. Not what you actually liked...
Put it this way how many posts make you chuckle that you never "like"? How many sentiments that you agree with? Now how many times ave you "liked" something because of who posted it? Be honest? And pushing it a bit further... Though you'd never admit it, have you ever liked something because you wanted others to see you like it? Don't worry everyone has.
So yes a computer can punch number to generate REAL basics such as gender, sexuality, politics... But the real stuff that makes you you? Well that'd be as honest as you were to start with...
Very true
Context of your like is something the big data itself can't discern.
It's like when I "like" a post by my niece or nephew stating that they had a great time at the movies - I'm not saying I liked the movie they saw (I'd probably wouldn't) , I'm saying I like the fact that they had a great time.
How would big data understand I'm not liking the movie?
I hate Britney Spears!
The assumption you are making is that
re: The assumption you are making is that
No kidding
Anyway, what do you think data mining is about?
Advertising, typically...
* I'm a youngish guy
* I have three degrees
* am married to two women and one man
* and my cup side is DDD
that is why
It is FICO not FICA
FICA is the acronym for Social Security tax in the United States.
Facebook Likes
Oh duh - tell us something we already don't know...
Unless and Until
Unless "Likers" start manipulating the system. Until "Facebookers" figure out that they are being gamed, and then they start gaming back.
Throw enough chaff into the system and analysis gets harder, takes more time and effort, with results being extremely unreliable. Oh, the analysts will say that their formulas can compensate, or perhaps their intuition, their ability to derive the ultimate truth, will enable them to provide reliable figures.
But ... liars figure and figures lie.
Say what?
Proofreading the Internet, one page at a time
Illegal characters, darn it...
Just one parent's divorce? Mom's, then, do you suppose, or Dad's?