How the new field of data science is grappling with ethics

Academic programs are popping up to meet the demand by companies like Google and Facebook, and even the government, to analyze data. What are they teaching about ethics?
Written by Laura Shin, Contributor on

It’s hard to imagine a world without personalized recommendations -- one in which Amazon and Netflix don't know what books or movies you might like, or Facebook and LinkedIn don't invite you to connect with friends and acquaintances. Analysis of our online activities is so advanced that in one famous anecdote retailer Target “knew” a teenage girl was pregnant before her father did: he found out after the store sent her coupons for baby clothes and cribs.

As we become more aware of how our actions are being studied, new academic programs in data science are cropping up with more frequency. This year, New York University and Columbia University launched new courses of study, joining similar programs at Stanford University, Northwestern University, Syracuse University and other educational institutions. The services of these graduates will be in demand: the McKinsey Global Institute reports that, for example, big data -- or, analysis of large data sets -- could help retailers increase operating margins by 60 percent and reduce U.S. healthcare expenditures by 8 percent.

But even as we embrace the potential benefits, this summer’s revelations about surveillance activities by the National Security Agency show that progress often comes at a cost -- in this case, to privacy. A Pew Internet and American Life study released last week showed that 86 percent of Internet users have made steps to remove or mask their identities online. Meanwhile, some companies are even trying to be open about their activities: Acxiom Corp., which collects and sells data about individuals to companies, just launched Aboutthedata.com, a site where Internet users can see and manage what Acxiom knows about them.

As new academic programs catch up with the use of big data in the real world, many also are grappling with how to teach ethics.

“Generally speaking, fields such as statistics, computer science and the hard sciences don’t teach ethics,” says Dr. Rachel Schutt, an adjunct professor at Columbia’s Institute for Data Sciences and Engineering. “There are privacy concerns, such as how much corporations and the government should know about individuals…. But software engineers [are taught] about the elegance or the mathematical beauty of the thing that they’re building, not how it will affect people’s lives."

That doesn’t mean these university programs aren’t trying. Karrie Karahalios, a computer science professor at the University of Illinois at Urbana-Champaign, says that she teaches her students how to sample data ethically and protect subjects in academic studies. For example, in a Facebook study, the researcher should replace all the participants’ names, all their friends’ names and all their friends of friends’ names with numbers.

“If you do these large social network studies, you don’t have what they call participant-informed consent. Let’s say I have you in one of my Facebook studies, and you’re coming to my lab and we are analyzing the strength of the connections between you and your friends. I’m getting information about your friends and their friends without their consent. It’s a very, very ethically sensitive area.”

Many ethics guidelines come from the Belmont Report, created in 1978 to protect human research subjects. It requires universities that receive funding from the government to have what’s called an Institute Review Board perform an ethics review of proposed studies involving human subjects.

“This is where it gets interesting with big data, because there are a lot of things you can do with big data that don’t involve talking with people,” Karahalios says. “If your study involves just scraping the Web, and not talking to a human, you don’t have to talk to the board.”

If academics find that big data allows them to obtain more information than they would be able to gather when dealing with subjects in person, imagine what companies like Google and Facebook know. They are forming their own policies, which tend to be that you “pay” for a service, particularly a free service, by giving up some privacy. The fact people are so used to this may be why, after the initial shock over the NSA news, many people effectively shrugged. According to a Washington Post-ABC poll in late July, 58 percent said they support this intelligence gathering in the effort to identify potential terrorists, compared to 39 percent opposed.

“There are academic workshops on the governance of algorithms, and I know that high-level executives from major corporations go to Washington and have meetings, but I don’t think there’s a uniform policy or standardization for what should be done with user-level data. We’ve been looking to companies like Google or Facebook to do the right thing and to set the standard but to the extent these are enforced or that other companies have to follow, a lot of this stuff isn’t in place,” Dr. Schutt says.

While so far there hasn’t been a big outcry, she says, “It’s the sort of thing where … people don’t object and it doesn’t seem that bad, but it really opens the door up to worse things.”

(Photo: Courtesy of Facebook)

This post was originally published on Smartplanet.com

Editorial standards