SAN FRANCISCO -- Big data offers a lot of promise and opportunities for improving our society -- especially for business and education purposes -- but there are a lot of hurdles to get around first.
A few of those pitfalls and perils were discussed in detail during a panel discussion at The Economist's two-day summit about information, which kicked off on Tuesday afternoon.
Economist data editor Kenneth Cukier got the ball rolling during a series of rolling interviews exploring the promise and perils of the data deluge.
Jeff Hammerbacher, chief scientist of Cloudera, posited that the most interesting thing going on in big data today is making data preparation more granular.
"The more you zoom in, the more pathologies you find," Hammerbacher said.
When Cukier pressed if this meant that big data can't always be trusted for one reason or another, Hammerbacher countered by asking, "If you can't trust the data, what can you trust?"
"It's hard for me to think of a case where measuring more makes you less certain," Hammerbacher asserted.
One of the barriers towards big data at the moment, Hammerbacher hypothesized, is that we're in this "brief pocket of time" where people involved in statistics and computer science have an advantage over most of the population in this field, allowing them to draw information and conclusions that most people can't.
"It's fun to exploit this inefficiency," Hammerbacher joked, but he predicting that our tools will automate big data analytics for everyone within 10 to 15 years.
Hammerbacher continued the rolling interview style by following up with Geoffrey Nunberg, an adjunct professor at the School of Information at the University of California, Berkeley.
Nunberg commented that there has been progress in using big data sources in linguistics and how language is used as a tool in different communities, but that change has been more incremental rather than a "quantum leap."
Looking at Google Translate, Nunberg admitted that the online translation tool is "so much better than systems we had a long time ago." But the problem is that it is still hard to tell the difference because Google Translate still doesn't usually produce results that even a "first year" language student would offer.
As an observer of language in the cultural scene, Nunberg said that he uses data analytics this all the time to see the way words are changing in meaning. For example, Nunberg cited the word "elite" used to be usually modified by financial terms, but now it is more associated with the media.
Turning towards social networks, Juliette Powell, author of the book, 33 Million People in the Room, cited social media as a medium "to co-create the future together" using connected technology.
When it comes to understanding data sets, Powell outlined four core elements that any Internet user would recognize that are vital for understanding big data analytics going forward: digital literacy, digital trust, platform transparency, and access.
But what is most interesting to Powell are the people themselves, adding that we shouldn't get caught up in data sets and forget the relationships.
"It's all about trust," Powell asserted, explaining that we choose the information we share and that fact needs to be kept in mind when analyzing big data.
One topic that always seems to come up when discussing either social media or big data is privacy. Circling back, Cukier argued that this is not the central problem to big data, but rather "propensity" and the possibility that we're going to have algorithims predicting our behaviors.
- IT leaders already making strides thanks to big data, survey says
- IBM works with students to build global enterprise skills
- Google BigQuery: Self-service cloud data analysis, from your iPad or desktop
- Dell's Boomi adds crowd sourced regression testing in latest update
- IBM revamps storage approach, integrates Platform Computing technology