Big data analysis needs human context

Knowing the intent and meaning behind the unstructured data that gets analyzed is the next evolution for big data, and will help companies avoid making spurious decisions.
Written by Jamie Yap, Contributor

SINGAPORE--Big data analysis should mature from the "low-hanging fruit" and easy correlations to understanding and verifying the context behind unstructured data points. This way, findings will be more accurate and grounded, and will enable better business decisions.

When the context behind the data points is not taken in consideration, and analysis is based solely on algorithms, the risk is the end results are porous, according to Jude Yew, assistant professor at the Department of Communications and New Media at National University of Singapore (NUS). He was speaking as a panelist at the ZDNet Asia Big Debate on Big Data held here Wednesday.

"Take for instance the 'Like' button on Facebook. What does it mean when a user likes it? [We need to know] the meaning and intent behind the actions being captured as data, or we end up making inaccurate decisions. As the saying goes, 'rubbish in, rubbish out'," Yew said.

Another panelist, James Woo, CIO at Farrer Park Company, similarly argued one action of a person liking another's Facebook post can have a multitude of reasons behind it.

It could be that the post's content appeals to his own thinking or it may have nothing to do with the post because there is a biased relationship, Woo explained. The differences in contexts in turn impact how the results will be gathered and used.

Yew emphasized knowing the context basically means going beyond the abstract and understanding what is accurately happening on the ground, from which the appropriate questions to ask then can be created. Out of this only then can truly meaningful results be sieved.

"It's easy to bring large data sets and analyze them, but whether it's meaningful on the ground is another matter," he said.

Janet Ang, managing director at IBM Singapore, acknowledged the importance of context awareness, but disagreed having a clear hypothesis is always possible or necessary. "We've got to be prepared that we don't always know the questions we're supposed to ask. Because if we do [know the questions], we end up finding only data to prove the theory right."

As much as the data can be cleansed with software tools, the uncertainty over the veracity of big data must be tolerated, Ang noted.

Yew explained that the human context was critical because, ultimately, companies perform data analytics in order to use the results to understand customers and create better targeted and relevant projects. The real value from the big data analysis, therefore, is not knowing the context itself, but the unforeseen insights and harnessing the data in unexpected ways.

For instance, in a study he previously did analyzing social behavior of users in chatrooms who were sharing video links, Yew said the experiment yielded insights into understanding the people's relationships and ability to predict the content of the video, whether it was a music video or a viral comedy clip. From that, companies, for example, can derive insights into people's cultural tastes and identify the best time during the day to recommend viral content such as videos to targeted audiences.

Yew, on the event sidelines, acknowledged when in the "trenches" of the business world, it is not always feasible or appropriate for companies to have the resources to know the human context of the data they are analyzing.

"But that does not absolve you from the need to understanding what happens on the ground after you achieved your result. For instance, you can allocate one day to talk to and survey a representative sample of users to verify the analyses," he said.

Privacy matters in big data
Knowing as much about the human factor behind the data also brought the panel discussion to the issue of user privacy within the big data phenomenon.

Woo noted that from the CIO perspective, the real power of big data comes from being able to combine an organization's own data with data outside the company's firewall. Noting the passing of the data protection bill in Singapore last month, he said while having more data is "definitely very good", the question remains whether and to what extent a company can justifiably do with the data.

Tan Eng Pheng, senior director of industry cluster group at Infocomm Development Authority (IDA), who was also on the panel, said privacy was a real concern in several countries including Singapore, and the Act was meant to protect the consumer whilst ensuring businesses safeguarded customer information in their custody.

Quizzed whether the data protection law stifled the needs of businesses, Tan noted the data protection law was just a baseline that applied to all service providers, whereas there were sector-specific regulations which were even stricter.

Editorial standards