SINGAPORE--There has been a lot of hype about big data, but are companies benefiting from it yet? What are the key challenges Singapore businesses face in capturing relevant data and extracting intelligence from it?
How can technology help resolve some of these challenges, and what is the government's role in helping Singapore companies tap data for innovation? These, among others, are just some of the big data questions panelists will be discussing at ZDNet Asia's Big Debate to be held this week on Nov. 28 at the Pan Pacific Hotel.
Among them is Jude Yew, assistant professor at the Department of Communications and New Media at the National University of Singapore. He joined the department in 2012 after completing a PhD at the University of Michigan, where his research focused on studying and designing social computing systems that encourage pro-social behavior.
Specifically, Yew is interested in understanding, modeling, predicting, and designing for pro-social human behavior within socio-technical systems. He studied and designed environments for large-scale scientific collaboration, as well as the use of social tagging in sharing and reusing of user-generated content in online communities.
As a leadup to the panel discussion, we profile Yew in a Q&A here to get some of his initial thoughts on big data. Also catch our Q&A profiles of other panelists in the big debate:
- Tan Eng Pheng, Infocomm Development Authority of Singapore;
- James Woo, Farrer Park Company; and
- Janet Ang, IBM Singapore.
Q: How would you define big data? And why should the general Singapore population care about it?
Yew: Big data for me represents an opportunity to better understand and gain insights into large-scale human behavior, through the large amounts of data captured by technologies.
From the user of loyalty cards at supermarkets to Web searches on Google, traces of our interactions are logged and aggregated into massive databases which can be mined in real-time or at a later date. The insights gained from these large datasets can be used to develop predictions--for example, statistician Nate Silver and his predictions in the 2012 U.S. presidential elections--and useful technologies to inform ourselves about our habits, and to develop new technologies to make our lives better such as Netflix's movie recommendations.
While big data represents a unique opportunity, it also brings with it a unique set of challenges. Most notably, what constitutes an invasion of your privacy from the various efforts of data collection.
As more and more companies are realizing the value of big data, Singaporeans should also be aware not every data point about their lives is up for grabs. For instance, I've noticed many legitimate forms in Singapore which require me to provide my IC number, and other personal data. My suspicions also are raised when data collection efforts start asking me for unrelated information such as my parents' education level.
It is obvious the public's appetite for data openness, as well as the associated privacy, is changing as are efforts such as the Singapore government's Personal Data Protection Act. There is a balance to be drawn between the utility that can be gained from big data and ensuring the consumer's right to privacy is not abused.
In your view, what is the most fascinating potential application of big data technologies in Singapore? What social problems can it resolve and how can it best improve our daily life?
The most fascinating big data application in Singapore I have heard about is the use of data mining and machine-learning approaches to solve the problem of catching a cab on rainy days and at midnight. Data from real-time monitoring of taxi locations and machine predictions of where crowds are likely to congregate, depending on the time of day or event, can be used to suggest ideal locations to catch a cab in less optimal conditions.
Personally, I feel big data offers us the opportunity to explore areas that have been impossible to investigate, such as the evolution of cultural trends and tastes or the dynamics involves in large scale creative collaborations. My personal research has investigated the social behavior that surrounds the sharing of YouTube videos.
The social data around the sharing of these videos gives us insights into the types of content being shared. And by extension. we are able to make predictions about which videos are likely to go viral and become memes.
What do you see as the primary challenge hindering the adoption of such technology?
One of the biggest challenges around big data is how we manage the balance between the collection of data and making that data available for reuse. As I mentioned earlier, the Singapore public's appetite for data collection is evolving and should not be abused. Ensuring there are appropriate steps toward data privacy and ethical reuse is a conversation that needs to be had amongst and within companies.
Another challenge is the temptation to make spurious correlations and find significance in the copious amount of data available. As the saying goes, "rubbish in, and rubbish out".
A real fundamental challenge in big data is knowledge about the data itself--that the data points essentially represent people as well as their interactions and behavior. It is important to understand this data in the context it was collected. For instance, what are the tools used to collect or generate the data? Is there some ground knowledge about the behavior or interaction being logged? Do we know whose data is being captured?
These are some fundamental questions to ask of big data. Often, because the data is abstracted from reality, we tend to lose sight of the ground truth which is being represented in the data.