People think the cloud can overcome the problems that have derailed big data-sharing efforts in the past. Those hopes may be misplaced, says Lori MacVittie.
The term big data has come to mean big headaches for IT organisations and big problems for consumers. Privacy is a growing concern as more and more data is not only collected but voluntarily shared by consumers in exchange for free access to applications and functionality.
Those wondering how much sites such as Facebook might know about them have to jump through hoops to find out and are likely to be surprised by how many personal details websites actually store.
The TV documentary Erasing David, screened on More 4 in 2010, detailed an attempt by film maker David Bond to do just that — find out how private his identity really is. After deliberately disappearing for a month, he hired detectives to track him down.
Before his disappearing act, Bond spent weeks trying to find out just how much information various websites held on him. Big data took on a whole new meaning as he sat at a desk, poring over more than 1,000 printed pages from Facebook alone.
The UK's Midata initiative
The UK government is proposing to make part of that discovery process easier on the consumer and their wallets with its Midata initiative, whereby consumers would have access to some of their data held by private organisations.
The government is promising protocols to handle privacy or consumer protection issues — but also stresses that this is a private-sector initiative and it will not be hamstrung by rules and regulations.
Given the amount of data stored by private organisations — consider the amount of storage required to maintain data on spending patterns in supermarkets — such an effort raises many questions, from how such vast amounts of data might be transferred to where they might end up.
Perhaps most interesting from a technology perspective is whether or not the cloud makes it possible to overcome the problems that have derailed these kinds of big data-sharing efforts in the past.
The cloud as a data warehouse
Whether it's sharing of data by the private sector, or attempts at similar initiatives within the US government or vertical industries such as healthcare, data-sharing has issues.
The main problems relate to the format of the data, both from an immediate integration perspective as well as for long-term accessibility.
Cloud computing is often touted as a solution to storage and processing of big data thanks to an illusory perception of infinite capacity. But the reality is that if...