There's a logical fallacy that mathematicians are fond of quoting when humans exercise their considerable built-in pattern-recognition abilities to draw conclusions that could just be coincidence: correlation does not imply causality. But, as Kenneth Cukier and Viktor Mayer-Schönberger argue in Big Data: A Revolution That Will Transform How We Live, Work, and Think, what Big Data brings with it is a profound shift in our attempts to understand How the World Works. In their view, correlation may now be good enough all by itself.
For centuries we have focused on causation as a way of deriving general principles from specific cases. For example, once we understood that plants grew in response to ready supplies of sunlight, water and nutrients in the soil, we were able to apply this knowledge to promote more rapid and reliable growth.
What's happening now is that by churning through huge masses of data we can find patterns that would not be trustworthy in smaller samples, and derive value from them whether or not we understand the underlying causality.
If studying millions of patient records shows that this weird complex of symptoms indicates a particular rare illness and this particular drug ameliorates it, does it matter why? The result will be to kill off disciplines like sampling and habits of mind like the desire for exactitude and causality. Being approximately right is good enough; we don't need to risk being exactly wrong.
Cukier is the data editor of The Economist; Mayer-Schönberger, an Oxford professor, is best known for his 2009 book Delete, in which he proposed the "right to be forgotten". This book seems to reflect their disparate interests. The first half talks about the state of Big Data, the kinds of new insights it's bringing and the changes it's making in various industries, while the second studies its risks. It's tempting to attribute them to Cukier and Mayer-Schönberger respectively, but it's always dangerous to guess the mechanics of collaboration — the sample size is too small.
The state-of-the-art story is relatively familiar. Quantity can compensate for some lack of quality. Medical diagnostics. Spotting flu outbreaks using Google's search data. Moneyball (about which, I pause to complain that a book citing a non-fiction work should cite the original book rather than the movie).
Big Data profoundly changes the problem of privacy — another reason why the US's data-driven companies are lobbying so hard to use the review of data protection law to weaken it.
Big Data profoundly changes the problem of privacy — another reason why the US's data-driven companies are lobbying so hard to use the review of data protection law to weaken it. One of the fundamental data protection principles is that consent should be obtained for a change of use. But secondary uses are where much of the value of Big Data is derived. No one, for example, consented to the use of their search engine queries to track flu outbreaks, yet using the data in this way is clearly a public benefit. At least, it is until or unless some enterprising government decides that putting all the people in those areas under quarantine is a good idea.
Cukier and Mayer-Schönberger end up suggesting that we need a shift to accountability for the use of data from the present situation of restricting how it may be used. This is an idea we hear a lot these days, and it suffers from the problem that sometimes the damage of disclosure may be bad enough that no amount of accountability can fix it. Plus, as Simon Davies, the founder of Privacy International, is so fond of saying, "Companies are pathologically unable to regulate themselves".
Overall, this is probably the best-rounded book on Big Data to date. Most just cheerlead, while a few are all doom and gloom. This one aims at balance and a provides thorough grounding.
Big Data: A Revolution That Will Transform How We Live, Work, and Think
By Viktor Mayer-Schönberger and Kenneth Cukier