It was inevitable that some would begin questioning the term and idea of "big data", since it has become the hype word of the year. But what exactly are we talking about when discussing the huge surge in volume and variety of information moving through our enterprises?
Just over the past few days, in fact, some articles have suggested that the hype cycle of big data may have peaked.
First, Precog CEO John De Goes wrote in VentureBeat that "big data" — at least as it's hyped up by vendors — is dead. We've reached the point where every vendor software offering is about "big data":
The phrase "big data" is now beyond completely meaningless. For those of us who have been in the industry long enough, the mere mention of the phrase is enough to induce a big data headache — please pass the big data Advil.
That's the semantic issue. But the troubles with big data may go deeper than that. Nassim Taleb cautions in an article in Wired that researchers and analysts working with big data run the risk of cherry-picking information:
"Big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal)."
In other words, big data analytics can find you the results you want, versus real-life situations.
Similarly, Brian Bergstein of MIT Technology Review suggests that growing reliance on big data analytics is creating a corporate bubble of overconfidence.
A future in which such "intuitive knowledge" about how to deploy resources is overruled by algorithms that can work only with hard data and can't, of course, account for the data they don't have ... While it might seem obvious that data, no matter how "big", cannot perfectly represent life in all its complexity, information technology produces so much information that it is easy to forget just how much is missing.
History is full of examples of the incomplete pictures data provides, versus human observations on the ground. The US overreliance on data during the 1959-1975 Vietnam War is a classic example, Bergstein pointed out.
De Goes said that the broad trend that has been branded big data is actually several concurrent developments:
Predictive analytics: "If you can predict the future, you can also change it," De Goes observed. "Predictive analytics are behind everything from recommendation engines to fraud detection, to, yes, predicting which parolees are most likely to commit murder. The field calls upon techniques in statistics, machine learning, modeling, and other fields to identify and exploit patterns."
Smart data: This is the term that seems to be poised to replace "big data", De Goes said — which means it may soon get overused itself. He described the move toward smart data as concentrating on the "monetization of machine-captured data through predictive analytics".
Data science: A description for the emerging field "that employs advanced techniques in statistics, machine learning, natural language processing, and computer science to extract meaning from large amounts of data".
NewSQL: "A moniker for describing highly-scalable, horizontally distributed SQL systems."
De Goes and Bergstein both cautioned against the hype and hope of big data analytics, but the two appear to go in separate directions. De Goes' predictions pointed to greater reliance on machine-generated data and analysis, which actually goes against Bergstein's warnings about overreliance on systems to deliver insights on the ground.