Faced with the ongoing confusion over the term 'Big Data,' here's a handy - and somewhat cynical - guide to some of the key definitions that you might see out there.
(1) Big Data as Exponential Data Growth
Despite what Wikipedia says, most people in the industry generally agree that Big Data isn't just about having more data, but that was indeed the term's first meaning in the late 1990s (even though warnings about the exponential rise of data volumes go back until at least the 1940s).
(2) Big Data as Data Characteristics
Big Data as the three Vs: Volume, Velocity, and Variety. This is the most well-known definition, first coined by Doug Laney of Gartner over twelve years ago. Since then, many others have tried to take it to 11 with additional Vs including Validity, Veracity, Value, and Visibility.
(3) Big Data as New Technology
Why did Big Data suddenly become such a widely used term? It wasn't simply because we do indeed now have a lot more volume, velocity, and variety than a decade ago. Instead, it was fueled by new technology, and in particular the fast rise of open source technologies such as Hadoop and other NoSQL ways of storing and manipulating data.
The users of these new tools needed a term that differentiated them from previous technologies, and-somehow-ended up settling on the woefully inadequate term Big Data. If you go to a big data conference, you can be assured that sessions featuring relational databases-no matter how many Vs they boast-will be in the minority.
(4) Big Data as Different Data Sources
The problem with big-data-as-technology is that (a) it's vague enough that every vendor in the industry jumped in to claim it for themselves and (b) everybody 'knew' that they were supposed to elevate the debate and talk about something more business-y and useful.
Here are two good attempts to help organizations understand why Big Data now is different from mere big data in the past:
- Transactions, Interactions, and Observations. This one is from Shaun Connolly of Hortonworks. Transactions make up the majority of what we have collected, stored and analyzed in the past. Interactions are data that comes from things like people clicking on web pages. Observations are data collected automatically.
- Process-Mediated Data, Human-Sourced Information, and Machine-Generated Data. This is brought to us by Barry Devlin, who co-wrote the first paper on data warehousing. It is basically the same as the above, but with clearer names.
(5) Big Data as Signals
This is another business-y approach that divides the world by intent and timing rather than the type of data, courtesy of SAP's Steve Lucas. The 'old world' is about transactions, and by the time these transactions are recorded, it's too late to do anything about them: companies are constantly 'managing out of the rear-view mirror'. In the 'new world,' companies can instead use new 'signal' data to anticipate what's going to happen, and intervene to improve the situation.
Examples include tracking brand sentiment on social media (if your 'likes' fall off a cliff, your sales will surely follow) and predictive maintenance (complex algorithms determine when you need to replace an aircraft part, before the plane gets expensively stuck on the runway).
(6) Big Data as Opportunity
This one is from 451 Research's Matt Aslett and broadly defines big data as 'analyzing data that was previously ignored because of technology limitations.' (OK, so technically, Matt used the term 'Dark Data' rather than Big Data, but it's close enough). This version lines up pretty well with how the term is actually used in most vendor web sites, articles, presentations, and discussions.
(7) Big Data as Awareness
Visualization expert Stephen Few points out that "because Big Data has no commonly accepted definition, discussions about it are rarely meaningful or useful." He believes that the definition should be anchored in what's actually new: "Big Data is a rapid increase in public awareness that data is a valuable resource for discovering useful and sometimes potentially harmful knowledge."
(8) Big Data as Metaphor
In his wonderful book The Human Face of Big Data, journalist Rick Smolan says big data is "the process of helping the planet grow a nervous system, one in which we are just another, human, type of sensor." Deep, huh? But by the time you've read some of stories in the book or the mobile app, you'll be nodding your head in agreement.
(9) Big Data as The New Term for Analytics and BI
Despite the gnashing teeth of some, Big Data is becoming an umbrella term for any type of data analysis, including what was possible with previous technology and which would have been called BI or analytics in the past.
Still not enough for you? Here's 30+ more definitions of Big Data!.
The bottom line: it is pointless to squabble over the "true" definition of Big Data. Instead, we should embrace the opportunity that the term gives us to educate a big new audience on the power of data to transform the way we work and live.
[A version of this was first posted on the Business Intelligence Blog]