Big Data: Defining its definition
Summary: Big Data is all the rage these days, as are its constituent technologies like Hadoop, NoSQL, and the mystical discipline of data science. But it turns out that understanding of, and a consensus definition for, Big Data are rather elusive. This blog is here to address that.
This blog is about an industry area that has come to be called “Big Data.” The excitement around Big Data is huge; the mere fact that the term is capitalized implies a lot of respect. A number of technologies and terms get mentioned in the context of Big Data, with Hadoop chief among them, “data scientist” often not far behind and sometimes NoSQL thrown in for good measure.
It’s a bit unorthodox to start a blog post – especially a first post for a new blog – with a bunch of terms unaccompanied by definitions. But that’s a perfect metaphor for Big Data itself because, frankly, it’s not rigorously defined. Meanwhile the term is already entrenched – not just in the industry lexicon but in the mainstream vernacular as well.
What about Big Data is concrete and certain? We can safely say that Big Data is about the technologies and practice of handling data sets so large that conventional database management systems cannot handle them efficiently, and sometimes cannot handle them at all. Often these data sets are fast-streaming too, meaning practitioners don’t have lots of time to analyze them in a slow, deliberate manner, because the data just keeps coming.
Sources for Big Data include financial markets, sensors in manufacturing or logistics environments, cell towers, or traffic cameras throughout a major metropolis. Another source is the Web, including Web server log data, social media material (tweets, status messages, likes, follows, etc.), e-commerce transactions and site crawling output, to list just a few examples.
Really, Big Data can come from anywhere, as long as it’s disruptive to today's operational, transactional database systems. And while those systems will be able to handle larger data sets in the future, Big Data volumes will grow as well, so the disruptions will continue. The technologies used for creating and maintaining data, it turns out, are just not that well-suited to gathering data from a variety of systems, triaging it and consolidating it for precise analysis.
Perhaps you’ve heard of other terms, like Business Intelligence, Decision Support, Data Mining and Analytics, and wondered whether they’re part of Big Data or technically distinct from it. While these fields may have started out as distinct endeavors, they are often folded in to the Big Data discussion. Sometimes when that happens, it may seem that people are merely conflating things. But it turns out that Big Data is still evolving, and as a term it’s malleable. In a way, Big Data is a startup that’s still working out its business model.
I’ve been working with database, data access and business intelligence technologies since the mid-1980s, so Big Data quickly became a logical interest for me. What’s interesting, though, is that Big Data purists sometimes seem unaware of the data technologies that have come before, and miss out on the knowledge and experience those technologies represent. A few Big Data wheels have been re-invented ones.
This blog will investigate and explain what Big Data is about, based on the premise that there’s no perfect consensus on that definition and that it is, in any case, changing. I’ll be talking to customers, vendors and implementers, and I’ll be getting hands-on with the technology as well. I’ll look at some combination of terms, technologies, algorithms, languages, products, services, vendors, industry alliances, and more. Whatever the area of coverage, I’ll do my best to report it, explain it, analyze it, and monitor how it changes.
That’s not the whole story though. I’d very much like your input on which areas interest you the most, and if you have additional ideas. I’d like to hear your thoughts on Big Data in general, as well. My goal is to create a blog that deepens understanding of Big Data, and deepens interest in it too. Your feedback will be the best enabler in achievement of that goal.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
Excellent opening post
"mystical discipline of data science"
I don't see any data that is "disruptive" to current models of data management at present. Apart of course from the massive duplication and lack of constraint checking in "big data" DBMSs drastically disrupting data quality.
Good Points
Looking forward to this.
Obviously I'm also hoping you'll be my number one source of information on what our competition is doing, the varieties of ways businesses are using big data and what you see are some of the problems that need to be addressed.
Great first article Andrew. Looking forward to the next already.
Defining Big Data
Then this 'harnessing the massive volume of data' is accomplished using a variety of technologies and tools (e.g. Hadoop, NoSQL, R, ...) and techniques (e.g. data scientists as subject matter experts). In this sense 'harnessing' means bringing data under control, in order to gain insights from it - in new ways.
Nice one
I don't get Big Data
As far as I can see I am being offered products with far less functionality than RDBMSs and which lack a coherent logical model.
People keep saying that relational isn't scaleable but this statement is logically absurd. The relational model is a mathematical model for defining and manipulating data. Saying it isn't scaleable is a little bit like saying long division isn't scaleable because the only implementation you have is pencil and paper.
In my opinion the right way forward is a better implementation of the RDBMS. I see no viable alternative model at present.
Big Bigger Biggest
At first I thought it was just another industry buzzword. Like "turbo", "enterprise", paradigm, and some of the other ones we've had across the years.
The best answer I received is that BIG DATA is anything large enough to make you consider putting it in the CLOUD.
Big Data for Advertising and Marketing, Finance, and Healthcare
http://www.recruiter.com/articles/recruiting-in-the-age-of-big-data-a-guide-for-recruiters/
Hey Andrew, congrats on your article, and please also say hello to your dad for me