Big Data: Defining its definition

Big Data: Defining its definition

Summary: Big Data is all the rage these days, as are its constituent technologies like Hadoop, NoSQL, and the mystical discipline of data science. But it turns out that understanding of, and a consensus definition for, Big Data are rather elusive. This blog is here to address that.

TOPICS: Big Data

This blog is about an industry area that has come to be called “Big Data.”  The excitement around Big Data is huge; the mere fact that the term is capitalized implies a lot of respect.  A number of technologies and terms get mentioned in the context of Big Data, with Hadoop chief among them, “data scientist” often not far behind and sometimes NoSQL thrown in for good measure.

It’s a bit unorthodox to start a blog post – especially a first post for a new blog – with a bunch of terms unaccompanied by definitions.  But that’s a perfect metaphor for Big Data itself because, frankly, it’s not rigorously defined.  Meanwhile the term is already entrenched – not just in the industry lexicon but in the mainstream vernacular as well.

What about Big Data is concrete and certain?  We can safely say that Big Data is about the technologies and practice of handling data sets so large that conventional database management systems cannot handle them efficiently, and sometimes cannot handle them at all.  Often these data sets are fast-streaming too, meaning practitioners don’t have lots of time to analyze them in a slow, deliberate manner, because the data just keeps coming.

Sources for Big Data include financial markets, sensors in manufacturing or logistics environments, cell towers, or traffic cameras throughout a major metropolis.  Another source is the Web, including Web server log data, social media material (tweets, status messages, likes, follows, etc.), e-commerce transactions and site crawling output, to list just a few examples.

Really, Big Data can come from anywhere, as long as it’s disruptive to today's operational, transactional database systems.  And while those systems will be able to handle larger data sets in the future, Big Data volumes will grow as well, so the disruptions will continue.  The technologies used for creating and maintaining data, it turns out, are just not that well-suited to gathering data from a variety of systems, triaging it and consolidating it for precise analysis.

Perhaps you’ve heard of other terms, like Business Intelligence, Decision Support, Data Mining and Analytics, and wondered whether they’re part of Big Data or technically distinct from it.  While these fields may have started out as distinct endeavors, they are often folded in to the Big Data discussion.  Sometimes when that happens, it may seem that people are merely conflating things.  But it turns out that Big Data is still evolving, and as a term it’s malleable. In a way, Big Data is a startup that’s still working out its business model.

I’ve been working with database, data access and business intelligence technologies since the mid-1980s, so Big Data quickly became a logical interest for me.  What’s interesting, though, is that Big Data purists sometimes seem unaware of the data technologies that have come before, and miss out on the knowledge and experience those technologies represent.  A few Big Data wheels have been re-invented ones.

This blog will investigate and explain what Big Data is about, based on the premise that there’s no perfect consensus on that definition and that it is, in any case, changing.  I’ll be talking to customers, vendors and implementers, and I’ll be getting hands-on with the technology as well.  I’ll look at some combination of terms, technologies, algorithms, languages, products, services, vendors, industry alliances, and more.  Whatever the area of coverage, I’ll do my best to report it, explain it, analyze it, and monitor how it changes.

That’s not the whole story though.  I’d very much like your input on which areas interest you the most, and if you have additional ideas. I’d like to hear your thoughts on Big Data in general, as well.  My goal is to create a blog that deepens understanding of Big Data, and deepens interest in it too.  Your feedback will be the best enabler in achievement of that goal.

Topic: Big Data

Andrew Brust

About Andrew Brust

Andrew J. Brust has worked in the software industry for 25 years as a developer, consultant, entrepreneur and CTO, specializing in application development, databases and business intelligence technology.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Excellent opening post

    Excellent opening post. I look forward to its coverage of Big Data, a topic that has become quite hot lately.
  • "mystical discipline of data science"

    Data science is all about logic and absolutely nothing to do with mysticism.

    I don't see any data that is "disruptive" to current models of data management at present. Apart of course from the massive duplication and lack of constraint checking in "big data" DBMSs drastically disrupting data quality.
  • Good Points

    Congrats on a great article, Andrew. I'll be curious to see how the definition of big data evolves. What was big data 5 years ago is now "desktop data" and what is big data today will no doubt be desktop-worthy 5 years from now. - DKM
  • Looking forward to this.

    Can't wait to see how this discussion evolves Andrew. You couldn't have picked a better time to start this topic. I work for Sybase which was acquired by SAP. It's new Database and Technologies group is all about Big Data, and from where I'm sitting we couldn't be better situated to provide the most comprehensive solutions available in this space. I'll be following intently to see if you agree.

    Obviously I'm also hoping you'll be my number one source of information on what our competition is doing, the varieties of ways businesses are using big data and what you see are some of the problems that need to be addressed.

    Great first article Andrew. Looking forward to the next already.
  • Defining Big Data

    Let's keep Big Data simple, so that everyone in business and on the street can instantly understand what it means. I would say that Business Intelligence and Data Warehousing are things you can do with Big Data - and Big Data is all about 'harnessing the massive volume of data' we now have - inside and outside of the business - especially from social media (Twitter, Facebook, LinkedIn, Chatter, etc.).

    Then this 'harnessing the massive volume of data' is accomplished using a variety of technologies and tools (e.g. Hadoop, NoSQL, R, ...) and techniques (e.g. data scientists as subject matter experts). In this sense 'harnessing' means bringing data under control, in order to gain insights from it - in new ways.
    Being Guided
  • Nice one

    Nice to see you starting by acknowledging how this is being hijacked by every man and his dog. I'm always suspicious of terms that suddenly spring up and get popularised yet drift into meaningless. Keep that stake firmly in the ground.
  • I don't get Big Data

    which is probably just as well as I am fairly certain I don't want it.

    As far as I can see I am being offered products with far less functionality than RDBMSs and which lack a coherent logical model.

    People keep saying that relational isn't scaleable but this statement is logically absurd. The relational model is a mathematical model for defining and manipulating data. Saying it isn't scaleable is a little bit like saying long division isn't scaleable because the only implementation you have is pencil and paper.

    In my opinion the right way forward is a better implementation of the RDBMS. I see no viable alternative model at present.
  • Big Bigger Biggest

    Thanks for addressing the (unknown, varying) definition of Big Data. I have asked this question myself for several months, literally, and sometimes with humorous results.

    At first I thought it was just another industry buzzword. Like "turbo", "enterprise", paradigm, and some of the other ones we've had across the years.

    The best answer I received is that BIG DATA is anything large enough to make you consider putting it in the CLOUD.
    stupid user name
  • Big Data for Advertising and Marketing, Finance, and Healthcare

    For anyone interested in Big Data, see

    Hey Andrew, congrats on your article, and please also say hello to your dad for me