What is "Big Data?"

What is "Big Data?"

Summary: Look out, Big Data is coming your way.


"Big Data" is a catch phrase that has been bubbling up from the high performance computing niche of the IT market. Increasingly suppliers of processing virtualization and storage virtualization software have begun to flog "Big Data" in their presentations. What, exactly, does this phrase mean?

If one sits through the presentations from ten suppliers of technology, fifteen or so different definitions are likely to come forward. Each definition, of course, tends to support the need for that supplier's products and services. Imagine that.

In simplest terms, the phrase refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities. Does this mean terabytes, petabytes or even larger collections of data? The answer offered by these suppliers is "yes." They would go on to say, "you need our product to manage and make best use of that mass of data." Just thinking about the problems created by the maintenance of huge, dynamic sets of data gives me a headache.

An example often cited is how much weather data is collected on a daily basis by the U.S. National Oceanic and Atmospheric Administration (NOAA) to aide in climate, ecosystem, weather and commercial research. Add that to the masses of data collected by the U.S. National Aeronautics and Space Administration (NASA) for its research and the numbers get pretty big. The commercial sector has its poster children as well. Energy companies have amassed huge amounts of geophysical data. Pharmaceutical companies routinely munch their way through enormous amounts of drug testing data. What about the data your organization maintains in all of its datacenters, regional offices and on all of its user-facing systems (desktops, laptops and handheld devices)?

Large organizations increasingly face the need to maintain large amounts of structured and unstructured data to comply with government regulations. Recent court cases have also lead them to keep large masses of documents, Email messages and other forms of electronic communication that may be required if they face litigation.

Like the term virtualization, big data is likely to be increasingly part of IT world. It would be a good idea for your organization to consider the implications of the emergence of this catch phrase.

Topics: Data Centers, Hardware, Storage, Virtualization


Daniel Kusnetzky, a reformed software engineer and product manager, founded Kusnetzky Group LLC in 2006. He's literally written the book on virtualization and often comments on cloud computing, mobility and systems software. In his spare time, he's also the managing partner of Lux Sonus LLC, an investment firm.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • re: What is big data?

    Certainly the amount of data makes it "big", but even small amounts of data can be leveraged by some new<br>techniques.<br><br>The way I think about "big data" is that it's not just big in terms of size, but also "big" in terms of comparative advantage, like "big fish". Data is changing the game in a lot of industries, and those who have the data (and ability to wield it) can perform more efficiently than those who don't. In some sense, this is what "experience" is for<br>humans - the more you've seen, the more you're likely to be able to predict what you're going to see.<br><br>Hope this offers another way of looking at it.<br>Ignacio
  • Not only size matters

    We had many discussions about Big Data and some circled arround a definition of the Big Data problem.
    In the end we concluded that a Big Data problem is described as a data collection, processing and / or analytics problem you seriously have to think about the solution before building it.
    Although this sounds vage, I think it is a good conclusing because a Big Data problem does not only exist if you have a lot of data, the type of data (many rows and/or many columns), the query type (how frequently, how complex) and the real-time requirements (latency in import and querying, query response times) are at least some factor which influcene the solution design.

    One thing should be clear to all who have to deal with a Big Data problem for the first time. Oracle and Hadoop are not a solution by definition - just because they have a big name and are well known, the don't fit every problem and there are much more alternatives (and many are much better) who have to be considered.

    Mike (parstream.com)
    • Handling Bigdata

      What other solutions we can use to handle bigdata other than ORACLE & Hadoop?
  • RE: What is

    Where does Idaho rank? We have been living in Montana for the past 5 years and I am not supri<a href=http://www.lojadoalongadorpeniano.com.br>aumento peniano</a>to find it #3 on the "worst" list.
  • RE: What is

    Better than Oracle? Please cite some
    <a href="https://www.cliquesexshop.com.br">sex shop</a>
  • Big Data

    Vectorwise is also a great open source big data product. http://www.actian.com/products/vectorwise
  • Overview of Big Data and NoSQL Technologies as of January 2013.

    Overview of Big Data and NoSQL Technologies as of January 2013.
    Overview of Big Data and NoSQL Technologies as of January 2013.
    Ladislav Urban
  • Overview of Big Data and NoSQL Technologies as of January 2013.

    Overview of Big Data and NoSQL Technologies as of January 2013.
    Ladislav Urban
  • Big Data in Two Words

    Great article! Here's another article you may find interesting - Big Data analysed in two words: