Comments on NY Times Big Data article

Comments on NY Times Big Data article

Summary: The concept of Big Data is breaking through into the New York Times. The article only scratches the surface, however.

TOPICS: Storage

During my morning news scan, I came across "For Start-Ups That Aim at Giants, Sorting the Data Cloud Is the Next Big Thing" by Malia Wollan. Although the article starts off as a general treatment of Big Data and why it is becoming an increasingly important area of technology, it then turns into a review of Splunk and a few profiles of Splunk customers. This, of course, doesn't tell the whole story.

In my article "What is "Big Data?," I pointed out that "Big Data" is a catch phrase that first appeared in the high performance/technical computing segment of the overall IT market. Suppliers of many different types of virtualization technology have been presenting why their product or service is a critical requirement for a potential customer interested in deploying a "Big Data" application.

Some of these suppliers are speaking about the use of a distributed cache mechanism, one supplier calls this "memory virtualization," in order to allow a large number of systems to access a huge, rapidly changing data sent. The key here is rapidly changing. The data sets in question are updated so frequently that a traditional database engine could not possibly keep up.

Other suppliers, such as DataStax and MapR, are focused one level higher up the stack. They're presenting their version of Apache Hadoop, a popular framework for "distributed processing of large data sets across clusters of computers."

Still others focus on how Big Data applicaitons can be mapped onto storage virtualization solutions. The goal of these suppliers is making sure that this huge amount of rapidly changing data can somehow make it safely to more tradtional storage systems.

After speaking with quite a number of these suppliers, it becomes clear that each is working from a different, rather self-serving definition of what the concept entails. In an attempt to come up with something a bit more useful, I described Big Data as:

In simplest terms, the phrase refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities.

Does your organization have a need for "Big Data" applications? What tools are you using?

My thanks go out to Malia Wollan for bringing the topic up in the New York Times. 

Topic: Storage


Daniel Kusnetzky, a reformed software engineer and product manager, founded Kusnetzky Group LLC in 2006. He's literally written the book on virtualization and often comments on cloud computing, mobility and systems software. In his spare time, he's also the managing partner of Lux Sonus LLC, an investment firm.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • RE: Comments on NY Times Big Data article

    Good article Dan. While there is still some confusion about articulating and defining Big Data, there is no doubt that Managing large volumes of data is the next big IT wave and it is going to be really big business for the next few years starting 2012 as it goes from R&D mode to production mode.
  • RE: Comments on NY Times Big Data article

    Great article Dan. It is worth mentioning the HPCC Systems platform as a great fit for all data models. HPCC is a mature platform and provides for a data delivery engine together with a data transformation and linking system equivalent to Hadoop. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model. Also, the ROI for HPCC is significantly better than Hadoop in that it requires less nodes and less programmers. Visit
  • Definition of Big Data

    Hi Dan, our firm, IA Ventures ( is 100% focused on investing in early stage Big Data companies. We've spent countless hours contemplating the term Big Data. In our experience, Big Data refers to extracting insight from [b]complex data[/b]. Complex data does not only mean 'large amounts of data', but refers to data that exhibits some combination of the following characteristics: massive scale, unstructured and/or real time.

    Of course, there is no 'right' definition of Big Data - but figured I'd share the way we think about it at IA Ventures.