Comments on NY Times Big Data article

The concept of Big Data is breaking through into the New York Times. The article only scratches the surface, however.
Written by Dan Kusnetzky, Contributor

During my morning news scan, I came across "For Start-Ups That Aim at Giants, Sorting the Data Cloud Is the Next Big Thing" by Malia Wollan. Although the article starts off as a general treatment of Big Data and why it is becoming an increasingly important area of technology, it then turns into a review of Splunk and a few profiles of Splunk customers. This, of course, doesn't tell the whole story.

In my article "What is "Big Data?," I pointed out that "Big Data" is a catch phrase that first appeared in the high performance/technical computing segment of the overall IT market. Suppliers of many different types of virtualization technology have been presenting why their product or service is a critical requirement for a potential customer interested in deploying a "Big Data" application.

Some of these suppliers are speaking about the use of a distributed cache mechanism, one supplier calls this "memory virtualization," in order to allow a large number of systems to access a huge, rapidly changing data sent. The key here is rapidly changing. The data sets in question are updated so frequently that a traditional database engine could not possibly keep up.

Other suppliers, such as DataStax and MapR, are focused one level higher up the stack. They're presenting their version of Apache Hadoop, a popular framework for "distributed processing of large data sets across clusters of computers."

Still others focus on how Big Data applicaitons can be mapped onto storage virtualization solutions. The goal of these suppliers is making sure that this huge amount of rapidly changing data can somehow make it safely to more tradtional storage systems.

After speaking with quite a number of these suppliers, it becomes clear that each is working from a different, rather self-serving definition of what the concept entails. In an attempt to come up with something a bit more useful, I described Big Data as:

In simplest terms, the phrase refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities.

Does your organization have a need for "Big Data" applications? What tools are you using?

My thanks go out to Malia Wollan for bringing the topic up in the New York Times. 

Editorial standards