Transforming the Datacenter

Big Data Invades the Datacenter

Big Data Invades the Datacenter

Summary: Big data is high volume, high velocity, real-time data that comes from all kinds of sources and ends up in a datacenter. To take full advantage of all this data, organizations need highly scalable storage and servers as well as the applications and frameworks to process all of the incoming data.

TOPICS: Data Centers

I recently ran across an old press release of a study by IDC which stated that at that time (June 2011), there were 1.8 zettabytes of data in the world. I don’t know what a zettabyte is, but the press release helped put this in perspective by saying it was equivalent to 200 billion feature length HD movies.

Oh yes, and there was one more thing. The press release said that the amount of data in the world is doubling every 2 years. That means that there must be about 3.6 zettabytes out there now. That’s seriously big data.

Big data is high volume, high velocity, real-time data that comes from all kinds of sources – people collecting information as part of their work, people interacting with search engines and social cites, consumers generating data when they make online and off-line purchases, machines collecting data generated by financial services, businesses, health care organizations, government, and all the forms of media. Even machines generate huge amounts of data. For example, modern jet engines have self-monitoring capabilities, sending continuous streams of information and alerts about their own operating parameters.

All this data at some point ends up in a datacenter. Remember, I said a few posts back that IT budgets, calculated in inflation adjusted dollars, have actually declined over the past 10 years? Notice that I just said the amount of data in the world is doubling every 2 years? But, I digress.

The demand for big data is growing because businesses are learning how to strategically analyze data in ways that give them a competitive advantage. They analyze data to understand buying patterns among their customers and link those patters to other market events. They use analytics to streamline their supply chain in order to reduce any unnecessary overhead. Increasingly, big data and predictive analytics are being used in highly sophisticated personalization strategies that identify individuals and make timely offers based on their location and other information that is known about them. All these applications put huge demand on datacenters where data is stored and analyzed.

The tendency today is to retain all data rather than summarizing or discarding anything not considered essential. This is partly due to reductions in storage costs, but it is also happening because data analysis is advancing so rapidly that it is no longer possible to say what data is not important or valuable.

To take full advantage of all this data, organizations need highly scalable storage and servers as well as the applications and frameworks to process all of the incoming data. Traditional databases are based on SQL, which lends itself well for transactional processing but is not optimized as well for high-performance analysis. Nonetheless, SQL databases have the advantage that they already hold a very large part of the relevant information. Extending them is therefore the quickest and easiest way to add more data processing capacity. 

Most Big Data activity is now focused on Hadoop, an open-source software framework that supports data-intensive distributed applications. Hadoop implements a scale-out computational paradigm named map/reduce, which splits the data into many small fragments and distributes processing of the application logic to all the nodes in a given cluster. Some applications combine Hadoop and in-memory computing for ultra-fast, real-time analytics based on high volumes of data.

There are many implementations of Hadoop including Microsoft’s HDInsight, which is available both on Windows Server 2012 and as a Windows Azure service. This ability to process data on internal infrastructure as well as in a public cloud is fundamental to the discussion. There are advantages to running Big Data applications in the public cloud, especially as a proof-of-concept or for one-off analyses. At the same time, latency and security considerations may require an on-premises process. It is valuable to develop a strategy that supports both delivery models.

Topic: Data Centers

John Rhoton

About John Rhoton

John Rhoton is a contributor to CBS Interactive's custom content group, which powers this Microsoft sponsored blog. He is a technology strategist who specializes in consulting to global enterprise customers with a focus on cloud computing.His tenure in the IT industry spans over twenty-five years at major technology companies, defining and implementing business strategy. He has recently led corporate technical strategy development, business development, and adoption of cloud services, datacenter transformation, mobility, security and next-generation networking, while also driving key corporate knowledge management and community-building programs.John is the author of six books.

John Rhoton's views are his alone and do not necessarily represent those of Microsoft or CBSi.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Nice intro of BigData

    Thanks for the article. It is a very good executive summary about the usefulness of BigData. I will mention your article to my management. The technical part about Hadoop offer at the end is lacking though. Microsoft doesn't have good implementation of Hadoop. Leaders in Hadoop are Hortonworks, Cloudera and MapR. The article would be more complete if you could give a summary about each of the main Hadoop leaders.
  • Does Microsoft Have A Facebook-Class Customer Yet?

    Facebook serves about a billion users with a combination of MySQL, memcached, and other common pluggable components of a Linux-based LAMP stack. Is there any equivalent powerhouse showcase for Microsoft products?
    • Re: Does Microsoft Have A Facebook-Class Customer Yet?

      Hi ldo17, that is an interesting question!

      I am not employed by Microsoft, so I am probably not the best person to answer it. I would point out that Microsoft hosts a number of sizable online services themselves, including Windows Azure, Office 365, Xbox LIVE, Skydrive etc.

      Since every service offering is different, I'm not sure that the absolute number of subscribers is the best point of comparison. Still, there is no question that the Facebook architecture is formidable in terms of the scalability that they have achieved.

      Thanks for the question and best regards,

      John (@johnrhoton)
  • Re: Nice intro of BigData

    Hi RelaxWalk, thanks for the feedback!

    As you point out, this is not intended to be a technical article about Hadoop but rather just an introduction to the business case.

    I do believe it would be useful to provide a competitive comparison of the Hadoop implementations. I will probably not be doing that in this column but will keep your suggestion in mind for when I can find an appropriate opportunity.

    Thanks again and best regards,

    John (@johnrhoton)