Big data: How the revolution may play out

Big data: How the revolution may play out

Summary: Big data pilots in 2012 will go production in 2013 and 2014. Then the real fun begins.


If 2012 was the year of big data hype, interest and pilot projects, 2013 will bring production deployments, early returns on investment and a bit of disruption. By 2014, big data projects and systems are likely to be commonplace.

This year, big data became a tech term on par with cloud computing. The term means a lot yet is becoming used so much it loses its definition. By the way that definition typically revolves around velocity (data is moving fast), volume (there's too much of it) and variety (unstructured and structured information).

Does big data live up to the hype? Yes. To me, big data means technology and business alignment---that Holy Grail endlessly pursued by CIOs---becomes a no brainer. Big data projects by nature are about revenue, risk and profits. In other words, IT and the business can't help but be aligned.

Clearly, we're in a big data hype cycle that I put on par with the Linux and open source software craze in the late 1990s and early 2000s. Back then, Linux was going to change the world, kill Microsoft and other things. In many respects, Linux and open source software (Android for instance) did change everything. But a funny thing happened on the way to revolution---open source software became commonplace in every data center and now is take for granted. The revolution happened, but we just stopped talking about it as much. Cloud computing is playing out in a similar fashion.

Big data will follow this cycle too. Sure, millions of jobs will be created. And yes, talent pools will be stretched for a bit. Companies will also reinvent their industries. The vendor pecking order will be altered as startups like Cloudera become the new Red Hats. There will probably be a big data backlash of some sort (see cloud, sustainability etc).

Here's how I see the big data progression as we look ahead.

2013: Those 2012 pilots become production systems. Every vertical will have a big data success story. Oddly enough, success stories will be everywhere. Why? The big data projects are initiated by the business---CEOs, CFOs, CMOs---and IT is seen as an enabler not a cost center.

2014: Based on 2013 success stories and customer case studies, the fast followers will enter the big data game. Industries will all follow a big data playbook. Initially, these early returns will look good. Companies will primarily focus on internal data because there's a lot to mine there. Incorporating external data will be a nice to have, but nothing more at this stage.

2015: Companies will begin to look at external data in their big data plans. Before 2015, consumer facing companies spent the most time with external information and using it. Every analytics and data warehousing stack will have a Hadoop cluster and big data layer. Technologies like Hadoop cease to be a focus because they remain important, but fade into the software stack as a given. Big data mergers and acquisitions pick up steam.



2016: By this point, big data is seen as a utopia of sorts and companies become cocky---they always do. Data driven decisions replace gut feel and common sense. Early wins and common business cases are played out. Now companies have to start really thinking about the data and avoiding errors and correlations that aren't meaningful. There will be spectacular errors as companies incorrectly reject hypothesis, adopt other ones and mistakenly conclude that there are relationships between data that are meaningful.

2017: Cloud combines with big data and data warehousing as a service, analytics as a service and data as a service become the norm. Few companies actually think of building their own Hadoop clusters doing the integration work. Big data infrastructure is just there. Note: 2017 is a guess on when these big data as a service efforts will be common to the masses. The big data as a service game is starting now, but will hit critical mass later.

How does big data play out for the IT buying cycle? By its very nature, big data projects require more C-level types in the ball game. CIOs are still important---and arguably the center of the technology decisions---but there's a gaggle of execs at the table. Here's breakdown:

  • CIO: Big data projects allow CIOs to finally break past that "are we aligned?" phase. 
  • CFO: All of this information flow is utopia for CFOs who rally behind the cause as a way to control costs and maximize revenue. One risk is that companies lose that human element that inspires big bets. 
  • Chief Marketing Officer: In 2012, CMOs became the belles of the IT spending ball. That focus is likely to be premature. Why? CMOs will primarily rely on external data and signals for their projects. Companies just aren't there yet unless they're consumer facing. CMOs have budget though. Also: Can big data engineer marketing influence?
  • Chief Operating Officers, Procurement officers: Big data will allow inventory, supplies and manufacturing processes to be tracked from beginning to end. Efficiency will improve once the analysis is figured out.
  • Data scientists: These folks will increasingly be seen as C-level material. Career wise, data wonks can write their own tickets.



Topics: Big Data, TechLines, Making the Business Case For Big Data

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • The more likely scenario

    In 2014 Big Data will be replaced by the next fad.

    Numerous failed projects using Hadoop, MongoDB etc, etc will be airbrushed out of corporate history.

    The rest of the world will continue to represent its data the modern way - in a relational DBMS - giving the huge advantages of uniform data representation allowing highly flexible manipulation with a very small number of operators.

    Beats the programming intensive world of big data tools hands down.
    • Its already a success

      Worked on some pilot projects where it was successful. I don't know what big data did to you to make you this angry :P.
      I agree with this article.
      • I'm not angry

        just trying to persuade people that they are wasting their time and energy with these approaches.

        As I say, relational DBMSs will still be around when the big data approaches are buried and forgotten, so why bother with "big data" at all?

        Back in the 90s object oriented DBMS were going to take over everything. Does anyone know of a major company running core systems on an OODBMS?

        In the 2000s it was XML DBMS. Same question.

        The big data tools are a regression to long dead pre-relational methods. They will be shown to lead to all the same problems in due course and will be abandoned.
        • "don't stop, believing"

          We use RDBMs for tons of stuff they are horrible at, because for years it was all we had. My first exposure outside was BerkeleyDB, which mind you Oracle though enough of to buy. Mike Olsen (Sleepycat's former CEO) is now at Cloudera.

          RDMSs are great at what they do well, just awful at scaling.

          Where I think the value of BigData lies is in turning what was the domain of the BI/DW tools into something real time that drives applications. Amazon's product recommendations, LinkedIn's people you may know, etc. We've had great success using Hadoop/Hive (EMR) and Mahout at AWS in conjunction with S3 to do things for our customers we never dreamed of before this stack was available.

          Don't worry jorwell, the traditional RDBMs ain't going anywhere. I'm glad they fulfill your needs.
          • I'm not the slightest bit worried

            I just want to use modern methods, not antiquated big data approaches. I last used key value pairs on 1970s vintage mini-computers that didn't have the power to run ISAM. Hadoop is basically a piece of industrial archeology.

            You will be back with relational in a year or two, believe me.
          • show me your modern methods crunching TBs of data daily ...

            for a few grand a month. I am with Hive. What's better is that I can side load 1TB of data from S3 into Hive on Hadoop in a few minutes. Try that with your relational tools.

            I love RDBMs (MYSQL is my personal choice) but you have to admit there are use cases out there where they are just massive overkill. Otherwise we wouldn't need DWs, we'd just keep everything around forever (which is what S3/Glacier allow me to do).

            The traditional funnel model (of pairing data down over time) is no longer necessary.

            Not everything needs ACID, sometimes giant hash tables are just what the doctor ordered. Otherwise keep paying $750K for the solution I am doing for $2K/mo. That is a head to head we did recently.
          • All this has nothing to do with the relational model

            The relational model is a purely mathematical model of how to represent data.

            To say that the relational model isn't scalable is similar to saying that long division isn't scalable because your only implementation is pencil and paper.

            I cannot see the sense of abandoning something as articulate and flexible as the relational model and I don't think it is necessary to do so. The current supposed limitations of RDBMSs are to do with current implementations and nothing to do with the model.

            Brewer's CAP theorem proves that distributed systems are fundamentally inconsistent - so to my mind distributed should be to the tool of last resort for scalability.
        • RDMBS sucks at most non-traditional applications

          Good luck with managing Peta bytes with RDBMS
          • What is an "RDMBS"?

            I've never heard of such a thing.

            What is a "non-traditional application" for that matter?

            What for that matter is an "application"? Are you sure you can define the term clearly?

            If you meant RDBMS then it does not need saying often enough that the relational model is a logical model for representing data. It makes no sense to talk about it not being scalable. Scalability is a question of how you choose to implement the logical model. It's just another problem to be solved.

            I know everyone has decided they hate Oracle lately because of them having got hold of Java and MySQL, but Oracle didn't invent the RDBMS. Oracle (and the other SQL-DBMSs) are not a very faithful implementation of the model and you can also argue that their physical implementation is far from optimal.

            From my perspective the relational model is one of the most powerful intellectual tools we have available today. It makes no sense whatsoever to abandon it - especially not for methods that appear to be a rerun of all the methods relational replaced because they were so complex and inflexible.
        • There are companies that use Big Data

          jorwell, you make some very important points. The relational model has been around for forty years, and it's not going anywhere soon. In most cases it is the hands-down best choice for managing data, and this will not change. While I agree wih the article somewhat, what I see happening in the near future is that companies will make a poor choice and use a Big Data solution instead of a good RDBMS to handle a data management issue that's pretty conventional. I think Big Data solutions will suffer a few more black eyes than what is described here in the article.

          At LexisNexis, we have used a Big Data solution--the High Performance Computing Cluster (HPCC) coupled with the ECL programming language--for over a decade, and it is by far and away the best solution for managing and consuming our data. Much of our data is unstructured, and this is where HPCC really shines: it accomplishes in minutes what an RDBMS does in hours or days. I've been an ECL programmer for over five years, but I've also designed and built 3NF relational databases and SPs, so I agree with you without hesitation that RDBMSs are the way to go. Just not in all cases.

          I think that the next few years will be crucial for the Big Data industry beause we will be discovering what the best usages for it will be, while debunking a lot of hype. I suspect a lot of companies will use it badly, but I suspect also that we'll find applications for Big Data solutions that will be surprising.

          For more information concerning the Big Data solution we use here at LexisNexis, take a look at Some good whitepapers there (some of which describe the performance benefits), along with a downloadable platform and ancillary tools, and a developer forum.
          Chris Albee
  • Big Data has legs

    After seeing customer work first-hand, Big Data has legs beyond the hype cycle. Whether it plays out to the timing you've laid out, we'll see, but I think you have the elements right.
  • The Approach to Data Will Change as Well

    I think there's something missing from the prognostication and that's how our approach to data will change. Right now, big data is looked at like data has always been looked at: structured. It's in columns and rows. It's name/value pairs. The issue with that is that structure inherently limits insight because it presupposes result (i.e., you put it into a structure because you were looking for certain answers). As "data scientists" begin to emerge, so too will a new perspective on data. Think about it like a river or stream. And not just data that a company "thinks" important (that is imposed structure again) but ALL data. This will support having a "conversation" with data. What if sales of a product weren't impacted so much by price drop as by traffic on neighboring streets or in relation to a holiday celebration or the weather? And when this fundamental change happens, businesses will realize the need to empower their workforce to carry out these conversations. People, not automation, will uncover patterns. Look at projects like Fold.It. Computers can crunch (hadoop clusters can process) but ultimately, machines cannot mimic human ingenuity. And when the perspective on data changes (from "analysis" to "conversations"), this will become paramount.

  • Great

    Great article, you make really good point please go and read my blog