Big data: How the revolution may play out
Summary: Big data pilots in 2012 will go production in 2013 and 2014. Then the real fun begins.
If 2012 was the year of big data hype, interest and pilot projects, 2013 will bring production deployments, early returns on investment and a bit of disruption. By 2014, big data projects and systems are likely to be commonplace.
This year, big data became a tech term on par with cloud computing. The term means a lot yet is becoming used so much it loses its definition. By the way that definition typically revolves around velocity (data is moving fast), volume (there's too much of it) and variety (unstructured and structured information).
Does big data live up to the hype? Yes. To me, big data means technology and business alignment---that Holy Grail endlessly pursued by CIOs---becomes a no brainer. Big data projects by nature are about revenue, risk and profits. In other words, IT and the business can't help but be aligned.
Clearly, we're in a big data hype cycle that I put on par with the Linux and open source software craze in the late 1990s and early 2000s. Back then, Linux was going to change the world, kill Microsoft and other things. In many respects, Linux and open source software (Android for instance) did change everything. But a funny thing happened on the way to revolution---open source software became commonplace in every data center and now is take for granted. The revolution happened, but we just stopped talking about it as much. Cloud computing is playing out in a similar fashion.
Big data will follow this cycle too. Sure, millions of jobs will be created. And yes, talent pools will be stretched for a bit. Companies will also reinvent their industries. The vendor pecking order will be altered as startups like Cloudera become the new Red Hats. There will probably be a big data backlash of some sort (see cloud, sustainability etc).
Here's how I see the big data progression as we look ahead.
2013: Those 2012 pilots become production systems. Every vertical will have a big data success story. Oddly enough, success stories will be everywhere. Why? The big data projects are initiated by the business---CEOs, CFOs, CMOs---and IT is seen as an enabler not a cost center.
2014: Based on 2013 success stories and customer case studies, the fast followers will enter the big data game. Industries will all follow a big data playbook. Initially, these early returns will look good. Companies will primarily focus on internal data because there's a lot to mine there. Incorporating external data will be a nice to have, but nothing more at this stage.
2015: Companies will begin to look at external data in their big data plans. Before 2015, consumer facing companies spent the most time with external information and using it. Every analytics and data warehousing stack will have a Hadoop cluster and big data layer. Technologies like Hadoop cease to be a focus because they remain important, but fade into the software stack as a given. Big data mergers and acquisitions pick up steam.

2016: By this point, big data is seen as a utopia of sorts and companies become cocky---they always do. Data driven decisions replace gut feel and common sense. Early wins and common business cases are played out. Now companies have to start really thinking about the data and avoiding errors and correlations that aren't meaningful. There will be spectacular errors as companies incorrectly reject hypothesis, adopt other ones and mistakenly conclude that there are relationships between data that are meaningful.
2017: Cloud combines with big data and data warehousing as a service, analytics as a service and data as a service become the norm. Few companies actually think of building their own Hadoop clusters doing the integration work. Big data infrastructure is just there. Note: 2017 is a guess on when these big data as a service efforts will be common to the masses. The big data as a service game is starting now, but will hit critical mass later.
How does big data play out for the IT buying cycle? By its very nature, big data projects require more C-level types in the ball game. CIOs are still important---and arguably the center of the technology decisions---but there's a gaggle of execs at the table. Here's breakdown:
- CIO: Big data projects allow CIOs to finally break past that "are we aligned?" phase.
- CFO: All of this information flow is utopia for CFOs who rally behind the cause as a way to control costs and maximize revenue. One risk is that companies lose that human element that inspires big bets.
- Chief Marketing Officer: In 2012, CMOs became the belles of the IT spending ball. That focus is likely to be premature. Why? CMOs will primarily rely on external data and signals for their projects. Companies just aren't there yet unless they're consumer facing. CMOs have budget though. Also: Can big data engineer marketing influence?
- Chief Operating Officers, Procurement officers: Big data will allow inventory, supplies and manufacturing processes to be tracked from beginning to end. Efficiency will improve once the analysis is figured out.
- Data scientists: These folks will increasingly be seen as C-level material. Career wise, data wonks can write their own tickets.
More:
- Big data: brainstorming the possibilities
- Big data steps closer to mainstream
- Big data projects: Is the hardware infrastructure overlooked?
- 30 big data project takeaways
- Big data, analytics as a service: Likely boom on deck
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback
The more likely scenario
Numerous failed projects using Hadoop, MongoDB etc, etc will be airbrushed out of corporate history.
The rest of the world will continue to represent its data the modern way - in a relational DBMS - giving the huge advantages of uniform data representation allowing highly flexible manipulation with a very small number of operators.
Beats the programming intensive world of big data tools hands down.
Its already a success
I agree with this article.
I'm not angry
As I say, relational DBMSs will still be around when the big data approaches are buried and forgotten, so why bother with "big data" at all?
Back in the 90s object oriented DBMS were going to take over everything. Does anyone know of a major company running core systems on an OODBMS?
In the 2000s it was XML DBMS. Same question.
The big data tools are a regression to long dead pre-relational methods. They will be shown to lead to all the same problems in due course and will be abandoned.
"don't stop, believing"
RDMSs are great at what they do well, just awful at scaling.
Where I think the value of BigData lies is in turning what was the domain of the BI/DW tools into something real time that drives applications. Amazon's product recommendations, LinkedIn's people you may know, etc. We've had great success using Hadoop/Hive (EMR) and Mahout at AWS in conjunction with S3 to do things for our customers we never dreamed of before this stack was available.
Don't worry jorwell, the traditional RDBMs ain't going anywhere. I'm glad they fulfill your needs.
I'm not the slightest bit worried
You will be back with relational in a year or two, believe me.
show me your modern methods crunching TBs of data daily ...
I love RDBMs (MYSQL is my personal choice) but you have to admit there are use cases out there where they are just massive overkill. Otherwise we wouldn't need DWs, we'd just keep everything around forever (which is what S3/Glacier allow me to do).
The traditional funnel model (of pairing data down over time) is no longer necessary.
Not everything needs ACID, sometimes giant hash tables are just what the doctor ordered. Otherwise keep paying $750K for the solution I am doing for $2K/mo. That is a head to head we did recently.
All this has nothing to do with the relational model
To say that the relational model isn't scalable is similar to saying that long division isn't scalable because your only implementation is pencil and paper.
I cannot see the sense of abandoning something as articulate and flexible as the relational model and I don't think it is necessary to do so. The current supposed limitations of RDBMSs are to do with current implementations and nothing to do with the model.
Brewer's CAP theorem proves that distributed systems are fundamentally inconsistent - so to my mind distributed should be to the tool of last resort for scalability.
RDMBS sucks at most non-traditional applications
What is an "RDMBS"?
What is a "non-traditional application" for that matter?
What for that matter is an "application"? Are you sure you can define the term clearly?
If you meant RDBMS then it does not need saying often enough that the relational model is a logical model for representing data. It makes no sense to talk about it not being scalable. Scalability is a question of how you choose to implement the logical model. It's just another problem to be solved.
I know everyone has decided they hate Oracle lately because of them having got hold of Java and MySQL, but Oracle didn't invent the RDBMS. Oracle (and the other SQL-DBMSs) are not a very faithful implementation of the model and you can also argue that their physical implementation is far from optimal.
From my perspective the relational model is one of the most powerful intellectual tools we have available today. It makes no sense whatsoever to abandon it - especially not for methods that appear to be a rerun of all the methods relational replaced because they were so complex and inflexible.
There are companies that use Big Data
At LexisNexis, we have used a Big Data solution--the High Performance Computing Cluster (HPCC) coupled with the ECL programming language--for over a decade, and it is by far and away the best solution for managing and consuming our data. Much of our data is unstructured, and this is where HPCC really shines: it accomplishes in minutes what an RDBMS does in hours or days. I've been an ECL programmer for over five years, but I've also designed and built 3NF relational databases and SPs, so I agree with you without hesitation that RDBMSs are the way to go. Just not in all cases.
I think that the next few years will be crucial for the Big Data industry beause we will be discovering what the best usages for it will be, while debunking a lot of hype. I suspect a lot of companies will use it badly, but I suspect also that we'll find applications for Big Data solutions that will be surprising.
For more information concerning the Big Data solution we use here at LexisNexis, take a look at http://hpccsystems.com/. Some good whitepapers there (some of which describe the performance benefits), along with a downloadable platform and ancillary tools, and a developer forum.
Big Data has legs
The Approach to Data Will Change as Well
http://jasonthibeault.com/mind/2012/09/14/are-you-having-conversations-with-your-data/
@_jasonthibeault
Great