There’s no shortage of year-ahead predictions in the tech industry. This is especially true in the Big Data world, but I’ve had little desire to write a post around any one Big Data company’s predictions. As it turns out, however, a number of companies in the space sent me their opinions on what’s going to happen next year. So I thought a roundup of some of these Big Data 2013 predictions, along my opinions of them, might be fun.
Let’s start with Hadoop itself. As the poster child Big Data technology, you won’t be surprised to learn that a number of companies offered a series of predictions focused exclusively on it. John Schroeder, CEO of MapR, predicts that "hardware will become optimized for use with Hadoop" and Mike Hoskins, CTO at Pervasive Software, says that "demand for enterprise-friendly Hadoop will reach a fever pitch."
I think both of these predictions make sense, and speak to the same overall need: harnessing not just the power of Hadoop but making its provisioning and integration in the corporate data center much more seamless. Perhaps that’s why MapR’s Schroeder also observed that "Hadoop expertise is growing rapidly, but a shortage of talent remains" and predicts that "SQL-based tools for Hadoop will continue to expand." The latter prediction is almost impossible to disagree with, as such tools have grown enormously in just the last quarter of this year, and show no signs of slowing down.
Not all the opinions were in agreement, however. Rainstor, which itself offers a SQL-Hadoop hybrid product, predicts that companies will look to new technologies, other than Hadoop, when managing Big Data. That contrasts with Pervasive’s prediction that "Existing data warehouses will fail" and MapR’s take on the market, wherein "Hadoop pulled away from the other Big Data analytics alternatives."
Moving past Hadoop takes us to some more nuanced predictions. Rainstor says "Enterprise Big Data initiatives will move out of the sandbox and define a clear set of business and technology requirements." MapR says "revenue generating use cases [will] trump cost saving applications." Essentially, other companies are predicting that customers will move to the next layer in the Big Data maturity model.
They’re right, but moving out of the sandbox brings requirements of rigor, which many Big Data users haven’t addressed comprehensively yet. Some of our fortune tellers seem to agree. Pervasive says "Data Quality will continue to be the 'Hot Potato' of the enterprise" while the folks at Progress DataDirect say "People will get overwhelmed by all of their data" and "fragmented data will creep into the picture."
All of these devilish details sound correct to me, and the BI world has been dealing with them for years now. If the Big Data world thought itself immune, that could only be due to the clouded vision arising from a technology hype cycle. Once you get beyond the gee-whiz stage, formidable problems can no longer escape focus.
Data Analysis and Visualization
Rainstor, Pervasive, MapR and Progress DataDirect were not the only industry players who sent me predictions. I also received some pontifications from marketing analytics-focused BlueKai and iOS data visualization vendor Roambi. The predictions form each of these companies were somewhat self-serving, and yet consistent with the theme of customers getting more sophisticated and dealing with issues that arise consequentially.
For example, Roambi says "Businesses are finding that half of their business data is not accessible for easy review, affecting decision-making and accuracy of projections." And Omar Tawakol, CEO of BlueKai, says that "forward-thinking brands will…re-evaluate their agency based on ability to identify ways to maximize the use of cross channel audience data and to measure data effectiveness and ROI beyond media performance."
Down with Complexity
My take on where Big Data technology is going comes down to two themes: a lessening dependency on MapReduce and a pushing down of Hadoop deeper into the enterprise software stack.
By the lessening dependency on MapReduce, I mean to say that products like Cloudera’s Impala, and Microsoft’s PolyBase, which bypass MapReduce and work directly against data stored in Hadoop’s Distributed File System (HDFS) will gain momentum.
MapR’s prediction about the continued rise of SQL-based tools aligns well with this, as does another prediction from Pervasive that "YARN [Yet Another Resource Negotiator] changes the Hadoop game." Pervasive explains that "YARN allows running not only MapReduce applications [on Hadoop] but multiple other application types as well."
And what do I mean by my prediction that Hadoop will be pushed deeper into the software stack? Simply that (a) Hadoop has gained such significant adoption that it has in effect become an industry standard and that (b) standards tend to become the foundation of higher-valued software tools, rather than tools in their own right. As such, I think we’ll see more BI and analytics tools integrate Hadoop’s functionality internally, and that our dependency on specialists who work with Hadoop directly will diminish.
I’m excited to enter the second calendar year of Big on Data and seeing which of these predictions turns out to be correct.