New Hadoop survey makes big data predictions for 2016

The results from the second annual Syncsort Hadoop survey are in and the 250 high-level respondents predict some surprising future trends of big data.
Written by Ken Hess, Contributor

In a new survey conducted by Syncsort, 250 prominent respondents including data architects, IT managers, developers, business intelligence/data analysts, and data scientists weigh in on big data trends to watch in 2016. Two-thirds of those surveyed work in companies with over $100 million in annual revenue. Industries represented are financial services, healthcare, government, and retail. The big trend for 2016 is the move away from Hadoop experimentation into full production with big data analytics.

2016's big three trends are:

  • Apache Spark production deployments
  • Conversion from other platforms to Hadoop
  • Leveraging Hadoop for advanced use cases

The uptick in Apache Spark is a bit of a surprise at a full 70 percent of respondents stating that Spark is the platform that they're most interested in. MapReduce came in at a distant second at 55 percent. However, Syncsort's big data analysts predict that MapReduce will remain the primary compute framework for production deployments. But the numbers tell a different story. With 70 percent of the respondents expressing a keen interest in Apache Spark, MapReduce deployments may in fact reduce over the next twelve months.

The two primary factors in this interest in Spark is that it is easy to deploy and its speed. Because Spark runs in memory, it requires big iron. Its speed also highlights one of MapReduce's biggest problems: its high-latency, batch-mode response.

But like the Syncsort experts, I believe that people will hang onto MapReduce for a while longer.

The conversion or offload from expensive platforms to open source Hadoop is a significant shift. The old mainstays of mainframe and the enterprise data warehouse are becoming too expensive to deal with when cheaper alternatives are screaming for attention. The respondents agree to the tune of 63 percent stating that Hadoop will help them increase business and IT agility. Fifty-five percent expect to increase operational efficiency and reduce costs. And 51 percent want to use Hadoop to make more data available to business users.

More than half the respondents view Hadoop as a way to innovate by using social media data and data from IoT sources. Oddly, only 4.9 percent reported interest in advanced use cases involving mobile apps and software.

Tendü Yoğurtçu, General Manager of Syncsort's Big Data business, stated that "As Hadoop adoption becomes mainstream, the number of applications in production increases and the use cases, frameworks and data sources become more varied and complex. Organizations realize significant benefits from Hadoop; however, they also cite challenges in keeping up with new tools and skills, connectivity and data movement, and unforeseen costs".

Syncsort also made two other trend predictions for 2016:

Businesses will adopt and leverage real-time data sources. IoT, of course, will play a major role in this adoption, but other use cases will as well, such as fraud detection, telemetry analytics, security data, and insurance claim validation. I'll add my own prediction for social media as one of those real-time data sources.

Syncsort's analysts also predict that data governance and security will be major areas of focus this year. I'm not convinced that it requires a mainframe computer to predict that security will be a focus for an increased dependence on data.

I'm going to add another prediction to the mix for this year. I foresee a new rise in data broker businesses which sell data to other companies. Data collection, storage, and analysis have big potential for startups wishing to sell cleverly consolidated or correlated data. Think about it. Rather than guessing where the best location is to set up a new cupcake shop, ask the data. Realtors should have a field day with the prospect of using big data to predict when and where to buy houses, how much the client should pay, and how rapidly a particular market will rise or fall.

2016 just might be the year of big data analytics with Apache Spark leading the way. What I'm hoping will come true is an easier way for would-be data consumers to consume data. I'm also hoping that people stop talking about big data and someone starts doing something with all that big data. If not, I can make another prediction -- the person who creates a big data shredder will be the world's next billionaire.

What do you think of these predictions? Will Apache Spark displace MapReduce as a processing framework? Has MapReduce run its course? Talk back and let me know.

Editorial standards