Cloudera and Hortonworks' merger closes; quo vadis Big Data?

The two biggest Hadoop distribution vendors are now one. What does this mean for the Big Data world, for customers and for Apache Hadoop?
Written by Andrew Brust, Contributor

Yesterday, in the immediate aftermath of the holidays, and without much fanfare, the Cloudera/Hortonworks merger closed.  The two biggest Hadoop distribution vendors, each of which could trace its lineage back to the original Hadoop team at Yahoo, and who maintained a sometimes tiresome rivalry, are now one.

The new entity is called Cloudera, will trade under that company's original ticker symbol (CLDR) and will be led by its erstwhile CEO, Tom Reilly.  And whereas the old Cloudera pushed its Enterprise Data Hub, Reilly says the new one will "...become the leading enterprise data cloud provider" (emphasis mine).

Many in the industry have cited this merger as proof positive that Hadoop is in some sort of death spiral.  If two leading vendors in the Hadoop space are joining forces, the thinking goes, that must mean there isn't enough business in the space to sustain them as separate entities.

Same thing, only different

I would say that argument expresses things in a converse format.  The more germane observation is that, as separate entities, Cloudera and Hortonworks were having a net negative impact on Hadoop and Big Data market.  By combining, they will be eliminating headwinds for the industry and for their own book of business.

Cloudera and Hortonworks had a destructive rivalry.  Above the core Hadoop platform itself, each company developed and/or backed competing components in the Big Data stack.  Hortonworks backed Apache Atlas for governance, Ambari for cluster management, Ranger for access controls, and Hive for data warehousing, while Cloudera Navigator, Cloudera Manager, the Cloudera-backed Apache Sentry and Apache Impala focused, respectively, on the same territorial patches.  Even the core Hadoop platform saw fragmentation as Hortonworks put its weight behind an open source component called Tez, while Cloudera thought Apache Spark was the better candidate there.

All of this bickering created a tale of two Hadoop platforms; two elephants creating significant customer risk of betting on the wrong horse.  The consequences were diluted ecosystem cohesiveness that begat compatibility nightmares for software vendors, as well as fragmented community and standards that resulted in customer confusion, frustration and indecision. In short, it was bad for business.

Unified front; array of competitors

The new Cloudera will have its work cut out for it, in terms of anointing the winning entrant in each of these categories. Helping customers transition through the changes won't be easy but, if it's handled correctly, the eventual impact will be stabilizing and customer satisfaction should improve.

Cloudera will still face stiff competition. On the Hadoop DNA side, MapR has been busy for years building an Enterprise-oriented stack for analytics and AI with a blend of proprietary technology powered by open source technologies and APIs.  In the Spark world, where both Hortonworks and the legacy Cloudera also played, Databricks is a formidable challenger.

Moving to the cloud, Amazon and Google each offer Hadoop services based on their own distributions.  And whereas Microsoft's cloud Hadoop service, Azure HDInsight, was developed with Hortonworks and based on that company's Hadoop distro, maybe Microsoft will bite the bullet and go the DIY route, too.  Cloudera will offer multi-cloud Hadoop to counter each of the major cloud providers' house brands.  But it will still compete with the likes of Snowflake, Vertica (MicroFocus) and Greenplum (Pivotal) in the independent data warehouse arena.

Microsoft is also blending Hadoop and Spark technologies into the next release of its flagship SQL Server on-premises database platform.  And IBM, SAP and Oracle have integrated Hadoop and Spark into their platforms too, sometimes in partnership with Hortonworks or Cloudera, and sometimes on their own.  Yes, there are pockets of resistance everywhere.

David or Goliath?

Suddenly, Cloudera, the newly unified Hadoop juggernaut, seems like the underdog.  What if Big Data is just a capability set and not a new platform category after all? And with competition from startups, cloud providers and Enterprise megavendors, how will the new Cloudera fare?

First, take a deep breath.  Now consider a few things that should give pause to Cloudera's competitors.  The new Cloudera has the dream team of Doug Cutting, Arun Murthy, Mike Olson and Hilary Mason under one roof.  It's under the leadership of Tom Reilly, with former Hortonworks CEO Rob Bearden on the Board of Directors.  The combined brain trust and ambition of that group is going to be tough to beat.

The new Cloudera will also offer a stack that covers data lake, data warehouse, IoT, and AI.  On-premises and in the cloud.  For data at rest and in-motion.  On bare metal, VMs and soon in containers.  All without the multi-decade vintage technology baggage weighing down many of its competitors.  Plus, it's the company that's helped many of those competitors step into the Big Data world in the first place.  So while these companies are competitors who will compete, they'd be well-advised to avoid the route of "haters gonna hate."

Dark clouds; silver lining

One could say it's do or die time for Big Data, Hadoop and Data Lakes as a category.  One could also say that the space has had too many vendors and that consolidation is overdue.  The Enterprise software companies have real competency in this arena now, making it hard for pure-play startups to remain viable, and for any hubris on their part to be much more than a liability.

All that's true.  But we're in the age of data.  We're also in the age of AI and machine learning, which are, ultimately, data disciplines.  Decisively doubling down on data ain't dumb.  Doing it with a team of data rock stars doesn't suck.  Competing as a relatively young, agile company against Enterprise behemoths isn't crazy.  And offering customers an all-up data, analytics, IoT and AI portfolio, in an era when so many other companies are still playing catch-up, sounds like an opportunity to me.

This isn't a terminus, it's a commencement.  And while life as an adult may be scary compared to being a student, it's also where the big opportunities arise.  Training is over, rivalries are set aside and opportunities are arrayed and waiting.  Let's see how this combined team rises to the occasion.  The outcome will be impactful on the tech industry overall.

Editorial standards