Hortonworks plays balancing act in the cloud

Among the big elephant platform providers, Hortonworks remains just about the last one not shy of having Hadoop associated with its name. But after this week's DataWorks Summit, it's apparent that Hortonworks and data warehousing providers are aiming to play the same Switzerland of analytics in an increasingly cloud-centric world.

dataworks-summit-san-jose-2017-keynote.png

In some ways Hortonworks is old fashioned in that it still clings to the stretch goal of managing half of the world's data in an era where cloud object stores and bespoke analytic services are adding more alternatives to the mix. Hortonworks' aspirational goal may not be realistic, but never mind, there are bigger fish to fry.

The underlying message from this year's North American DataWorks Summit and analyst briefings is that the company is competing and facing the challenges of navigating a multipolar cloud world.

My big on data bro Andrew Brust reported the headlines coming out earlier in the week: Hortonworks is releasing the 3.0 version of its data platform that, confusingly, is based on Hadoop 3.1. As we reported back at the start of the year, the 3.x generation of Apache Hadoop will mark a watershed with containerization and storage. HDP 3.0 adds YARN support for running Docker containers, meaning you can run containerized jobs with all dependencies and configurations rolled in. It also supports erasure coding, providing a path to tiering data, and begins abstracting support for specialized hardware such as GPUs.

The company is stabilizing financially. Coming atop a Q4 that beat the street, the most recent Q1 quarter that ended in May showed positive cash flow. Unlike Cloudera, Hortonworks has pulled off this feat so far without cutting R&D. But among all vendors whose platforms are based on Hadoop, there's a common thread of battening down the hatches especially when it comes to customer acquisition - just that Hortonworks is less vocal about it.

Following last year's announcement of a significantly ramped up relationship with IBM extending to joint go to market and product, the open question was whether this year would show progression toward marriage. After a year, there was a noticeable impact on new HDP customers among the IBM base, but in the grand scheme of things, not yet a very impressive one. The slow ramp-up of the relationship speaks to the challenge of turning a huge organization like IBM on a dime, and to the reality that the Hortonworks customer base still values independence. But there was one new development in the IBM relationship: it is opening a new IBM Hosted Analytics with Hortonworks (IHAH) cloud service that will also bundle IBM Db2 Big SQL, and the IBM Data Science Experience. By the way, we didn't come up with that acronym.

Even with Hortonworks and IBM trying to become BFFs, Microsoft is hardly folding its cards. This is the company where Azure HDInsight provided Hortonworks its first major OEM channel. Hortonworks and Microsoft re-upped the Azure relationship, expanding it to the IaaS side where there is new joint development and support for optimizing HDP on core Azure infrastructure. On the horizon, we expect expanded support of Azure Data Lake Storage (ADLS), a more optimized form of cloud storage, matching a strategy that Cloudera has already signed onto.

Let's not forget Google Cloud. Hortonworks has taken the first major step to optimize for the GCP platform with support of Google Cloud Storage. That puts Google on par with what Hortonworks already does with AWS and Azure.

But with the flurry of cloud announcements comes a more measured attitude coming from Hortonworks customer base. While the company does not break out cloud revenues, it estimates that roughly 20% of its base has at least one HDP implementation in the cloud. Given that Hadoop players like Hortonworks are doubling down on expanding business with existing clients, the relatively deliberate pace of cloud adoption is understandable as this would largely entail migrating existing workloads from early adopters who likely already have the skills to manage their clusters. Sure, as more workloads involve data that lives in the cloud, you'll see a higher percentage of the installed base implement there. But remember that with the sweet spot of the Hortonworks installed base being early Hadoop adopters, this is not the primary cohort demanding cloud simplification.

And with cloud, Hortonworks and other providers of Hadoop platforms are no longer the only games in town for big data analytics. There are plenty of a la carte services for running R or python projects, not to mention machine learning and deep learning workloads, and with cloud storage becoming the de facto data lake, you don't necessarily need Hadoop to run them. The difference that Hadoop provides is governance, but that is also the domain for data warehouse incumbents who are also eyeing running more diverse analytics workloads.

That sets the stage for the frenemy relationships of all incumbent providers with the AWSs, Azures, and GCPs of the world. Strange as it seems to imagine Hortonworks, or Cloudera and MapR for that matter, grouped as part of the on-premise "legacy," they face the challenge to counter the perception that cloud provider native platforms like EMR, Cloud Dataproc, or point services are becoming the new big data default in the cloud.

For Hortonworks, that's where Dataplane Services (DPS) comes in. As we reported last fall, DPS is actually a catalog of catalogs for registering and cataloging data services. To make DPS more usable, Hortonworks is beginning to roll out a series of task- or role-oriented plug-ins beginning with Data Analytics Studio, that lets you explore Hive metadata; and Data Steward Studio, which has just been released in preview for discovering which clusters get access to the NameNode and check for outliers such as PII data that has not been properly tagged or masked. But that's just the beginning - we expect DPS will play a growing role in making HDP more cloud-agnostic.

With the tone of the conference keynotes shifting from noises about Apache zoo animals to out-takes from a data warehousing conference (emphasizing themes like the importance of data quality), Hortonworks is striving for a message of enterprise normalcy. Hadoop should not be that strange outlier platform sitting in the corner. Keep your eyes on projects like Apache Ozone which, finally after years of gestation, start making Hadoop look like a normal citizen, not only in the cloud, but also in the enterprise data center.

Clairification: Thank you, Roman V Shaposhnik, for correcting the record. Hortonworks has has had the Ozone proposal on the table for several years for making Hadoop's file system compatible with cloud object stores. It has not currently a formal open source project with Apache or any other entity. But given that object stores are increasingly supplanting HDFS in the cloud, never say never.