'

Hortonworks revamps its stack, further embraces Apache Spark

A partnership with Hewlett Packard Enterprise, enhancements to Apache DataFlow and a new release cadence all mean one thing: Hortonworks is serious about making Hadoop fit into enterprise IT environments.

Yesterday in public Webinar, Hadoop distribution vendor Hortonworks made a number of announcements. Lawrence Dignan was ready with a news item, right as the Webinar began, to give us news. Meanwhile, I tuned into the Webinar and took a briefing with Hortonworks about an hour after it ended, to get a detailed understanding of just what was announced and what it all means.

The announcements, which I'll cover now, break down into three areas

Apache Spark, including a partnership with Hewlett Packard Enterprise
Technologists at Hewlett Packard Enterprise (HPE) Labs have optimized Apache Spark's "Shuffle Engine" by rewriting it in C++. They claim these optimizations have yielded up to 15x performance improvements for certain workloads. HPE wishes to open source this code (ostensibly to check it in as part of the standard Spark codebase) and will work with Hortonworks to do so. In addition, Hortonworks is now including Spark 1.6 in its Hortonworks Data Platform (HDP) Hadoop distribution, and claims it is the first Hadoop vendor to do so.

Hortonworks DataFlow
Hortonworks DataFlow (HDF), Hortonworks' data in motion (streaming data) package, based on Apache NiFi, now includes Apache Storm and Apache Kafka. Previously, customers needed to get these two components from HDP, thus requiring both HDP and HDF subscriptions to obtain support from Hortonworks when using NiFi with Storm and/or Kafka. Now it's possible to use HDF with other vendors' Hadoop distributions and still receive support on the HDF side from Hortonworks.

Also noteworthy is that Hortonworks announced a partnership with Impetus Technologies to make that company's StreamAnalytix product (which I wrote about recently) integrate smoothly with HDF.

New packaging and release schedule/cadence for Hortonworks Data Platform
Hortonworks is aligning HDP with the Hadoop core and extended delineations defined by the Open Data Platform initiative (ODPi), of which Hortonworks is a founding member. The company will reduce the core components' release cadence to once annually. Core components include YARN, HDFS, MapReduce and Zookeeper. Other components, like Hive, Pig and Spark will release more frequently.

On the extended side, Hortonworks is pushing out a new release of Apache Ambari, along with SmartSense, which helps make Hadoop more manageable and operational issue resolution more automated. Hortonworks used the term "single pane of glass" to describe how Ambari simplifies management of Hadoop. If you doubted Hortonworks' Enterprise ambitions before, the use of that very Enterprise-y term should remove any confusion.

What it all means
With the details of the announcements established, let's try to find some commonalities in them, and correlate them with the need Hortonworks has to boost its sagging stock price and otherwise revamp/reboot its market approach.

The first and most important commonality in all of this is Enterprise readiness. Less frequent releases are much easier and less costly to manage for corporate IT departments. While Hadoop was still a laboratory tool, keeping up with the "latest and greatest" bits was something customers wanted. But once Hadoop clusters have been stood up across an organization or are centralized and relied upon operationally by numerous corporate constituencies, stability becomes important and upgrade volatility becomes expensive and risky. Moving to annual releases of the core platform, therefore, makes a great deal of sense. Doing that under the cover of the ODPi makes it feel like an industry standard, even if the other big Hadoop vendors, Cloudera and MapR, will have nothing to do with ODPi.

In terms of Apache Spark, Hortonworks recognizes that it's now an industry standard. Meanwhile, the fact remains that -- as a memory-oriented technology -- it doesn't scale to the same data volumes as do MapReduce and Tez. That leaves Hortonworks with two requirements: embrace Spark wholeheartedly, and work hard to make it scale better. That's the story behind the HPE partnership, and aligning with an Enterprise software and server company doesn't hurt either.

On the DataFlow end of things, there are two takeaways: streaming data is hot, and so is Kafka. While NiFi is cool technology too, shipping a streaming data package without Kafka was likely turning out to look anomalous to customers, and needing an HDP subscription to put NiFi and Kafka together may have rubbed customers the wrong way, smacking of vendor lock-in. Putting Kafka (and Storm) into HDF removes this objection, and it has the added upside of letting Hortonworks sell HDF into accounts that are standardized on Cloudera's Distribution including Apache Hadoop (CDH).

Carpe diem
Put all these announcements together and what you get is a rationalized stack, with a manageable release cadence, a good corporate data center manageability story, streaming data access that is enhanced and simplified, and enhancements to Apache Spark to make it more ready for Enterprise Big Data workloads.

Now the combination of HDP and HDF becomes a true Enterprise IT sale, and HDF on its own becomes a wedge into competitive accounts. While this may not work miracles on Hortonworks' stock price, which is now less than half of its IPO levels, it's a good collection of sensitive, well thought-out moves in the right direction, and most likely precedent-setting for others in the industry.