With last month's launch of the Open Data Platform initiative, the time is fast approaching when larger vendors involved in Hadoop will have to pick an alliance - because the option of going it alone is too risky, according to Hortonworks president Herb Cunitz.
Cunitz, whose firm jointly founded the Open Data Platform with EMC and VMware spinoff Pivotal, believes market pressures will oblige Hadoop firms to decide on their allegiances if they want to exert any influence on the direction of the big-data technology.
"The larger vendors have to proclaim. This market is moving very quickly. They really only have two choices of how to play the market," Cunitz said.
"[They can either] work with what we're driving around Hortonworks Data Platform and the Open Data Platform and the alliance; or they can say, 'Hey, I don't want to align with anybody. I'm just going to take whatever comes out of Apache Software Foundation', and then they're beholden to package it, support it, distribute it themselves."
However, a number of businesses have already tried that second approach, only to fail, Cunitz said.
"We've seen in the market company after company do that and then exit the space because they realise it's very hard to do. Unless you have the committers, you have no influence on what's being driven into Apache. You're just a downstream packager and then it's a very difficult business to monetise. Unless you have influence, it's not a monetisable business," he said.
"It's very hard for anyone else to influence [Apache Hadoop] because all those people either work for us, or the next company would be Cloudera, or somebody else in the space. There are only a handful of those people around."
Aimed at defining a core set of Apache technologies to speed adoption of Hadoop, the Open Data Platform's founding members of GE, Hortonworks, IBM, Infosys, Pivotal, SAS, and AltiScale, plus Capgemini, CenturyLink, EMC, Telstra, Teradata, Splunk, Verizon and VMware will test and certify a number of primary Apache components, which will then form the basis of their Hadoop platforms.
"What everyone agreed on is a Hadoop kernel. Think of it like the Linux kernel, so this doesn't impact upstream software development in the Apache Software Foundation. That goes on exactly as it's always done and we'll continue to be the leader in doing that," Cunitz said.
"This is downstream from that. These vendors just agree on what the kernel is that they're going to package and build on top of. That drives alignment across the industry to say if we've agreed on a common kernel, it's a lot easier for us to build the APIs, the interfaces and focus on the things on top of the kernel than to argue what the parts are in the kernel."
Cunitz said some big companies, such as Microsoft and HP, are notably absent from the Open Data Platform.
"The kernel of YARN, Ambari and Hadoop are the foundation of Hortonworks Data Platform. It's the same bits [as the Open Data Platform]. Many of the other vendors we've already partnered with said, 'I don't need to go pay to join that because I'm already getting that by being partnered with Hortonworks," he said.
"[Open Data Platform] was many of the others, who are not partnered today, saying, 'Let's adopt a standardised kernel'. What I do expect is more market uptake, [and] less arguing across vendors."
Whether any of those already in a relationship with Hortonworks will ultimately end up joining the Open Data Platform remains to be seen but Cunitz expects more companies to join.
"If it stays in the format it is now, it's fine. I would think naturally over time we would see others who would either join in one of two ways. They would either join the Open Data Platform and the kernel or join like Microsoft and HP have with us on the broader, full Hortonworks Data Platform," he said.
"My expectation would be you will see others in the market this year proclaim and join into one of those paths - or candidly join an alternative path. Some would argue there is a competing initiative. It's called Cloudera-Intel. It's never been positioned that way, but obviously they're aligned and they're not part of this and in some ways you could argue that's already a competing initiative."
Microsoft's existing involvement with Hortonworks would not preclude its joining the Open Data Platform one day.
"They may. Don't read anything into what I'm saying here. They can make their own choice. They are already getting what they would want in terms of driving standardisation through our partnership with Microsoft today because they've not only adopted the kernel, they've adopted all of Hortonworks Data Platform as their standard, their standard inside Azure, their standard inside HDInsight [the Azure Hadoop cloud service].
"They're already building what I would call a bigger kernel jointly with us. So for them to say we've also agreed on the smaller kernel [of the Open Data Platform], well, they already did it on the bigger one. It doesn't really buy them anything."
Last week Hortonworks reported its first quarterly results since its initial public offering in December raised $100m and prompted its stock to rise 65 percent.
"In the last quarter the number that's most impressive is the 99 new subscription customers. That means 99 new companies have become customers and are paying us to go support them and work with them on Hortonworks as a platform. The quarter prior we had 232 total customers. So to go from 232 to add 99 in one quarter, you can do the math on the slope of that curve," Cunitz said.
"Those 99 new customers are by default using the exact same kernel that's in the Open Data Platform. So as, an example, if they choose to say, 'I want to run the Pivotal HAWQ SQL engine' that can now plug right into Horton Data Platform and the kernel and they can take advantage of that. It's certified and it runs out of the box."
"We're in the early stages of it and this is by far from not over for anybody but we're very comfortable on the direction and the strategy and how this is playing out."
More on Hadoop and big data
- Couchbase ties into Hortonworks Hadoop for single analytics and transaction datastore
- Databricks CEO: Why so many firms are fired up over Apache Spark
- MySQL: Percona plugs in TokuDB storage engine for big datasets
- Cloudera links up with Hadoop developer Cask
- Mesosphere and MapR link up over Myriad to create one big data platform to rule them all
- Teradata rolls out big data apps, updates Loom
- MapR CEO talks Hadoop, IPO possibilities for 2015
- Teradata acquires archival app maker RainStor
- Hortonworks expands certification program, looks to accelerate enterprise Hadoop adoption
- Actian adds SPARQL City's graph analytics engine to its arsenal
- Splice Machine's SQL on Hadoop database goes on general release