Some three weeks after the launch of the Open Data Platform, the Hadoop initiative driven by Pivotal and Hortonworks can now name another new member, in the form of distributed computing specialist WANdisco.
The San Ramon, California, and Sheffield, UK-based company, which is a corporate contributor to various open-source projects including Apache Hadoop and Apache Subversion, has allied itself to the Open Data Platform cause of defining a core set of Apache technologies to speed adoption of Hadoop.
Its founding members of GE, Hortonworks, IBM, Infosys, Pivotal, SAS, and AltiScale, plus Capgemini, CenturyLink, EMC, Telstra, Teradata, Splunk, Verizon and VMware will test and certify a number of primary Apache components, which will then form the basis of their Hadoop platforms.
WANdisco co-founder and CEO David Richards said in a statement its decision to join the Open Data Platform underlines his firm's "commitment to software based on open standards that give customers a choice, instead of locking them in to a proprietary platform".
WANdisco sells clustering products based on an engine that allows multiple instances of the same app to run on independent hardware without shared resources. This patented technology enables, for example, Hadoop to be available across datacentres regardless of their distance apart, while eliminating downtime and data loss, the company said.
Pivotal, spun out of EMC and VMware in 2013, said at the time of the Open Data Platform launch that it will work directly with specific Apache projects, adhering to the Apache Software Foundation guidelines on contributing ideas and code. The goal is to increase compatibility and make it easier for apps and tools to run on any compliant system.
Hortonworks president Herb Cunitz this week likened the Open Data Platform's attempt to create a core set of Hadoop components to the Linux kernel, with no impact upstream on software development in the Apache Software Foundation.
However, Mike Olson, chief strategy officer and co-founder of major Hadoop provider Cloudera, which is not part of the initiative, recently suggested that the Open Data Platform may well turn out "to be no more than a retrograde marketing effort".
Hadoop creator Doug Cutting, who is Cloudera's chief architect, has also queried the use of the word 'open' in the new initiative's title, while pointing to the existing Apache Bigtop community effort to create a standard distribution.
Another major Hadoop provider not participating in the initiative, MapR, has said interoperability across projects is not a major issue and that the market, especially the Apache Software Foundation, works well. It considers the Open Data Platform more of a partner program than a community initiative.
Against those positions, Hadoop veteran Raymie Stata, founder and CEO of Open Data Platform member Altiscale and former Yahoo CTO, recently wrote a blogpost setting out the problem that the Open Data Platform initiative seeks to solve.
He said because each Apache Software Foundation project that makes up Hadoop "is governed independently, somebody needs to take responsibility for releasing the ecosystem as a whole".
"To certify on Hortonworks, you need to test on at least Hadoop 2.2.0 and Hive 0.12.0, and Hadoop 2.4.0 and Hive 0.13.0. For Cloudera's CDH 5.1, turns out the combinations would be Hadoop 2.3.0 and Hive 0.12.0, while for CDH 5.2 it would be Hadoop 2.5.0 and Hive 0.13.1. For Pivotal and IBM you'd be looking at slightly different combinations again," Stata said.
"This proliferation of baskets creates significant drag when it comes to building reliable applications running on top of Hadoop. All these shifting complexities, across many, many customers, makes it harder for customers to assess which basket of Hadoop they need and harder for application developers to create solutions that work broadly."
More on Hadoop and big data
- Couchbase ties into Hortonworks Hadoop for single analytics and transaction datastore
- Databricks CEO: Why so many firms are fired up over Apache Spark
- MySQL: Percona plugs in TokuDB storage engine for big datasets
- Cloudera links up with Hadoop developer Cask
- Mesosphere and MapR link up over Myriad to create one big data platform to rule them all
- Teradata rolls out big data apps, updates Loom
- MapR CEO talks Hadoop, IPO possibilities for 2015
- Teradata acquires archival app maker RainStor
- Hortonworks expands certification program, looks to accelerate enterprise Hadoop adoption
- Actian adds SPARQL City's graph analytics engine to its arsenal
- Splice Machine's SQL on Hadoop database goes on general release