Six months down the line from its creation, the Open Data Platform Hadoop initiative driven by Pivotal and Hortonworks has today unveiled new members, work on a core spec and reference implementation, plus a formal governance structure.
The initiative caused controversy at its launch in February because of its declared aim of defining a core set of open-source Apache technologies to speed adoption of Hadoop.
Opponents dismissed it as a marketing effort and argued that interoperability across projects is not a major issue.
In a move that could further grate with those not in the Open Data Platform camp, the initiative is also now being hosted at the Linux Foundation as a collaborative project.
The original 14 members of GE, Hortonworks, IBM, Infosys, Pivotal, SAS, and AltiScale, plus Capgemini, CenturyLink, EMC, Teradata, Splunk, Verizon and VMware have now been joined by Ampool, DataTorrent, Linaro, Squid Solutions, SyncSort, Toshiba, UNIFi, Xiilab, zData and Zettaset. WANdisco joined in March.
"The state of Apache Hadoop demands open standardisation and integration that can accelerate and ease deployments among its massive user community," Linux Foundation executive director Jim Zemlin said in a statement.
"We've seen this model work with open-source technologies experiencing rapid growth - projects like Debian, among others - and know it can increase adoption and open up opportunities for innovation on top of an already-strong Hadoop community."
On GitHub, the initiative describes itself as "a non-profit defining, testing and promoting open standards and technologies based on the big-data ecosystem".
The Open Data Platform initiative, or ODPi as it now styles itself, said more than 35 maintainers from 25 companies have collaborated on the creation of an initial ODPi core specification and reference implementation.
Their goal is to simplify upstream and downstream qualification efforts, the initiative said.
A certification program is also being developed "to ensure consistency and compatibility across the big-data ecosystem".
Under the initiative's governance model, a group of developers will form a technical steering committee chosen on the basis of expertise and the value of their contributions.
All initiative members will have an equal voice on core decisions, whatever their level of investment, to ensure "equality among all participants and an industry-wide consolidation of enterprise requirements".
The Open Data Platform will also be voting in a board of directors who will be responsible for the financial, legal and promotional aspects of the initiative.
Earlier this year Hortonworks president Herb Cunitz likened the Open Data Platform's attempt to create a core set of Hadoop components to the Linux kernel. He argued there will be no impact upstream on software development in the Apache Software Foundation.
However, Mike Olson, chief strategy officer and co-founder of major Hadoop provider Cloudera, which is not part of the Open Data Platform, said in response to today's announcement that his company's position on the initiative has not changed.
"The Apache Software Foundation is the home of the Apache Hadoop ecosystem, and the place that the work on the projects belongs. Setting up separate governance and code at the Linux Foundation is confusing and unnecessary," he said.
"We'll continue to make our substantial contributions to Apache. The community there, now embracing Apache Spark, Apache Kafka and more, is vibrant and innovating wonderfully."
More on Hadoop