Yahoo has spun off its internal Hadoop development team into an independent company, named Hortonworks.
The company, which counts both Yahoo and Benchmark Capital as investors, was announced on Tuesday.
"With Apache Hadoop, companies can connect thousands of servers to process and analyse data at supercomputing speed," Yahoo said in a statement. "Yahoo pioneered, is the primary contributor to, and one of the leading users of Apache Hadoop."
"We anticipate that within five years, more than half the world's data will be stored in Apache Hadoop," Eric Baldeschwieler, Hortonworks' chief executive and former head of software engineering for the Hadoop team at Yahoo, said in the statement. "We've assembled a top calibre team committed to the Apache open-source community."
Hadoop is an open-source data analytics framework that companies can use to analyse, mine and compute large amounts of data from disparate sources. It is based on Google's internal tools, MapReduce and the Google File System. Yahoo was Hadoop's primary developer in 2005, though the Apache Foundation project has subsequently broadened to include contributions from many major web companies.
Yahoo already runs Hadoop across 40,000 servers, processing five billion jobs per month.
We anticipate that within five years, more than half the world's data will be stored in Apache Hadoop.– Eric Baldeschwieler, Hortonworks
"Hadoop has gone from a powerful solution for specific problems to a foundational technology for almost any business," Raymie Stata, Yahoo's chief technology officer, wrote in a blog post. "And there is a wealth of important work happening, not just on core Hadoop, but on the greater stack as well, with projects like Hbase, Hive, PIG, and Ooozie making Hadoop ever more useful for a broad set of applications, from crunching web search results and global retail transactions to analysing genetic code."
Facebook, Twitter, Yahoo, Adobe and eBay all use Hadoop, according to the Hadoop wiki. Hortonworks will have an initial staff of about 25 employees, Jay Rossiter, senior vice president of Yahoo's Cloud Platform Group, told Infoworld.
Besides Yahoo, a range of companies offer commercial distributions of the framework. MapR has become EMC's preferred partner for Hadoop with its high-availability distribution; Cloudera's Hadoop has high resiliency and broad open-source application support; and Syncsort has a version of its DMExpress software with support for the Hadoop Distributed File System (HDFS).
On Wednesday Platform Computing announced its own distribution as well, which it called Platform MapReduce. It has high availability, policy-driven workload scheduling and supports multiple file systems besides HDFS.
Get the latest technology news and analysis, blogs and reviews
delivered directly to your inbox with ZDNet UK's