X
Innovation

Cloudera releases Hadoop distro with open-source extras

The third version of Cloudera's Distribution including Apache Hadoop wraps a range of extra Hadoop-based applications for a 'pure' open-source software stack
Written by Jack Clark, Contributor

Cloudera has released the third version of its distributed number-crunching Hadoop product — Cloudera's Distribution including Apache Hadoop — along with a variety of other Hadoop-boosting open-source applications.

Cloudera's Distribution including Apache Hadoop Version 3 (CDH3) wraps the core services in secondary Hadoop applications to give "a 100-percent open-source enterprise data platform", Cloudera announced on Tuesday.

This is a pure open-source software stack, with unmatched levels of testing and production experience.
– Mike Olson, Cloudera

"CDH3 includes everything [companies] need to use Hadoop for real work right out of the box," Cloudera's chief executive Mike Olson said in a statement. "We put CDH3 through an exhaustive beta cycle with thousands of enterprises, including some of the most demanding production environments in the world... Make no mistake: this is a pure open-source software stack, 100-percent Apache licences, but with unmatched levels of testing, integration and production experience".

According to Cloudera, CDH3 supports recent Red Hat, Centos, Suse and Ubuntu Linux distributions with revised kernels that boost performance. Small MapReduce jobs now run up to three times faster and file-system input and output is 20-percent faster than with the previous version.

CDH3 can also be run from the Amazon and Rackspace clouds.

Cloudera provides a commercial distribution of Hadoop, targeted at the enterprise. It is a major contributor of code to the project, along with Facebook and Yahoo, which originally developed aspects of the system.

Hadoop is an open-source project administered by the Apache Software Foundation. It combines the Hadoop Distributed File System (HDFS) for storage and MapReduce, which is a high-performance parallel data processing tool. The system can run on commodity servers and can react to hardware failures by moving loads automatically.

Both HDFS and MapReduce are heavily influenced by Google's proprietary Google File System and Google MapReduce technologies, respectively.

Wrapped services

CDH3 wraps the core Hadoop components in a variety of open-source applications to form a complete stack. Along with HDFS and MapReduce, CDH3 comes with Hive, Pig, Oozie, Hue, Sqoop, HBase and Zookeeper.

Oozie is a workflow coordination service for running jobs on Hadoop; Hive is a batch-processing, data-warehouse infrastructure. Hue, or Hadoop User Environment, is a browser-based desktop interface for Hadoop.

Sqoop is a Cloudera-developed tool for interfacing with Hadoop via the command-line. Modelled on Google's BigTable, HBase is a big, distributed database.

ZooKeeper is a co-ordination tool for applications running on Hadoop. Hive allows for SQL-like queries and tables on Hadoop datasets, and Pig is a dataflow language and compiler.

Additionally, Cloudera has made sure that these applications integrate well together.

"CDH3 integrates all components and functions to interoperate through standard APIs, manages required component versions and dependencies and is maintained by Cloudera with regular patches for enterprise-class reliability," Cloudera said.


Get the latest technology news and analysis, blogs and reviews delivered directly to your inbox with ZDNet UK's newsletters.
Editorial standards