Cloudera, maker of the most implemented Hadoop distribution, has announced the release of version 4 of its open source Hadoop distribution, known as CDH (Cloudera's Distribution including Apache Hadoop) as well as Cloudera Enterprise 4.0.
And this isn't just an increment of Cloudera's Hadoop distribution, since CDH 4 includes Hadoop 2.0, bringing a number of new core Hadoop features to general availability, in a stable, supported production release.
Here are some highlights the new capabilities offered by CDH 4:
- High Availability HDFS, wherein the the Hadoop Distributed File System's name node can be supported by a standby, no longer forcing a single point-of-failure
- MapReduce 2.0, which, ironically, allows for data processing algorithms beyond MapReduce and providing a more integrated option for writing MapReduce code in languages other than Java.
- The ability to continue using MapReduce 1.0 while using the other components of Hadoop 2.0
- Table and column-level permissions for HBase
- "Co-processors" for HBase, offering functionality similar to relational database insert triggers
- More granular job scheduling, providing better support for multi-tenant cluster use
- A RESTful Web service interface to HDFS
- A Web browser-based shell for Apache Pig and HBase
- Numerous performance improvements in MapReduce, HDFS and Flume
And here are some highlights of what's new and interesting in Cloudera Enterprise 4.0:
- Wizard-based setup and management of multiple clusters from one console
- Heatmaps for health status reporting of your Hadoop clusters
- Support for Oracle 11g, MySQL or PostgreSQLto store cluster metabase/meta data
One of the remarkable things about Cloudera, beyond the capabilities of its Hadoop distro is the number of partnerships the company has built with Business Intelligence and other Big Data firms. Today, Cloudera announced that the number of such partnerships has reached 250.
The company explained to me that it has a dedicated partner engineering team, ensuring the the partnerships are not merely business relationships, but truly reliable integrations that benefit the ecosystem. That must explain why almost everyone in the Big Data business wants a Cloudera partnership.
This is a big day for Big Data. The ecosystem is robust, the core platform is increasing in sophistication and reliability, and the manageability tools are getting mcuh closer to contemporary data center standards. Let's keep an eye on how these new features get absorbed and exploited by ecosystem companies and products.