EMC taps MapR technology for Hadoop distro

The storage specialist will use technology from MapR for its own enterprise Hadoop distribution and for its Greenplum data-analytics appliances, turning its back on partner Cloudera

Leading storage vendor EMC plans to use technology developed by MapR to power the Hadoop data-analytics platform in its Greenplum products.

The licensing agreement, announced on Wednesday, will see proprietary Hadoop-based software developed by MapR integrated into EMC's Greenplum data analytics appliances and own enterprise-class Hadoop distribution, the EMC Greenplum HD Enterprise Edition.

"With MapR we are able to provide [a] solution for high availability, fault tolerance, and enterprise-class support and service," Scott Yara, vice president of EMC's data-computing division, said in a statement. "Combined with the EMC Greenplum Database, we will enable the co-processing of both structured and unstructured data within a single... solution."

EMC has made Hadoop a priority for its data-analytics strategy. Bill Cook, general manager for the company's data computing division, told ZDNet UK earlier in May that the company hopes to salt data analytics — and, by extension, Hadoop — throughout its IT stack.

With MapR we are able to provide [a] solution for high availability, fault tolerance and enterprise-class support.

– Scott Yara, EMC

Hadoop is an open-source data-analytics software package administered by the Apache Software Foundation. Its core components — the Hadoop Distributed File System (HDFS) and the MapReduce distributed computing framework — are based on a data-analytics framework developed by Google. Hadoop is integral to many major web companies, such as Facebook, Twitter and Yahoo.

Silicon Valley-based MapR said in a statement on Wednesday that the technology it has developed on top of Hadoop allows it to eliminate single points of failure, allow snapshots for data protection and recovery, and dramatically raise the performance of the basic Hadoop framework.

Cloudera partnership

The main commercial distributor of Hadoop today is US-based Cloudera, with its Cloudera Distribution including Apache Hadoop Version 3 (CDH3). It has a solid foothold in the major companies; for example, Twitter uses the previous version, CDH2.

In September, Cloudera formed an alliance with EMC to integrate Cloudera's distribution with the Greenplum Chorus platform. By making MapR's proprietary technology the key to its enterprise Hadoop distribution, Cloudera appears to have been sidelined.

At the time of writing, Cloudera had not responded to requests for comment on the status of its alliance with EMC.

EMC did not give a timeline for the incorporation of MapR technology into its products. However, EMC said it plans to release its Hadoop software within the next few months.

Get the latest technology news and analysis, blogs and reviews delivered directly to your inbox with ZDNet UK's newsletters.