X
Tech

EMC tunes Hadoop for Greenplum data analytics

EMC has crafted a distribution of the Hadoop data processing project and announced compatible hardware for its Greenplum product range.The storage company hopes the hardware and software products, announced on Monday at EMC World in Las Vegas, will allow it to provide an enterprise-grade Hadoop platform for large-scale data processing and analytics.
Written by Jack Clark, Contributor

EMC has crafted a distribution of the Hadoop data processing project and announced compatible hardware for its Greenplum product range.

The storage company hopes the hardware and software products, announced on Monday at EMC World in Las Vegas, will allow it to provide an enterprise-grade Hadoop platform for large-scale data processing and analytics.

Hadoop is an open-source project, administered by the Apache Software Foundation, which combines a file system and a high-performance parallel data processing tool along with various other modules to provide a platform for analysing large unstructured datasets.

Hadoop is suited to data analysis tasks like fraud detection, web crawling and behavioural analysis, EMC said.

"Information, in itself, is not a means to an end. What you need to do is get intelligence from it, analyse it, and that's where data analytics plays", EMC chief executive Joe Tucci said in a keynote speech on Monday alongside the announcement.

Hadoop is used by an ever-growing roster of companies across the world, including Facebook, Yahoo, Adobe, eBay and Rackspace.

EMC says it believes its Greenplum portfolio can broaden the domain Hadoop plays in from the current crop of web-focused companies and into enterprises with a need for scalable data analytics services because the Hadoop distribution has been tuned for its own data analysis hardware.

The EMC Greenplum HD Data Computing Appliance is a data processing appliance that fuses Hadoop with the Greenplum database and sits on top of a variant of EMC's existing Greenplum data computing appliance. By combining Greenplum with Hadoop, EMC says it has created a platform that can analyse structured and unstructured datasets in real-time in parallel.

EMC Greenplum HD Enterprise Edition is a commercial Hadoop distribution by EMC for the enterprise. It has features that, for the time being, are not open-source. These include simplified Hadoop cluster deployment, automatic failure detection, data management features and multi-site management.

EMC claims the Enterprise Edition has two to five times the performance of standard Hadoop distributions. Cloudera is one of the larger players in the space and it released Cloudera's Distribution including Apache Hadoop Version 3 (CDH3) in April.

The open-source community has not been forsaken with the announcement, as EMC also announced EMC Greenplum HD Community Edition. This will be an entirely open source Hadoop distribution that has been tuned for Greenplum, EMC said. The distribution comprises of the Hadoop Distributed File System (HDFS), MapReduce, ZooKeeper, Hive and HBase.

The hardware appliance will be available in the third quarter of 2011 and the software distributions will become available within a quarter, EMC said. Prices were not disclosed.

Now in competition with Cloudera

The announcement sees EMC change its Hadoop implementation strategy. Formerly, the company had a partnership with Hadoop distribution and integration company Cloudera to develop Hadoop for the forthcoming Greenplum Chorus software along with other products in the Greenplum portfolio.

With Monday's announcement, EMC is branching out on its own by distributing, integrating and supporting a version of Hadoop tailored for its Greenplum product range.

As a result, Cloudera is now a competitor for the company, Luke Lonergan, chief technology officer for EMC's Greenplum division, confirmed to ZDNet UK at a press conference.

At the time of writing ZDNet UK had not received a response from Cloudera on how the announcement may alter its Hadoop partnership with EMC.

Editorial standards