Business

Amazon puts Hadoop data-crunching in the cloud

Customers can use the new Amazon MapReduce to pay only for the capacity they use as they perform tasks such as indexing the web, mining data or conducting financial analysis

Written by Larry Dignan, Contributor April 3, 2009 at 5:04 a.m. PT

Amazon on Thursday announced a new cloud-computing service that uses Hadoop, an open-source software framework, to crunch large amounts of data.

The service, called Amazon Elastic MapReduce, is designed for businesses, researchers and analysts trying to conduct data-intensive number crunching. Hadoop, an Apache-run distributed-computing technology used by companies such as Yahoo, is being promoted for the enterprise datacentre by startups such as Cloudera.

Amazon's Hadoop framework runs on the company's Elastic Compute Cloud (EC2) and Simple Storage Service (S3). Customers that use MapReduce will be able to pay only for the capacity they use, as they do things such as index the web, data mine or conduct financial analysis, simulation and bioinformatics research.

Amazon Elastic MapReduce works by creating data-processing jobs that are carried out by Hadoop software on EC2, the company said in its announcement. The service automatically launches and configures EC2 instances according to the customer's specifications. Next, it uses Hadoop to load large amounts of data from S3, and that data is then divided up for parallel processing using EC2. Once that is done, the data is recombined and the results are put back into S3.

Editorial standards

Show Comments

Amazon puts Hadoop data-crunching in the cloud

Related

I did not expect this $170 Android tablet to be as impressive as it is

My 2 must-have tools to make DIY projects a lot less frustrating (and they're cheap)

The best indoor TV antenna you can buy: Expert tested