Business

Amazon puts Hadoop data-crunching in the cloud

Customers can use the new Amazon MapReduce to pay only for the capacity they use as they perform tasks such as indexing the web, mining data or conducting financial analysis

Written by Larry Dignan, Contributor April 3, 2009 at 5:04 a.m. PT

Amazon on Thursday announced a new cloud-computing service that uses Hadoop, an open-source software framework, to crunch large amounts of data.

The service, called Amazon Elastic MapReduce, is designed for businesses, researchers and analysts trying to conduct data-intensive number crunching. Hadoop, an Apache-run distributed-computing technology used by companies such as Yahoo, is being promoted for the enterprise datacentre by startups such as Cloudera.

Amazon's Hadoop framework runs on the company's Elastic Compute Cloud (EC2) and Simple Storage Service (S3). Customers that use MapReduce will be able to pay only for the capacity they use, as they do things such as index the web, data mine or conduct financial analysis, simulation and bioinformatics research.

Amazon Elastic MapReduce works by creating data-processing jobs that are carried out by Hadoop software on EC2, the company said in its announcement. The service automatically launches and configures EC2 instances according to the customer's specifications. Next, it uses Hadoop to load large amounts of data from S3, and that data is then divided up for parallel processing using EC2. Once that is done, the data is recombined and the results are put back into S3.

Editorial standards

Show Comments

Amazon puts Hadoop data-crunching in the cloud

Related

Not into the Tesla Powerwall? You can now buy the Anker Solix X1

The Jackery Explorer 1000 is one of the best portable power stations you can buy, and it's on sale

Can Meta AI code? I tested it against Llama, Gemini, and ChatGPT - it wasn't even close