X
Business

Amazon puts Hadoop data-crunching in the cloud

Customers can use the new Amazon MapReduce to pay only for the capacity they use as they perform tasks such as indexing the web, mining data or conducting financial analysis
Written by Larry Dignan, Contributor

Amazon on Thursday announced a new cloud-computing service that uses Hadoop, an open-source software framework, to crunch large amounts of data.

The service, called Amazon Elastic MapReduce, is designed for businesses, researchers and analysts trying to conduct data-intensive number crunching. Hadoop, an Apache-run distributed-computing technology used by companies such as Yahoo, is being promoted for the enterprise datacentre by startups such as Cloudera.

Amazon's Hadoop framework runs on the company's Elastic Compute Cloud (EC2) and Simple Storage Service (S3). Customers that use MapReduce will be able to pay only for the capacity they use, as they do things such as index the web, data mine or conduct financial analysis, simulation and bioinformatics research.

Amazon Elastic MapReduce works by creating data-processing jobs that are carried out by Hadoop software on EC2, the company said in its announcement. The service automatically launches and configures EC2 instances according to the customer's specifications. Next, it uses Hadoop to load large amounts of data from S3, and that data is then divided up for parallel processing using EC2. Once that is done, the data is recombined and the results are put back into S3.

Editorial standards