2 of 29Image
Hadoop on Amazon's Cloud
There are lots of ways to run Hadoop, but what if you want to start working with it right away, without the distraction of building a cluster yourself? Your best bet is probably a cloud-based Hadoop cluster, and the Elastic MapReduce (EMR) service on Amazon Web Services (AWS) can get you there pretty speedily.
To get an EMR cluster up and running, you'll need to create an AWS account at http://aws.amazon.com, and you'll want to create a security key pair too. There are several other steps of course, and we'll cover them, one by one, in this gallery.
Pick a distro
Amazon refers to the process of standing up an EMR cluster as creating a "job flow." You can do this from the command line, using a technique we'll detail later, but you can also do it from your browser. Just navigate to the EMR home page in the AWS console at https://console.aws.amazon.com/elasticmapreduce, and click the Create New Job Flow button at the top left. Doing so will bring up the Create a New Job Flow dialog box (a wizard, essentially), the first screen of which is shown here.
An EMR cluster can use Amazon's own distribution of Hadoop, or MapR's M3 or M5 distrubution instead. M5 carries a premium billing rate as it not MapR's open source distro.
Those just experimeting with Amazon's Elastic MapReduce can get started immediately by running a sample application, rather than running their own code on their own data. Amazon offers WordCount (the ubiquitous Hadoop sample application) as well as a Hive-based contextual advertising sample, Java and Pig-based log analysis samples and another Java-based sample that looks at data from Amazon's CloudBurst service.