Big Data on Amazon: Elastic MapReduce, step by step

Big Data on Amazon: Elastic MapReduce, step by step

Summary: Curious how to go about doing Hadoop in Amazon's cloud? Here's some guidance.

SHARE:
TOPICS: Big Data
0

 |  Image 3 of 29

  • Thumbnail 1
  • Thumbnail 2
  • Thumbnail 3
  • Thumbnail 4
  • Thumbnail 5
  • Thumbnail 6
  • Thumbnail 7
  • Thumbnail 8
  • Thumbnail 9
  • Thumbnail 10
  • Thumbnail 11
  • Thumbnail 12
  • Thumbnail 13
  • Thumbnail 14
  • Thumbnail 15
  • Thumbnail 16
  • Thumbnail 17
  • Thumbnail 18
  • Thumbnail 19
  • Thumbnail 20
  • Thumbnail 21
  • Thumbnail 22
  • Thumbnail 23
  • Thumbnail 24
  • Thumbnail 25
  • Thumbnail 26
  • Thumbnail 27
  • Thumbnail 28
  • Thumbnail 29
  • Pick a distro

    Amazon refers to the process of standing up an EMR cluster as creating a "job flow."  You can do this from the command line, using a technique we'll detail later, but you can also do it from your browser.  Just navigate to the EMR home page in the AWS console at https://console.aws.amazon.com/elasticmapreduce, and click the Create New Job Flow button at the top left.  Doing so will bring up the Create a New Job Flow dialog box (a wizard, essentially), the first screen of which is shown here.

    An EMR cluster can use Amazon's own distribution of Hadoop, or MapR's M3 or M5 distrubution instead.  M5 carries a premium billing rate as it not MapR's open source distro.

  • Sample applications

    Those just experimeting with Amazon's Elastic MapReduce can get started immediately by running a sample application, rather than running their own code on their own data.  Amazon offers WordCount (the ubiquitous Hadoop sample application) as well as a Hive-based contextual advertising sample, Java and Pig-based log analysis samples and another Java-based sample that looks at data from Amazon's CloudBurst service.

  • Run your own app

    If you need to do production work, or just want to conduct a more free-form Hadoop experiment, you'll want to select the option to run your own application.  Picking HBase and clicking Continue is best, as this lets you add Hive and Pig as well.

Topic: Big Data

Andrew Brust

About Andrew Brust

Andrew J. Brust has worked in the software industry for 25 years as a developer, consultant, entrepreneur and CTO, specializing in application development, databases and business intelligence technology.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

Talkback

0 comments
Log in or register to start the discussion