Search
  • Videos
  • 5G
  • Windows 10
  • Cloud
  • Innovation
  • Security
  • Tech Pro
  • more
    • Apple
    • ZDNet Academy
    • Microsoft
    • Mobility
    • Hardware
    • Executive Guides
    • Best VPN Services
    • See All Topics
    • White Papers
    • Downloads
    • Reviews
    • Galleries
    • Videos
    • TechRepublic Forums
  • Newsletters
  • All Writers
    • Log In to ZDNET
    • Join ZDNet
    • About ZDNet
    • Preferences
    • Community
    • Newsletters
    • Log Out
  • Menu
    • Videos
    • 5G
    • Windows 10
    • Cloud
    • Innovation
    • Security
    • Tech Pro
    • Apple
    • ZDNet Academy
    • Microsoft
    • Mobility
    • Hardware
    • Executive Guides
    • Best VPN Services
    • See All Topics
    • White Papers
    • Downloads
    • Reviews
    • Galleries
    • Videos
    • TechRepublic Forums
      • Log In to ZDNET
      • Join ZDNet
      • About ZDNet
      • Preferences
      • Community
      • Newsletters
      • Log Out
  • us
    • Asia
    • Australia
    • Europe
    • India
    • United Kingdom
    • United States
    • ZDNet around the globe:
    • ZDNet China
    • ZDNet France
    • ZDNet Germany
    • ZDNet Korea
    • ZDNet Japan

Hadoop on your PC: Cloudera's CDH4 virtual machine

1 of 10 NEXT PREV
  • download-vm.jpg

    Everyone's talking Big Data and Hadoop, but how can you start to learn the technology if you have to set up a whole cluster first? This isn't just about the servers, but also the Hadoop software itself, and various companion products like Hive and Pig.  If you're Windows person without Linux/Unix skills, this may be more daunting still.  But there is a solution.

    Cloudera, maker of perhaps the most widely-deployed Hadoop distribution, offers a a pre-built, training-appropriate, 1-node Hadoop cluster in virtual machine (VM) form, and it's free.  In this gallery, I'll show you the step-by-step of how to download, configure and use the VM, and how to get at the various Hadoop distro components within it.

    To start, visit Cloudera's Web site to download the CDH4 (Cloudera Distribution including Apache Hadoop, version 4) VM, as shown here.  The VM image is available in VMWare, Virtual Box and KVM formats.  Our work here is done under VMWare.

    If you're using Windows Server Hyper-V, you can download the VMWare image and convert it to VHD format.  Windows 7 Virtual PC won't work as it only supports 32-bit VMs and the CDH4 image is a 64-bit VM.

    Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

    Caption by: Andrew Brust

  • download-vmware-player.jpg

    If you're not running the full-fledged VMWare product, never fear.  Just download the free VMWare player, then use it to run the CDH4 image.

    Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

    Caption by: Andrew Brust

  • connecting-to-network.jpg

    If you put the VM on your network, you'll be able to use Hadoop from your own PC (I'll show you how in the next two screens).  To make this work, just get into VMWare's Virtual Machine Settings dialog box, select the Network Adapter device and then the Connected check box.

    Now you're ready to get the VM's IP address and surf to it from your own browser.  Continue to the next screen for details on getting the IP address.

    Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

    Caption by: Andrew Brust

  • determine-ip-address.jpg

    Cloudera's CDH4 runs its own Web server and a Web-based user interface, called Hue, sporting consoles for MapReduce, HDFS and Hive, along with browser-based command line shells for HBase and Pig.

    You can run Hue from your own PC's (i.e. the host's) Web browser, as long as you know the IP address of the VM.  Inside the VM, open an X terminal emulator command prompt (by clicking the highlighted icon at the bottom of the screen) and enter the ifconfig command to determine the address.  You'll see what to do with that address in the next screen...

    Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

    Caption by: Andrew Brust

  • hue-login.jpg

    To get to Hue, just open a browser on your own host PC (use Firefox or Chrome -- I had trouble with IE), and navigate to port 8888 at the IP address you just discovered.  For example, if the IP address were 192.168.1.35 (as shown in the previous screen), you'd type the following into your browser's address bar:
    http://192.168.1.35:8888

    This will take you to the Hue login screen, pictured above.  Login using "cloudera" as both the user name and password.


    Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

    Caption by: Andrew Brust

  • beeswax.jpg

    After logging in to Hue, its browser-based user interface for Hive, called Beeswax, comes up by default.  Beeswax allows you to enter SQL (well, technically, HiveQL) queries and see the results.  You can also save queries, upload files, edit Hive's settings, or create user-defined functions.

    Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

    Caption by: Andrew Brust

  • hdfs-browser.jpg

    Used to a file system GUI?  Don't feel like typing commands like "lsr" and "mkdir" at the shell prompt?  Never fear, because Hue's browser-based HDFS user interface is here.  Just click the file cabinet icon in Hue's toolbar to get there.

    Shown above is Hive's root folder.  Clicking on the "warehouse" link would let you view the files which are queryable as tables in Hive.

    Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

    Caption by: Andrew Brust

  • mr-job-design.jpg

    Click the Job Designer icon (4th from the left in Hue's toolbar), then the Create Mapreduce Design button, and up comes a screen that lets you run a jar-based MapReduce job.  Give the job a name and a description, specify the jar file, add properties (parameters) and property values, then click Save.

    You'll be automatically redirected to the Job Designs screen where you can click the Submit button for the job design you just built.  You can also click the Create Streaming Design button to design a job whsoe mapper and reducer code is written in a language other than Java.

    Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

    Caption by: Andrew Brust

  • grunt.jpg

    Click the Hue Shell Toolbar button to get to browser-based command line shells for Pig and HBase.  Below the toolbar, you'll find one clickable link for each.  The Pig shell (called "Grunt") is shown here.  At this prompt you can enter Pig Latin commands interactively or run .pig scripts.


    Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

    Caption by: Andrew Brust

  • hbase-shell.jpg

    Click the HBase Shell link beneath Hue's toolbar and you'll come to HBase's command line interface.  Read more about the commands you can use in this interface here.

    ---

    That's pretty much all there is to working with the CDH 4 VM!  Of course, if you prefer the command line for everything, then you can go back to an X terminal window and do all your work from there.

    No matter which way you work, you can now start to learn Hadoop, Pig, Hive and more.  And you can build your own clusters later on, if you're into that sort of thing :-)

    Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

    Caption by: Andrew Brust

1 of 10 NEXT PREV
  • download-vm.jpg
  • download-vmware-player.jpg
  • connecting-to-network.jpg
  • determine-ip-address.jpg
  • hue-login.jpg
  • beeswax.jpg
  • hdfs-browser.jpg
  • mr-job-design.jpg
  • grunt.jpg
  • hbase-shell.jpg

Want to learn Hadoop without building your own cluster or paying for cloud resources? Then download Cloudera's Hadoop distro and run it in a virtual machine on your PC. I'll show you how.

Read More Read Less

Everyone's talking Big Data and Hadoop, but how can you start to learn the technology if you have to set up a whole cluster first? This isn't just about the servers, but also the Hadoop software itself, and various companion products like Hive and Pig.  If you're Windows person without Linux/Unix skills, this may be more daunting still.  But there is a solution.

Cloudera, maker of perhaps the most widely-deployed Hadoop distribution, offers a a pre-built, training-appropriate, 1-node Hadoop cluster in virtual machine (VM) form, and it's free.  In this gallery, I'll show you the step-by-step of how to download, configure and use the VM, and how to get at the various Hadoop distro components within it.

To start, visit Cloudera's Web site to download the CDH4 (Cloudera Distribution including Apache Hadoop, version 4) VM, as shown here.  The VM image is available in VMWare, Virtual Box and KVM formats.  Our work here is done under VMWare.

If you're using Windows Server Hyper-V, you can download the VMWare image and convert it to VHD format.  Windows 7 Virtual PC won't work as it only supports 32-bit VMs and the CDH4 image is a 64-bit VM.

Published: August 27, 2012 -- 13:00 GMT (06:00 PDT)

Caption by: Andrew Brust

1 of 10 NEXT PREV

Related Topics:

Big Data Analytics Digital Transformation Robotics Internet of Things Innovation Enterprise Software
LOG IN TO COMMENT
  • My Profile
  • Log Out
| Community Guidelines

Join Discussion

Add Your Comment
Add Your Comment

Related Galleries

  • 1 of 3
  • When chatbots are a very bad idea

    Not every business problem can be solved by using chatbots. Here are some inappropriate uses for the AI tool.

  • How ubiquitous AI will permeate everything we do without our knowledge.

    Most of us do not know that we are using chatbots to talk to service agents, so how will we know that AI will be seamlessly interacting in with our future lives? ...

  • Streaming becomes mainstream

    The endless streams of data generated by applications lends its name to this paradigm, but also brings some hard to deal with requirements to the table: How do you deal with querying ...

  • Photos: How FC Barcelona uses football player data to win games

    FC Barcelona is focusing on data analysis to give it an edge on the soccer field and at the bank.

  • Heart and sleep apps that work with the Apple Watch

    If you want to track sleep and heart health, these apps will get you going.

  • Azure HDInsight click-by-click guide: Get cloud-based Hadoop up and running today

    Click by click, we'll show you how to get Microsoft's Apache Hadoop-based big bata service up and running.

  • Hands-on with Azure Data Lake: How to get productive fast

    Microsoft's Azure Data Lake is now generally available, but what does it do, and how does it work? Here's a tour around the service's tooling and capabilities, to help you understand ...

ZDNet
Connect with us

© 2019 CBS Interactive. All rights reserved. Privacy Policy | Cookies | Ad Choice | Advertise | Terms of Use | Mobile User Agreement

  • Topics
  • All Authors
  • Galleries
  • Videos
  • Sponsored Narratives
  • About ZDNet
  • Meet The Team
  • Site Map
  • RSS Feeds
  • Reprint Policy
  • Manage | Log Out
  • Join | Log In | Membership
  • Newsletters
  • Site Assistance
  • ZDNet Academy
  • TechRepublic Forums