2 of 10Image
Everyone's talking Big Data and Hadoop, but how can you start to learn the technology if you have to set up a whole cluster first? This isn't just about the servers, but also the Hadoop software itself, and various companion products like Hive and Pig. If you're Windows person without Linux/Unix skills, this may be more daunting still. But there is a solution.
Cloudera, maker of perhaps the most widely-deployed Hadoop distribution, offers a a pre-built, training-appropriate, 1-node Hadoop cluster in virtual machine (VM) form, and it's free. In this gallery, I'll show you the step-by-step of how to download, configure and use the VM, and how to get at the various Hadoop distro components within it.
To start, visit Cloudera's Web site to download the CDH4 (Cloudera Distribution including Apache Hadoop, version 4) VM, as shown here. The VM image is available in VMWare, Virtual Box and KVM formats. Our work here is done under VMWare.
If you're using Windows Server Hyper-V, you can download the VMWare image and convert it to VHD format. Windows 7 Virtual PC won't work as it only supports 32-bit VMs and the CDH4 image is a 64-bit VM.
If you're not running the full-fledged VMWare product, never fear. Just download the free VMWare player, then use it to run the CDH4 image.
If you put the VM on your network, you'll be able to use Hadoop from your own PC (I'll show you how in the next two screens). To make this work, just get into VMWare's Virtual Machine Settings dialog box, select the Network Adapter device and then the Connected check box.
Now you're ready to get the VM's IP address and surf to it from your own browser. Continue to the next screen for details on getting the IP address.