Running Hadoop on a Raspberry Pi 2 cluster

Last week I wrote about a 300 node cluster using Raspberry Pi (RPi) microcomputers. But can you do useful work on such a low-cost, low-power cluster? Yes, you can. Hadoop runs on massive clusters, but you can also run it on your own, highly-scalable, RPi cluster.

I've been involved with cluster computing ever since DEC introduced VAXcluster in 1984. In those days, a three node VAXcluster cost about $1 million. Today you can build a much more powerful cluster for under $1,000, including much more storage than anyone could afford back then.

Hadoop is the open-source version of Google's Map/Reduce and Google File System (GFS), widely used for large data-crunching applications. It is a shared-nothing cluster, which means that as you add cluster nodes, performance scales up smoothly.

Raspberry Pi: Hands-on with the Pi-Desktop kit | Raspberry Pi's smaller, cheaper rival: NanoPi Neo Plus2 weighs in at $25 | This is why you need to learn the Raspberry Pi 3 (ZDNet Academy) | Building a 300 node Raspberry Pi supercomputer | Raspberry Pi: Google plans more AI projects to follow DIY voice recognition kit | Raspberry Pi computing cluster: What I'm using it for, and what I've added to it

In the paper, Performance of a Low Cost Hadoop Cluster for Image Analysis, researchers Basit Qureshia, Yasir Javeda, Anis Kouba, Mohamed-Foued Sritic, and Maram Alajlan, built a 20 node RPi Model 2 cluster, brought up Hadoop on it, and used it for surveillance drone image analysis. They also benchmarked the RPi cluster against a 4-node PC cluster based on 3GHz Intel i7 CPUs, each with 4GB of RAM.

Configuration

The 20 node cluster was divided into four, 5-node subnets, each attached to 16 port switches that are, in turn, networked to a managed 24 port core switch. The extra switch ports enable easy cluster expansion.

Each 700MHz RPi B runs Raspbian, an ARM-optimized version of Debian Linux. Each RPi has a Class 10, 16 GB SD card capable of up to 80MB/s read/write speeds. An image of the OS with Hadoop 2.6.2 was copied onto the SD cards. The Hadoop Master node, which implements the name-node only, was installed on a PC running Ubuntu 14.4 and Hadoop.

TechRepublic: Raspberry Pi laptop? Here's a super-simple kit you can build yourself | The 20 silliest Raspberry Pi projects | Windows 10 face-off: Raspberry Pi thin client vs modern laptop | Raspberry Pi: Build your own turbo-charged cluster with OctaPi | How to give your Raspberry Pi 'state-of-the art computer vision' using Intel's Neural Compute Stick | Raspberry Pi add-on lets you build your own AI assistant powered by Amazon, Google and Microsoft | Raspberry Pi Zero W: The smart person's guide

Performance results

You'd expect a cluster of 64-bit, 3GHz x86 CPUs to be much faster than 700MHz, 32-bit ARM CPUs, and you'd be right. The team ran a series of tests that were a) compute-intensive (calculating Pi), b) I/O intensive (document word counts), and, c) both (large image file pixel counts).

Here's the word count results, taken from a figure in the paper.

hadooprpiperf.jpg
Courtesy of the authors

In general, the x86 cluster was 10-20 times faster. However, the ability to put a Hadoop cluster in a backpack with a battery, opens up possibilities for powerful edge computing, such as the drone video pre-processing the authors explore in their paper. Also, today we have the RPi Model 3, with a processor with almost double the clock speed of the RPi tested by the researchers.

The Storage Bits take

Mobile edge clusters aren't a thing today, but they will be, because our ability to gather data at the edge is growing much faster than network bandwidth to the edge. We'll have to pre-process, for example, IoT data to compact it for network transmission.

When will they be economically viable? Three things have to happen:

  • Mobile processors have to get faster, while remaining power efficient.
  • More power efficient memory - whether low-power DRAM, or NVRAM - must enable larger memory cacacities on mobile processors.
  • Universal Flash Storage (UFS) support on mobile processors, removing the current storage bottleneck of micro-SD cards.

All three will happen in the next five years. Then backpack clusters will be capable of real work out in the wild.

Courteous comments welcome, of course.

Newsletters

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
See All
See All