Running Hadoop on a Raspberry Pi 2 cluster

Last week I wrote about a 300 node cluster using Raspberry Pi (RPi) microcomputers. But can you do useful work on such a low-cost, low-power cluster? Yes, you can. Hadoop runs on massive clusters, but you can also run it on your own, highly-scalable, RPi cluster.
Written by Robin Harris, Contributor

I've been involved with cluster computing ever since DEC introduced VAXcluster in 1984. In those days, a three node VAXcluster cost about $1 million. Today you can build a much more powerful cluster for under $1,000, including much more storage than anyone could afford back then.

Hadoop is the open-source version of Google's Map/Reduce and Google File System (GFS), widely used for large data-crunching applications. It is a shared-nothing cluster, which means that as you add cluster nodes, performance scales up smoothly.

Raspberry Pi: Hands-on with the Pi-Desktop kit | Raspberry Pi's smaller, cheaper rival: NanoPi Neo Plus2 weighs in at $25 | This is why you need to learn the Raspberry Pi 3 (ZDNet Academy) | Building a 300 node Raspberry Pi supercomputer | Raspberry Pi: Google plans more AI projects to follow DIY voice recognition kit | Raspberry Pi computing cluster: What I'm using it for, and what I've added to it

In the paper, Performance of a Low Cost Hadoop Cluster for Image Analysis, researchers Basit Qureshia, Yasir Javeda, Anis Kouba, Mohamed-Foued Sritic, and Maram Alajlan, built a 20 node RPi Model 2 cluster, brought up Hadoop on it, and used it for surveillance drone image analysis. They also benchmarked the RPi cluster against a 4-node PC cluster based on 3GHz Intel i7 CPUs, each with 4GB of RAM.


The 20 node cluster was divided into four, 5-node subnets, each attached to 16 port switches that are, in turn, networked to a managed 24 port core switch. The extra switch ports enable easy cluster expansion.

Each 700MHz RPi B runs Raspbian, an ARM-optimized version of Debian Linux. Each RPi has a Class 10, 16 GB SD card capable of up to 80MB/s read/write speeds. An image of the OS with Hadoop 2.6.2 was copied onto the SD cards. The Hadoop Master node, which implements the name-node only, was installed on a PC running Ubuntu 14.4 and Hadoop.

21 of our favorite Raspberry Pi projects

TechRepublic: Raspberry Pi laptop? Here's a super-simple kit you can build yourself | The 20 silliest Raspberry Pi projects | Windows 10 face-off: Raspberry Pi thin client vs modern laptop | Raspberry Pi: Build your own turbo-charged cluster with OctaPi | How to give your Raspberry Pi 'state-of-the art computer vision' using Intel's Neural Compute Stick | Raspberry Pi add-on lets you build your own AI assistant powered by Amazon, Google and Microsoft | Raspberry Pi Zero W: The smart person's guide

Performance results

You'd expect a cluster of 64-bit, 3GHz x86 CPUs to be much faster than 700MHz, 32-bit ARM CPUs, and you'd be right. The team ran a series of tests that were a) compute-intensive (calculating Pi), b) I/O intensive (document word counts), and, c) both (large image file pixel counts).

Here's the word count results, taken from a figure in the paper.

Courtesy of the authors

In general, the x86 cluster was 10-20 times faster. However, the ability to put a Hadoop cluster in a backpack with a battery, opens up possibilities for powerful edge computing, such as the drone video pre-processing the authors explore in their paper. Also, today we have the RPi Model 3, with a processor with almost double the clock speed of the RPi tested by the researchers.

The Storage Bits take

Mobile edge clusters aren't a thing today, but they will be, because our ability to gather data at the edge is growing much faster than network bandwidth to the edge. We'll have to pre-process, for example, IoT data to compact it for network transmission.

When will they be economically viable? Three things have to happen:

  • Mobile processors have to get faster, while remaining power efficient.
  • More power efficient memory - whether low-power DRAM, or NVRAM - must enable larger memory cacacities on mobile processors.
  • Universal Flash Storage (UFS) support on mobile processors, removing the current storage bottleneck of micro-SD cards.

All three will happen in the next five years. Then backpack clusters will be capable of real work out in the wild.

Courteous comments welcome, of course.

Editorial standards