When you think do-it-yourself (DIY) computing, you probably think of setting up a screaming gaming computer or putting together the best possible components for the least amount of money. You're almost certainly not considering putting together a supercomputer. Maybe you should. Joshua Kiepert, a doctoral student at Boise State's Electrical and Computer Engineering department, has managed to create a mini-supercomputer using Raspberry Pi (RPi) computers for less than $2,000.
Raspberry Pi is a single-board Linux-powered computer. They're powered by 700MHz ARM11-processors and include a Videocore IV GPU. The Model B, which is what Kiepert is using, comes with 512MBs of RAM, two USB ports and a 10/100 BaseT Ethernet port. For his project Kiepert overclocked the processors to 1GHz.
By itself the Raspberry Pi is interesting, but it seems an unlikely supercomputer component. But, Kiepert had a problem. He was doing his doctoral research on data sharing for wireless sensor networks by simulating these networks on Boise State's Linux-powered Onyx Beowulf-cluster supercomputer. This modest, by supercomputer standards, currently has 32 nodes, each of which has a 3.1GHz Intel Xeon E3-1225 quad-core processor and 8GBs of RAM.
A Beowulf cluster is simply a collection of inexpensive commercial off the shelf (COTS) computers networked together running Linux and parallel processing software. First designed by Don Becker and Thomas Sterling at Goddard Space Flight Center in 1994, this design has since become one of the core supercomputer architectures.
So with a perfectly good Beowulf-style supercomputer at hand, why did Kiepert start to put together his own Beowulf cluster? In a white paper, Creating a Raspberry Pi-Based Beowulf Cluster, (PDF Link) he explained,
"First, while the Onyx cluster has an excellent uptime rating, it could be taken down for any number of reasons. When you have a project that requires the use of such a cluster and Onyx is unavailable, there are not really any other options on campus available to students aside from waiting for it to become available again. The RPiCluster provides another option for continuing development of projects that require MPI [Message Passing Interface] or Java in a cluster environment.
Second, RPis provide a unique feature in that they have external low-level hardware interfaces for embedded systems use, such as I2C, SPI, UART, and GPIO. This is very useful to electrical engineers requiring testing of embedded hardware on a large scale.
Third, having user only access to a cluster is fine if the cluster has all the necessary tools installed. If not however, you must then work with the cluster administrator to get things working. Thus, by building my own cluster I could outfit it with anything I might need directly.
Finally, RPis are cheap! The RPi platform has to be one of the cheapest ways to create a cluster of 32 nodes. The cost for an RPi with an 8GB SD card is ~$45. For comparison, each node in the Onyx cluster was somewhere between $1,000 and $1,500. So, for near the price of one PC-based node, we can create a 32 node Raspberry Pi cluster!"
In an e-mail, Kiepert added, "This project was started because there was one week (Spring break) in which I could not use the Onyx Beowulf cluster I had been using. The Onyx cluster was down due to some renovations on the computer lab in which it resides. That got me thinking. I needed to continue testing my Ph.D. work, but if I didn't have access to Onyx I didn't have any options.
Previously, I had spent a lot of time playing with Raspberry Pis (RPis), and I have also been a long time Linux user (Fedora and Mint primarily). Additionally, in the research lab where I work, we use RPis as servers for our custom-built wireless sensor network systems, to up-link sensor data to our central database. So, this project allowed me to take my previous experience with clusters and RPis to another level, and it gave me some options for continuing my dissertation work. One thing for sure is it definitely adds something to the experience when you get to use a cluster you built."
For his baby-supercomputer, Kiepert elected to use Arch Linux. He explained, "Arch Linux ... takes the minimalist approach. The image is tiny at ~150MB. It boots in around 10 seconds. The install image has nothing extra included. The default installation provides a bare bones, minimal environment, that boots to a command line interface (CLI) with network support. The beauty of this approach is that you can start with the cleanest, fastest setup and only add the things you need for your application. The downside is you have to be willing to wade through the learning process of a different, but elegant, approach to Linux."
Of course, his RPi cluster isn't ideal. Kiepert admitted, "the overall value proposition is pretty good, particularly if cluster program development is focused on distributed computing rather than parallel processing. That is, if the programs being developed for the cluster are distributed in nature, but not terribly CPU intensive. Compute-intensive applications will need to look elsewhere, as there simply is not enough 'horse power' available to make the RPi a terribly useful choice for cluster computing."
In our e-mail conversation, Kiepert added that, "Perhaps the most annoying problem I had [with setting up the cluster] was SD-card corruption. Initially, I had a lot of file system corruptions when I powered down the cluster (nicely using: shutdown -h now) and attempted to start it again. This seems to be a known problem with the RPi that you are more likely to experience when you overclock. The weird thing was it was only occurring on the slave nodes, not the master. [The master node was a Samsung Chromebook Series 3 with a 1.7GHz dual-core ARM Cortex-A15 processor.]
Eventually, I found that if I just manually un-mounted the NFS shares before powering down the problem seems to be reduced. As part of the development I created a script for writing the SD-card images when re-imaging is needed. I just provide the host name and IP address, and the script does the rest. This greatly simplifies re-imaging, especially the first time I had to write all 32 of them while putting the initial image on the cards!"
At day's end, Kiepert has a cheap, working supercomputer, albeit one that still uses "electrical tape to hold the fans on the case!" So now for the 64-bit question: "How fast does it run?"
Kiepert ran the High Performance Linpack (HPL), the standard supercomputer benchmark on his home-made computer and found that his RPiCluster with its 32 Broadcom BCM2708 ARM11 processors running are 1GHz and 14.6GB of usable RAM turned in a HPL peak performance of 10.13 GFLOPS. That's not going to get this cluster into the TOP500 supercomputer list, but as Kiepert observed, "the first Cray-2 supercomputer in 1985 did 1.9 GFLOPS. How times have changed!"