Raspberry Pi: How I built an OctaPi-style computing cluster

Assembling my motley crew of Raspberry Pi systems into a cluster computing group.

The inspiration for this post (and this project) came from something that I recently read in the Raspberry Pi blog. I would like to start out by saying if you are interested in computers, programming, DIY electronics, space exploration, or just technology in general, you are very likely to find the Raspberry Pi blog interesting and entertaining.

octapi-system.png

OctaPi Raspberry Pi Cluster

Image: Raspberry Pi Foundation

The project I am starting here is based on OctaPi: Cluster Computing and Cryptology. It's basically a way of connecting and configuring a number of Raspberry Pi systems into a cluster of servers with a single client to control them and dispatch jobs to them. Of course, the project as described in that blog post, and in the detailed Build an OctaPi instructions, is very professional looking, using nine identical Raspberry Pi 3 systems, each with an LED HAT (blinking lights always make a project look better), and the whole thing is mounted on a colorful perspex board.

What I hope to build will be made up of the various Raspberry Pi systems I have scattered about on my desk, ranging from the original Model B to the latest Zero W, and no two are the same. Also, the original project uses a 10-port power hub -- wouldn't that be nice to have? I will be using whatever power supplies I can find around here, so it will be a challenge to get them all plugged in and keep the cables reasonably neat (and the circuit breakers in my house intact).

Getting started

The first step in this project is simply getting together the necessary materials, and making sure the software is up to date.

It is not necessary to create an 'OctaPi' cluster with exactly eight Raspberry Pi servers. You can use pretty much as many as you like, although only one server couldn't really be called a cluster, and by the time you get to eight or so you are likely to be struggling to power them all. I am going to use the following, which is determined by one simple criterion -- it's all that I've got:

  • Model B
  • Model B+
  • Pi 2
  • Pi 3
  • Zero v1.2
  • Zero v1.3
  • Zero W
imgp2927.jpg

My motley crew

Some of these are already running Raspbian, so I will just have to make sure they are updated. Others are running Fedora, PiCore, Manjaro, Kali, or openSUSE, so I have more work to do to prepare them.

I will be using a TP-Link WR802N Nano router, because it is small, simple and inexpensive (less than 30 CHF).

I strongly (very strongly) recommend you do not use your home wi-fi network for this project. While it might well work for you, it is likely to expose you to some serious security risks. The instructions for this project use open SSH connections with default user names and passwords, for example. Not a good idea at all.

One of the more mundane aspects of this build is going to be powering all of the Pi systems. The original project mentions a very spiffy Universal 10 port USB charging Hub, which is capable of delivering 2.4A simultaneously to all ten ports. Zowie! That's a nice piece of kit for a project like this -- unfortunately, I can't find anything like it here in Switzerland.

To be sure what capacity power supplies I will need, I checked the Raspberry Pi FAQs, where there is a very handy table listing the power requirements for each model. Now I just have to see if I can find some reasonable multi-port power supply, or if I will have to cobble together a bunch of separate power supplies.

Finally, each of the Raspberry Pis that I use will need a wireless network connection. The Pi 3 and Zero W have that built in, of course, but for the other five I will need USB WiFi dongles. I'm not sure that I have enough of those laying around, but if necessary I can get a very low-priced 802.11b/g/n dongle from my friends at the Pi-Shop.ch. One small piece of advice on this -- when buying a USB wi-fi dongle, make sure that it has already been tested with Raspbian.

First step - the client

The first Pi system I will set up is the one which will be used to manage the cluster -- the project description refers to this one as the OctaPi Client. It is the only system which will have a keyboard, mouse, and display in the final setup, and it will be used to monitor the cluster servers, dispatch and manage jobs to them, and to reboot or shut them down.

imgp2923.jpg

The Client - Raspberry Pi Model B

Image: J.A. Watson

I am going to use my oldest Raspberry Pi Model B for this task, because there isn't much work to be done by this system. I want to keep the systems which have faster CPUs and more memory for use as servers in the cluster.

This one is already running Raspbian, so I just have to make sure that it has all the latest updates installed. That requires an internet connection (duh), then I just use this:

sudo sh -c "apt-get update && apt-get dist-upgrade && apt-get autoremove"

Because this is the oldest/slowest of my Raspberry Pi systems, and I tend not to use it all that often, there were a lot of updates to be installed. Then those updates took a long time to download and install, so this was a time-consuming process.

Next I have to install the utilities and application software needed for this project. This is kind of interesting, because there are three different things to be installed, and they are downloaded and installed in three different ways.

The cluster operation in this project is done using a small group of Python packages. These are installed using the Python pip3 installer:

sudo pip3 install dispy==4.7.1

Note the double-equals notation -- that is not a typo; also, note the version specification, without it you will get a later version of the package, which does not yet work on Raspbian.

Next, it needs the nmap utility, which will be used to find the cluster members: it is installed using the apt-get utility that we are already familiar with:

sudo apt-get install nmap

Finally, I need the OctaPi software, which is available on the github development platform, so it is downloaded using the git clone command:

git clone https://github.com/raspberrypilearning/octapi-setup.git

This actually gets both the client and server files. We only need the client side on this system, so I can move them to my home directory:

mv octapi-setup/client/* /home/pi

Believe it or not, that's all the client needs at this time, So I can shut it down and set it aside, and turn my attention to the first of the cluster servers.

Second step - a cluster server

This is where I start to see the first consequences of my decision about which Raspberry Pi systems to use for cluster servers. In the project description they use eight identical Pi 3 systems. Besides the aesthetic appeal of that, there is a huge benefit in the fact that the setup procedure for every server is identical. So identical, in fact, that you actually only have to go through the setup and configuration once, then you can simply clone the SD card for each of the other servers. Nice.

My systems have all sorts of different SD cards, and some of them are currently not even running Raspbian -- I have PiCore, Manjaro, Kali, Fedora and openSUSE loaded on various of them. So I am going to have to check each one, update those which are already running Raspbian, and on those running something else I will have to either prepare a new SD card or overwrite their existing card -- either way, those will require a fresh installation.

imgp2925.jpg

The First Server - Raspberry Pi 3 Model B

Image: J.A. Watson

I am going to use my Raspberry Pi 3 for the first cluster server system, mostly because the Pi 3 has built-in wireless networking, so that's one less thing to have to fiddle with during this first setup.

While preparing the first SD card, I remembered something I had written above -- the server systems will not have a monitor, keyboard, or mouse. That made me consider using Raspbian Jessie Lite rather than the full Jessie with PIXEL, because it is considerably smaller, so it is faster to download and copy to the SD card.

There are a few other things to consider when deciding between Jessie Lite and Jessie with PIXEL, though. First, some of the operating system configuration that has to be made is a lot easier to do on the GUI than from the CLI -- things like wi-fi connection, for example. Second, the PIXEL GUI is not the only thing that is not included in Jessie Lite -- python3 is not included, so you would have to install that as well.

In the end, even though I am a pretty dedicated and experienced CLI user, I would say that it is better (or at least easier) to install the full Raspbian with PIXEL on all of the Raspberry Pi systems used in this project.

Even if you made a fresh installation from the latest image, there probably have been updates released since that image was created, so you have to make sure you're up to date:

sudo sh -c "apt-get update && apt-get dist-upgrade && apt-get autoremove"

What's in a name: The battle for the soul of Arduino

The creators of Arduino have differing visions for the open source hardware project's future. Whose will win?

Read More

The server will also need the Python dispy package:

sudo pip3 install dispy==4.7.1

It also needs the psutil package, which is used to report server load to the client, so the client can decide how to distribute the work:

sudo pip3 install psutil

This time we don't need to specify the version, because whatever the latest is will work.

Next you need to add dispy to the system startup process. I'm not going to try to teach you how to use a text editor here, I will just say add these three lines to the end of /etc/rc.local, just before the "exit 0" line:

sleep 20
_IP=$(hostname -I)
/usr/local/bin/dispynode.py -i "$_IP" --daemon --client_shutdown

Before a bunch of people jump in here to point out how lame those three lines are, let me say first that I agree with you. To be fair, we have to keep in mind that this is a home/hobby project, and the intent is to set up a Pi cluster, not to produce a production-ready system. But still, I just have to comment on each one of those lines:

  • sleep 20 - the point here is that the dispynode.py program needs to be given the IP address on the command line. But the system doesn't even have an IP address until it connections to the router and one is assigned to it. So this line is an attempt to delay long enough for that to happen. The weakness here is obvious, of it takes longer than 20 seconds to get an IP address, you're in trouble. There should at least be some simple error-checking included.
  • _IP=$(hostname -I) - This assumes that hostname -I is going to return exactly one IP address. Besides the obvious problem that it might return none, as described in the previous point, it also turns out that a lot of routers in Switzerland will actually return two addresses - an IPv4 and an IPv6 like this:
pi@ModelB:~ $ hostname -I
192.168.0.100 2a02:120b:c3d9:ba20:7d78:7107:6570:4500

In most cases, these are not serious problems. But you should be aware of them, just in case you run into some situation where the client doesn't use one or more of the cluster servers in the following tests.

Next, you need to make sure that SSH is enabled on the server systems.

IMPORTANT! ( It's soapbox time again, sorry.) Enabling SSH on a system with the default user name and password is such a bad idea that I can't even think of words to describe it. The latest versions of Raspbian will actually complain when it is set up this way, telling you over and over again that you need to change the password for the pi user.

I don't care that this project is supposed to be implemented on a dedicated router that does not have internet access. I don't care about any other "special circumstances" that are supposed to make this OK 'just in this case'. It's not OK. Period. At least change the password -- and if you have any sense, use a different login name as well, and either disable or delete the 'pi' login.

Soapbox end. Thank you for your attention. Now back to your regularly scheduled project.

If you are running a full Jessie with Desktop system, you can enable SSH by going to the PIXEL GUI menu, Preferences, Raspberry Pi Configuration, Interfaces. If you have a text-only system (Raspbian Lite, no PIXEL), you can get a text-mode version of the Pi Configuration utility with sudo raspi-config, then go to Interfacing Options, SSH.

There is one other way you can get SSH enabled, by putting an empty file named ssh in the boot directory, and then rebooting.

sudo touch /boot/ssh

The file will be gone after the reboot, indicating that SSH has been enabled.

While you are making changes in the Raspberry Pi Configuration, it is also a good idea to give each of the server systems a different name. This is not absolutely required, but it will make it a bit easier to read and understand some of the results later in the project.

At this point the server setup is complete, so I can put it aside with the client, and get the router setup for this project.

Third step - router setup

imgp2926.jpg

TP-Link TL-WR802N Wireless N Nano Router

Image: J.A. Watson

The OctaPi project description includes a lot of details about setting up the router. This is an admirable effort, and there is some good information included there. But my experience with routers has been that they are all so different from each other that including detailed instructions and screenshots only causes confusion when the actual router being set up differs significantly from the test unit. So I will just say this:

  • Don't connect the router used for this project to the internet. See warnings above
  • Set the SSID (wireless network name) to something that identifies this project, and that you will remember. You will need this name to configure all of the Raspberry Pi systems. The project uses OctaPi, I decided to use PiCluster, you are free to choose whatever makes you happy.
  • Configure the wireless network for WPA encryption, and with a reasonable WPA Pre-Shared Key. (The SSID is not acceptable as a key. Even if you spell it backwards.) You will need this key to configure all of the Raspberry Pi systems.
  • Make sure that the router is configured to act as a DHCP server, and that it has a reasonable range (pool) of IP addresses to use. Every router I have ever used was set up this way by default, so you will probably not have to change anything, but check it just in case.
  • If you see anything in the DHCP configuration about Lease Time, set it to the largest number it will take. One way to see how fragile the rc.local code mentioned above really is, is to have the servers frequently getting different IP addresses.
  • After configuring the router, reboot it and then check it again to make sure whatever changes you made are still there. If they aren't, then look a little harder for wherever they have hidden the 'save changes' or 'save and reboot' button, and then try again.

The OctaPi project description, and the program files picked up from Github, assume that the wireless network addresses will be in the subnet 192.168.1.*. This is not always the case -- the router I am using here, for example, uses 192.168.0.*. When this happens you have to decide if you want to change the router configuration so that it assigns the 'correct' addresses, or you want to edit the OctaPi scripts so that they match the addresses your router is giving out.

Fourth step - Hooray! Connecting the client and server to the router (and to each other)

The objective for this step is to get the Raspberry Pi systems connected to the OctaPi router, and then to make sure that they will always connect to that router, and not to your home network router or whatever else might be around. It turns out that it is actually a bit easier to accomplish the second part of that first.

So, boot one Raspberry Pi (client or server, it doesn't matter at this point). I am assuming that it will be connected to your home network, because you have been downloading software needed for this project above. Go to the Network Manager icon in the bottom panel, and tell it to disconnect from that network.

I believe that forcing a disconnect like this will actually remove that network from your wireless configuration files (it certainly does so on my systems). Just to be sure, check the file /etc/wpa_supplicant/wpa_supplicant.conf. If there are any text blocks which look like this, use your favorite text editor and delete them.

network = {
blah
blah
blah
}

Shut down that Raspberry Pi, and then boot the other one, do the same thing to it, and shut it down. Now you have both of your systems configured so that they will not connect to any wireless network automatically.

Boot the Raspberry Pi client system, and connect it to the OctaPi router wireless network with the network name and key you defined previously in the router setup.

Find the client's IP address, using the command:

pi@ModelB:~ $ hostname -I
192.168.0.101

This will vary depending on the router you are using; in the project description it is 192.168.1.2. Whatever it is, make a note of it.

Boot the Raspberry Pi Server system, connect it to the OctaPi router wireless network, and note its IP address.

WARNING! The following step uses the nmap utility, which performs a scan of the network to find anything else connected to it. You should only use nmap on your closed, private, dedicated, not-internet-connected network. Running it on the internet can make you very unpopular, at the least; running it on the corporate network of your employer is a good way to lose your job.

Open a terminal window on the client system, and use nmap to try to locate the server. In the following command, replace the IP subnet address with whatever your client and server reported in the steps above:

pi@ModelB:~ $ nmap -sP 192.168.0.*
Starting Nmap 6.47 ( http://nmap.org ) at 2017-07-15 06:25 CEST
Nmap scan report for 192.168.0.1
Host is up (0.011s latency).
Nmap scan report for 192.168.0.100
Host is up (0.093s latency).
Nmap scan report for 192.168.0.101
Host is up (0.0024s latency).
Nmap done: 256 IP addresses (3 hosts up) scanned in 9.33 seconds

Note, you should see three IP addresses here -- the router, the client and the server. If you don't get all three, go back to the router setup and figure out what went wrong.

If you get both the client and server addresses from the nmap command, and they match what you got from the hostname command when setting them up. Woohoo! You're done with this part and ready to go to SSH configuration.

With the client and server both connected to the router, the last part of this step is to get them talking to each other. This project uses ssh to do that, so we have to set that up now. We will generate a public/private set of keys on the client, and then copy the public one to the server.

First, generate an SSH key pair. You do not need (or want) to be root to do this:

pi@ModelB:~ $ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/pi/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/pi/.ssh
Your public key has been saved in /home/pi/.ssh/id_rsa.pub.
The key fingerprint is:
............ pi@ModelB
The key's randomart image is:
+---[RSA 2048]----+
............

You will be prompted for input three times by this command, for the directory where the key pair will be stored (~/.ssh), and for a passphrase (you don't want one). Just press return each time.

Copy the public key you just generated to the Raspberry Pi server. You will need the IP address for the server that you noted above, and because this is the first time that these two systems have connected via ssh, you will also be prompted to accept the fingerprint of the remote system -- answer 'yes' to this question.

You will then need to provide password authentication for the pi account on the server.

pi@ModelB:~ $ ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.0.100

Once this is done, you should be able to run remote commands on the server without providing additional authentication. You can make a simple test with a command like this:

pi@ModelB:~ $ ssh 192.168.0.100 hostname
Pi3

If you get the remote host name, and not some error messages or login/password prompts, then you have things set up properly, hooray!

That completes the setup of the first cluster server. You can disconnect the keyboard, mouse, and display, and leave it running with just the power connection.

Fifth Step - test client/server operation

This step should be a snooze - we have basically configured and tested everything that it is going to do, so now we just need to make sure that it all works properly together. But I can tell you now, that the first time I tried this it didn't work. The best laid plans...

On the client system, make sure that your current directory is /home/pi, and run this command:

pi@ModelB:~ $ sudo python3 compute.py
2017-07-15 07:26:21 asyncoro - version 4.5.6 with epoll I/O notifier
2017-07-15 07:26:21 dispy - dispy client version: 4.7.1
2017-07-15 07:26:21 dispy - Storing fault recovery information in "_dispy_20170715072621"
Pi3 executed job 0 at 1499973771.8287275 with 13
Pi3 executed job 1 at 1499973771.8292212 with 13
Pi3 executed job 2 at 1499973771.8202548 with 7
Pi3 executed job 3 at 1499973771.845238 with 6
Pi3 executed job 4 at 1499973777.9198005 with 14
Pi3 executed job 5 at 1499973778.8888695 with 7
Pi3 executed job 6 at 1499973784.9767196 with 5
Pi3 executed job 7 at 1499973784.9907656 with 12
Pi3 executed job 8 at 1499973785.9577928 with 17
Pi3 executed job 9 at 1499973790.0447996 with 11
Pi3 executed job 10 at 1499973792.026735 with 14
Pi3 executed job 11 at 1499973797.1099284 with 19
Pi3 executed job 12 at 1499973801.1476002 with 10
Pi3 executed job 13 at 1499973803.0565348 with 12
Pi3 executed job 14 at 1499973806.1220703 with 17
Pi3 executed job 15 at 1499973811.2433848 with 6

The compute.py script uses nmap to locate active servers, and then dispatches 16 jobs to them. In this case, of course, they will all be sent to the one active server. You should see the first results reported within 15 seconds or so.

The entire job should take about a minute or two, and when it is finished it will report that all jobs were executed on the same server:

NodeCPUsJobsSec/JobNode Time Sec
192.168.0.100 (Pi3)41611.464183.420

Total job time: 183.420 sec, wall time: 51.608 sec, speedup: 3.554

If this test doesn't work, go back and check each step of the configuration again. The most likely cause is that the IP addresses assigned by the wireless router do not correspond to what the OctaPi scripts expect (that was my problem). Other likely places to look are the wireless network connection of either system, or the ssh configuration.

When the compute.py script is producing the results shown above, your first client and server are complete. If you are going to use identical Raspberry Pi systems, with identical SD cards, you could shut down the server at this point, and simply copy its SD card as many times as necessary. If your servers are not going to be identical (like mine), you still need to go through the server setup procedure for the rest of the systems.

Sixth Step - setting up the rest of the servers

At this point you've pretty much got the project done. The only thing left to do is to setup and configure the rest of the servers. If you are lucky enough to be using identical Raspberry Pi systems, with identical SD cards, then this step is also trivial. Just shut down the first server and clone the SD card as many times as you need to.

If you are less fortunate (like me), every Pi system (and every SD card) is different. So you have to slog through the server setup again for each one. The steps you need to take are:

  • sudo sh -c "apt-get update && apt-get dist-upgrade && apt-get autoremove"
  • sudo pip3 install dispy==4.7.1
  • sudo pip3 install psutil
  • sudo vi /etc/rc.local (add the three lines to the end of this file)
  • disconnect from the public network
  • remove the public network block from /etc/wpa_supplicant/wpa_supplicant.conf
  • connect to the private OctaPi network
  • set hostname
  • enable ssh
  • copy the ssh public key from the client to the new server (ssh-copy-id)

Seventh step - testing the entire cluster

When everything is configured, connected and running properly, the compute.py test program produces this output for my cluster:

pi@ModelB:~ $ sudo python3 compute.py
2017-07-15 09:59:11 asyncoro - version 4.5.6 with epoll I/O notifier
2017-07-15 09:59:11 dispy - dispy client version: 4.7.1
2017-07-15 09:59:11 dispy - Storing fault recovery information in "_dispy_20170715095911"
Zero13 executed job 0 at 1500107840.514999 with 9
Zero12 executed job 1 at 1500104041.402219 with 7
Pi3 executed job 2 at 1499975128.8935144 with 8
ZeroW executed job 3 at 1500058945.8551314 with 7
Pi3 executed job 4 at 1499975128.8720584 with 10
Pi3 executed job 5 at 1499975128.8723927 with 13
Pi3 executed job 6 at 1499975128.8620682 with 6
BPlus executed job 7 at 1499975885.883818 with 8
Pi2 executed job 8 at 1499976437.6603177 with 16
Pi2 executed job 9 at 1499976437.6613452 with 7
Pi2 executed job 10 at 1499976437.6610622 with 17
Pi2 executed job 11 at 1499976437.692981 with 7
Pi3 executed job 12 at 1499975134.9797184 with 5
Zero12 executed job 13 at 1500104048.525442 with 15
ZeroW executed job 14 at 1500058953.0012572 with 13
Pi2 executed job 15 at 1499976444.7784967 with 11
NodeCPUsJobsSec/JobNode Time Sec
192.168.0.100 (Zero13)119.0519.051
192.168.0.105 (Pi2)4511.64058.199
192.168.0.102 (Zero12)1211.04522.089
192.168.0.106 (Pi3)458.42542.125
192.168.0.103 (ZeroW)1210.04920.097
192.168.0.104 (BPlus)118.0528.052

Total job time: 159.614 sec, wall time: 22.649 sec, speedup: 7.047

Eighth step - controlling the servers

If you got the results shown above, the Pi cluster is now complete and functioning. There is just one thing left to do -- give the client a way to control the cluster servers, because we have been planning all along to construct this cluster with no keyboard, mouse, and display on the servers.

Because we have set up SSH connectivity between the client and each of the servers, they can be controlled individually in a trivial way like this:

pi@ModelB:~ $ ssh 192.168.0.100 sudo shutdown -HP now

But that requires a separate command for each server, and you have to get the IP address right for each one. Not my idea of a good time.

The project includes a script called cluster_action.sh, which gives you control over all of the cluster servers as a group. It was included in the client files that you installed and copied to your home directory at the beginning of the project, so all you have to do now is make it executable:

pi@ModelB:~ $ chmod 700 cluster_action.sh

You can then shut down all of the servers at once:

pi@ModelB:~ $ ./cluster_action.sh shutdown

Of course you are still left with the old Raspberry Pi problem of knowing when each server is really down and safe to power off. All I do is shut down the client after the command has completed, and because I am using the slowest Raspberry Pi for the client, by the time it is down I can pretty safely assume that all the servers are down as well.

You can reboot all of the servers in the same way, just change shutdown to reboot on the command line above. Then you have a bit of a different question -- how to know when all of the servers have rebooted and are ready to use? I just run the nmap command as we did earlier in the project, and when all of the servers are shown in its output, you're ready to go.

The other useful thing this control script can do is send the current date/time to all of the servers.

Summary

picrew.jpg

My Final Pi Cluster

Image: J.A. Watson

When set up on my desk, with the servers having only a power connection and wi-fi dongles when necessary, and the client with display, keyboard, and mouse, it all looks like this.

I know, it's not exactly a professional-looking setup. But it works, and it shows that you can create something useful and interesting from a random array of Raspberry Pi hardware.

Most of all, it was fun!

Read more on Raspberry Pi

Newsletters

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
See All
See All