Build a $2,500 supercomputer

Build a $2,500 supercomputer

Summary: Supercomputing Costco-styleIn 1997, IBM's Deep Blue supercomputer beat world chess champion Gary Kasparov. Today you can build a more powerful machine for less than $2,500 in an 11" x 12" x 17" box.

TOPICS: Hardware

Supercomputing Costco-style In 1997, IBM's Deep Blue supercomputer beat world chess champion Gary Kasparov. Today you can build a more powerful machine for less than $2,500 in an 11" x 12" x 17" box. That works out to less than $100 per gigaflop as of January, 2007

More good news: pricing out the components today the machine would only cost $1,300!

The recipe Professor Joel Adams and undergraduate Tim Brom built the machine at Calvin College in Grand Rapids, MI. Using the Beowulf cluster model, the Microwulf design includes

  • 4 microATX motherboards with dual-core AMD Athlon 64 X2 3800 AM2+ processors
  • 8 GigE ports - 1 built-in port on each motherboard, plus 1 added GigE PCI-express NIC
  • 8 GB RAM - half of what a balanced system should have, but 16 GB would have busted their budget.
  • 4 microATX power supplies
  • 1 8-port GigE switch
  • 250 GB hard drive & a CD/DVD drive
  • 3 polycarbonate plastic shelves to mount the kit on plus 5 threaded rods to support the shelves

Here's a schematic diagram:


The architecture Beowulf clusters are based on a message-passing (MPI) infrastructure that uses a network to interconnect the nodes. Some Beowulf clusters have hundreds of nodes and scale nicely with the right workloads.

Microwulf has an economical version of the same architecture, built on Ubuntu Linux and MPI libraries.


The result Performance is a many-splendored thing. In the world of supercomputing the standard benchmark is Linpack, which solves a dense system of linear equations in 64-bit double precision arithmetic. Learn more about Linpack, HPL and their parameters here.

It is worth noting that with a 250 GB SATA drive, HPL doesn't do much I/O. The benchmark is testing float point performance on an in-memory problem. Above 30,000 the machine ran out of memory. Here are Microwulf's stats:


While unexceptional today, this performance would have made Microwulf the world's 6th fastest supercomputer in 1993. At less than $100 per gigaflop. Update: at today's prices about $50 per GFlop.

The Storage Bits take Humans aren't very good at forecasting exponential functions like Moore's Law. Microwulf is a good excuse to take stock of just how much computing has advanced in the last 15 years.

Millicomputing is the name of a related initiative to build powerful clusters out of very power-efficient processors and low-cost components. In another 10 years you'll be able to have the equivalent of a 5,000 node Google cluster in your den. Cluster-based virtual reality, anyone?

Update: Lots of great comments from some very experienced people. Thanks! A couple of folks pointed to a detailed tutorial written by Professor Adams - who graciously permitted me to use his copyrighted diagram - that I'd linked to but without flagging its importance.

Let me rectify that oversight. If you want to get into the details of the hardware and software this article on the Microwulf architecture and construction should suffice.

Comments welcome. Personally, I'm very happy with my quad-core Xeon, but I don't do much with computational fluid dynamics or protein folding.

Topic: Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • You do need a day off!

    Monday is Labor Day! lol!
    • Re:Monday is Labor Day! lol!

      Thats still a day off whether it being labor day or not. lol!
  • Add a Terabyte or four Raid Array to the system.

    Add a Terabyte or four Raid Array to the system. Available now for $600 in refurb units at Don't forget the advent of quad cpu's out now with Intel and available soon from AMD (atleast from AMD some of these will use the same mb's as the duos). Should be possible to design mb's for this type of comp and an enclosure that would allow nearly infinite expansion and set them up in the place of blade server racks etc.
    • Good point

      It is a little odd that the the I/O is so unbalanced. I understand wanting to keep costs
      down, but I'd love to see what kind of storage the professor could come up with for
      an additional $1000.

      R Harris
  • Will Vista SP1 allow me to run this?

    /just checking...
    • Vista SP1 'Soopercomputing Edition'

      Microsoft don't do scaleable clusters. But if they did, they'd probably be the most expensive clusters in the world (1 license per processor)
      • Try again...

        [b]Microsoft don't do scaleable clusters. But if they did, they'd probably be the most expensive clusters in the world (1 license per processor) [/b]

        Microsoft DOES do scaleable clusters. Search for "Scalable Cluster" at and check the results for yourself.
    • Not Yet ...

      SP1 is still a little too constrained, but Vista SP2 ought to fail on this hardware at least as well as Vista does on everything else.
    • Easy...

      Just dedicate one node to handle Activation, one to handle WGA, one to handle DRM, one to handle Re-Activation, one for driver issues and one to re-route for automatic re-booting after critical updates.
    • Yes, Vista SP1 Will Run On It..But

      You need the special Microsoft Bob Limited ME Edition of Vista for it to run. ;).
    • No...

      You need SERVER software for this kind of beast.
  • Many Cray machines are not AMD based.

    I thought everyone knew that most Cray machines are now based on AMD chips. Yep, the same ones that go into your personal computer. Interesting to say the least.
    • not vs now?

      Hey Man, I'm not quite following you -- first you say that many Cray machines are <i>not</i> AMD based (boldfaced/subject line), then you go on to say that they are <i>now</i> AMD based. The latter seems correct, yeah. Would you agree? Maybe it was just a typo for the subject line?

      Some poor schmuck who doesn't know about Cray and their niche might just go your subject line and avoid AMD, unfortunately
  • Ok how do i do this..

    This is one of the many projects i've always whated to have a go at, got the kit so how do i do the rest, i.e. connect the boards, drive, etc and install a OS ??

    Can you of anyone help

    • Nice piece of work

      I always try to impress my computer science students that the laptop they carry around so carelessly would have ruled the world when I started working in the field. I'll refer them to your article to show them where things are heading.
      Just Watching Now
    • RTFM

      The people who built it have a very good tutorial about how to do it at
  • Details Here

    For the technical details straight from the builders, see
  • A bit of Cray history from a Cray-on

    We designed our own VLSI parts for the Y-MP. 16 gate arrays were used for the X-MP. Everything at ECL levels and speed. I have noted that the quad core systems match our performance levels ( we ran a 64 bit O/S ) but we ran into some interesting hardware problems that added instability at that speed. Consult with Bott for the details.
    I'm working on a similar line of experimentation. The 20 year old internal structure of the systems we designed is the key. This will be interesting.....and my idea my be cheaper....8-P....
    Old Timer 8080
  • I doubt it would outperform Deep Blue

    Deep Blue in 1997 was a 30 node SP, with each node comprising 16 custom made and tuned CPU's specifically for the chess program. That's 480 CPU's. Each node had 1GB of RAM, for 30GB total, and communicated via the high speed SP Switch.

    Now, trying to compare that Deep Blue system with the AMD dual core system suggested by the author is going to be literally a case of comparing apples and oranges. None of the performance benchmark programs would apply to both system types.

    The AMD box would have far faster CPUS, but far less of them (8 versus 480). The massively parallel chess program would not work as effectively on so few processors, in spite of their greatly improved speed.

    The SP Switch ran at something like 300MBytes per sec I seem to recall, far faster than GigE which tops out around 100MBytes per sec. However, network speed would not be a great factor in this, since the node-to-code communication is sending relatively small packets of data. Ditto disk performance, there's not a huge amount of disk access going on for this app.

    If you take a look at the TOP500 Supercomputing site, you will see a lot of SP systems still in there, but their numbers are dwindling. However, you won't see *any* 8-CPU AMD or Intel machines in the list, heck, I'd bet that wouldn't even make the top 5000 :-)

    All in all, a nice little article, and it's very nice to think that you can build a basic, decently performing cluster or a few grand, but it's not going to run anywhere near the performance of Deep Blue.

    The last SP that I was personally the Admin for was a 54 node SP back in 2002. Sweet box I have to say. Today I run linux clusters, but not as big as those SP's of old.
    • Scalar vs Vector Processing

      You might want to consider that in your assessment. Architecture is the key here. How easy is it to add extra computing " nodes " in the system.
      But, I just competed with IBM in the 1980's...and when JR turned bean-counter, IBM got my boss & SSI..

      Look at the big picture here. I am....
      Old Timer 8080