After you've gotten used to your new computer, the first thing you, and everyone else, wants, whether you're just playing games or chasing down the Higgs Boson particle, is a faster computer. The ultimate search for faster computers happens in supercomputers.
Supercomputers are designed to be orders of magnitude faster than the current generation of computers. So it is that when Seymour Cray designed the Control Data Center (CDC) 6600 in 1964, it was the world's fastest computer... at a top speed of 40MHz or 3 million floating point operations per second (MegaFlops). By comparison, an early model Raspberry Pi with an ARM1176JZF-S 700 MHz processor can run at 42 MegaFlops.
If you know only one name in supercomputi it's probably "Cray." Seymour Cray, the first major and without doubt the greatest supercomputer architect, created the first of his eponymous supercomputers in 1976.
This 80MHz system used integrated circuits to achieve performance rates as high as 136 MegaFlops. Part of the Cray-1's remarkable — for the times — speed came from its unusual "C" shape. This look was not done for aesthetic reasons, but because the shape gave the most speed-dependent circuit boards shorter, hence faster, circuit lengths.
This attention to every last detail of design from the CPU up is a distinguishing mark of Cray's work. In the long run, as we shall see, this custom design approach would prove a dead end.
After decades of US designs ruling supercomputing, Japanese systems such as the NEC SX-3 and Hitachi SR2201 dominated the field. From 1993 to 1996, Fujitsu's Numerical Wind Tunnel was the world's fastest supercomputer, with speeds of up to 600 GigaFlops. A GigaFlop is one billion Flops.
These machines relied on two new developments for their performance. First, they used vector computing. In vector processing, dedicated chips work with one-dimensional arrays of data instead of the single data items that standard scalar processors use. Second, instead of using a single shared data bus, they used multi-buses. This, in turn, enables multiple processors to work on a single problem at a time. This approach to computing is called massive parallelism. It's also the ancestor of the multiple instruction, multiple data (MIMD) approach that enables today's CPUs to use multiple cores.
Intel, which had stayed out of supercomputing, thought that MIMD might allow them to create more affordable supercomputers without the use of specialized vector processors. In 1996, ASCI Red proved Intel right.
ASIC Red used over 6,000 200MHz Pentium Pros to break the 1 TeraFlop (one trillion Flops) barrier. For years ASCI Red would be both the fastest and most reliable supercomputer in the world.
At about the same time that Intel was spending millions on ASIC Red, some underfunded contractors at NASA's Goddard Space Flight Center (GSFC) decided to build their own "supercomputer" using 16 486DX processors with 10Mbps Ethernet for the "bus" for thousands instead of millions. Little did they know that by creating the first Beowulf cluster, they were creating the ancestor to today's most popular supercomputer design: Linux-powered, Beowulf-cluster supercomputers.
Beowulf, designed by NASA contractors Don Becker and Thomas Sterling in 1994, was meant to be a do-it-yourself parallel processing computer for a few thousand dollars. While its speed was only in single digit GigaFlops, Beowulf demonstrated that you could use build supercomputers from commercial off-the-shelf (COTS) hardware. Heck, you can even build a Beowulf "supercomputer" from Raspberry Pi boards!
Tianhe-2, or Milky Way-2, is 2014's fastest supercomputer. With a performance of 33.86 PetaFlops, it blows the first Beowulf out of the water — but using the same basic design. Its Chinese designers hope that it will reach 100 PetaFlops, a thousand trillion Flops, by 2018.
Instead of 16 486DX CPUs, the Tianhe-2 has 16,000 nodes. Each of these nodes has two Intel Xeon IvyBridge processors and three Xeon Phi processors for a total of 3,120,000 computing cores.
The only major architectural difference between Tianhe-2 and the first Beowulf is that Tianhe-2 mixes different processor architectures. Besides the general purpose Xeon IvyBridge CPUs, Tianhe-2 uses Xeon Phi chips, which specialize in floating point calculations.
This new style of combining two types of COTS processors is becoming more common. A total of 62 systems on the June 2014 Top 500 supercomputer list are currently using accelerator/co-processor technology, up from 53 from November 2013. Forty-four of these use NVIDIA chips, two use ATI Radeon, and there are now 17 systems with Intel MIC technology (Xeon Phi).
As the race to the next supercomputing continues, the next goal is the ExaFlop or a thousand PetaFlops, supercomputer. Most experts expect the first of these to appear by 2018. I don't know who will make it, but I expect it will be running Linux, using two processor types, and still be based on the tried-and-true Beowulf design.
And, what will we be able to do with that kind of speed? How does real-time gene-sequencing or, dare I say it, truly accurate weather predictions? The day is coming, and it's not that far away.