Last week IBM announced that it would open source
some Cell libraries. A bit earlier it had provided an extensive overview
of the Cell processor architecture.
In brief the standard unit has one primary processor:
- 64bit, Power Architecture
- Two way, statically controlled, superscalar pipeline
- 64bit ALU, 64bit double precision FMAC
- 128Bit VMX (aka "Altivec")
- 32KBI +32KB D L1 Cache
- 512KB L2 Cache
- Two way hardware multi-threaded
- Supports logical partitioning
and eight SPEs - or Synergistic Processing Elements.
The SPEs are extremely powerful devices in themselves, implementing a general purpose SIMD core and communicating with each other and the primary processor via hardware memory management, a 256KB DMA buffer, and a shared "cache coherent SMP bus."
This thing, a high end grid reduced to a single chip, is incredible. Promised at 3.2Ghz, it's expected to reach 3.9Ghz almost immediately and delivers an estimated 25.4GB/Second in throughput. With the right code this thing will run better than ten times faster than Xeon, double that on purely floating point applications.
Some people still think this is intended just as a games engine, but it's not. As the presentation noted above puts it: "Intended to be the next generation standard architecture." And it will be too, it has the security features of RISC, the hardware partitioning IBM loyalists demand, and awesome performance potential.
In fact IBM engineers privately demonstrated a blade server made up from Cell processors during the 2005 E3 conference in Los Angeles recently. Here's a bit from the report on the the nikkeibp English language site:
The prototype, called the Cell Processor Based Blade Server, measured approximately 23 x 43 cm. Each board featured two Cell processors, two 512 Mb XDR DRAM chips and two South Bridge LSIs. The Cell processors were demonstrated running at 2.4-2.8 GHz. "We are driving the Cell processors at higher rates in the laboratory," said thengineer. "If operated at 3 GHz, Cell's theoretical performance reaches about 200 GFLOPS, which works out to about 400 GFLOPS per board," he added. IBM plans to release a rack product capable of storing seven of these boards.
Put seven of these in a rack, add Linux, and you have a serious supercomputer. As far as I know there's just one catch: it's easy to port Linux applications to it so they'll amble along, it's apparently quite difficult for people who don't already work in GRID style supercomputing to adapt to the full programming model so they'll run fast.