PS3 chip powers world's fastest computer
Summary: Some scoffed at the 8 PS3 supercomputer. But not the scientists at Los Alamos National Labs.
Some scoffed at the 8 PS3 supercomputer. But not the scientists at Los Alamos National Labs. They used the idea to build a 1 petaflop computer named Roadrunner - the world's fastest. Here's how.
1,000 trillion floating point operations per second Fine-grained simulation of aging nuclear weapons is the new computer's ultimate gig. They couldn't just string 14,000 PS3's together - who'd believe the results?
Besides, it's American to want something better - and way faster.
First they built a new Cell Broadband Engine The new version of the PS3 chip - called a PowerXCell 8i Processor - features 8x faster double-precision floating point and over 25 GB/sec of memory bandwidth. That is the building block of a new and really honking compute node.
[PowerXCell 8i photo courtesy of IBM Systems and Technology Group]
Each compute node consists of 2 dual-core AMD Opterons and 4 PowerXCell 8i's. Each Opteron has a fast connection to 2 PowerXCells enabling a theoretical 25x boost in floating point performance over a stock Opteron.
No one mentioned how much RAM they gave each PowerXCell, but the chip can address 64 GB of RAM, so each compute node could easily support 264 GB of RAM (4x64GB + 2x4GB or more on the Opterons). With over 100 GB/sec of memory bandwidth.
We'll take 3,250 of them That's about how many nodes are in the completed Roadrunner. They're interconnected by a standard - for HPC clusters - Infiniband DDR network.
Infiniband is a switched fabric interconnect featuring microsecond latencies and data rates of 2 GB/sec for 4x-DDR. That's about as much as you can get out of a PCI-Express x8 bus anyway.
Update: I learned more about the storage infrastructure behind Roadrunner - 2,000 terabytes of file server - and wrote it up in my other blog StorageMojo.
The money quote:
Roadrunner currently has about 80TB of RAM, roughly 24 GB per compute node. That works out to about 4 GB RAM per processor.
The jobs these machines run are huge. A simulation can run 6 months or more. Depending on criticality a job gets checkpointed every hour or maybe once a day.
The Panasas installation at LANL, begun in 2003, is currently 2 PB. Assuming an average of 500 GB drives, that means 4,000 disk drives.
Big computers require big storage. End update.
Software is problem The hardware specs are drool-worthy, but without the right codes it is just an expensive furnace. As the best single article on Roadrunner I found explains:
For the Cell, the programmer must know exactly what's needed to do one computation and then specify that the necessary instructions and data for that one computation are fetched from the Cell's off-chip memory in a single step. . . . IBM's Peter Hoftstee, the Cell's chief architect, describes this process as “a shopping list approach,” likening off-chip memory to Home Depot. You save time if you get all the supplies in one trip, rather than making multiple trips for each piece just when you need it.
The programmers optimized codes for a variety of applications, including radiation and neutron transport, molecular dynamics, fluid turbulence and plasma behavior. With the optimized codes they got a real-world 6-10x performance boost over the standard Opterons.
The Storage Bits take Back when I was hawking vector processors a Gigaflop was considered respectable. A couple of decades later and we have a machine 1 million times faster. Cool!
We won't be able to shrink feature sizes forever though, so architecture and bandwidth will be key to further speed-ups. Hopefully that time is still a few decades away.
Comments welcome, of course.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
.... i believe the link is wrong
Oops! Fixed.
Robin
And it runs..
RE: PS3 chip powers world's fastest computer
Runs Linux... ]:)
IBM never
Other OSs
It's not the chip used in the PS3...
processor.
Misleading
Also, as olePigeon pointed out, this chip has little in common with the PS3. This is like claiming that my PC (Core2Duo) is powered by a Xbox processor (PIII). Beyond sensationalism, was there any reason to even mention the PSIII in this article?
I doubt anyone would be here if he didn't mention PS3
I don't know
"Computer, while I'm at lunch, kindly finish the audit report and send it to the board members, make reservations for my trip next week, and download all nine Star Wars movies for my kids. I'll be back in an hour."
What's a PS3?
On a more interesting note...
have CELL 2 processors in our desktop and laptops relatively
soon. Toshiba already demoed a prototype laptop using
CELL processors.
Amiga fans are getting ready for a collective "I told you so."
Co-processors: an old idea
Wasn't aware of the Toshiba demo, but the GPU companies
are moving to enable their highly parallel processors to be
used for general purpose computing.
And Apple just announced that OS X.6 will have:
". . . OpenCL (Open Compute Library), makes it possible for
developers to efficiently tap the vast gigaflops of
computing power currently locked up in the graphics
processing unit (GPU). With GPUs approaching processing
speeds of a trillion operations per second, they?re capable
of considerably more than just drawing pictures. OpenCL
takes that power and redirects it for general-purpose
computing."
That's in addition to "Grand Central" that
". . . makes it much easier for developers to create
programs that squeeze every last drop of power from
multicore systems."
Cool.
Robin
It's on your UK sister site...
http://crave.cnet.co.uk/video/0,139101587,49295004,00.
htm
You already know this, but for everyone else on here:
when computers were really becoming main stream and
started to hit the home, most computers had separate
dedicated coprocessors for the various functions of the
computer (including the floating point.)
Amiga was famous for its parallel processing, it had a
dedicated CPU, FP, GPU, and DSP which allowed it to do a
lot of things at once with little hit on performance. It was
a huge hit in broadcast television, wouldn't surprise me of
some of those machines are still in use.
Opteron the controller, work down by the Cell
Opterons do the essential work
Which is a HPC
Which is a HPC.
The xFLOPS part is mostly CELL
As you are well aware AMD and CELL are two entirely different instruction sets.
Magnetic memory chip
Hmmm.