PS3 chip powers world's fastest computer

PS3 chip powers world's fastest computer

Summary: Some scoffed at the 8 PS3 supercomputer. But not the scientists at Los Alamos National Labs.


Some scoffed at the 8 PS3 supercomputer. But not the scientists at Los Alamos National Labs. They used the idea to build a 1 petaflop computer named Roadrunner - the world's fastest. Here's how.

1,000 trillion floating point operations per second Fine-grained simulation of aging nuclear weapons is the new computer's ultimate gig. They couldn't just string 14,000 PS3's together - who'd believe the results?

Besides, it's American to want something better - and way faster.

First they built a new Cell Broadband Engine The new version of the PS3 chip - called a PowerXCell 8i Processor - features 8x faster double-precision floating point and over 25 GB/sec of memory bandwidth. That is the building block of a new and really honking compute node.

IBM PowerXCell 8i processor [PowerXCell 8i photo courtesy of IBM Systems and Technology Group]

Each compute node consists of 2 dual-core AMD Opterons and 4 PowerXCell 8i's. Each Opteron has a fast connection to 2 PowerXCells enabling a theoretical 25x boost in floating point performance over a stock Opteron.

No one mentioned how much RAM they gave each PowerXCell, but the chip can address 64 GB of RAM, so each compute node could easily support 264 GB of RAM (4x64GB + 2x4GB or more on the Opterons). With over 100 GB/sec of memory bandwidth.

We'll take 3,250 of them That's about how many nodes are in the completed Roadrunner. They're interconnected by a standard - for HPC clusters - Infiniband DDR network.

Infiniband is a switched fabric interconnect featuring microsecond latencies and data rates of 2 GB/sec for 4x-DDR. That's about as much as you can get out of a PCI-Express x8 bus anyway.

Update: I learned more about the storage infrastructure behind Roadrunner - 2,000 terabytes of file server - and wrote it up in my other blog StorageMojo.

The money quote:

Roadrunner currently has about 80TB of RAM, roughly 24 GB per compute node. That works out to about 4 GB RAM per processor.

The jobs these machines run are huge. A simulation can run 6 months or more. Depending on criticality a job gets checkpointed every hour or maybe once a day.

The Panasas installation at LANL, begun in 2003, is currently 2 PB. Assuming an average of 500 GB drives, that means 4,000 disk drives.

Big computers require big storage. End update.

Software is problem The hardware specs are drool-worthy, but without the right codes it is just an expensive furnace. As the best single article on Roadrunner I found explains:

For the Cell, the programmer must know exactly what's needed to do one computation and then specify that the necessary instructions and data for that one computation are fetched from the Cell's off-chip memory in a single step. . . . IBM's Peter Hoftstee, the Cell's chief architect, describes this process as “a shopping list approach,” likening off-chip memory to Home Depot. You save time if you get all the supplies in one trip, rather than making multiple trips for each piece just when you need it.

The programmers optimized codes for a variety of applications, including radiation and neutron transport, molecular dynamics, fluid turbulence and plasma behavior. With the optimized codes they got a real-world 6-10x performance boost over the standard Opterons.

The Storage Bits take Back when I was hawking vector processors a Gigaflop was considered respectable. A couple of decades later and we have a machine 1 million times faster. Cool!

We won't be able to shrink feature sizes forever though, so architecture and bandwidth will be key to further speed-ups. Hopefully that time is still a few decades away.

Comments welcome, of course.

Topics: Hardware, Mobility, Networking, Processors

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • .... i believe the link is wrong

    nice article. i think its a fantastic achievement by IBM. but i believe the link you supplied to the best article you found on roadrunner is incorrect as it merely links back to your own blog posting....
    • Oops! Fixed.


      R Harris
  • And it runs..

    All of that talk about the 'codes' and you failed to mention that the world's fastest super computer runs on a modified version of Red Hat Linux.
    Tim Patterson
  • RE: PS3 chip powers world's fastest computer

    [url=]Here is more details on the system[/url]

    Runs Linux... ]:)
    Linux User 147560
    • IBM never

      ported AIX to the Cell, which is not particularly oriented toward HPC, nor did they follow the blue gene architecture. perhaps was that a requirement of the CFT by DoE... Too bad only linux has support for Cell as of today, but perhaps other OS supporting Power will be ported to Cell, open of closed source ... if the architecture is successfull, which seems less and less likely as time go by and as massively multicore chips appear...
      • Other OSs

        If you're waiting for a Windows port, forget it !
        Hemlock Stones
  • It's not the chip used in the PS3...

    It's not the same chip used in the PS3, this is the CELL 2
  • Misleading

    It appears to me that the Roadrunner is "powered" by AMD Opterons, with the new Cell processor playing the role of a math coprocessor. While that is not an insignificant role, it is the Opteron that is the brains of the system, not the Cell.

    Also, as olePigeon pointed out, this chip has little in common with the PS3. This is like claiming that my PC (Core2Duo) is powered by a Xbox processor (PIII). Beyond sensationalism, was there any reason to even mention the PSIII in this article?
    • I doubt anyone would be here if he didn't mention PS3

      I know I wouldn't have clicked on the link...
      Kid Icarus-21097050858087920245213802267493
      • I don't know

        I follow the super computer field because it is a look ahead. Just think, in a few years we will have that kind of power on our desktops. Perhaps that will be enough power to finally get voice recognition to work.

        "Computer, while I'm at lunch, kindly finish the audit report and send it to the board members, make reservations for my trip next week, and download all nine Star Wars movies for my kids. I'll be back in an hour."
      • What's a PS3?

        I didn't immediately associate the terms "PS3" and "fastest supercomputer." It was the latter that caught my eye.
    • On a more interesting note...

      On a more interesting note, there's no reason why we can't
      have CELL 2 processors in our desktop and laptops relatively
      soon. Toshiba already demoed a prototype laptop using
      CELL processors.

      Amiga fans are getting ready for a collective "I told you so."
      • Co-processors: an old idea

        that is new again.

        Wasn't aware of the Toshiba demo, but the GPU companies
        are moving to enable their highly parallel processors to be
        used for general purpose computing.

        And Apple just announced that OS X.6 will have:

        ". . . OpenCL (Open Compute Library), makes it possible for
        developers to efficiently tap the vast gigaflops of
        computing power currently locked up in the graphics
        processing unit (GPU). With GPUs approaching processing
        speeds of a trillion operations per second, they?re capable
        of considerably more than just drawing pictures. OpenCL
        takes that power and redirects it for general-purpose

        That's in addition to "Grand Central" that
        ". . . makes it much easier for developers to create
        programs that squeeze every last drop of power from
        multicore systems."


        R Harris
        • It's on your UK sister site...

          It's on your UK sister site:


          You already know this, but for everyone else on here:
          when computers were really becoming main stream and
          started to hit the home, most computers had separate
          dedicated coprocessors for the various functions of the
          computer (including the floating point.)

          Amiga was famous for its parallel processing, it had a
          dedicated CPU, FP, GPU, and DSP which allowed it to do a
          lot of things at once with little hit on performance. It was
          a huge hit in broadcast television, wouldn't surprise me of
          some of those machines are still in use.
    • Opteron the controller, work down by the Cell

      The work is what makes HPC.
      Richard Flude
      • Opterons do the essential work

        No, you've got it backwards. The Opterons do most of the essential work (run the OS, distribute the workload on the node, distribute the workload among the other nodes, handle the data needed by the cell processors, communicate with the other nodes, and just do all the OS stuff needed to make it all work) the Cell processors just do the math. Think advanced math coprocessor.
        Hemlock Stones
        • Which is a HPC

          "Think advanced math coprocessor."

          Which is a HPC.
          Richard Flude
        • The xFLOPS part is mostly CELL

          When it comes to the petaFLOPS most of the Floating Point Ops are being handled in CELL. That's the part they will brag about.

          As you are well aware AMD and CELL are two entirely different instruction sets.
  • Magnetic memory chip

    Somebody stole digital and saturated time and space with lies about digital.They will never tell you how their code works in their flop.
    • Hmmm.

      The truth is analog. Shhhhh.