Building your own supercomputer

Building your own supercomputer

Summary: My advice: rent - until IBM gets the second generation cell out the door

SHARE:
TOPICS: Hardware
6
Suppose you wanted to build your own super computer -one capable of running realistic problems using standard "codes" in areas like bio-computing, climatology, or geophysics - what would it take?

The currently favored approach to this is to get as many Opteron cores as you can pay for, rack em up with appropriate storage, and run the whole thing as either a Linux or Solaris grid.

The plus side of this is simply that you can get performance well into the 100TF (linpack TeraFlops) range, the downside is that operating costs will kill you in the long run while some intrinsic bottlenecks limit the top end you can reach.

Specifically, you can heat grow-ops big enough to cover your capital and staffing costs with the thing, but cabling and storage limits will stop performance growth long before you get into the exaflop range.

Worse, the software combines with those hardware bottlenecks to give you diminishing returns to scale - meaning that a 10,000 core machine will need considerably more than half the time needed by a 5,000 core unit to complete a large run.

What that means is that if you generally plan on having two programs running concurrently, you'll get more performance per dollar if you're willing to forget about bragging rights and build two smaller systems instead of one big one.

There are alternatives to x86 grid computing: specifically, IBM has long offered PPC based grids and has now moved, in Cell, to putting a grid directly into the silicon. Right now, for example, you could make yourself a supercomputer that would have placed number one on the supercomputer list for the year 2000 by combining 12 IBM QS-20 dual cell blades with two Sun x4500 storage servers, a big UPS, and a 16 way Infiniband class fabric switch in one IBM blade server chassis.

The upside here is cost: about $400,000 for a theoretical 5TF machine with 44TB of mirrored storage -buy your X4500s with only two 500GB disks, than fill the remaining 22 slots in each machine with third party 1TB disks.

The downside is a combination of software, hardware limitations, and communications bottlenecks. In particular software limitations mean that performance on typical tasks scales less than linearly with the number of processors, hardware limits in the present generation cell reduce double precision throughput, and overall storage and I/O limits mean that you just can't shove enough data and instructions through the available pipes to push this kind of system much beyond 10TF per concurrent run.

So if that's not enough, what do you do?

My advice: rent - until IBM gets the second generation cell out the door. No more double precision limit, faster I/O and, above all, multiple cells per chip - I don't know how many they'll allow, but "slices" containing 9 cells running at 5+ Ghz should be possible, and the ultra dense memory needed to support that is on the way.

Ok, the 10TF grid CPU is probably in the 2010 time frame, but it's coming; so if you really want your own supercomputer before then - consider a couple of smaller units or renting space on someone else's machine, because three years won't get you to cost recovery on a x86 super grid, and by 2011 nothing else is even going to be in the same ball park with cell..

Unless, of course, someone gets in RAM array processing (write an op and two vectors to RAM, read result) or a compilable hardware language to work...

 

Topic: Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

6 comments
Log in or register to join the discussion
  • Okay I have a horse in this race but...

    I just came from the BARC conference (Boston area ARChitecture, dumb acronym, good little conference) last week. We always get a higher-up rep from Intel or AMD along with the folks from the Universities around Boston-- it's a very good way to get a snapshot of where things are going (and it's cheap, $50 for the day w/ lunch & snacks).

    To stop the plug, and bring it back to Paul's vision of a desktop supercomputer, The big 2 are definitely heading in the grid-on-chip direction.

    I think the big question is going to be homogeneous or heterogeneous computing.

    Last year, Intel's slides showed every increasing numbers of CPU cores on a chip (big surprise), and I recall some talk of an architecture consisting of many-many Floating point plus little bits of memory connected by a heavy-duty network (a la Merrimac).

    This year, the AMD guy talked about many-many accelerator cores on a chip, with a less complex network. An MIT fellow gave a talk about an architecture for a contentionless optical network by integrating optical components into silicon chips. The pieces are falling into place.

    check out the full scoop in the proceedings...

    http://www.bu.edu/barc2007/

    Both approaches still need a good programming model (or good parallel programmers. I predict the job-market for physics grads is about to take off), which is the hardest part, and where some of the work in our lab comes in. We're looking at using FPGA's for the HPC world
    http://www.bu.edu/caadlab/.

    The biggest challenges here are convincing the scientists they don't REALLY need floating point :) and memory limitations on-chip (though these are getting better).

    Other folks are looking at GPUs, where the small-data-in, large-data-out and programming model for these chips are the big problem (it's hard to turn every problem into a rendering one).

    I think both are viable approaches for realizing at least a piece of a desktop supercomputer, or at the least, a useful one for $10k-$20k.
    --Josh
    jtmodel
    • Thanks! (and a prediction)

      While I agree progress is being made, and that physics students should command higher salaries..
      I don't think we'll see Intel actually try to take more than four cores mainstream. They're dependent on Microsoft for software, and the software simply isn't there. That doesn't mean they won't make an eight core - and offer compelling benchmarks for it - but it won't be a real product and it won't sell.

      AMD is betting on using GPUs as accellerators - but MS couldn't get even light multi-threading to work, never even tried hard to get altivec to work, and so I don't think they'll get this to work either.

      Cell works and so does Sun's CMT - and if they ever get the SPARC/FP array working it'll be a world beater. What's different about these is that sales don't depend on Microsoft.
      murph_z
      • A fish looking up..

        Funny, from way down here in the architecture world, I hadn't given a thought to the OS. That being said, I'm not sure I'd need to. Who cares if MS can't run their applications or OS to run threaded, as long as there is some application developer who can.

        Parallelism is going to be application driven, and if the applications you talked about are really going to be done at the desktop (I'm thinking data-mining for sure, maybe some biological simulation. Wouldn't it be neat if everyone's desktop was used to design custom drugs or check interactions or something) at some point in the future. We're talking about fundamentally different machines, for fundamentally different applications-- if something in the OS is preventing you from using HW resources you need to get a compelling application running, either

        1) that OS is going to adapt next rev, or
        2) It's going to lose market share, or
        3) The app really isn't all that compelling

        If I'm being honest with myself, the market for desktop supercomputers is relatively small, even for the medium term,say 2013, so anyone who's bothering probably isn't going MS anyhow.


        However, the other thing we have going for us is that there's nowhere else to go (though I'm not sure how the Penryn announcement plays into this). Sun's already there, IBM sort of, with Altivec, and with Cell (I still think the Memory arch is a beast,though). They're backed up against a 4GHz wall, and are going to be selling many-core (>4) processors because of marketing and incremental cost/performance. There's only so much L3/L2 you can use.

        I see it as similar to about 5 years ago, when processor speeds were leapfrogging like nobody's business, but real application performance wasn't improving that much because of the memory wall. That's why Processor-in-memory was such a hot idea. People kept lapping up the faster chips, for little good reason.

        Application wise, there may be some other arguments for multi-core. Right now, media-PC's are separate devices from your desktop, and functional convergence along with the advent of enormous displays(and the required processing) may have some role in making trivially parallel tasks happen in one box, driving the need for many-cores. But that may be hope talking.

        --Josh
        jtmodel
        • That OS issue

          Linux on cell works - and yes, it's a cast iron bitch to get anywhere close to rated performance with a real application, but it can be done - and done today knowing that the work will still have value as later cell generations hit. 4Ghz next year, 128bit, 5Ghz in 2009.. multiple microgrids to a CPU - and better faster, denser, memory to support it.

          Solaris on SPARC works too - not highly rated for super computing, but there's hope for a slew of next gen products tied to the idea that you can take a CMT mask and "plugin" additional components for manufacture.

          I think you'll see Intel go back to the megahertz wars with their nextgen products - and the key reason is simply that they live or die by microsoft software and have no choice but to work on making single threads go faster.
          murph_z
          • Re: OS issue

            What about using an Apple OS? They were using the PPC w/ Altivec in the G4 and G5
            chips. I am not aware of Apple OS having a 4 core limit like MS. Since Apple is using
            Intel x86 architecture now they two of them should play well together. Apple did
            make the development boxes that were used to write software for the cell chips on
            Xbox 360s. Maybe Apple is keeping the PPC side of OS X alive for just such an
            occasion.
            Mr_Dave
          • 4 core limit vs PPC

            1) it's not a hard limit - it's just that compilers become less and less effective as you go past 2 and fall off a cliff at 4.

            2) it's a compiler issue, not a hw issue -i.e. same for PPC as x86 even though Apple has altivec experience, most of the apps writers don't.

            3) Apple used to have a high precision lab that worked on this - gone now.
            murph_z