Supercomputing's future: Is it CPU or GPU?

Graphics processing units are a hot topic, but that does not assure them a place in supercomputing's future, says Andrew Jones

In the world of high-performance computing, graphics processing units are the talk of the town. But all that debate makes it no easier to identify their future place in the greater scheme of things, says Andrew Jones.

We're desperate to know the future of high-performance computing (HPC) as soon as we can, so we can best invest efforts for our supercomputing needs for the medium term. Without a crystal ball, we look to the past, starting with the massive assumption that how we got here is a good indication of where we are going next. We assume that how Product X succeeded, why Technology Y became dominant and the story behind Company Z's failure are all useful in showing us which of the current crop of products, technologies or companies will fail and which will go on to change our industry.

The big question in high-performance computing (HPC) now is: "What about graphics processing units (GPUs)?" While the bigger question is "What about the software?", GPUs are getting more attention.

Many-core devices
For brevity, I'll use the term GPU here to cover all many-core computing devices, meaning GPUs or accelerated processing units (APUs). Are GPUs here to stay, perhaps even to become the dominant HPC processor? Will GPUs go away as people realise they are hard to program? Which kind of GPU or accelerator is going to emerge as the market choice?

The dominant processor for the HPC market today — we'll use the Top500 as the market indicator — is the Intel Xeon x86-64. Of course, x86 only really took off in HPC when AMD came up with AMD64. AMD created and led the adoption of the type of processor that dominates the HPC market now, but Intel in the Xeon developed the most prolific product family within that x86-64 type, which now dominates the Top500.

Eventually, Intel won despite AMD's head start of several years and early performance lead with Opteron. Our first observation from history is that those who end up dominating the market may not be the ones who pioneered the underlying technology.

Thus, although it is reasonably safe to say that Nvidia, especially with Cuda, pioneered the GPU for technical computing and HPC, we cannot be sure at all that Nvidia will be the prevailing technology provider, if and when GPUs become the dominant HPC processor. AMD, Intel and Nvidia all have plans to deliver products that meet the HPC needs beyond current CPUs.

Critical momentum
The second observation is that for both AMD with x86-64 and Nvidia with general-purpose GPU computing (GPGPU), the provision of a software ecosystem of compilers, ACML, Cuda and community sites was critical to create momentum behind the technology. Some might even say that the success was because of the software and community rather than because of any advantage of the hardware over other similar solutions.

The third observation is that when early adopters started to...

... promote x86/x86-64 as a real technology for HPC — offering better price-performance against high-end supercomputers — many high-end computing advocates argued that price-performance was not everything and that real supercomputers would remain the best solution — even suggesting that commodity clusters would remain niche.

The CPU vs GPU debate sounds similar to me so far. It seems that price won over competing arguments in the past. For example, another processor trying to take on the RISC dominance at the time was Intel's IA64 Itanium. It offered something potentially better but at a price premium. x86-64 offered good enough, but cheaper.

This time it's different
Are there any key differences between the emergence of x86-64 in the Top500 and CPU vs GPU then? Perhaps. The new x86-64 processors were different from the dominant RISC processors, but not radically so in terms of programming style. Sure, optimisations were different, but so were different supercomputers, so the community was used to that.

Tools were readily available, due to the existing consumer demand for software development beyond HPC on x86, of which x86-64 was a nice evolution. Code had to be ported from RISC to x86, but we did it. GPUs are, however, very different to CPUs in terms of programming. The ecosystem for GPU-HPC is still immature, but growing in the areas of compilers, libraries and community.

Perhaps most critically, the various GPU-like options of, say, Fusion, Knights and Fermi are sufficiently diverse not to make it a simple choice between CPU vs GPU. The lesson from the past was that good code with long life expectancy could be developed without knowing upfront if it was to run on Opteron or Xeon.

Investment of effort
If we had to develop a major application now, for a longish use-life, we'd have to make a gamble between OpenMP or Cuda or OpenCL or the various products that hope to bridge the gap. Until that is fixed and GPU is generic enough to mean it doesn't matter at develop-time whose product will be used at run-time, the investment of effort to get the performance and cost rewards is a hard call.

But once standardisation happens, the final lesson from history is that cost-of-deployment — and sometimes cost-of-ownership — wins against the best option. RISC to x86 was fortunate not to be a drastic programming transition. The painful programming transition from the dominant CPU to the promising GPU might slow down adoption of potentially better price-performance, but is unlikely to stop the eventual changeover.

As vice president of HPC at the Numerical Algorithms Group, Andrew Jones leads the company's HPC services and consulting business, providing expertise in parallel, scalable and robust software development. Jones is well known in the supercomputing community. He is a former head of HPC at the University of Manchester and has more than 10 years' experience in HPC as an end user.


You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.
See All