The world's most powerful supercomputers can require many megawatts of electricity to operate. But what if the next factor of 1,000-fold performance increase needs 100MW, asks Andrew Jones.
I was recently interviewed about deploying the world's largest supercomputers for The Exascale Report, a magazine focused on the evolution of supercomputing and the targeted 1,000-fold increase in compute power in the next 10 years.
Inevitably the interview covered the huge costs and especially energy. It got me thinking. What follows is not yet my opinion — but it might start an interesting discussion.
There are a range of estimates for the likely power consumption of the first exaflops supercomputers, which are expected at some point between 2018 and 2020. But probably the most accepted estimate is 120MW, as set out in the Darpa Exascale Study edited by Peter Kogge (PDF).
At this figure, the supercomputing community panics and says it is far too much — we must get it down to between 20MW and 60MW, depending who you ask — and we worry even that is too much. But is it?
Supercomputers as scientific instruments
First an aside. In my opinion, the largest supercomputers at any time, including the first exaflops, should not be thought of as computers. They are strategic scientific instruments that happen to be built from computer technology. Their usage patterns and scientific impact are closer to major research facilities such as Cern, Iter, or Hubble.
Back to the question of power consumption. I looked at other major scientific facilities for comparison. So, some quick web searching shows that Cern idles at 35MW and peaks at 180MW when all is running. It consumes about 1,000GW h per year — or the equivalent of about 120MW steady state.
In terms of construction costs, one estimate of the cost to design and deploy the first exaflops supercomputer facility is between $1bn (£650m) and $2bn, with subsequent procurements of early exaflops of the order of a few $100m each. Well, from the above links, the LHC at Cern is a $9bn project, Iter has a $5bn build budget and a further $5bn operational budget over 35 years — and so on with Hubble and Hugo.
So our power requirements are not that outrageous compared with other major scientific facilities. Neither are our overall costs. My question is: are we making such a poor case for supercomputers that we get scared by 20MW to 60MW and a few $100m for the biggest?
Major impact on disparate sciences
One of supercomputing's greatest strengths is its ability to have a major impact on research across a huge range of disparate sciences from climate to medicine to aerodynamics to cosmology. Is this also one of its weaknesses — that any statement of its value is always a list of many sciences, rather than one simple message as it can be for a major facility owned by a single discipline?
Another clue may lie in the extreme pace at which supercomputing technology evolves — and thus we need a new supercomputer every...
...year not once every decade, as in the case of Cern or Iter. And, over the first two to four years of a new technology range such as petaflops, we want one for each of the major supercomputing countries — or several for the USA.
What if we compared two paths? The first option is to carry on as now, so for each year between 2020 and 2030, different members of the high-performance computing (HPC) community buy the largest machine viable for about $1m to $200m, starting at about 1 exaflops, with each machine having a three-year lifetime.
The second option is to follow the other major scientific facilities. The global HPC community collaborates for the next 10 years to design a 50-exaflops supercomputer for 2020, with all required operating software, algorithms, applications. Then, in 2020, the collaborating global community deploys one 50-exaflops supercomputer, and then that is the only facility until 2030. No more buying supercomputers for a decade.
Total exaflops-years delivered
Now count total exaflops-years delivered over the decade for each of the two options. Because of the exponential performance increases of supercomputing, the first option wins by the end of the decade in terms of both total exaflops-years delivered and peak exaflops on the ground. So maybe our present game is right.
But, not so fast — this is only the cost of deploying the facility, not the total cost of science, which would include the costs of algorithm development, implementation and validation, and the cost of the scientist time to perform the research.
The second option could require one large rewrite of codes and validation, while the first option seems to require almost continual rewrites. And a 50-exaflops machine in 2020 is more valuable than one in 2030 — in terms of getting results for science and society sooner. There is real value to society and science in having a particular set of climate, energy or medical research results 10 years earlier.
I can't estimate which is right without a lot more investigation — but the community seems locked into the first option without checking the second, which seems to work well for other major scientific facilities.
Of course, the budgets associated with major facilities that are international and are expected to serve a decade or more are much larger than our current HPC procurements.
As vice president of HPC at the Numerical Algorithms Group, Andrew Jones leads the company's HPC services and consulting business, providing expertise in parallel, scalable and robust software development. Jones is well known in the supercomputing community. He is a former head of HPC at the University of Manchester and has more than 10 years' experience in HPC as an end user.