The future direction of supercomputing may seem dazzling, but it raises difficult questions about the survival of important software at the heart of today's facilities, says Andrew Jones.
The supercomputing community generally agrees that the future holds a number of software challenges. The first of these is the massively increasing degrees of concurrency required — heading towards billion-way.
Then there is the complex hierarchy of parallelism — from vector-like parallelism required at the local level, through multithreading and onwards to multi-level, massive parallel processing across many nodes. On top of those challenges comes the impending storm of verification, validation and resilience.
Evolving our applications and middleware to address these issues is going to be a difficult, but necessary, job over the coming years, as petascale computers become increasingly common for scientific use and in corporate high-performance computing (HPC) facilities. The derived technologies place many teraflops in the hands of individual researchers, but they raise the same programming issues.
As some parts of the community consider the prospect of hundreds of petaflops and exascale computing — which may only be a few years away — others are starting to ask whether some of their applications are ever going to make it.
Proponents of this view argue that some legacy applications are coded in ways, or rely on algorithms, that make evolution impossible. The code refactoring and algorithm development would be greater than the effort of starting from scratch.
Others put it like this: "Don't let the code be the science." If you focus on the engineering challenge or the science, then the code constitutes an instrument. And as one instrument becomes incapable of addressing the scale of the problem required, move to a different instrument.
Even outside the computational arena, in traditional experimental science or industry test facilities, some research groups have enormous inertia and have become as attached to their instrument for investigating a problem as to the problem itself.
This issue is seen in computational science, too. Many researchers become so attached to their code, that its capabilities control their scientific direction — instead of their scientific ambitions controlling the code.
In industry, many companies, or their R&D departments, will claim to be experts in a specific area, but in reality are only specialists in one method. Perhaps a harsh view, but painfully true in some cases, where the researcher comes up with all sorts of thin reasons against moving instruments.
Of course, it is not that simple. A widely used code will have collected a large amount of data related to validation, knowledge of which methods match physical reality in different parts of parameter space, regions of numerical stability and so on — so that the code embodies much of the science. Thus moving to a new code potentially throws away hugely valuable scientific knowledge.
So perhaps two classes of applications will slowly evolve: those that will never be able to exploit future high-end supercomputers to the full, but will remain in use as their successors develop to comparable scientific maturity; and those that with appropriate investment can operate in the exascale regime or petascale personal HPC arena.
That situation creates a difficult balancing act for researchers, developers and funding agencies or company heads. They have to continue to provide the essential investment in scaling, optimisation, algorithm evolution and scientific advances in existing codes so that they can be used on high-end and medium-term mid-scale HPC platforms and avoid a possibly lethal competitive gap opening.
At the same time, they must divert sufficient effort into the development of codes to enable the next step in science or engineering design by running on the most powerful supercomputers of the future. Both tracks of investment are necessary for short- and long-term survival.
In both cases, tools exist to help, but they will only help skilled HPC programmers. They will not replace the need for skilled HPC programmers. Thus, investment in people is equally critical.
Perhaps it is time to repeat the call for more balanced investment. For example, divert between 10 and 20 percent of hardware procurement money into people and software. Don't shy away at the last minute because your latest supercomputer might have 10 percent less hardware speed or, for the Top500 chasers, not make the Top5, Top50, or Top-Whatever.
An increase in supercomputing capability is sometimes business-critical. However, since it is usually an order of magnitude or more, adjusting hardware investment into people and software development investment — on top of your normal operations budget — is not only credible, but is likely to lead to a greater overall computational improvement than hardware alone.
You never know, it might even pay enormous dividends in the science. There are, after all, plenty of case studies around the world's supercomputer centres that show this benefit.
Or perhaps we delay the exascale hardware by a year or two and divert the money saved into software that can be used for science when it arrives?
As vice president of HPC at the Numerical Algorithms Group, Andrew Jones leads the company's HPC services and consulting business, providing expertise in parallel, scalable and robust software development. Jones is well known in the supercomputing community. He is a former head of HPC at the University of Manchester and has more than 10 years' experience in HPC as an end user.