PaRSEC: Designing software for the exascale supercomputer generation

Supercomputers are getting faster than ever, but the next generation, which will be able to do a quintillion floating point operations per second, needs software that can keep up.
Written by Steven Vaughan-Nichols, Senior Contributing Editor

Anyone who programs for high-performance computing (HPC) knows that what works for standard computing doesn't work for supercomputers. Now, with exascale supercomputers, which will be able to do a quintillion (1018 or 1,000 quadrillion)  floating point operations per second, in sight by decade's end, it's time for software to get ready to handle this new generation of speed.

Today, Tianhe-2 is the world's fastest supercomputer. By 2020, exascale supercomputers are expected to be one-thousand times faster. (Credit: TOP500)

While some experts, like Horst Simon, the Deputy Director at the Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center (NERSC ) is willing to bet we won't have exascale computers by 2020, other supercomputer pros are sure we will.

Exascale computers will be about 1,000 times faster than today's top Linux-powered petascale supercomputers. How big is that? Well if you had a quintillion pennies, it would make up a cube about five miles to a side. 

Leaving silliness aside, Simon pointed out that exascale supercomputing will fundamentally break "our current programming paradigm and computing ecosystem." In other words, even if we build the hardware, we won't be able to use it efficiently.

Jack Dongarra, distinguished professor of computer science at the University of Tennessee, Knoxville and creator of the TOP500 supercomputer list, thinks we will have exascale computers and doesn't disagree with the second part of Simon's position. That's why he's working on designing software that will make the next generation of supercomputers operational.

Dongarra recently received a million-dollar grant over three years from the US Department of Energy to find answers for these programming problems. Called the Parallel Runtime Scheduling and Execution Controller (PaRSEC), this aims to address the critical situation that is facing the supercomputing community due to the introduction of more complex supercomputer designs.

"You can't wait for the exascale computers to be delivered and then start thinking about the software and algorithms," said Dongarra in a statement. "The exascale computers are going to be dramatically different than the computers we have today. We have to have the techniques and software to effectively use these machines on the most challenging science problems in the near future."

According to Dongarra, "Today's supercomputers have processor counts in the millions. Tomorrow's exascale computers will have roughly a billion processors. In addition, the general makeup of the machines will differ dramatically through the use of multiple central processing units and hybrid systems to overcome barriers limiting today's supercomputers. These barriers include large amounts of heat and power consumption, leaking voltage and a limited bandwidth of data through the pins on a single chip."

The PaRSEC site states that PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. In it, applications will be expressed as a Direct Acyclic Graph (DAG) of tasks with labeled edges designating data dependencies. DAGs are represented in a compact problem-size independent format that can be queried on-demand to discover data dependencies in a totally distributed fashion. PaRSEC assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as Non-Uniform Memory Access (NUMA) nodes and algorithmic features such as data reuse.

Dongarra is also developing an algorithm to overcome a reliability problem associated with the increasing number of processors. If it works when one processor fails, the calculation may then have to be repeated partially or in full. The project aims to develop software that can survive failures.

Additional work on exascale computing will be done at meetings hosted by the National Science Foundation, are held around the world annually. 

Related Stories:

Editorial standards