PaRSEC: Designing software for the exascale supercomputer generation

PaRSEC: Designing software for the exascale supercomputer generation

Summary: Supercomputers are getting faster than ever, but the next generation, which will be able to do a quintillion floating point operations per second, needs software that can keep up.


Anyone who programs for high-performance computing (HPC) knows that what works for standard computing doesn't work for supercomputers. Now, with exascale supercomputers, which will be able to do a quintillion (1018 or 1,000 quadrillion)  floating point operations per second, in sight by decade's end, it's time for software to get ready to handle this new generation of speed.

Today, Tianhe-2 is the world's fastest supercomputer. By 2020, exascale supercomputers are expected to be one-thousand times faster. (Credit: TOP500)

While some experts, like Horst Simon, the Deputy Director at the Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center (NERSC ) is willing to bet we won't have exascale computers by 2020, other supercomputer pros are sure we will.

Exascale computers will be about 1,000 times faster than today's top Linux-powered petascale supercomputers. How big is that? Well if you had a quintillion pennies, it would make up a cube about five miles to a side. 

Leaving silliness aside, Simon pointed out that exascale supercomputing will fundamentally break "our current programming paradigm and computing ecosystem." In other words, even if we build the hardware, we won't be able to use it efficiently.

Jack Dongarra, distinguished professor of computer science at the University of Tennessee, Knoxville and creator of the TOP500 supercomputer list, thinks we will have exascale computers and doesn't disagree with the second part of Simon's position. That's why he's working on designing software that will make the next generation of supercomputers operational.

Dongarra recently received a million-dollar grant over three years from the US Department of Energy to find answers for these programming problems. Called the Parallel Runtime Scheduling and Execution Controller (PaRSEC), this aims to address the critical situation that is facing the supercomputing community due to the introduction of more complex supercomputer designs.

"You can't wait for the exascale computers to be delivered and then start thinking about the software and algorithms," said Dongarra in a statement. "The exascale computers are going to be dramatically different than the computers we have today. We have to have the techniques and software to effectively use these machines on the most challenging science problems in the near future."

According to Dongarra, "Today's supercomputers have processor counts in the millions. Tomorrow's exascale computers will have roughly a billion processors. In addition, the general makeup of the machines will differ dramatically through the use of multiple central processing units and hybrid systems to overcome barriers limiting today's supercomputers. These barriers include large amounts of heat and power consumption, leaking voltage and a limited bandwidth of data through the pins on a single chip."

The PaRSEC site states that PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. In it, applications will be expressed as a Direct Acyclic Graph (DAG) of tasks with labeled edges designating data dependencies. DAGs are represented in a compact problem-size independent format that can be queried on-demand to discover data dependencies in a totally distributed fashion. PaRSEC assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as Non-Uniform Memory Access (NUMA) nodes and algorithmic features such as data reuse.

Dongarra is also developing an algorithm to overcome a reliability problem associated with the increasing number of processors. If it works when one processor fails, the calculation may then have to be repeated partially or in full. The project aims to develop software that can survive failures.

Additional work on exascale computing will be done at meetings hosted by the National Science Foundation, are held around the world annually. 

Related Stories:

Topics: Servers, Data Centers, Hardware, Linux, Open Source

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Are they really faster?

    These computers will do more things in parallel but are not really that much faster.
    Give them a non-parallel sequential task and suddenly they are not that much faster than PC.
    • If based on POWER arch..

      it would be between 4 and 7 times faster.

      This is based on the past and current POWER systems. They really are a fast single CPU core.
  • Questions about the problem

    I get it that as the number of processors, nodes, etc., increases there will be problems of OPERATING SYSTEM software as well as heat, current leakage in ever-smaller circuitry, communication between nodes, etc.

    But regarding APPLICATION software, are there any fundamental problems other than managing the SCALE of massively paralleled application software? If so, what are they (in general terms)? It seems that on the application side, for things like protein folding, meteorological and climate studies, analysis of geological sounding data, simulation of nuclear explosions, etc., scaling up shouldn't be that big a deal (no pun intended!) and shouldn't require FUNDAMENTALLY different techniques.
    • Not really. Linux will handle the OS base per node without issues.

      The problem isn't the operating system.

      It is in the communications manager. Right now, the maximum speed is limited by the number of communications paths.

      The fastest use a 3D torus, which requires 6 interconnects per node. It gives each node access to 6 adjacent nodes with a minimal latency. If the torus is treated as a bus... (or network segment - what is usually done) things slow down... a lot.

      What they have to work on is faster routing, faster networks... and then faster messaging.

      They also have to deal with better parallelism (automatic detection would be nice), unfortunately, the parallelism is determined via the algorithm used. There is no automatic way to translate a serial algorithm into a parallel one efficently. The results have always sucked.
      • Perhaps Rick R has a point

        The fundamental architecture of Unix / Linux is decades old. While I believe Unix variants will be the first gen OS for the new generation, limitations of antiquated architectures will become apparent immediately.

        However, a new OS that tackles the capabilities of the new hardware can't really be created until the hardware exists and people have the opportunity to learn and understand its capabilities.

        No OS is eternal, and Linux on Super Computers may not be the best choice in 10 to 15 years. All things evolve. As old Operating systems such as IRIS are long gone (no, that's not a typo, I worked on that ancient beast), perhaps UNIX and variants are getting a bit long in the tooth and the newest generations of hardware need more modern approaches to computing rather than a patched and cobbled OS.

        Only time will tell.