Building a 'Rosetta Stone' for supercomputing software

The project to make it simpler to write software for the increasingly complex mix of hardware inside the world's fastest computers.
Written by Nick Heath, Contributor on

Simulating the world down to the interactions between individual atoms needs more than rack upon rack of the most expensive computers money can buy.

In trying to create systems capable of conjuring up digital approximations of incredibly complex real-world systems, supercomputer designers have turned to specialised hardware.

Beyond packing in thousands of CPUs, the world's fastest supercomputers, such as China's Tianhe-2, increasingly offload information processing to a diverse range of circuitry. Working in tandem with CPUs are graphics processing units (GPUs), many integrated core (MiC) accelerators like the Intel Xeon Phi, and, for some machines, field programmable gate arrays (FPGAs).

Each of these devices vary in terms of the type of software algorithms they are suited to execute and data structures they can efficiently handle. Allocate your algorithms to the right hardware and you can increase performance and lower energy costs – a crucial consideration when systems like the Tianhe-2 can consume 17.6 MW of power.

However, writing software to run on these different hardware devices is a complex and challenging process. Developers generally need to include hardware-specific code and annotations that limit the portability of their software.

A group of researchers from three UK universities are aiming to simplify the process of creating software that can run on a heterogeneous mix of computer hardware. The project has been described as an attempt to create a software 'Rosetta Stone', after the ancient Egyptian stele which displays an identical text in three languages and thus played a key role in deciphering hieroglyphs.

The £2m research project – carried out by the University of Glasgow, Heriot-Watt Univeristy in Edinburgh and Imperial College in London - will create a compiler that can decide for itself which hardware device is best suited to run a particular block of code.

This compiler would allow programmers to write software to run on these heterogeneous systems without having to make explicit which hardware devices should run different sets of instructions. In this way the programmer could write software without worrying about the diversity of the underlying hardware, as the compiler would take care of assigning code to be executed by the appropriate device.

Building the compiler

The researchers are designing a compiler that will work with Fortran, the venerable high-level programming language still widely used by the scientific and HPC community to build models of weather systems and for other simulations in scientific investigations.

The compiler will, as a first step, translate the Fortran code into Single Assignment C (SAC), a functional language that can efficiently handle large arrays of data.

Next, the compiler will carry out program transformation on the SAC instructions, which are used to generate tranches of intermediate code, which will differ depending on whether the software instructions are to be run CPUs, GPUs, MiC accelerators or FPGAs.

The nature of this intermediate code will depend on the target hardware: for multi-core CPUs it will be targeted at the LLVM compiler; for GPUs and many-core accelerators like Xeon Phi it will be at the OpenCl framework; while for FPGAs the team will develop its own back-end compiler.

"We are posed with a very complex problem of program transformation if we want to tackle these heterogeneous systems and we can't afford, one, to do the transformations manually, and, two, to be wrong," said Dr Wim Vanderbauwhede of the University of Glasgow.

To help avoid errors during the program transformation process the teams will use the Multiparty Session Types system developed by professor Nobuko Yoshida at Imperial College.

The types system will ensure that the code generated during the transformation process remains semantically equivalent to the code it was generated from, for instance to avoid different outcomes when executing that code.

Researchers plan to build into the compiler a cost model that will allow it to decide which type of underlying hardware, for example a multi-core CPU or GPU, would most efficiently execute a particular block of code.

"For every function, you can run the cost model on it, analyse the code, and get a graph that shows what the communication patterns are for that particular bit of code. It's that communication pattern that tells you basically where it should be best run," said professor Sven-Bodo Scholz of Heriot-Watt University.

"If it's very parallel it's better run on the GPU, if it's less parallel and heavy on control flow it's better run on your multi-core CPU, if it has deep pipelines it should be better run on a FPGA.

"This could, potentially, be a very time-consuming process so we will need some kind of machine learning techniques that will help guide this. Otherwise the number of possibilities is just too large.

"We anticipate our compiler will take longer than a normal compiler in order to deliver that performance with guaranteed correctness. Nobody has done this, built a compiler that runs on whatever you have in your system."

As part of the project researchers will build the type system and cost model into the SaC compiler developed at Heriot-Watt University under professor Scholz. Researchers at Heriot-Watt have already used this compiler to generate code that can run on a GPU or multi-core CPU from one Single Assignment C code base.

Researchers intend to make the compiler available online, but anticipate it will take "one year or two" of the five year project before the first release is put out.

Editorial standards