GTC: A call to (parallel) arms

Moore’s law has hit the wall. Or rather, has hit one of its walls.
Written by Simon Bisson, Contributor and  Mary Branscombe, Contributor

Moore’s law has hit the wall. Or rather, has hit one of its walls. We can still keep making transistors smaller and smaller, but heat and power considerations mean that they can’t get any faster. In fact the denser they get, the slower they’re going to be.

That’s why silicon architectures have changed. Intel’s tick and tock is adding more and more cores, with its first truly modular architecture in Sandy Bridge. The same is true of AMD’s Bulldozer and Fusion, all of which give the big chip vendors easily replicable chunks of silicon that can be scaled down as processes get finer and finer. It’s a lesson that’s also been learnt by GPU companies, with vast arrays of massively parallel cores in each chip.

So, let me present the future: one where processors get more and more parallel, and where price/performance curves are gauged in TFLOPS per watt. You’re going to get a lot of bang for your buck, with only one downside. How are you going to write applications that take advantage of all that horsepower? It’s not just parallel code, it’s also offloading CPU intensive functions to GPUs, much like IE 9’s hardware-powered rendering engine. It’s not even about complex simulation software and high power business tools, with new GPU-powered features in Adobe’s recently announced Elements consumer tools.

That’s why we’re in San Jose this week, at the smaller (but much more interesting) of this week’s two big tech conferences. NVIDIA’s GPU Technology Conference isn’t your usual tech event. It’s much more like an academic conference, where the cutting edge of parallel programming is getting together to exchange ideas and to show the world just what general purpose GPU development is capable of delivering. Walk the halls between the conference sessions, and what you see are the hallmark of academia: conference posters that condense complex research topics into an A0 LaTeX printout. They’re fascinating pieces of work, covering computational biology, image processing, computational fluid dynamics, linear algebra, logic simulation, the list goes on –but there’s one thing they all have in common, they’re the underpinnings of our modern world.

It also means that there is a way to take advantage of all that processing power, that it is possible to take real-world problems, extract the underlying parallelism, and turn it into code. Some of it is NVIDIA’s CUDA programing model at work, but much of it is new parallel implementations of familiar mathematical libraries. Scientific programmers can take the tools they’re familiar with, and quickly speed them up – 10x, 50x, even 500x.

What happens in universities doesn’t take long to reach the rest of the computing world. The tools that sit in FORTRAN compilers today end up in Java and C# tomorrow – and JavaScript the day after. Once the concepts have been abstracted, it’s easy to move them from compiler to compiler to programming model to programming model. Already C and C++ developers using Microsoft’s Visual Studio get access to CUDA, with a set of add-ins and new compiler directives – all that’s really needed now are tools that help us model our applications and determine the underlying concurrency and parallelisms.

And that’s really the sticking point (that and the sad lack of parallel programming in many of the computer science degree courses out there). We need new and better ways of guiding developers into working with concurrency, in finding the parallelism that will speed up their code, and identifying the sequencing that will help reduce the risk of race conditions. We need general purpose equivalents of Intel’s Parallel Studio (as good as it is, it’s still to wedded to the world of x86), that can generalise parallelism to multicore x86 and to GPGPU. And we need those tools soon, too.

NVIDIA’s CEO Jen-Hsen Huang used his GTC keynote to reveal the next two generations of the company’s GPGPU technologies, codenamed Kepler and Maxwell, promising up to a 40x performance per watt increase over today’s technologies using 28 and 22nm silicon processes. He also hinted at the possibility that the same technologies would find themselves in future Tegra mobile application processors, giving ARM a (err) shot in the arm of power that would bring complex image processing to mobile devices.

We live in a parallel world, and our computing devices are finally getting the capabilities they need to work in it. All we need to do is write the code to take advantage of them.

Simon Bisson

Editorial standards