We’re at Intel’s ISTEP software event in Barcelona, getting a deep dive into the world of parallel computing. It’s an interesting place to be, as the physics of silicon has put an end to the seemingly constant increase in speed and power of processors. Sure, you can keep packing in the transistors (especially with the latest 32nm foundries starting to kick out the chips), but thermal effects mean that clock rates won’t go any faster – and in many cases will be slower. 1.2GHz to 2GHz will be the norm, with performance processors peaking at 3GHz or so. You’ll buy the latest and greatest PC and it’ll be slower than the machine you’re currently using.
Well, unless your software is taking advantage of the extra cores in your CPU. Most operating systems handle some of the multitasking for you, scheduling different applications on different cores, giving you a performance boost on existing software, but there are applications which really need the extra power. Fancy building a massive multi-gigapixel panorama of a world city? You’re going to need plenty of processor to stitch those images together – so you’ll need software that can use those cores. That’s what Intel’s trying to get the development world to do – use new compiler technologies to build multiprocessor applications right from the very start.
It’s something that James Reinders from Intel thinks about a lot. As far as he’s concerned, “Parallelism is here to stay”, and it’s not just silicon that’s driving it. It’s also the explosion in data – ten times as much data stored and used over five years, and the rate is still accelerating. He’s clear that we need multicore systems to process it, just to keep up. Working with one thing at a time, he says “Is a dead end.”
And it’s up to software developers to change things. Lessons from the high performance computing world (where parallel computing has been the norm for decades) need to translate down on to the desktop, and even further, with embedded systems using Atom soon getting multicore capabilities.
Part of what Intel’s doing to encourage multicore development is its Parallel Studio, which plugs into Microsoft’s Visual Studio IDE, and adds tools for understanding and managing parallelism in applications (a beta for Visual Studio 2010 is dues soon). What Parallel Studio adds, along with support for Intel’s low level parallel building blocks, are tools for debugging and analysing parallel code. Reinders believes that tools like this are increasingly important, as they provide a valuable level of abstraction. Instead of using native threading calls for parallel development, which tie applications to a specific number of cores, Intel’s tools like its Threading Building Blocks are able to work with any number of cores. With future silicon architectures likely to be an asymmetric mix of large and small cores (mixing smaller capabilities with bigger) or attached processing mixes of standard CPUs and accelerators, abstracting development away from the physical is going to be increasingly important.
Cilk is a compiler-centric implementation of a subset of Threaded Building Blocks, making it easier to add parallel code to your applications. Ct takes a different approach, focusing on working with data, and is another C++ template library. Intel says that these are solutions are intended to become standards, and is working with other compiler vendors to move them to other platforms – and to get feedback on its proposals.
We really like the idea of Cilk – it’s so very simple. All it does is slightly change a handful of keywords, and the compiler then handles the rest, producing code that will run sequentially on single processor machines, or in parallel on machines with more cores. The semantics of Cilk are straightforward, and just modify existing sequential code (changing loops from for to cilk_for) It’ll scale from Atom all the way up to multi-socket Nehalem EX beasts – though only using Intel’s compilers. But then, as Reinders says, “Someone has to be first.” There’s a common resource manager shared between Cilk and OpenMP, but the two remain very separate technologies, Cilk for C and C++, OpenMP for Fortran.
Intel hopes to use the same way of standardising Cilk and Ct as the industry did for OpenMP, with initial separate implementations coming together in an appropriate industry body. Reinders doesn’t think there’s any point in going straight to ISO (or any other general purpose standards body) with either approach, after all, the industry consensus/co-development route is the way C and C++ were developed, and Intel isn’t going to work on a standard with companies or individuals that aren’t going to implement it.
Reinders sees multiple models for parallel processing that form a three level hierarchy. First there’s the typical high performance computing approach of co-ordination and message passing, Below that there’s task parallelism, and then at a lower level, data parallelism. Sometimes they’ll overlap, and the technologies used to deliver them will vary. In the future there’ll be many more technologies from more vendors – and they’ll all work together.