And I really want to emphasise that when you look to do parallelism, one of the things we want to avoid is approaching the problem by going to the lowest programming model available for parallelism, which is to start working with raw threads – P threads or Windows threads. We really want to avoid that. We really want to work at parallelism at an abstract level where we’re describing a little bit about the parallelism of our application but mostly letting something take care of the low level details of that. For me the ultimate thing to look at is libraries. Now there are some limitations here but if you’re adding parallelism to a program and you can find a library that implements functionality you want and does it in parallel, this is a no-brainer and this is the first thing you should look to be doing. So take a look at whether what you’re doing in your application is something you could be doing a library call for, and if so are there any libraries available that do that in parallel. And even if there aren’t you may want to focus on taking those libraries and making them run in parallel if there’s something that you have. Some simple examples are: Intel offers a library MKL for mathematical operations and it is common to find libraries like that – IMSL offers them that operate in parallel. Also in multimedia you’ll find opportunities to get libraries that can do codecs, JPE and MPEG encode and decode and do those in parallel, those are pretty obvious examples. You’ll see this a lot in graphics; on Apple platforms you’ll see the core animation library which takes advantage of parallelism. So look around, look for libraries. This to me is always the number one thing to look for. It’ very easy, it fits into the way your program is today you call a library, it mysteriously uses multiple processors.
The second suggestion I have is to take a very god look at OpenMP. OpenMP if you haven't heard of it is a set of extensions for C, C++, Fortran that ad to the language some directives that will give compilers enough hints to take part of your program and run it in parallel. The reason it surprises people is that either they haven’t heard of it or they don’t realise that virtually every compiler out there supports this now. It was first introduced in 1996 and a decade later you’ll find it in virtually every C, C++ and Fortran compiler out there. It is awfully easy to use.
Imagine in C if you have a simply nest of For loops – For I, For J, For K – and you’ve got some data parallelism there, there is an opportunity to run this in parallel, but in general a compiler is not going to figure out to run this in parallel automatically. Well you can add one line of code: Add
pragma omp parallel for
And that’s roughly all you need to do. Depending on your application there are some additional directives you may need to give it. On the same line you’d say private (l,j,k). And this gives the compiler a little additional hint that the indices in these for loops should be private to each thread and that may increase performance. But in one line added to your program you’re able to tell it that the loop nest ought to run in parallel. OpenMP is somewhat limited in that it’s oriented around loop nests. It can do a little bit more than that and some implementations take it further for task parallelism and I think we’ll see OpenMP extended in the next few years to explicitly address task parallelism but for now it’s mostly oriented around loops but it does a fantastic job: it’s easy to use, it’s very portable, you can count on it being there on all platforms, virtually all compilers, so it’s worth a look. It’s a way to add parallelism that’s extremely simple and reliable.
The third one I have written down is threaded building blocks. I think we’re all reading a bit about threading building blocks and I think we’ll be hearing more about it, but let me back up a little bit here and say what I’m really after here is finding another level of abstraction to avoid dipping into raw threads so libraries and OpenMP are widely available and we search around and say what else is available? Well libraries are limited in scope, OpenMP really address C and Fortran, it certainly gets used in C++ programs but for a C++ programmer, threading building blocks is the place to look and it goes beyond the data parallelism of OpenMP and addresses task parallelism as well.
SO these are the 1-2-3s. SO no mater who I talk to that’s looking at adding parallelism in program, I always advise looking ain this order: Look for libraries, look for using OpenMP, look for threading building blocks, and by all means try to avoid falling into the mode f programming raw threads. This is something worth avoiding because your program’s not going to be as future proof, not going to be as easy to write, it’s going to be more prone to errors if you have to get down into that essentially assembly language of parallel programming. For more information on OpenM you can go to OpenMP.org and for more information on threadingbuildingblocks.org so take a look at these three items and good luck adding parallelism to your programming.