Foundations of parallel programming II

Intel's James Reinders delves further into threading building blocks - a C++ template library for parallel programming.
Written by ZDNet Staff, Contributor
Hi, I’m James Reinders and we’re back with part two on Threading Building Blocks which is a C++ template library that provides everything that we would need to take up an application and make it run in parallel on C++. I’m going to go through the rest of Threading Building Blocks and tell a little bit about them. It will complete the set which we started by looking at the algorithms that Threading Building Blocks supplies.

Now, you can either look at it as a complete package with everything that you would need or, as we go through these, you may notice some things that can be very useful when used in conjunction with other ways of getting parallelism. So, you can either use the whole package or you can use bits and pieces of it.

So, to give some more background rather than just go through and list what Threading Building Blocks has in it, I’ll try to give a perspective on each one about some of the complicated problems that these packages solve to give an appreciation for why it’s probably better to go and find a package, like Threading Building Blocks, to solve these problems than it is to try to solve them yourselves.

So, real quickly the five other things in addition to algorithms that we have in Threading Building Blocks are:

    (i) Containers - in this case highly concurrent containers. Containers that can hold data but can be accessed by different threads in a parallel program.

    (ii) A Scalable Memory Allocator - also, it’s just a memory allocator but again, it’s designed to be able to be handled by multiple threads.

    (iii) Mutual Exclusion

    (iv) Timing capabilities

    (v) Access to the Task Scheduler itself in Threading Building Blocks to do more advanced things.

The templates that we discussed in Part 1 are actually all implemented as calls to the “Task Scheduler”.

So, let’s start with “Containers”. Threading Building Blocks offers three different containers. It offers a “queue”, a “vector” and a “hash” table. These are all things that STL provides as well but STL does not provide thread-safe versions of these. In this case, queue, vector and hash map can be accessed. They have an API, they have a design, they are implemented in such a way that they can be used by multiple threads concurrently. Just to give a flavour for it, queue seems like it would be a relatively simple thing to implement concurrently but it turns out that there’s a problem known as “ABA problem”. This can occur when you use a simple thing such as an atomic operation like compare and swap to implement a queue, and you will actually still have some problems with it.

So, queue is rather sophisticated in what it has to do to provide guarantees that it can be used concurrently. It’s an interesting problem to look at, the ABA problem and how to implement a queue. When I’ve looked at it it’s given me an appreciation for how much I would rather just use a container someone else has written to take care of the problem for me.

“Scalable memory allocator”. Perhaps this is the simplest one to think about implementing yourself but when you get into the details of this of how to implement it efficiently, how to create thread pools so that multiple threads can be allocating memory at the same time and releasing it without stepping on each other, compensating for cache lines so that you keep things far enough apart so that you don’t get false sharing. A scalable memory allocator is not as easy as it would seem initially. What you really want to do is just be able to call a malloc, be able to use new and delete and have it taken care of, and the scalable memory allocator let’s you do that. Let’s use a scalable malloc, a scalable new and delete functionality within your C++ program. So, this is a really nice feature and one worth looking at. Each one of these things can be used independent of the other so you don’t need to use all of them.

In order to give portability, Threading Building Blocks also defines “mutual exclusion”. This is having threads being able to access data in a manner so that one thread can exclude another thread from making modification until it’s done. There are two types of mutual exclusion here. Operations 1 are the traditional “locks” and “unlocks”. The other is “atomic operations” (operations that conclude all at once but are guaranteed to happen together).

The real value of using mutual exclusion defined by Threading Building Blocks is that they are portable. You can program it once and compile it or run in on multiple different platforms (Windows, Linux, Mac OS10 and there are multiple others you can take a look at porting to easily without changing your program).

“Timing”. This isn’t a critical thing to have but it turns out I think that everybody wants to have, which is the ability to find out what time it is and then a little later look at what time it is and then do a subtraction and figure out how long your program took to run. Well, getting a thread-safe version of that, one that gives you fine granularity, one that you can count on even if your thread is rescheduled on multiple processors, gets a little complex as well so it’s really nice that Threading Building Blocks defines that.

And finally, the real guts of Threading Building Blocks is its “task scheduler”. The algorithms that we discussed in Part 1 are all about defining task. Whether it’s with a parallel_for, or reduce or the operations/algorithms that are available, it ultimately creates a lot of task. The task scheduler then maps those onto threads, which are processors and makes your application run. The task scheduler is actually what we could call a “non pre-emptive unfair scheduler”. It uses some very advanced algorithms to even out and load balance across multiple processors. Even if you have a system that has processors with different capabilities or perhaps there are different applications running no some of the processors and not others, the load balancing that the task scheduler can do is very sophisticated. It helps you run your program as fast as possible.

So, these are the five features of Threading Building Blocks, in addition to the algorithms that we discussed in Part 1. If you want to learn more information www.threadingbuildingblocks.org has the software and documentation to download.

Editorial standards