Intel director James Reinders explains the difference between task and data parallelism, and how there is a way around the limits imposed by Amdahl's Law...
I'm James Reinders, and I'm going to cover two key concepts involved with parallelism today. They are terms that you'll hear when you start working with parallel programming, when you start looking at multicore processors and start writing programs for them, and they're really important terms to thoroughly understand. They're very simple concepts. Let's go over them.
The first one I'll go over today is task and data parallelism — what are they and what do they mean? — and then I'll finish with talking a bit about Amdahl's Law. Amdahl's Law is a very interesting observation but sometimes it is used to predict things that simply don't make sense.
So let's start with task and data parallelism. Parallelism is doing multiple things at once, but there are fundamentally two types of parallelism that…
…people will talk about. One is task parallelism and the other is data parallelism.
Data parallelism is pretty simple. It is the concept that you have a lot of data that you want to process — perhaps a lot of pixels in an image, perhaps you have a whole lot of payroll cheques to update. Taking that data and dividing it up among multiple processors is a method of getting data parallelism. This is an area that supercomputers have excelled at for years. It is a type of problem that is pretty well understood and more emphasis has been put on data parallelism in the past than on task parallelism.
Task parallelism, on the other hand, is where you have multiple tasks that need to be done. So perhaps you have a large data set and you want to know the minimum value and you want to know the maximum and you want to know the average. This is a rather trivial example but you could have different processors each look at the same data set and compute different answers. So task parallelism is a different way of looking at things. Instead of dividing up the data and doing the same work on different processors, in task parallelism what you're doing is dividing up the task to apply.
The most common task parallelism is something called "pipelining". Pipelining is something where you have multiple tasks, say task one and task two and task three, and, instead of have each one of them operate on the data independently, what you actually do is take the data and give it to the first task and process it and the second task and the third task.
Image processing is often done in a pipeline fashion — graphics processors often do pipelining. You stream an image and some of the processing starts with the first task — a certain filter is applied and then it is passed on and passed on and passed on. This is a very common combination of task and data parallelism.
So this takes a little getting used to. It's worth getting into and understanding where the parallelism is in your application. In reality, usually, there's other types of parallelism and, as you start to look for that and understand it, you'll start to understand whether it's best to write your program for task parallelism or data parallelism or perhaps both, for pipelining.
Now, with that in mind, you've got the idea you're either going to divide up your task or your data or a little of both, and the question becomes: how much speed-up can you expect, how much performance are you going to get from these multicore processors?
There's a very famous observation by Gene Amdahl. It's known as Amdahl's Law and that is that the speed-up is limited by the serial portions of your program, serial meaning the parts that don't run in parallel.
So, let's take a look at a simple example here of a program that's got five different parts to it and each part takes 100 units of time. In this case, let's assume that the first, second and fifth parts are serial and we're not going to find any parallelism in it. But let's look at the second and fourth part and assume that we're finding data and task parallelism. So initially the program takes 500 units of time to execute, so, if we can change the second and fourth so they only take 50 units of time because we run them on two processors, this overall program is only going to take now 400 units of time to execute, which represents a 25 percent speed-up.
We can continue this and perhaps we have four processors and we make each one run in 25 units of time and we can see how this progresses. Each of these parts of the program now only take 25 units of time to execute because we have four processors doing the work we were doing before. We get the execution time of the program down to 350 units of time, representing a 40 percent speed-up.
Now the thing about Amdahl's Law is it continues to focus on the fact that the serial portions aren't getting smaller. We can envisage taking a huge number of processors and making these two parts of the program run in no time at all. Still the program takes 300 units of time to run because we've left the…
…serial portions behind. This is key to Amdahl's Law. Basically it says you're going to be limited by the serial portion of your code, and Amdahl's Law predicts for this particular simple example that, no matter how many processors you have, you're never going to get more than a 70 percent speed-up.
If you take this at face value, Amdahl's Law speeding up today's programs, you can see that multicore doesn't look very appealing, parallelism in general doesn't look appealing. So what's going on? We've built supercomputers with thousands of processors, so clearly Amdahl's Law isn't all there is to it. Gene Amdahl came up with his observation in 1967.
Surprisingly it took quite a long time but in 1988 Gustafson pointed out something about Amdahl's Law. He took quite a different look at it and he said: Look, computers don't do the same work year after year after year. As computers get more powerful, we tend to throw bigger and bigger workloads at them, with more and more data to process. And we see that today.
So, if we take a different look at this and say — our program that ran in 500 units of time — we're going to double the amount of work we're doing in our parallel sections and now we're going to get 200 units of work done in the same amount of time, the program still takes 500 units of time to execute but we get 700 units worth of work done and that's a 40 percent speed-up with just two processors. And, if we continue this trend and we add a couple more processors each getting the same amount of work done, we can get 1,100 units of work done in 500 units of time and that's a 2.2 speed-up.
Gustafson's observation basically says that, if we can keep increasing the amount of work we want to get done, which is the history of computing, that the serial portions actually have diminishing impact on us and that we actually can see speed-ups on the order of the number pf processors that we have.
So it may not be perfect use of parallelism, perfect speed-up — every processor doesn't increase our speed, so, if we have 10 processors, we don't get ten-fold speed-up — but we are getting linear speed-up; we can expect 20 processors to give us roughly double the performance that we'd get with 10 processors.
So, if you ever hear Amdahl's Law quoted as a reason that parallelism won't work out for us, you can come back and make the observation that Gustafson had an explanation for what to do. And this is the key behind why supercomputers have been successful with parallelism, because we continue to increase the data sets, and the same thing will happen with multicore processors.