The first one I’ll go over today is task and data parallelism. What are they and what do they mean and then I’ll finish with talking a bit about Amdahl’s Law. Amdahl’s law is a very interesting observation but sometimes it is used to predict things that simply don’t make sense.
So let’s start with task and data parallelism. Parallelism is doing multiple things at once, but there are fundamentally two types of parallelism that people will talk about. One is task parallelism and the other is data parallelism.
Task parallelism is pretty simple. It is the concept that you have a lot of data that you want to process - perhaps a lot of pixels in an image, perhaps you have a whole lot of payroll cheques to update. Taking that data and dividing it up among multiple processors is a method of getting data parallelism. This is an area that supercomputers have excelled at for years. It is a type of problem that is pretty well understood and more emphasis has been put on data parallelism the past than on task parallelism.
Task parallelism on the other hand is where you have multiple tasks that need to be done. So perhaps you have a large data set and you want to know the minimum value and you want to know the maximum and you want to know the average. This is a rather trivial example but you could have different processors each look at the same data set and compute different answers. So task parallelism is a different way of looking at things instead of dividing up the data and doing the same work on different processors, in task parallelism what you’re doing is dividing up the task to apply.
The most common task parallelism is something called pipelining. Pipelining is something where you have multiple tasks say task 1 and task 2 and task 3 and instead of have each one of them operate on the data independently what you actually do is take the data and give it to the first task and process it and the second task and the third task. Image processing is often done in a pipeline fashion – graphics processors often do pipelining. You stream an image and some of the processing starts with the first task – a certain filter is applied and then it is passed on and passed on and passed on. This is a very common combination of task and data parallelism. So this takes a little getting used to. It’s worth getting into and understanding where the parallelism is in your application. In reality usually there’s other types of parallelism and as you start to look for that and understand it you'll start to understand whether it’s best to write your program for task parallelism or data parallelism or perhaps both, for pipelining.
Now with that in mind you’ve got the idea you either going to divide up your task or your data or a little of both and the question becomes how much speed up can you expect, how much performance are you going to get from these multicore processors. There’s a very famous observation by Gene Amdahl. It’s known as Amdahl’s law and that is that the speed up is limited by the serial portions of your program, serial meaning the parts that don’t run in parallel. So let’s take a look at a simple example here of a program that’s got five different parts to it and each part takes 100 units of time. In this case lets assume that the first second and fifth parts are serial and we’re not going to find any parallelism in it. But let’s look at the second and fourth part and assume that we’re finding data and task parallelism. So initially the program takes 500 units of time to execute so if we can change the second and fourth so they only take 50 units of time because we run them on two processors, this overall program is only going to take now 400 units of time to execute which represents a 25 percent speed up. We can continue this and perhaps we have four processors and we make each one run in 25 units of time and we can see how this progresses. Each of these parts of the program now only take 25 units of time to execute because we have four processors doing the work we were doing before. We get the execution time of the program down to 350 units of time, representing a 40 percent speed up.
Now the thing bout Amdahl’s law is it continues to focus on the fact that the serial portions aren’t getting smaller. We can envisage taking a huge number of processors and making these two parts of the program run in no time at all. Still the program takes 300 units of time to run because we’ve left the serial portions behind. This is key to Amdahl’s law. Basically says you’re going to be limited by the serial portion of your code, and Amdahl's law predicts for this particular simple example that no matter how many processors you have, you're never going to get more than a 70 percent speed up.
If you take this at face value, Amdahl's law speeding up today’s programs you can see that multicore doesn’t look very appealing, parallelism in general doesn’t look appealing. So what's going on? We’ve built supercomputers with thousands of processors, so clearly Amdahl’s law isn’t all there is to it. Gene Amdahl came up with his observation in 1967.
Surprisingly it took quite a long time but in 1988 Gustafson pointed out something about Amdahl’s law. He took quite a different look at it and he said look, computers don’t do the same work year after year after year. As computers get more powerful we tend to throw bigger and bigger workloads at them, with more and more data to process. And we see that today. So if we take a different look at this, and say our program that runs in 500 unit of time, we’re going to double the amount of work we’re doing in our parallel sections and now we’re going to get 200 units of work done in the same amount of time. The program still takes 500 unit of time to execute but we get 700 units worth of work done and that’s a 40 percent speed up with just two processors. And if we continue this trend and we add a couple more processors each getting the same amount of work done we can get 1100 units of work done in the same amount of time and that’s a 2.2 speed-up. Gustafson’s observation basically says that if we can keep increasing the amount of work we want to get done, which is the history of computing, that the serial portions actually have diminishing impact on us and that we actually can see speedups on the order of the number pf processors that we have. So it may not be perfect use of parallelism, perfect speed-up - every processor doesn’t increase our speed, so if we have ten processors we don’t get ten-fold seed up, but we are getting linear speed-up, we can expect 20 processors to give us roughly double the processors that we’d get with ten processors.
So if you ever hear Amdahl's law quoted as a reason that parallelism won’t work out for us you can come back and make the observation that Gustafson, had an explanation for what to do. And this is the key behind why supercomputers have been successful with parallelism, because we continue to increase the datasets, and the same thing will happen with multi-core processors.