Understanding task and data parallelism

Summary:Intel director James Reinders explains the difference between task and data parallelism, and how there is a way around the limits imposed by Amdahl's Law

…people will talk about. One is task parallelism and the other is data parallelism.

Data parallelism is pretty simple. It is the concept that you have a lot of data that you want to process — perhaps a lot of pixels in an image, perhaps you have a whole lot of payroll cheques to update. Taking that data and dividing it up among multiple processors is a method of getting data parallelism. This is an area that supercomputers have excelled at for years. It is a type of problem that is pretty well understood and more emphasis has been put on data parallelism in the past than on task parallelism.

Task parallelism, on the other hand, is where you have multiple tasks that need to be done. So perhaps you have a large data set and you want to know the minimum value and you want to know the maximum and you want to know the average. This is a rather trivial example but you could have different processors each look at the same data set and compute different answers. So task parallelism is a different way of looking at things. Instead of dividing up the data and doing the same work on different processors, in task parallelism what you're doing is dividing up the task to apply.

The most common task parallelism is something called "pipelining". Pipelining is something where you have multiple tasks, say task one and task two and task three, and, instead of have each one of them operate on the data independently, what you actually do is take the data and give it to the first task and process it and the second task and the third task.

Image processing is often done in a pipeline fashion — graphics processors often do pipelining. You stream an image and some of the processing starts with the first task — a certain filter is applied and then it is passed on and passed on and passed on. This is a very common combination of task and data parallelism.

So this takes a little getting used to. It's worth getting into and understanding where the parallelism is in your application. In reality, usually, there's other types of parallelism and, as you start to look for that and understand it, you'll start to understand whether it's best to write your program for task parallelism or data parallelism or perhaps both, for pipelining.

Now, with that in mind, you've got the idea you're either going to divide up your task or your data or a little of both, and the question becomes: how much speed-up can you expect, how much performance are you going to get from these multicore processors?

There's a very famous observation by Gene Amdahl. It's known as Amdahl's Law and that is that the speed-up is limited by the serial portions of your program, serial meaning the parts that don't run in parallel.

So, let's take a look at a simple example here of a program that's got five different parts to it and each part takes 100 units of time. In this case, let's assume that the first, second and fifth parts are serial and we're not going to find any parallelism in it. But let's look at the second and fourth part and assume that we're finding data and task parallelism. So initially the program takes 500 units of time to execute, so, if we can change the second and fourth so they only take 50 units of time because we run them on two processors, this overall program is only going to take now 400 units of time to execute, which represents a 25 percent speed-up.

We can continue this and perhaps we have four processors and we make each one run in 25 units of time and we can see how this progresses. Each of these parts of the program now only take 25 units of time to execute because we have four processors doing the work we were doing before. We get the execution time of the program down to 350 units of time, representing a 40 percent speed-up.

Now the thing about Amdahl's Law is it continues to focus on the fact that the serial portions aren't getting smaller. We can envisage taking a huge number of processors and making these two parts of the program run in no time at all. Still the program takes 300 units of time to run because we've left the…

Topics: Tech & Work

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.