Business

Intel opens up high performance multi-core library

Written by Ed Burnette, Contributor July 25, 2007 at 11:40 a.m. PT

The Intel Threading Building Blocks (TBB) library was released this week as open source under the GPLv2 (plus runtime exception) license. The library makes it much easier for programmers to take advantage of all those multiple-core CPUs on the market now (and in the future), including non-Intel machines! James Reinders, software evangelist and director of Intel’s Developer Products Division, says that the Linux, Mac OS X and Windows ports for IA32, IA64 and Intel 64 have been in use for over a year. But when asked about other operating systems, he replied:

The TBB download will build for a G5 on MacOS X, and it will build on Solaris and FreeBSD for x86. Absolutely! We did the work ourselves. These have not been tested for long nor used by customers – I know of no problems… but these would be “alpha” release status now. We’ll help others as they join the community – for instance, we’re working with some engineers at Sun to expand Solaris support (to build with their compiler, and for SPARC)… there is no schedule, just engineers working together to see if there are issues and get it working.

According to Intel, the runtime exception used for the license is the same one used by libstdc++, which allows commercial/proprietary use (though Intel will be happy to sell you a commercial version if you need it). TBB is currently specific to C++ programmers and makes heavy use of templates, though there is talk of porting it to other languages like Java in the future. Conceptually it's similar to raw thread interfaces such as POSIX threads and Windows threads. However it operates at much higher level of abstraction. For example, say you have to iterate over an array with a million elements in it. Using raw threads, you might create two threads (or a thread pool), assign each thread to work on half the array, start the threads, and wait for the result. Using TBB you would create a 'task' class (not to be confused with an operating system task or process) which has a method that operates on a subset of an array. Then you ask TBB to iterate over the array using the task you just defined. For example,

00:#include "tbb/blocked_range.h"
01: class ApplyFoo {
02: float *const my_a;
03: public:
04: void operator()( const blocked_range< size_t >& r ) const {
05: float *a = my_a;
06: for( size_t i=r.begin(); i!=r.end(); ++i ) {
07: Foo(a[i]);
08: }
09: }
10: ApplyFoo( float a[] ) :
11: my_a(a) {}
12: };
13: void ParallelApplyFoo( float a[], size_t n ) {
14: parallel_for(blocked_range< size_t >(0,n,IdealGrainSize),
15: ApplyFoo(a) );
16: }

This frees up TBB to subdivide the work and schedule the tasks however it sees fit, so it will be more efficient than anything you're likely to come up with on short notice (no offense). Also, it makes your life easier because you've only written a couple lines of easy to read code instead of dozens of lines of difficult code. Another common paradigm implemented by TBB is a reduce operation. Reductions are expressions of the form x = x+y where + is associative. For this case, you give TBB a class that contains an operator() function that returns the local sum for the given range, and a join() function that combines two local results. Things get more interesting when you need to loop over something but you don't know how big it is, or the loop body may add more iterations before the loop exits. For example, consider code that calls a function on every element in a linked list. TBB provides a parallel_while class for this case. You pass it two classes, an item stream object that pops one element at a time off the linked list, and an apply object that does whatever it is you want to do to each particular item. Threading Building Blocks contains the following library components: Generic Parallel Algorithms

parallel_for
parallel_reduce
parallel_scan
parallel_sort
parallel_while
pipeline

Assistant Classes to Use with Algorithms

blocked_range (for use with algorithms, containers, etc.)
blocked_range2d (for use with algorithms, containers, etc.)

Thread-Safe Containers

concurrent_hash_map
concurrent_queue
concurrent_vector

Synchronization Primitives

atomic
spin_mutex
spin_rw_mutex (reader-writer spin mutex)
queuing_mutex
queuing_rw_mutex (reader-writer queuing mutex)
mutex

Task Scheduler Memory Allocation

scalable_allocator
cache_aligned_allocator
aligned_space

Timing

tick_count

Resources:

TBB home page
TBB Book by James Reinders
Chapter one of Reinders' book (pdf; see especially the comparison with threads, MBI, and OpenMP).
TBB FAQs
Commercial home page
Press release
Intel releases the Threading Building Blocks Toolkit as Open Source (July 2007, Robert Kaye)
Coding with TBB Contest
Scalable Parallelism with Intel Threading Building Blocks (July 2007, Shwetha Doss, Intel)
Why Too Many Threads Hurts Performance (April 2007, Arch Robison, Intel)
Product Review: Intel Threading Building Blocks (December 2006)
Enable Safe, Scalable Parallelism with Intel Threading Building Block's Concurrent Containers (December 2006, Michael Voss, Intel)
Demystify Scalable Parallelism with Intel Threading Building Block’s Generic Parallel Algorithms (October 2006, Michael Voss, Intel) (This is a good introduction to TBB for programmers)

Editorial standards

Show Comments

Intel opens up high performance multi-core library

Related

Logitech mouse and keyboard users are getting a free AI upgrade

Linus Torvalds takes on evil developers, hardware errors and 'hilarious' AI hype

The best AI image generators to try right now