X
Business

Intel opens up high performance multi-core library

The Intel Threading Building Blocks (TBB) library was released this week as open source under the GPLv2 (plus runtime exception) license. The library makes it much easier for programmers to take advantage of all those multiple-core CPUs on the market now (and in the future), including non-Intel machines!
Written by Ed Burnette, Contributor
tbb-bird-150.jpg
The Intel Threading Building Blocks (TBB) library was released this week as open source under the GPLv2 (plus runtime exception) license. The library makes it much easier for programmers to take advantage of all those multiple-core CPUs on the market now (and in the future), including non-Intel machines! James Reinders, software evangelist and director of Intel’s Developer Products Division, says that the Linux, Mac OS X and Windows ports for IA32, IA64 and Intel 64 have been in use for over a year. But when asked about other operating systems, he replied:
The TBB download will build for a G5 on MacOS X, and it will build on Solaris and FreeBSD for x86. Absolutely! We did the work ourselves. These have not been tested for long nor used by customers – I know of no problems… but these would be “alpha” release status now. We’ll help others as they join the community – for instance, we’re working with some engineers at Sun to expand Solaris support (to build with their compiler, and for SPARC)… there is no schedule, just engineers working together to see if there are issues and get it working.
According to Intel, the runtime exception used for the license is the same one used by libstdc++, which allows commercial/proprietary use (though Intel will be happy to sell you a commercial version if you need it). TBB is currently specific to C++ programmers and makes heavy use of templates, though there is talk of porting it to other languages like Java in the future. Conceptually it's similar to raw thread interfaces such as POSIX threads and Windows threads. However it operates at much higher level of abstraction. For example, say you have to iterate over an array with a million elements in it. Using raw threads, you might create two threads (or a thread pool), assign each thread to work on half the array, start the threads, and wait for the result. Using TBB you would create a 'task' class (not to be confused with an operating system task or process) which has a method that operates on a subset of an array. Then you ask TBB to iterate over the array using the task you just defined. For example,

00:#include "tbb/blocked_range.h"
01: class ApplyFoo {
02: float *const my_a;
03: public:
04: void operator()( const blocked_range< size_t >& r ) const {
05: float *a = my_a;
06: for( size_t i=r.begin(); i!=r.end(); ++i ) {
07: Foo(a[i]);
08: }
09: }
10: ApplyFoo( float a[] ) :
11: my_a(a) {}
12: };
13: void ParallelApplyFoo( float a[], size_t n ) {
14: parallel_for(blocked_range< size_t >(0,n,IdealGrainSize),
15: ApplyFoo(a) );
16: }

This frees up TBB to subdivide the work and schedule the tasks however it sees fit, so it will be more efficient than anything you're likely to come up with on short notice (no offense). Also, it makes your life easier because you've only written a couple lines of easy to read code instead of dozens of lines of difficult code. Another common paradigm implemented by TBB is a reduce operation. Reductions are expressions of the form x = x+y where + is associative. For this case, you give TBB a class that contains an operator() function that returns the local sum for the given range, and a join() function that combines two local results. Things get more interesting when you need to loop over something but you don't know how big it is, or the loop body may add more iterations before the loop exits. For example, consider code that calls a function on every element in a linked list. TBB provides a parallel_while class for this case. You pass it two classes, an item stream object that pops one element at a time off the linked list, and an apply object that does whatever it is you want to do to each particular item. Threading Building Blocks contains the following library components: Generic Parallel Algorithms
  • parallel_for
  • parallel_reduce
  • parallel_scan
  • parallel_sort
  • parallel_while
  • pipeline
Assistant Classes to Use with Algorithms
  • blocked_range (for use with algorithms, containers, etc.)
  • blocked_range2d (for use with algorithms, containers, etc.)
Thread-Safe Containers
  • concurrent_hash_map
  • concurrent_queue
  • concurrent_vector
Synchronization Primitives
  • atomic
  • spin_mutex
  • spin_rw_mutex (reader-writer spin mutex)
  • queuing_mutex
  • queuing_rw_mutex (reader-writer queuing mutex)
  • mutex
Task Scheduler Memory Allocation
  • scalable_allocator
  • cache_aligned_allocator
  • aligned_space
Timing
  • tick_count
Resources:
Editorial standards