X
Tech

How to get 520 GigaFlops for $600

Certain specialized compute-intensive tasks have long been coded to take advantage of vendor math libraries and special hardware vector acceleration. Now there are new techniques that can be used to utilize the massively parallel graphics processing unit (GPU) found in modern high-end 3D video cards originally designed for gaming.
Written by Ed Burnette, Contributor

[Update: This article has been corrected (see below). -Ed]

Certain specialized compute-intensive tasks have long been coded to take advantage of vendor math libraries and special hardware vector acceleration. Now there are new techniques that can be used to utilize the massively parallel graphics processing unit (GPU) found in modern high-end 3D video cards originally designed for gaming.

The two main graphics card vendors are NVIDIA and ATI. ATI was recently bought by AMD but they still produce independent products that work on non-AMD computers so one can still think of them as an independent company, or subsidiary. There used to be a 3rd competitor named 3dfx, who pioneered the 3D video card market with their Voodoo series several years ago. I still have an "I believe" T-shirt (a take-off of the X-files) from a 3dfx conference that I attended. Eventually they were bought by their rival NVIDIA, leaving only NVIDIA and ATI as serious contenders.

Now both NVIDIA and ATI are trying to branch out of their gaming niche by allowing businesses and other high performance computing (HPC) users to tap into the enormous processing power in their GPUs. How much power are we talking about? The NVIDIA 8800GTX includes 128 stream processors running at 1.35GHz. According to the manufacturer, each processor supports the dual issue of a scalar MAD and MUL operation, for roughly 66.6 TFlops corrected: 520 GFlops of raw horsepower. (Of course this is just a theoretical maximum that is unlikely to be approached in a real world application.) The system supports thousands of independent, simultaneously executing threads and has 86.4GB/sec of bandwidth to its on-board 768MB memory. Still not enough power? Two of the cards can be lashed together to double the performance to over 1 TFlops.

NVIDIA CUDA (Compute Unified Device Architecture) technology enables the GPU to solve complex non-graphics computational problems using a modified version of the C language. You can also program it in Assembler. The CUDA Toolkit includes standard FFT and BLAS libraries and numerous examples with source code. Windows and Linux are supported. (A 8800GTX card has an MSRP of $600).

ATI's CTM (Close To the Metal) initiative is designed to open up the high-performance floating-point parallel processor array found in ATI graphics hardware. It exposes the instruction set of the ATI X1K Fragment Processor (X1K FP) used in the R580 GPU that powers the dedicated AMD Stream Processor card (MSRP $2215, evaluation $550).

Interestingly, 90% of all computers, especially business computers, are sold with cheap, integrated graphics controllers, usually made by Intel. At present, these are not suitable for high performance computing, but competition between Intel and AMD/ATI will likely change that situation in the years to come. Still, software that takes advantage of the GPU power will likely require the purchase of specific hardware to run it on, or come bundled in a hardware/software package.

Several concerns are likely to hold back this technology for all but the highest-end niche markets. First, both systems are very difficult to program. IBM's Cell architecture is child's play by comparison. See the related articles for more info. Second, there is no standard programming model so you have to code your core algorithms specifically for each vendor. If you're programming in assembler you might even have to code for specific model numbers. And finally, all current GPU hardware uses single-precision 32-bit floats instead of double-precision 64-bit arithmetic, and they may not fully implement the IEEE floating-point specs. That said, there are many hard problems waiting to be solved that could benefit from this technology.

Related articles

Editorial standards