X
Business

Gauging the gigaflops gap

Should PC users give a flip about the Mac's gigaflops?
Written by Don Granberry, Contributor
Here's what Apple Computer Inc. has to say on its Web site about the PowerPC G4 chip that powers its top-of-the-line Macs:

"The new PowerPC G4, architected by Apple, Motorola and IBM, is the first microprocessor that can deliver a sustained performance of over one gigaflop. In fact, it has a theoretical peak performance of 3.6 gigaflops."

The gigaflops concept is an essential one in the Apple (aapl) marketing canon; it's the foundation of Apple's claims of "supercomputer" performance for the Power Mac G4 and G4 Cube. Furthermore, as chips from Intel Corp. and AMD breeze past the 1GHz mark, Apple execs and loyalists alike have held up the gigaflops spec as evidence that the G4 is still in the running, performance-wise, even though the clock speeds on Motorola's Web site currently max out at 500MHz.

But what is a gigaflop and should we give a flip? Perhaps we should figure out what a "flop" is -- let alone gigaflops -- before deciding whether it matters.

One "flops" represents one floating-point operation completed per second. (A floating-point operation is any arithmetical calculation performed by a computing device involving decimal fractions.)

As the prefix implies, one gigaflops denotes a billion floating-point operations completed in one measly second.

Flops are usually carried out to 14 places to the right of the decimal point. Such operations are important because they place a disproportionate strain on the computer. The machine must work quite a bit harder to carry out arithmetic involving fractions than it does when performing arithmetic involving whole numbers or integers.

Measuring the performance of computers is almost as difficult as measuring the performance of vehicles, which is why pundits often liken PC benchmarks to tests designed to compare automotive performance.

In the automotive world, it is impossible to perform a meaningful comparison between a Formula One race car and a Peterbilt tractor: One is intended to travel at very high speed around a racing circuit, while the other is intended to haul large cargoes over the highway system.

The only meaningful comparison one can use to evaluate the performance of a Peterbilt tractor is to compare it to another tractor (say, a Mack) or an earlier model of Peterbilt. The same goes for the race car: You eventually take it down to the track and see if the McLaren can outrun the Ferrari, but first you check to see how the new car performs in comparison to your old car.

We try to gauge computer performance in much the same way. Nearly all the schemes used to measure computer performance use an older system, the "benchmark," to evaluate the performance of current systems.

These tests estimate the performance you can expect from a machine by running specific operating systems and common applications on it, then comparing its performance to that of an earlier model of computer in the same class.

This is a tricky business at best, especially when attempting to compare the performance and value of computers using different operating systems. Consider the classic -- and often controversial -- comparisons between Mac and Wintel hardware or, more recently, shootouts between Intel hardware running Linux or Windows.

So where do gigaflops come in? Finding out how many floating-point operations a chip can perform at a given clock rate is a handy benchmark for evaluating the performance of CPU chips.

It's like comparing automobile engines using a dynamometer. Dynamometer results are easy to grasp: You get this much brake horsepower at this many revolutions per minute at this amount of fuel consumption. Worries about handling, operator comfort, braking, and acceleration are not factored into this kind of evaluation.

There is -- wouldn't you know it? -- more than one fly in the gigaflop ointment: Floating-point operations vary considerably in difficulty. Different chips perform different floating-point operations at different rates, depending upon how the chip is organized internally.

Several of the major chip manufacturers have devised special modules within their CPU chips to enhance the performance of floating-point operations. The most powerful of these modules take advantage of the mathematical techniques used in matrix algebra to solve simultaneous equations. These modules, or "units," bear various names. Motorola has its AltiVec variant, which Apple marketing refers to as the Velocity Engine. AMD uses a module called 3DNow, and Intel currently uses a revised version of MMX.


Each uses essentially the same concepts expressed in silicon, and all such techniques are referred to as "single instruction, multiple data" or SIMD. None of these modules are exactly the same, so their performance varies, and none of them do you much good if your software hasn't been optimized to take advantage of their power.

All three SIMD designs are at their best when dealing with problems requiring a huge number of floating-point operations to be executed quickly, as in the processing of image files, sound files, or video. These units are also extremely useful in other fields, such as cryptography, fluid dynamics, and the calculation of orbits. (That's why scientists like to use the gigaflop as a measure of computing performance.) As Apple has been quick to point out, the official definition of "supercomputer" is a computer capable of a sustained performance greater than one gigaflops, or one billion floating point operations per second. By the standards of the computing industry, this definition is superannuated, since current supercomputers commonly perform at 100 gigaflops or faster. Such systems are massively parallel, using a great many processors inside a single computer, or they are clusters of computers with one or more CPU chip on each motherboard, using Gbit Ethernet to hook them all together.

Still, for a single-CPU desktop system, sustained computing rates greater than one gigaflops are nothing to sneeze at. Chips designed for desktop systems are just now reaching this level of performance, and Motorola is indeed the clear leader in a number of ways. I've included a chart that compares performance data for Motorola's 500MHz G4 with Intel's Pentium III running at 700MHz.

The Motorola chip uses only 27 percent of the clock cycles the Intel chip requires to do the same amount of work. Extrapolating from this data, the Intel chip would need to operate at 2.6GHz to achieve equivalent performance.

That's theoretical, of course; instead of trying to hit that clock speed any time soon, Intel will more likely make its chip more efficient as well as increasing its clock rate in order to match the floating-point performance of the G4.

On other fronts, the Intel chip uses 23 watts of power at 700MHz, while the Motorola chip uses 10 watts under full load at 500MHz. Meanwhile Intel chips running at near 1GHz draw a whopping 35 watts at full load.

While pricing on CPU chips is a constantly moving target, the 700MHz Pentium III and 500MHz PowerPC G4 seem comparable at around $190 and $195 each, respectively. The 1GHz flavor of Pentium III currently costs about $670 each when purchased in lots of a thousand.

All said and done, Motorola has good reason to be proud of the G4, but there is seldom little time to sit upon one's laurels in the CPU business. Advances by the competition are made on a daily basis.

The G4 is faced with substantial competition in the form of AMD's new Athlons. While the Athlon design is a bit power-hungry, consuming 39 watts at 700MHz and an eye-popping 65 watts at 1GHz, the University of Kentucky has constructed a 64-CPU cluster using 100base T connections that achieves sustained rates of 64 gigaflops by taking full advantage of the Athlon's 3DNow vector-processing module.

That tidbit suggests that the 700MHz Athlon is capable of sustained, 1 gigaflop rates when software takes full advantage of its design.

It takes more than an engine to produce a vehicle, and it requires more than a CPU to build a computer. The OS and the software to run atop it are both vital to the final performance mix. Software that fully exploits the PowerPC has been the glaring weak spot in Apple's game plan; it's also the reason Intel and AMD are not building faster, less power-hungry chips. All these systems are carrying huge bodies of legacy code that aren't easily abandoned. Apple has been struggling for years to optimize its operating system for PowerPC hardware, without abandoning the mountains of code written for Apple computers by third-party developers such as Adobe and Microsoft. Now, it looks as though Apple is finally about to succeed in making its operating system take full advantage of the PowerPC architecture.

Once this is done, it will remain to be seen if software developers will take full advantage of the new combination, enabling Apple to sell its higher-end workstations despite the slower clock rates.

However, neither IBM nor Motorola should sit on its laurels, since a prolonged lag will cause untold grief for the single most visible customer for the PowerPC. Like it or not, the majority of the public has been educated to accept megahertz as the spec to look for when buying a computer. Just as important, the competition is not allowing grass to grow under its feet when it comes to the specs where the PowerPC currently holds the edge.

MacEdition.com Contributing Editor Don Granberry began his career as a construction scheduler in the petrochemicals industry, eventually becoming a company "computer guru" and later the supervisor of information services at a prominent construction company in the Houston area. Since leaving the construction industry, he has done consulting and free-lance database work for a number of Houston-area firms.


Editorial standards