[Update 9/16/2007 - After doing a little more digging, it appears that SPEC's rules require the clear statement of a basis of comparison www.spec.org/fairuse.html rule 3.b. Without a basis of comparison, there's no telling what the marketing department is talking about. Without the basis of comparison, it's like calling someone the best pitcher in baseball without remembering to fully disclose the basis of comparison that:
"he is the best at E.R.A., but not strikeouts, during day games that are part of a doubleheader on the home field with temperatures above 90 degrees against one-legged batters whose mothers are named Susan"
So if I were to say Michael Jordan was the best basketball player of his time, I do not need to list any qualifiers just as Intel doesn't need any qualifiers for posting the best published SPEC score. But if I said Yao Ming is the best player in the NBA today without any qualifiers, then that would be a problem. If I put an * next to Yao Ming's name and then put "* Among Asians" on the bottom of the slide, that would be acceptable. AMD failed to put * no auto parallelization on the bottom of the slide and that is a violation of SPEC's acceptable use rule.]
A lot of people in the press have been scratching their heads at some off the performance claims made by AMD at the Barcelona launch Monday night including my colleague Tom Krazit who also covered the event for news.com (sister site to ZDNet). Not only were results omitted for questionable reasons, but they were omitted without a proper disclosure. The media then picked up on these slides and presented it to the public as is.The slide in question (shown below with link to full size version) claimed to be a comparison of the best published SPEC CPU 2006 (Standard Performance Evaluation Corporation) results of the best Intel compiler results versus the best PGI compiler results as of September 9, 2007. What AMD didn't tell you was that they omitted better Intel results because of an optimization technique called "auto parallelization". I only learned of this because I've compiled the officially published scores from SPEC for the Barcelona launch and AMD's numbers didn't match mine and I asked AMD about the disparities.
AMD used these slower Intel E5345 results which had auto-parallelization turned off instead of using the best Intel E5345 results which would have meant AMD's 2350 should have trailed by 13.4% and not just by 5% as shown in the slide. Nothing at the live event disclosed this omission. AMD Spokesman Phil Hughes told me that the link to the slower SPEC results were included in some of the slides sent to the media but you would have had to follow the link and then manually look up all the compiler flags which is fairly cryptic.
[UPDATE 9/30/2007 - Arian Wong points out that the charts are badly skewed with a non-zero baseline.]
I asked AMD why they didn't disclose omitting auto parallelization (referred to as autopar in quote below) numbers from Intel. AMD Spokesmen Phil Hughes and John Taylor sent me the following responses.
Phil Hughes: As I mentioned after I sent you those slides yesterday, I didn't include the backup that had the SPEC links. As John indicated below, it is something we will consider doing moving forward. Does Intel indicate Autopar is turned on for their performance charts?
John Taylor: Autopar is a (SPEC legal but must be explicitly disclosed) way to minimize the Intel memory bottleneck. Instead of running the full SPECint workload, it runs a fraction of the workload on multiple paths. We'll look into including a notation that we are sourcing against the more appropriate, but non-autopar, number. Because we don't have a memory bottleneck, we have no reason to do autopar testing that reduces the workload. So for AMD, autopar is inherently a non apples-to-apples SPECint comparison.
But wait just a minute, every minor detail is already spelled out ad nauseam on the actual SPEC disclosure page (example here). To turn the question around on Intel as if they did something underhanded in including the best results with all the optimizations seems strange to me since it is accepted practice to quote the best official results from SPEC on marketing slides without having to regurgitate every minor tweak that's already fully disclosed on SPEC. I once made the mistake of not labeling something as an "estimate" since it wasn't officially published and I learned first hand how rigorously SPEC enforces its policies when I received an email and formal snail mail asking me to immediately fix my error. SPEC does not require that you post the entire disclosure so long as the dates and the numbers are accurate, but to silently omit your competitor's best scores and then present it as the best score available as of September 9th 2007 is not ok. As a result of this, the media walked away with a false impression and reported it as news which is why AMD needs to be called out on this one.
Switching gears to AMD's technical explanation for its choice, John's explanation that auto parallelization helps minimize Intel's memory bottleneck seems strange to me because the more parallelization of code there is, the higher the memory utilization. But even if it did get around the memory bandwidth problem for Intel, why would someone who consciously purchased an Intel compiler choose to cripple a feature that gives you better performance? If auto parallelization is an underhanded technique that improves application performance, PLEASE give me more of those underhanded techniques. Intel's 10.0 compiler auto parallelization feature will also boost AMD processor performance and Intel has explained to me in the past that their compilers produce some of the best performance results for AMD processors. Furthermore, AMD's favorite compiler PGI also supports auto-parallelization and here's an excerpt below.
"In addition to the data parallel capabilities of the PGHPF® compiler, the PGI CDK package includes the PGF95(tm) Fortran 95 compiler; the PGF77® FORTRAN 77 compiler; the PGCC® ANSI C compiler; and the PGC++(tm) ISO/ANSI-compliant C++ compiler. All of these compilers support automatic parallelization for SMP workstations using a simple compiler switch, and full native support for OpenMP directive-based SMP programming."
So is AMD promising a moratorium on auto parallelization from now on or only when it hurts Intel CPUs? But cherry picking Intel processor scores doesn't seem to be the end of it and it seems that AMD is even willing to omit better Opteron dual-core scores to make its new generation Barcelona processor improvements look bigger than they actually are.
One of the first performance slides that were shown at the Barcelona launch showcased the performance boost of AMD's new "Barcelona" Opteron 2350 quad-core CPU over AMD's previous generation Opteron 2222 dual-core processors on various SPEC benchmarks. When I saw the numbers, I had to wonder how AMD turned a 46% improvement over the 2222 on SPECint_rate2006 in to a 57% improvement showcased by AMD at the launch of Barcelona. The 46% improvement number is what I derived from officially published SPEC scores as of September 9th 2007 so I thought it was strange when the AMD Barcelona slides indicated a 57% improvement.
I contacted AMD for an explanation and Phil Hughes explained to me that the best 2222 SPECint_rate2006 results from Sun Microsystems (published June 26th 2007) were omitted because they were "8222SE". "SE" is AMD's designation for high wattage 120W TDP (Thermal Design Power) CPUs instead of the normal 95W TDP parts that scored 56.4 on SPECint_rate2006 from Fujitsu Siemens. Hughes argued that this is a fair comparison between the "Barcelona" Opteron 2350 quad-core and Opteron 2222 dual-core because both chips operate within the 95W envelop. But that would imply that the 120W 2222SE somehow runs faster because it can use more power and that's ridiculous because there is no performance difference between an "SE" high-wattage part versus a normal-wattage part.
The only difference between the Opteron 2222 and 2222SE is the fact that the 2222 lower-wattage part is a better quality yield that happens to leak less power than the 2222SE. The performance difference comes from differences in the system and/or compiler Sun implemented in the system they benchmarked and Sun's system showed the true potential of the Opteron 2222 processor. Had Sun used a regular 2222 in the exact same server with the same binaries used to benchmark the 2222SE version, it would have gotten the same results (within normal variations). By omitting the best 2222SE results, AMD can claim that their new Barcelona 2 GHz product is 57% better than the previous generation 3 GHz product when in fact it was only 46% better.