Benchmarks: AMD's 45nm 'Shanghai' Opteron - Part 2
Summary
Topics
Ahead to Part 3
The 2.7GHz AMD Shanghai system consumes 183W when idle and 320W fully loaded. Equivalent figures for the 2.8GHz Intel Harpertown system were 190W (idle) and 320W (fully loaded). The 2.66GHz six-core Dunnington machine, with two SAS disks, drew 349W idle and 451W fully loaded.
Intel's Core 2 architecture, as used in the Xeon Harpertown and Dunnington CPUs, just about takes the lead when running arithmetic applications that make few demands on system memory (see graphs below). Intel's lead over AMD in these tests is nowhere near as clear as it was with Barcelona processors.In the Lavalys AES benchmark, which tests integer performance, the Shanghai system delivers a score of 36,908 points — some 11 percent behind Harpertown, which is 100MHz faster. The Dunnington system comes out well ahead, with a 67 percent advantage. But this comparison is not strictly fair, since the Dunnington system not only has four more cores but also 32MB of Level 3 cache in total, which allows it to cope without making many demands on system memory.
In the single-precision arithmetic Lavalys Julia benchmark, the Shanghai system with SSE scores 12,813, 35 per cent below Harpertown and 46 per cent below Dunnington. With SSE instructions, Intel usually exhibits a far higher performance than AMD. In the more relevant double-precision arithmetic tests of the Lavalys Mandelbrot benchmark, Shanghai with SSE2 is only 12 percent behind Harpertown and 62 percent behind Dunnington.





Memory throughput: no chance for IntelAs soon as main memory performance comes into play, AMD regains the advantage over Intel.
In rendering tests, the Shanghai (Opteron 2384) and Harpertown (Xeon E5462) systems deliver nearly identical results. The Shanghai system processed the Persistence of Vision (PovRay) benchmark 6.6 percent slower than Harpertown, with 3.5 percent less clock speed. The AMD machine took 5.3 percent longer to complete the Cinebench R10 benchmark than the Intel quad-core system. Meanwhile, with 50 percent more cores and an almost identical clock speed to the Shanghai test bed, the Dunnington (Xeon X7460) system ran the PovRay benchmark 54 percent faster and the Cinebench R10 test 41 percent faster.
Although CPU performance is the dominant factor in ray-tracing, memory throughput also plays its part. The Shanghai processors handled ZIP compression (Lavalys ZLib) 2.2 per cent faster than their Harpertown counterparts. That difference in performance becomes clearer with the 7Zip benchmark, in which the AMD system is 6.9 percent faster than the Intel quad-core machine. The 12-core Dunnington system outstrips Shanghai in the Lavalys ZLib benchmark by some 44 percent, but only by about 12 percent in the 7Zip test.
The Lavalys Photoworxx benchmark makes only small demands on arithmetic performance, but stresses memory throughput. Here AMD has the advantage, with the Shanghai system beating Harpertown by 10.5 percent. Dunnington's complex memory architecture and its snoop filter between the L3 cache and main memory get in the way. Intel's six-core system brings up the rear, 27 percent slower than AMD's quad-core Shanghai machine.





Talkback - Tell Us What You Think
The best of ZDNet, delivered
ZDNet Newsletters
Get the best of ZDNet delivered straight to your inbox




