Benchmarks: AMD's 45nm 'Shanghai' Opteron - Part 3
Summary
Topics
Back to Part 2
Full virtualization without VMware
Like its Barcelona predecessor, Shanghai supports nested page tables, which AMD calls Rapid Virtualization Indexing (RVI). Intel does not support this technology on its current server platforms but is expected to offer this feature with its Nehalem architecture.
Users who employ virtualization software from market leader VMware are unlikely to see performance improvements from RVI because VMware's binary translation software is at least the equal of RVI. Technically, binary translation means nothing more than code patching, with the program code altered to match the virtual address space.
That technology only functions with supported operating systems, and is a possible source of errors — particularly when patching — if executables do not distinguish clearly between programming code and data. A hardware feature such as RVI is primarily a safe alternative to binary translation. Support for it has been implemented, or is planned, by all virtualization vendors.AMD's memory architecture is also useful for virtualization. If a four- or eight-processor system is divided up so that each VM gets a physical processor with four cores, then main memory can be used optimally for each one. Memory throughput can cause problems in virtualized environments, particularly with Dunnington systems where bottlenecks can be caused by up to 24 cores depending on a single memory controller.
Conclusion
In Shanghai, AMD has a processor that can take on Intel's Core 2 architecture in all areas. For pure arithmetic performance, Shanghai still falls short of Intel's chips. But once clock speeds are taken into account, the gap is mostly reduced to a single-figure percentage. Only in single-precision arithmetic using SSE instructions do Shanghai systems show a 35 percent performance shortfall against a comparable Core 2 architecture Xeon system.
Arithmetic performance is highly significant if a server is used for rendering, video encoding or as part of a compute cluster. Typical server tasks, such as file, print, web and database serving, put a special load on memory throughput and memory latency. In these areas Shanghai systems are clearly ahead of their Intel Core 2 counterparts.

Thanks to the memory controller integrated into each Shanghai CPU, memory bandwidth scales with each additional processor in the system — an advantage that becomes particularly evident in four- and eight-processor machines. Intel's four-processor Dunnington system tries to counteract the memory bottleneck by adding large quantities of L3 cache, which succeeds in improving synthetic benchmarks. Buyers considering acquiring a 24-core system usually have to cope with large datasets that don't fit in cache and put a much greater load on the external memory subsystem.
Shanghai systems are efficient in terms of power consumption, but Intel also did its green homework with its 45nm processors, so it can no longer be assumed that AMD systems use less power than comparable Intel systems.
Buyers looking for a balance between power consumption, computational performance and memory throughput will find Shanghai processors extremely attractive. But in two-processor systems, Intel can offer clock speeds up to 3.40GHz, with commensurate increases in arithmetic performance. AMD's top Shanghai processor currently runs at 2.70GHz, but from February 2009 a more power-hungry 2.80GHz version will be available.Processor development is forging ahead. Intel has already demonstrated what Nehalem can do on the desktop. As well as Nehalem's substantial increases in arithmetic performance, its integrated memory controller is particularly significant. Intel offers a triple-channel DDR3 controller that supports memory up to 1,066MHz, which corresponds roughly to the bandwidth of a hexa-channel DDR2 controller running at up to 533MHz.
If Intel releases the first two-processor Nehalem systems in the spring, it's reasonable to assume they will outstrip the present Shanghai CPUs in most areas. AMD will have to counter that threat with HyperTransport 3.0 and DDR3 support.
AMD's strength in four- and eight-processor systems seems set to continue. Intel is unlikely to offer Nehalem variants for these machines before the end of 2009. Intel's Dunnington chips cannot keep up with AMD's Shanghai because of their restricted main memory bandwidth. Consequently, the two chip-makers will continue to give each other good reason to improve performance and feature sets, and the power user will continue to reap the rewards.
Additional contributions by ZDNet.co.uk's Rupert Goodwins.Translation by Toby Wolpe.
Additional contributions by ZDNet.co.uk's Rupert Goodwins.Translation by Toby Wolpe.
Talkback - Tell Us What You Think
The best of ZDNet, delivered
ZDNet Newsletters
Get the best of ZDNet delivered straight to your inbox




