Intel's new Nehalem architecture features an integrated memory controller and runs two threads per CPU core. Our extensive benchmark tests reveal how well the new quad-core processors perform in practice.
Five years after AMD, Intel has produced its first CPU with an integrated memory controller. The AMD design was ahead of the game in a number of areas, and market leader Intel has integrated ideas from its competitor into the new Nehalem architecture. Until now, Intel has manufactured its quad-core processors from two dual-core dies. AMD always maintained that there was only one company that could build real quad cores — a distinction that Intel pooh-poohed. Now even that distinction has been lost: Nehalem (Core i7) CPUs consist of a single chip.
But that's not the end of the story. AMD processors communicate between themselves and with peripherals using AMD's Hypertransport, a point-to-point switched interconnect that maintains high bandwidth through ad-hoc independent channels. That technology contrasts with Intel's approach of having chips use the frontside bus to address not only memory but also to connect to other system components, sharing that channel between devices. That's no real disadvantage with single-core systems, and Intel has maintained performance in dual-core and quad-core systems by using large amounts of cache.
However, this old-fashioned way of communicating is a bottleneck for servers with multiple sockets. In the long term, even the 64MB on-chip cache with snoop filtering that Intel offers in its Xeon 7300 chipset or the 16MB Level 3 cache recently introduced into the six-core Dunnington could not help the chip giant remain competitive with AMD in the server field.
Intel's answer is to provide the Nehalem architecture with a technology called Quick Path Interconnect (QPI) that is comparable with Hypertransport. QPI is in the Nehalem desktop variants, codenamed Bloomfield, that are available later this month. The server variant, Gainestown, for two-socket systems is to follow in the first quarter of 2009, according to Intel boss Paul Otellini. Intel plans on introducing Nehalem chips for multi-processor systems in the second half of 2009, and QPI will also be part of Tukwila, the next generation Itanium processor, due at the end of this year.
Intel has also cribbed a few virtualisation ideas from AMD for the Nehalem architecture. With the introduction of the Barcelona processor, AMD offered Rapid Virtualisation Indexing (RVI) to allow virtual machines direct memory access. Virtualisation specialist VMware enthusiastically backed the AMD technology. The equivalent technology in Intel's Nehalem is called Extended Page Table (EPT).
On top of the ideas borrowed from AMD, Nehalem chips offer a number of additional features. For example, the four processor cores can work on two threads at the same time, a refinement of the P4's well-known Hyperthreading architecture. As well as the four physical arithmetic and logic units, a further four logic units are also available.
Unlike the AMD equivalent chips, which only support dual-channel DDR2/1066 memory, the Core i7 processors, officially available from 17 November, offer three DDR3/1066 channels. Thus the chips have a theoretical memory bandwidth of 25.5GB/s, compared with the AMD chips' maximum of 16GB/s. Individual Nehalem processors are differentiated by the speed of the QPI interface. On the top model — the Core i7 Extreme 965 — QPI runs at 3.2GHz, but only reaches 2.4GHz on the smaller models.
According to Intel, the new Nehalem processors are specified up to a memory speed of DDR3/1066, while the current Core 2 architecture can be operated with DDR3/1600 memory. But according to the benchmark tool Everest 4.60, the internal memory controller supports up to 1333MHz. It could be that the system would not work stably in all situations at that frequency, so Intel opted for the more conservative specification. For optimal performance no more than three memory modules should be used. If four DIMMs are used, memory performance falls because the important memory parameter Command Rate can only handle two wait states.
Nehalem processors offer a built-in overclocking feature called Turbo Mode. If a piece of software fails to make full demands on all the cores, the chip's internal logic ensures that calculations in the cores that are in use operate at a higher clock speed. Last but not least, the Nehalem processors come equipped with SSE4.2, a command set extension that might be particularly useful for accelerating processing of string variables in search engines. Programs such as browsers, email clients and text processing programs could also benefit from the faster processing offered by SSE4.2.
In terms of power consumption, the system with the Nehalem Core i7 965 Extreme processor core ranks about the same as Intel's previous best-performing chip, the Core 2 Extreme QX9775, although the Nehalem processor, with 731 million transistors, clearly has fewer electronic circuits than the QX9775 with 820 million. Because hyperthreading technology makes more intensive use of the arithmetic units than with the single threading cores, they take the same power overall as the more complex earlier designs despite having fewer transistors.
Power consumption (Watts): shorter bars are better.