Inside Intel's Atom

Intel has launched its Atom processor, which will be seen in a wide range of computers, from Mobile Internet Devices to low-power notebooks (Netbooks) and desktops (Nettops). Here's a tour of the underlying technology.

The Atom processor has had a long gestation. It started in 2004, when work got going on the Bonnell core design project. This was a ground-up design of a brand-new x86 processor, intended as far as possible to put low power consumption first.

At the time, Intel had three main processor architectures: the Itanium, for high-performance computing; the x86 for everything mainstream; and ARM, for embedded products. Those included smartphones and handheld computers — large and growing markets, but ones where Intel wasn't competing especially effectively. The question Intel asked itself was whether it could extend the x86 into that area, bringing with it the huge advantage of compatibility with existing PC software.

It was a hard question. ARM has always been an exceptionally power-efficient architecture, with an instruction set explicitly designed for simple, fast decoding. This RISC (Reduced Instruction Set Computing) approach leads to simple, fast, low-complexity and thus low-power hardware. The x86, on the other hand, has an instruction set designed to provide a lot of powerful instructions with many options — CISC (Complex Instruction Set Computing). That makes the programmers' job easier (at least, it did in the days when programmers worried about instruction sets), but requires a large, complex and power-hungry processor.

The difference in approach becomes more pronounced when power considerations make it impossible to simply increase the clock speed for more performance: an already complex design becomes even more complicated as it adds more features like speculative and out-of-order execution. The Pentium chips of 2004 were heavily mutated from the original designs, and most of those mutations were aimed at speed, not efficiency.

Starting over
So the Bonnell team threw everything away and started from the simplest possible x86 design, only adding features if they added as much or more incremental performance benefit as they increased power consumption. In particular, they concentrated on the idea that within every CISC is a RISC struggling to get out: while the x86 instruction set is lopsided and baroque, most of the instructions are at heart simple ones. Moreover, those are the ones that are most commonly used; x86 processors have always used this realisation, but it assumed new importance.

There are other advantages to simplicity. Testing becomes easier. More excitingly, the size of the chip goes down. Chip company profits depend entirely on the simple equation that it costs the same to process a wafer, whether that wafer has one enormous processor on it or several thousand tiny ones, but you can make a lot more money by selling several thousand tiny processors. Plus, one defect on that wafer will result in one processor not working: if that's one out of a hundred, that's a 1 percent failure rate: if it's one out ten thousand, it's 0.01 percent.

The early stages of the Bonnell design took place in Intel's traditional cloak of extreme secrecy. However, two public events signalled some success in low power, high performance thinking within the company.

In August 2005, Intel formally announced a new transistor design process called P1264. This dramatically improved leakage current (the single most important parameter that defines how much power a modern chip takes), reducing it by up to a thousand times over other contemporary designs. Even more importantly, Intel had learned how to tune that figure, and was able to trade off power consumption against performance to a fine degree. This opened up the way for common architectures that could span extremely low-power portable chips up to server-grade processors.

A year later, Intel sold its ARM-based XScale processor division, a move widely interpreted as signalling the end of the company's involvement in the mainstream embedded processor market. Instead, it marked a growing confidence that x86 could after all become a contender.

 

Enter the Atom
And now the Atom, Intel's smallest processor. At under 25 square millimetres, it's a tenth the area of a Pentium 4 chip while having 47 million transistors compared to the P4's 42 million. A lot of that comes from the Atom being a 45nm chip, opposed to the P4's 180nm, but there are also a number of design efficiencies.

Intel's 45nm Atom processor occupies just 25 square millimetres — a tenth the size of the 180nm Pentium 4.

The Atom is a two-issue processor, meaning it can cope with two instructions simultaneously, as opposed to the more common three- and four-issue designs. Furthermore, it fully inspects each instruction before deciding what to do with it; other processors make assumptions and take the hit of redoing things if the guess was wrong. That turns out to be power-inefficient, especially with long pipeline architectures where the processor has lots of different instructions going through different stages of digestion: the need to go back and start again means an awful lot of joules that should have been spent on getting an answer are merely thrown away when the pipeline reloads.

The Atom architecture is a two-issue design with a 16-stage pipeline.

The Atom processor die.

Although the Atom's pipeline is longer than the Core 2 Duo's — 16 instead of 13 stages — each stage is rather simple. This has allowed the designers to concentrate on optimising for power consumption in a way that's far more difficult when dealing with significantly more complex circuitry as a single block, while letting each stage run efficiently at a high clock frequency.

One of the ways it keeps that pipeline efficiently fed is by treating the three phases of many x86 instructions — which typically get data, operate on it then store it away again — as a single entity to be passed down the pipeline, instead of splitting them into three separate micro-operations. Although there are plenty of complex operations that can't be handled by this model, Intel says that around 96 per cent can be passed through the pipeline as single chunks, with a good increase in efficiency.

The instruction decoders can also pair up instructions either from the same thread or from two different threads to dispatch simultaneously: this is the return of HyperThreading (HT) — an action which, Intel claims, can provide a 30 per cent speed boost for a 15 per cent power usage increase.

Intel's very low quoted idle power figures of a hundred or so milliwatts depend heavily on the chip's C6 sleep function. This is similar to that implemented in Penryn: the chip has a split power supply with one dedicated to keeping a special area of memory alive. This holds the complete chip state while the rest of the circuitry is effectively turned off — in reality, held at a very low voltage, to minimise the switch-on surge — and restores that state when the chip needs to wake up again. This is fine when the chip is idling, waiting for keyboard input or some other rare event, but the mechanism is unlikely to be invoked often when the chip is busy.

Other aspects of the design also have more effect on power consumption than might be apparent. The Atom is built out of collection of over 200 predefined circuit modules (called Functional Unit Blocks), that are designed and tested independently of the whole. One of the attributes of a FUB is that its clock and power can be independently controlled, giving the chip a very high degree of finesse when balancing performance versus power consumption.

The frontside bus (FSB) on the Atom can be configured at manufacture in one of two modes, GTL — the standard signalling technology used by existing chipsets — and a new CMOS bus that uses 2.5 times less power. The I/O chip legacy is stronger in Poulsbo, the support chip that together with the Atom makes up the Menlow platform (now called Centrino Atom). Poulsbo is a collation of various different operational units on a 130nm process. It brings together Intel's own 2D video controller, memory controller, PCIe, USB and so on — but the largest unit on the chip is a 3D GPU from Imagination Technologies, who own the PowerVR architecture. The chip is very much in the old school of I/O controllers, and it won't be until the Moorestown platform in 2010 that ten times greater power savings are expected through further integration.

Moorestown is expected to add a memory controller, video encode/decode and graphics to the main CPU — Lincroft — with disk and other I/O in the Langwell chipset, and a dedicated power management circuit looking after everything.

Until then, the Atom chip will continue to be considerably more energy-hungry than its ARM competition, which now includes the Tegra chip from Nvidia, albeit less so than devices like Via's C7 and Nano. The question is whether x86 compatibility, which gives mobile devices access to many more options in operating systems, applications and drivers, is more important than battery life or performance. Intel is betting strongly that it is — and that has a power all its own.

 

Silverthorne (Z series) Atom CPUs

SKUClock speed
TDP power
Average power
Idle power (C6)
FSBL2 cache size
Die size
Price (per 1000 units)
Atom Z500
800MHz
0.65W
160mW
80mW
400MHz
512KB
7.8mm x 3.1mm
$45
Atom Z510
1.1GHz
2W
220mW100mW400MHz512KB7.8mm x 3.1mm$45
Atom Z520
1.33GHz
2W220mW100mW533MHz HT
512KB7.8mm x 3.1mm$65
Atom Z530
1.6GHz
2W220mW100mW533MHz HT512KB7.8mm x 3.1mm$95
Atom Z540
1.86GHz
2.4W220mW100mW533MHz HT512KB7.8mm x 3.1mm$160

 

Diamondville (N series) Atom CPUs

SKUClock speedPower management
TDP powerVoltage regulator
Average powerL1 cache
L2 cacheFSB
Atom N2701.6GHzC0-C4 with enhanced C-state
2.5WIMVP60.6W32KB
+24KB

512KB
533MHz
Atom N2301.6GHzC0, C1
4WVRD11n/s
32KB
+24KB
512KB533MHz