ARM already owns smartphones and tablets, but the company has even bigger things in mind. At its annual developer conference last week, ARM and its customers talked up plans to push the low-power architecture into everything from smart watches to servers.
The conference, known as ARM TechCon, began only a week after the company announced its newest low-power design, the Cortex-A7. The company claims the A7, code-named Kingfisher, is the most efficient processor it has ever produced. Although it is a low-power core with a design similar to the Cortex-A5, at speeds of 1GHz or greater it should deliver performance that rivals the Cortex-A8 currently used in many high-end smartphones. The A7 core can be used in a cluster of one to four cores and ARM says a dual-core manufactured using a 28nm process will be 20 percent of the size of a dual-core A9 on 40nm enabling less expensive chips and devices. The primary target is sub-$100 Android smartphones, starting in 2013, but the A7 can also be used in enterprise storage, MP3 players, digital cameras, TVs and set-top boxes, broadband modems and smart meters.
At the opposite extreme, ARM has another new design in the works, the Cortex-A15, which is a high-performance follow-on to the A8 and A9. ARM first announced the A15, code-named Eagle, a year ago, but the first chips won't arrive until 2012. Like the A7, the A15 can be used in clusters of one to four cores, but it is designed to run at speeds up to 2.5GHz. Texas Instruments' OMAP 5 with a 2.0GHz dual-core manufactured on 28nm is likely to be the first A15-based SOC when it ships in the first half of 2012. The A15 is designed for high-end smartphones and tablets, but it should deliver sufficient performance to extend into laptops, all-in-ones and even servers. It has several new features--hardware virtualization, support for more than 4GB of memory (with 40-bit memory address extensions) and memory error correction-geared specifically to servers.
With the A7, ARM also introduced a new architecture called big.LITTLE that combines clusters of low-power A7 cores, for simple tasks, and high-performance A15 cores, for more demanding chores, on a single chip. This concept, known as heterogeneous multiprocessing, has been around for a while. Cray has tried it in supercomputers, IBM's Cell processor used in the Sony PlayStation3 console is a heterogeneous processor with a main processing element and eight co-processors, and TI's OMAP uses a mix of Cortex A and M cores, and DSPs (TI's Brian Carlson calls it "the best core for the chore"). Nvidia's Tegra 3 (Kal-El), which will reportedly be announced next week, pairs a 40nm high-performance quad-core A9 with a fifth "companion core"-also an A9 but using 40nm low-power transistors.
With traditional physical scaling slowing down, ARM's chief technology officer, Mike Muller predicted most of the future gains in performance and energy efficiency would come from this sort of heterogeneous multiprocessing. The combination of low-power A7 and high-performance A15 cores should allow ARM's customers to design chips with a wide range of operating frequencies and power on a 28nm HPM (High-Performance Mobile) process. Both the A7 and A15 use the same ARMv7 instruction set including the virtualization and 40-bit address extensions so any software that runs on one cluster will run on the other. The clusters are connected to one another over a bus that maintains cache memory coherency and firmware takes care of task switching so the process is invisible to the operating system and applications. ARM says that in typical mobile use 90 percent of tasks will run fine on the A7, resulting in power savings of 60 percent on MP3 playback, video streaming and casual gaming; greater than 40 percent on HD games; and 10 percent for browsing Flash-heavy sites when compared with a dual-core A9 only.
Many of the A15 and big.LITTLE SoCs will also use the upcoming Mali-T604, ARM's first GPU based on the next-generation Midgard architecture. Midgard is designed not only for 3D graphics but also for GPU computing. ARM says the Mali-T604, which can be used in configurations of one to four cores, will deliver 5x better graphics performance than the current Mali-400MP used in Samsung's Exynos 4210 among other processors. The Mali-T604 also supports Khronos OpenCL, Microsoft's DirectCompute and Google Android's RenderScript APIs for hardware acceleration of other applications. And it uses the same big.LITTLE cache-coherent interconnect, which means the GPU can read data directly out of the caches in the A7 and A15 CPU clusters--rather than having to move the data around the system-increasing performance and reducing system power. (ARM's Jem Davies wrapped up all the Mali-T604 updates from TechCon in this post.)
On the final day of its conference ARM announced a true 64-bit version of its instruction set. The purpose of the new ARMv8 instruction set is to allow ARM-based systems to address more than 4GB of system memory, in particular for use in desktops and servers. It is a superset of the existing ARMv7 instruction set so systems will still be able to run all of the legacy 32-bit applications. AppliedMicro demonstrated a prototype of its X-Gene processor, a 3.0GHz multi-core chip based on the 64-bit ARMv8 instruction set. This week Calxeda, a start-up backed by ARM among others, announced its EnergyCore ARM-based server SoC-and HP said it would test the chip in its Redstone Server Development Platform-but this is based on a quad-core A9 running the 32-bit ARMv7 instruction set. Other companies such as SeaMicro are likely to follow with their own ARM-based server processors.
Put it all together and ARM is clearly laying the groundwork to expand beyond mobile phones and challenge x86 in netbooks and laptops, desktops and servers. The introduction of versions of Windows 8 and Microsoft Office for ARM-based SoCs from Qualcomm, TI and Nvidia should also be a big boost for the ARM architecture.
Intel will defend its turf in two ways. First, it will leverage its lead in process technology speeding up development of Atom SoCs with 32nm Medfield next year, 22nm Silvermont in 2013 and 14nm Airmont in 2014. Second, Intel is pushing Ultrabooks to make laptops using the Core architecture look and feel more like tablets. I also expect to see AMD shift its focus more to its Brazos family of low-power processors based on the Bobcat core next year. This will include the Wichita and Krishna 28nm APUs, part of the Deccan platform for netbooks and ultra-thins, and reportedly an update to the 40nm Desna APU, code-named Hondo, for Windows 8 tablets.