Where will the next big leaps in performance and power efficiency come from? Increasingly, the industry is looking toward a concept known as heterogeneous computing to save the day. This week, AMD revealed new features of its Heterogeneous Systems Architecture.
Your current PC uses a dual- or quad-core processor. And chances are very good that your next PC will, too. Smartphones and tablets are getting more cores, but they will soon hit the same ceiling. So where will the next big leaps in performance and power efficiency come from?
New process technology and microarchitectural enhancements will, as always, play an important role. But increasingly, the industry is looking toward a concept known as heterogeneous computing to save the day.
It turns out that personal computers and mobile gadgets already have lots of specialized cores for dedicated tasks. That's because while CPUs are great at general-purpose, single-threaded jobs, other types of cores can handle different tasks more efficiently. The most obvious is the graphics processor (GPU), which was designed to play games at high resolutions and quality settings, but is also very good at parallel number crunching. Other hardware engines handle tasks such as cryptography, video encoding and decoding, image processing, and audio. The idea behind heterogeneous computing is to harness the power in these cores to do other things.
Nearly everyone agrees that this is a great idea. It was one of the big themes at a recent a chip conference, the Linley Group's Mobile Processor Conference, which I attended. But it quickly became apparent that there's still a lot of work to be done. The hardware isn't designed to do this efficiently, it is difficult to write heterogeneous applications, and there are numerous overlapping efforts to make programming easier. The Khronos Group, an industry consortium, promotes the OpenCL standard; Nvidia has its CUDA APIs; Microsoft has DirectCompute extensions to DirectX for GP-GPU computing on Windows; and Google has the Renderscript API for heterogeneous computing on Android.
AMD is pushing a different approach, known as Heterogeneous Systems Architecture (HSA), which involves changes to the hardware platform, as well as a software runtime (known as HSAIL) and a set of interfaces for HSA-accelerated applications. This week the company shed a little light on exactly how HSA will work.
One of the biggest challenges in heterogeneous computing has to do with memory. In the traditional system architecture, the CPU and GPU are separate, and each has its own pool of memory. To do computation on the GPU, the data has to be copied from the system memory to the GPU's memory, and when the work is completed, copied back to system memory. All of this shuffling data around negates the advantages of doing computation on the GPU.
AMD's first mainstream APU (Accelerated Processing Unit), known as Llano, combined the CPU and a capable GPU — each with a separate slice of system memory — on the same chip. With the current Trinity APU, AMD introduced its first HSA features (a memory management unit that allowed the GPU to see all of the physical system memory, shared power management, and support for OpenCL C++ and Microsoft C++ AMP). But the basic software model has remained the same; the CPU and GPU can't work together on the same data.
The next step for HSA, heterogeneous Uniform Memory Access (hUMA), promises to solve this problem with three features: the CPU and GPU use the same pointers (addresses) to access the entire memory space to read and write data; they are cache coherent, so they can work on data at the same time without issues; and, like the CPU, the GPU supports paged virtual memory, which makes it possible to work with larger datasets. The net result is that the CPU and GPU can work together much more efficiently, and it should be easier to write applications that take advantage of both. AMD said developers will be able to write HSA-accelerated applications using standard programming languages such as Python, C++, and Java.
AMD's next mainstream APU, known as Kaveri and slated to ship by the end of this year, will be its first processor to support hUMA. The PlayStation 4 will also use an AMD APU, and based on some of the comments from the console's lead architect, it is possible it will use these HSA features. The next version of the Xbox, which will be announced on May 21, is also rumored to use an AMD processor. Since hUMA is a part of the HSA Foundation's public specifications, other members could also use it in future processor designs.
The HSA Foundation has attracted some big names, including ARM, Qualcomm, Samsung, Texas Instruments, MediaTek, and Imagination. But there are also some notable omissions, namely Intel and Nvidia. The question is whether AMD has the clout to get the industry to adopt this architecture and to get developers to build HSA-accelerated applications. Hopefully, the industry will eventually move toward hardware and software standards for heterogeneous computing so that applications will work across different platforms, automatically taking advantage of the best core for the chore.