A closer look at AMD's heterogeneous computing

A closer look at AMD's heterogeneous computing

Summary: Where will the next big leaps in performance and power efficiency come from? Increasingly, the industry is looking toward a concept known as heterogeneous computing to save the day. This week, AMD revealed new features of its Heterogeneous Systems Architecture.

TOPICS: Processors

Your current PC uses a dual- or quad-core processor. And chances are very good that your next PC will, too. Smartphones and tablets are getting more cores, but they will soon hit the same ceiling. So where will the next big leaps in performance and power efficiency come from?

(Image: AMD)

New process technology and microarchitectural enhancements will, as always, play an important role. But increasingly, the industry is looking toward a concept known as heterogeneous computing to save the day.

It turns out that personal computers and mobile gadgets already have lots of specialized cores for dedicated tasks. That's because while CPUs are great at general-purpose, single-threaded jobs, other types of cores can handle different tasks more efficiently. The most obvious is the graphics processor (GPU), which was designed to play games at high resolutions and quality settings, but is also very good at parallel number crunching. Other hardware engines handle tasks such as cryptography, video encoding and decoding, image processing, and audio. The idea behind heterogeneous computing is to harness the power in these cores to do other things.

Nearly everyone agrees that this is a great idea. It was one of the big themes at a recent a chip conference, the Linley Group's Mobile Processor Conference, which I attended. But it quickly became apparent that there's still a lot of work to be done. The hardware isn't designed to do this efficiently, it is difficult to write heterogeneous applications, and there are numerous overlapping efforts to make programming easier. The Khronos Group, an industry consortium, promotes the OpenCL standard; Nvidia has its CUDA APIs; Microsoft has DirectCompute extensions to DirectX for GP-GPU computing on Windows; and Google has the Renderscript API for heterogeneous computing on Android.

AMD is pushing a different approach, known as Heterogeneous Systems Architecture (HSA), which involves changes to the hardware platform, as well as a software runtime (known as HSAIL) and a set of interfaces for HSA-accelerated applications. This week the company shed a little light on exactly how HSA will work.

One of the biggest challenges in heterogeneous computing has to do with memory. In the traditional system architecture, the CPU and GPU are separate, and each has its own pool of memory. To do computation on the GPU, the data has to be copied from the system memory to the GPU's memory, and when the work is completed, copied back to system memory. All of this shuffling data around negates the advantages of doing computation on the GPU.

AMD's first mainstream APU (Accelerated Processing Unit), known as Llano, combined the CPU and a capable GPU — each with a separate slice of system memory — on the same chip. With the current Trinity APU, AMD introduced its first HSA features (a memory management unit that allowed the GPU to see all of the physical system memory, shared power management, and support for OpenCL C++ and Microsoft C++ AMP). But the basic software model has remained the same; the CPU and GPU can't work together on the same data.

The next step for HSA, heterogeneous Uniform Memory Access (hUMA), promises to solve this problem with three features: the CPU and GPU use the same pointers (addresses) to access the entire memory space to read and write data; they are cache coherent, so they can work on data at the same time without issues; and, like the CPU, the GPU supports paged virtual memory, which makes it possible to work with larger datasets. The net result is that the CPU and GPU can work together much more efficiently, and it should be easier to write applications that take advantage of both. AMD said developers will be able to write HSA-accelerated applications using standard programming languages such as Python, C++, and Java.

AMD's next mainstream APU, known as Kaveri and slated to ship by the end of this year, will be its first processor to support hUMA. The PlayStation 4 will also use an AMD APU, and based on some of the comments from the console's lead architect, it is possible it will use these HSA features. The next version of the Xbox, which will be announced on May 21, is also rumored to use an AMD processor. Since hUMA is a part of the HSA Foundation's public specifications, other members could also use it in future processor designs.

The HSA Foundation has attracted some big names, including ARM, Qualcomm, Samsung, Texas Instruments, MediaTek, and Imagination. But there are also some notable omissions, namely Intel and Nvidia. The question is whether AMD has the clout to get the industry to adopt this architecture and to get developers to build HSA-accelerated applications. Hopefully, the industry will eventually move toward hardware and software standards for heterogeneous computing so that applications will work across different platforms, automatically taking advantage of the best core for the chore.

Topic: Processors

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Sounds like the next logival step.....

    But only if it is adopted by all hardware companies. And only if the the hardware and software standards are set soon. But by the sound of it AMD can do this by being the first then eventually other companies will follow suit if it is successful.
    • Oops

      Meant Logical step
      • The biggest obsticle

        is software... this changes how software is created in a major way!!!! If MS, Apple, and some Linux distro's get onboard then this will happen!!!!
        • It comes to Standards.....

          Once these are set and they are reasonable enough that there are a lot of early adopters then we will see the software evolve and conform to this new approach to computing. It would do wonders for console computing.
      • "other companies will follow suit if it is successful."

        I predict that Microsoft et al will wait until Intel has included this technology in their new chips before they start supporting it. If I recall that is more or less what happened with 64 bit procs.
    • Not all

      For example, it only takes Apple to adopt it for iOS devices (ARM and Imagination are already members), and all mobile manufacturers will quickly follow.

      For the PC, it's enough for AMD to do it. Intel will follow. They will likely rename the technology, as they always do. There are already rumours about similar development.
  • But the question reamians, to what end?

    Lets say for arguments sake a PC based on this architecture is 25% faster. So what? Seriously, it's going to require a stop watch to see any real world difference to the end user. If you look at the tasks done on PC's a speed boost is "ok" but it doesn't really ring the bell for anyone.

    The truth is that if you have a quad core based PC the CPUs are at idle 95% of the time (or more). No, we don't need to make them faster, we need better software that takes on tasks not thought of or implemented before. A basic A.I. in the operating system would be a GREAT place to start.
    • Speed isn't the only factor......

      This can lead to smaller more powerful PC's that are energy efficient. It can lead to an all new PC revolution which can change the way we look at PC's. We can have a small efficient machine that is a gamers wet dream, but can also be used for work and other tasks with relative ease. I would love to see in the next 5 years a small form PC that is powerful enough to do graphics like your monster PC's but at half the heat and power usage.
      • Ummm...

        "This can lead to smaller more powerful PC's that are energy efficient."

        You don't know that to be the case and are grasping at straws. Secondly, all the players are reducing power consumption with their chips across the board. I just don't see this making anyone's eyes light up.
    • A basic A.I. in the operating system......

      Would not work because CPU's are still not able to process multi-thread efficiently, this is just adding more ability to the brain and allowing for better processing. And the software will become more efficient at using this new architecture.
      • Not true...

        "Would not work because CPU's are still not able to process multi-thread efficiently."

        The CPU's can do it just fine, software that does it well is rare. Fact is, writing threaded code takes a lot more work and most coders aren't willing to do it,,, yet.
  • one more thing

    This continued drive by AMD should result in the compiler being able to do a lot automatically like other optimizations such as MMX and SSE where it becomes transparent to the software developer. Of course, still being able to target it specifically and explicitly means the developer could do better at times.

    If they take it as far as I suspect, the GPU cores will basically become a " floating point processor". Those used to be separate from the CPU.
  • Cloud systems with hUMA will outperform those who don't!

    One factor that was not mentioned in the article is the Cloud and virtualization of hUMA. If we assume that hUMA processors can virtualize efficiently, then hUMA will be great! Cloud providers will soon realize that they can charge a bit more for Cloud sessions with hUMA.

    I think hUMA is a disruptive technology in Cloud systems.
    • A big assumption

      " If we assume that hUMA processors can virtualize efficiently..." that is a very big assumption and even it right we have no idea how it will compare. I think you are looking for a problem to fit the solution.
  • Software Limits

    Threads are still difficult to program and coordinate. I work mostly in C# and WPF and try to break things into threads when I feel energetic. Often I am just lazy. Someday there will be a really big break threw in programming languages that will make it easy. It is probably around now but I have not found it yet and I read several software web sites.
    • Threads are still evil.....

      Just not as evil as 5 years ago
    • Someday there will be a really big break thread in programming languages

      I think your being overly optimistic. Having worked with compilers and languages for much of my career I can say that the predictions that a parallel programming breakthrough is just around the corner has been being made for 50 years. While there have been improvements they are small incremental changes not fundamental breakthroughs.
  • History repeats itself.

    This is nothing more than what happened back in the 80s...

    Specialized attached processor capabilities migrated into the host CPU.

    The result was first seen in the Cray 1, though the process started much earlier.

    Now the RISC foundations of the GPU migrates into a CISC cpu.

    Just think how much faster a RISC cpu will become when the process moves to those architectures.