AMD aims to simplify GPGPU programming with HSA and hUMA

AMD aims to simplify GPGPU programming with HSA and hUMA

Summary: Combining the power of the CPU and GPU offers GPGPU solutions, but these are cumbersome for developers, since the CPU and GPU use different memory pools. AMD plans to eliminate this burden by using another acronym: hUMA.

TOPICS: Processors

Chip maker AMD is looking to aid those wishing to make use of GPGPU by throwing more acronyms at them in the forms of HSA and hUMA.

While CPUs are great when it comes to processing single-threaded code with branches, they're not so good when it comes to parallel operations. However, it just so happens that GPUs are great at crunching through parallel operations, but weak when it comes to processing single-threaded code. This has given rise to the general purpose GPU (GPGPU) that has been designed to offer the best of both worlds.

While GPGPU offers the best of both worlds in terms of processing power, it does have a drawback – it's not easy to leverage. Specifically, addressing memory is cumbersome, because while the CPU and GPU may share the same physical memory chips, they have their own pools of memory. This means that data has to be copied back and forth between the CPU and GPU, which is not only wasteful in terms of processing power, but also a massive code overhead.

AMD wants to eliminate this burden with a new system architecture called Heterogeneous Systems Architecture (HSA), and at the core of that is "heterogeneous Uniform Memory Access", also known as hUMA (as if we didn't have enough acronyms already).

(Image: AMD)

Boiled down to its simplest terms, hUMA allows both the CPU and GPU to share the same chunk of memory, and this in turn makes the hardware simpler, which makes it easier for developers to leverage GPGPU.

The first AMD hardware to support hUMA will be the upcoming Kaveri APUs. These will feature Steamroller processing cores, and are expected to make an appearance during the second half of 2013.

(Image: AMD)
(Image: AMD)

Even better for developers is the news that hUMA will be supported by mainstream programming languages such as C++ and Java.

hUMA is expected to find its way into a broad range of hardware, from servers to games console. In fact, an interview with the PlayStation 4 lead architect Mark Cerny, he suggested that Sony's upcoming console may make use of this technology.

(Image: AMD)

Topic: Processors

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • The memory for GPUs is different . . .

    The memory for GPUs is different . . .

    Generally speaking, graphics cards use dedicated memory for two reasons:

    1) To take the load off of main memory. When using a GPU intensive app, it may use a lot of memory to store all of the vertices and textures. With a very realistic game, that means there's little room in memory for much else.

    2) Memory on graphics cards is generally GDDR, not standard DDR. GDDR is generally more expensive, but faster, as it needs to keep up with realtime graphics involving millions of polygons and large textures.

    Normally for graphics, having separate memory isn't a problem: The next step for graphics after the GPU is done with them is generally the monitor, not the CPU. The CPU generally cares very little about what's done after it hands the information to the GPU.

    However, there are some cases where things may be handed back to the CPU: Physics and scientific computing come to mind. In these cases, shared memory may make some sense.

    To me, "hUMA" sounds like a variation of shared memory - not necessarily anything new. Intel has been doing this for years. And to be honest, Intel has not impressed anybody with its results - it constantly underperforms even the slowest dedicated video cards.

    "In fact, an interview with the PlayStation 4 lead architect Mark Cerny suggests that Sony's upcoming console may make use of this technology."

    What's interesting is that the PlayStation 4 is not using standard DDR, as Intel generally does with its dedicated graphics. The PlayStation 4 is using GDDR, and such a thing has never, to my knowledge, been done with a CPU.

    This IMO makes the Playstation 4's performance a big wildcard. Nobody knows how this will turn out. It's never been tried before.

    It could go one of two ways:

    1) The GDDR boost makes it a screaming fast machine, and makes it a great gaming console.

    2) The fact that it's a type of shared memory inherits the drawbacks of shared memory, and it underperforms.

    Which way it'll go, nobody can tell you: This is literally something that hasn't been tried before. I hope it succeeds, but only time will tell.
    • Re: nobody can tell you

      You might not, but some of us do have a clue. The thing will scream.

      Unifying CPU and GPU memory access and cache coherency is good idea. Already done in the ARMs big.LITTLE architecture and planned for Intel CPUs. The current idea how to improve performance.

      The next step will be external GPU interconnect different that PCI-Express, for example something like HyperTransport. Not trivial, but doable. Technologies like hUMA will only make this task more challenging.
      • I'd need to see a benchmark.

        "You might not, but some of us do have a clue."


        I'd need to see a benchmark.

        "Unifying CPU and GPU memory access and cache coherency is good idea."

        It's a good idea for many applications. But being that memory is now shared, things are competing for it. Intel does something similar with integrated graphics (uses system memory for graphics), and nobody has been impressed with the performance.

        Hopefully the hUMA way is better than the integrated graphics way.

        "The next step will be external GPU interconnect different that PCI-Express, for example something like HyperTransport."

        Dunno what people have against PCI-E. v4.0 will actually be faster than HyperTransport.

        I've never had issues with bus speed being a bottleneck when gaming. I imagine for scientific research it could be an issue, but not gaming.
    • Re: The memory for GPUs is different . . .

      That's less of an issue. What's different here is the unified virtual address space between CPU and GPU, so code running on both can exchange pointer values and have them point to the same thing. That should simplify programming enormously.

      With that complexity problem out of the way, THEN we can deal with the performance issues of working with different kinds of memory.
      • The problem isn't hardware

        What many people overlook is that the hardware architecture is not the whole story. Utilizing GPU hardware for anything other than what it is designed to optimize is an expensive process fraught with trial-and-error coding and testing against a single device at a time.

        CPU/GPU hybrids which consume DDR and GDDR memory simultaneously and share pointers are a nice approach to leveraging transistor shrinking to reduce communication across the PCIe bus. This communication overhead is the biggest impediment for using GPUs generally for science - you must get your data there and back again over a "slow" bus. The devices are bandwidth limited and the large cache structure existing on the CPU doesn't exist on the GPU to make memory accesses O(1) amortized. However, AMD has failed to deliver a viable software ecosystem for GPGPU, and has reset its efforts multiple times now. First there was Brook+ and then CAL/IL and OpenCL (what I spent 2+ years working with), and now HSA (renamed from FSA).

        AMD's GPGPU problem is that its management keeps laying off teams of engineers who depart for Intel and NVIDIA. Don't get me wrong - the fusion chips are a good x86 CPU and a good Radeon GPU on one chip priced to sell. My conversations with some loose-lipped AMD employees have indicated that the big 3 console manufacturers are sold on the Fusion CPUs for the next generation of game consoles, and they should be. Reducing CoGS and delivering a previous generation high-end-PC worth of power with all the bells and whistles of a high end 3D graphics in a fan-less space for a mass-market price is a tough requirement to fill. But it's *not* GPGPU. AMD doesn't play in that space.

        NVIDIA CUDA is GPGPU. Where the AMD HSA falls down is that there are no 2+ socket motherboards with fusion devices. You can't link 8 of the Fusion chips over HTX (hypertransport). Whereas you can put up to 8 PCIe GPUs (AMD or NVIDIA) on a PCIe bus with 2 CPUs or up to 4 PCIe GPUs with 4 to 8 CPUs (the CPU bus networks vary in bandwidth and different MB vendors have different amounts of practicality). GPGPU is about throughput per Unit of server rack space. You have PCIe-PCIe DMA between NVIDIA CUDA GPUs starting with the 2+ year old Fermi generation (CUDA CC 2.0). With the CC 3.5 Kepler you also have kernel launches from kernels, paving the way to generate dynamic workload from these devices in really interesting ways, and solving graphics problems as well -- how do I delegate dynamic tesselation to the GPU on a large scale.

        AMD is still *talking* about multiple devices sharing pointers. NVIDIA delivered this 2 generations of product ago with "CUDA GPUDirect 2.0". As for density of GPUs. The TITAN supercomputer using Kepler GPUs has them mounted directly on the motherboards in a special form factor. NVIDIA CUDA focused on getting the most out of GPUs by giving more access to the underlying hardware to software developers - more access, more tools, more pre-built solutions, and huge investments early on. They paid off big time. Tesla GPUs (the server GPGPU) are priced at over 2x the equivalent AMD Server GPU, and people can't get enough of them.

        If AMD wants to win or even compete at GPGPU, at this point it has to clone the instruction set / high level interface of the NVIDIA CUDA stack. They have just as good hardware at a cheaper price, but no software legacy to run. NVIDIA has the GPGPU software ecosystem. The gamers don't care about GPGPU. Scientists using GPGPU only care about getting results. NVIDIA's OpenCL efforts were still-born because it was an internal conflict of interest to waste resources to make a half-baked competitor to their CUDA offerings that would never do anything but dilute their market.
        Mike Fried
  • New vector for attack

    Does this mean that we (but not me, I don't program drivers!) will need to make sure our video drivers are extra-hardened, as the GPU will have direct access to the same memory as the CPU?
    x I'm tc
    • Re: New vector for attack

      Don't worry. Nothing new will happen. For ages, any peripheral that can do DMA had unrestricted access to the "same" memory as the CPU. Did you trust those drivers? A driver has unrestricted access to the CPU nevertheless, especially in Windows --- so even a keyboard driver can do bad things.
  • Education To Help Those...

    ...with no sense of hUMA.