Nvidia looks to the future of GPU computing

Summary: Nvidia chief executive Jen-Hsun Huang talks about his firm's role in the rise of parallel GPU computing and where the technology is heading

... not so well. You want to use the right tools for the job. The GPU is like the hammer. It's not a very elegant tool. It's not architecturally sophisticated, but it's like a jackhammer — it really powers through stuff.

You don't have to make the operating system or Excel run on the GPU, you just run it on the CPU. If you are Wall Street running this massive Monte Carlo simulation on a spreadsheet, you can get a plug-in that can run the simulation on our GPU and the spreadsheet still runs on the CPU — so you get the best of both worlds.

I believe heterogeneous computing is the way to go. You have a CPU that's becoming more and more vectorised. You have a GPU that's very, very parallel and is able to deal with more and more complex types of parallel tasks and they'll meet in the middle and some day all apps will simply run incredibly fast.

Virtual memory lets us make it easier for software developers to do their jobs. That's an ease-of-programming issue. We will add more and more features for GPUs to address ease of programmability. Memory coherence is another example. It would be nice for all apps if the first version just works. Not very fast but it just works. Then you can tear it apart and get more performance.

Right now Cuda is interesting in the sense that the app might not work at all. It doesn't work and then 'boom' — it's infinitely fast. It would be better if it works but is only three times as fast. Then you can work towards very fast.

How do you move more of the power of GPUs from workstations and supercomputers and make it more generally available, through the cloud or in the datacentre?
The GPU is better at one application. One reason is because we are so stateful. Our pipeline, the amount of data streaming through our GPUs, doesn't compare with a CPU. We just have so much state inside our processors. We are running a million threads today — there's a lot of threads inside these processors that have to be kept coherent.

Read this

HP creates GPU server for high-performance needs

The new SL390s blade comes with up to three graphics processors and plugs into a modular SL6500 chassis, to provide a scalable system targeted at HPC users

Read more+

On our roadmap we have pre-emption and virtualisation of memory. Those techniques are vital to the era where you have multiple applications on one GPU. Today we have one large app on many GPUs. In the future we'll go the other way. You'll be able to do both — you'll be able to mix and match.

You should be able to have an enterprise server with Tesla inside and you could have that one Tesla serve up simultaneously a G4 session for a gamer, a Quadro session for a car designer and a Tesla session for someone who's doing high-performance computing — and any combination of that mixture.

That's the future server architecture we imagine; something that's not only able to do computing but able to do visualisation and parallel computing, all in the private cloud, and serve up a compressed high-quality image to your desktop or your tablet or your phone.

Whether it's in the cloud or inside the computer, how are you going to deal with the bandwidth issues to keep GPU computing efficient as you scale up?
There is an enormous challenge in computing, which is just moving data around. It's an enormous challenge for us because we are crunching through data so fast. This is a classic computer graphics problem. Moving the data is just evil — the answer is: don't. So you need to figure out a way to move data as little as possible.

In computer graphics, the traditional APIs of the past, the ones that all failed are the ones that moved data back and forth. They're all dead. We want the parallel computing environment that streams the data to the right place so that the processors can all access that large memory space, and move it around as little as we can. Conceptually that's what we need to do.

Some of the things that we are already working on, say, with InfiniBand, we want to feed directly into our GPU or we want to DMA into our GPU so that you don't copy into system memory and then copy back out from system memory. So you want to figure out a way to move data as little as possible and now that you've moved it as little as you can, you just need to move it as fast as you can. There is just no replacement for terabytes per second.

Topics: Apps, Software Development

Mary Branscombe

About Mary Branscombe

Mary Branscombe is a freelance tech journalist. Mary has been a technology writer for nearly two decades, covering everything from early versions of Windows and Office to the first smartphones, the arrival of the web and most things inbetween.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

2 comments
Log in or register to join the discussion
  • Just put a GPU socket straight onto the mainboard, and up the main system ram to GDDR5, and then eventually move the GPU's into the main CPU's, do away with waisted space for separate PCIe x16 slots.

    It would also be far cheaper to just produce GPU's than it currently is to produce GPU's on separate PCB boards requiring separate memory types & components, not to mention the environmental benefits overall.

    Of course an agreed upon standard for socket type would have to be agreed amongst the GPU & mainboard manufacturers a like for it gain accredited standard, whilst still allowing end user's sustained various upgrade paths rather than the current complete component replacement methods, which again is far more environmentally friendly.

    Overall this should also allow for better working collaborations within the fields of mainboard/memory design & architectural boundaries.
    CA-aba1d
  • The GPU/CPU interface is certainly evolving; there are interesting systems like Optimus that use PCIe directly between the CPU and GPU. But I'm not sure we'll see that much agreement on sockets especially as the technology is developing so fast...
    M
    Simon Bisson and Mary Branscombe