Nvidia’s annual GPU Technology Conference began as a technical gathering on using graphics processors for highly-specialized computing chores such as oil and gas exploration, options pricing and simulations. This remains a big focus for the company but GTC, which kicked off today, has grown into a full-fledged convention on the use of Nvidia’s visual computing technology in everything from smartphones to supercomputers.
This was illustrated by CEO Jen-Hsun Huang’s wide-ranging opening keynote, which featured a slew of new product announcements. Highlights included an updated roadmap for its GPUs and Tegra mobile processors, a new “tiny, little computer” called Kayla, and Nvidia’s first-end-to-end server system designed to replace dedicated workstations. Overall it was an impressive show with only two notable omissions: the announcement of any smartphone or tablet design wins and barely a mention of the upcoming Shield Android-based portable gaming device.
Huang began with the , which he described as “the largest semiconductor device, the most complex semiconductor device ever made.” He noted that it was hard to find in stores because they were having a hard time keeping up with demand for the $1,000 graphics card.
To demonstrate Titan’s performance, Huang showed two new Nvidia technologies designed to make simulations more realistic. Waveworks is a real-time ocean simulation that uses wind speed (the Beaufort scale) to make the scene more realistic. While simulating the ocean is hard, simulating a face is even harder, he said. Faceworks takes a massive video library of facial expressions (created in partnership with the Institute for Creative Technology at USC) and boils it down to a smaller set of 3D “meshes” that can be rendered in real-time on a GPU. To illustrate it, Huang engaged in a “conversation” with a 3D model, named Ira, and the facial expressions were impressively natural looking. But it also requires a lot of horsepower—about 40,000 operations per pixel at a rate of 60Hz. “You get something along the lines of about two teraflops, which is one of the reasons we built Titan so you can do something like this,” Huang said.
More realistic games are great, but there are lots of other potential applications for Faceworks. Huang joked that all famous people ought to have these Faceworks models. “Can you imagine if Lincoln had done this and we could sit here and talk with him?” he mused. The technology could, in theory, also be used for teleconferencing, animating life-like avatars based on the words spoken.
From graphics, Huang shifted to GPU computing. He said that about a decade ago Nvidia came to the conclusion that computer graphics was the perfect vehicle for GPU computing because it is inherently parallel and because there was already a large installed base of PCs with discrete GPUs (“it already has a day job,” as he like to put it). Since then, GPU computing has grown rapidly. As he does each year, Huang rattled off the latest CUDA stats:
- 430 million CUDA-capable GPU shipped
- 1.6 million CUDA programming kit downloads
- 640 universities teaching CUDA courses
- 37,000 academic papers
“It is very clear that we are close to the tipping point,” he said.
Nvidia GPUs are now used in 50 supercomputers around the world. The Oak Ridge National Laboratory’s Titan supercomputer recently did the world’s largest solids mechanical simulation using 40 million CUDA processors to deliver 10 petaflops of sustained performance. This week the Swiss supercomputing center announced it will adopt Nvidia GPUs to build Europe’s fastest supercomputer for weather simulation. Huang talked about the scientific uses of GPU computing including physics, genomics, gigapixel photo arrays, materials simulations, Alzheimer’s research and centrifuge analysis. He said there were more than 400 papers at GTC this year including emerging areas such as image processing and manufacturing.
GPUs are also being used to solve so-called big data problems. Twitter receives 500 million tweets per day and “smart companies such as Salesforce.com” are scanning this social data for companies like Cisco, Dell or Gatorade who want to know how their brands are perceived. Salesforce.com has about million keywords or expressions that they look for continuously, Huang said, and by porting this to the GPU the company made the process much faster. “What used to take minutes now takes seconds,” he said. “As a results they can really scale out this service.” Jason Titus, CTO of Shazam, talked about how the service used GPUs to handle 10 million searches per day, match them with its 27-million song database, and deliver an answer quickly. Though he provided few details, Titus said GPUs allowed them to handle queries faster at a third of the cost.
One of the more interesting demos was visual search. A company called Cortexica has developed a service that uses computer vision algorithms and cloud-based servers, powered by GPUs, to do recognize images and return results for similar items. Huang and Mike Houston, also from Nvidia, demonstrated the service by using a tablet to snap a photo of a page from InStyle magazine (an image of Kate Hudson), search through 800,000 clothing items on eBay, and return tops with similar colors and patterns. Noting that every minute 72 hours of video is uploaded to YouTube, Huang said that GPUs will enable companies to sift through all of this data for certain images to protect their copyrights, for example.
The most eagerly anticipated sections of the keynote were the roadmap updates. The Kepler GPU architecture, which Nvidia introduced in 2012, will be replaced next year by Maxwell, the company’s first GPU with a unified memory architecture, meaning the CPU and GPU will be able to see all of the system’s memory. “All memory is visible to all the processors. It will just make it a lot easier for you to program,” he said. Maxwell will be followed, in 2015, by Volta. All current GPUs have their own memory placed alongside the GPU on a circuit board, but Volta will have multiple layers of memory stacked directly on top of the GPU. This means graphics cards can be smaller and more energy-efficient, but more important it will increase the memory bandwidth to a whopping one terabyte per second. Huang said this is the equivalent of moving the entire contents of a Blu-ray disc from memory to the GPU in about 1/50th of a second.
The reason chipmakers aren’t already using 3D stacking is because it is difficult to wire them all together (using what’s called through-silicon vias) and because it is very tough to dissipate all of the heat generated between the layers. It will be interesting to see if Nvidia and its manufacturing partners, such as foundry TSMC (Taiwan Semiconductor Manufacturing Company) can overcome these issues in only a couple more years,
Nvidia also had some news on its mobile roadmap. Tegra 4, a quad-core Cortex-A15, is shipping now and the Tegra 4i, with a quad-core A9 and integrated 4G LTE, was announced at Mobile World Congress and will be in production by the end of this year. This will be followed by Logan, Nvidia’s first mobile processor with its most advanced GPU architecture, a true Kepler GPU that supports CUDA and OpenGL 4.3. “We should see Logan this year and we should see it in production very easily next year,” he said. The next generation, Parker, will be the first to offer the company’s Project Denver 64-bit ARM processor and a Maxwell GPU, and it will be manufactured using 3D FinFET transistors. Intel is already using FinFETs in its 22nm Ivy Bridge processors, but the semiconductor foundries that manufacture mobile processors for Nvidia and others do not yet offer this technology. Parker is scheduled for 2015. “In five years’ time, we will increase the performance of Tegra by 100 times,” Huang promised.
Huang also announced Kayla, a small computer on a board with a Tegra 3 processor and a new Kepler GPU. He demonstrated Kayla handling intensive tasks such as ray-tracing and smoke and water simulations, but gave no details on availability or pricing.
Last year Nvidia introduced its GRID GPU server virtualization technology for cloud-based gaming services. The company followed this up with a version for enterprise servers, which Nvidia said, is now in production for Citrix, Microsoft and VMWare; certified for use with servers from all of the big names (Cisco, Dell HP and IBM), and currently in 75 trial deployments. For example, Applied Materials, one of the world’s largest suppliers of chip-making equipment, is using Nvidia’s GRID servers allow engineers to use its CAD applications from any device, at any location.
There are lots of small and medium-size businesses around the world who can’t work this way, Huang said. They don’t have an IT department or racks of servers; instead they buy their computer equipment from the Apple store. But like large organizations they have multiple users who need to perform computationally-intensive work on large data files. To reach that audience, Nvidia announced an appliance server—it’s first end-to-end system—called GRID Virtual Computing Appliance (VCA). The VCA is a 4U chassis that contains Xeon server processors, multiple GRID Kepler-based GPUs, lots of memory and a hypervisor that can support at least 16 virtual machines. The client device connected to it needs only a downloadable GRID client and the “It doesn’t matter if it’s a Mac, if it’s a PC or even if it’s an Android thin client,” Huang said. “The VCA in the back does all of the work and the users all think they have their own personal supercomputer.”
The idea is to replace physical workstations, each of costs thousands of dollars with a single appliance that can be accessed from any device, anywhere. Nvidia demonstrated this on a MacBook Pro connected to a VCA simultaneously running Autodesk 3D Studio Max, Adobe Premiere (editing a 4K movie in real-time) and SolidWorks, a highly popular CAD program. Two of these programs don’t even run on the Mac, but of course it doesn’t matter since it is a virtualized environment.
The base VCA appliance, with a Xeon processor with eight cores (16 threads), eight Kepler GPUs, and 192GB of memory, will cost $24,900 plus a $2,400 software license fee for unlimited clients. The Max model, with two Xeons (32 threads), 16 Kepler GPUs and 384GB of memory will be $39,900 plus an annual license fee of $4,800.
Huang predicted that, thanks to virtualization, in the future the company computer will become a thing of the past, much like the company car has. Instead employees will bring their own computing devices with them and much of the processing will be done in the cloud.
The keynote concluded with a little Hollywood sizzle. Huang talked about the animation in the movie Life of Pi, where some 80 percent of the shots of the tiger are computer generated. Each frame took 30 hours to render, he said, and in all it took several hundred million hours of CPU time to render the entire move. Josh Trank, a director (The Fantastic Four, Chronicle) and Jules Urbach, the founder and CEO of Otoy, a company that provides cloud-based rendering for Hollywood, showed how an ordinary laptop connected to remote GRID servers in Los Angeles could be used to make edits and render “fill-quality” scenes in real-time. Urbach said that by using GPUs, rather than CPUs, filmmakers can render computer animation 40-100 times faster.