AI industry watchers and ZDNet readers know Nvidia has become a juggernaut in the world of GPU-based deep learning and artificial intelligence (AI). And today, NVIDIA CEO Jensen Huang is using his keynote at the company's GPU Technology Conference (GTC) event to announce significant improvements to NVIDIA's GPU hardware and platform performance.
NVIDIA also has news in the area of autonomous vehicles and professional visualization (read: graphics) and my ZDNet colleague Asha McLean has all the details on those two fronts:
This post, meanwhile will focus on NVIDIA's AI/deep learning-related announcements, which fall into three areas:
- Upgrades to the Tesla V100 data center GPU platform (and hardware products based on it)
- Software improvements that boost performance of GPU-accelerated deep learning-based inference
- A new partnership with ARM that will integrate Nvidia Deep Learning Accelerator technology into ARM-based IoT chips
Double your pleasure
The really big news is in the core technology itself: Tesla V100 GPUs will now be equipped with 32GB of memory, which doubles the previous 16GB capacity. More memory means more larger, deeper deep learning models can be accommodated, which in turn means greater accuracy in the models' predictive power.
The new GPU technology is being made available immediately on Nvidia's own DGX systems. Additionally, OEMs Cray, HP Enterprise, IBM, Lenovo, Supermicro and Tyan will be rolling it out in their own products during the second quarter of the 2018 calendar year.
New GPU switching and a 2 PF server
A second major plank in the improved platform is the introduction of NVSwitch, a scaling up of Nvidia's NVLink interconnect technology, which Nvidia says offers 5x the bandwidth of even the best PCIe-based switches, allowing more GPUs to be interconnected. As a result, bigger data sets can be accommodated and neural nets can be trained in a parallelized fashion.
The 32GB GPU and NVSwitch announcements are impressive individually. But the technologies are being combined in an upgrade to Nvidia's DGX-1 server. The new server, dubbed (reasonably enough) DGX-2, sports 16 of the new V100 GPUs for a total of 512GB of memory, which all 16 GPUs share as a single memory space, thanks to NVSwitch. Nvidia says the DGX-2 is the world's first 2 PetaFLOP (2 x 10^15 floating point operations per second) system and "is purpose-built for data scientists pushing the outer limits of deep learning research and computing."
Beyond deep-learning applications, the new 32GB GPUs work well in a variety of of high-performance computing (HPC) scenarios. To that end, Nvidia is updating its CUDA, TensorRT, NCCL and cuDNN software as well as its Isaac robotics SDK.
The aforementioned TensorRT software, is a deep learning inference optimizer and runtime. TensorRT 4, the updated version, provides up to 190x faster performance than CPU-based inference, according to Nvidia.
The company has also been working to integrate TensorRT with the major deep learning frameworks, including Google's TensorFlow, Microsoft's CNTK, Facebook's Caffe 2 and other ONNX frameworks like Chainer, Apache MXNet and PyTorch. Nvidia and Google are announcing that they have integrated Tensor RT into TensorFlow 1.7, delivering 8x higher inference throughput compared to regular GPU execution.
A leg up, with ARM
The last piece of NVIDIA AI-related news concerns the company's new partnership with another chip industry heavyweight: ARM. Nvidia's Deep Learning Accelerator (NVDLA), which the company open sourced in October of last year, is being integrated into the ARM Project Trillium machine learning platform, and onto its chip designs for edge devices.
ARM doesn't make chips; rather, it licenses chip designs to chip manufacturers. Integration of NVDLA into Trillium means that a vast array of Internet of Things (IoT) devices stand to benefit from NVDLA's performance optimizations, which will facilitate high-performance AI in IoT and other edge devices. AI at the edge vastly reduces data movement and makes IoT devices more intelligent.
Bill of Materials
This post, as with previous ones I've written on Nvidia, is a quite the laundry list of new features and technologies. I prefer to imbue news-driven posts with a little more background and analysis than I've had time and room to provide here. That's somewhat of a reflection, though, of what's happening in GPU land: rapidly improving hardware, pioneering ecosystem partnerships, requiring a bunch of effort just to understand what all these new developments are, let alone what they mean.
Nvidia is wisely taking as much territory and market share as it can. Eventually, its primacy will be institutionalized and we'll be able to slow down and watch all this cool stuff get applied in Enterprise scenarios. Until then, we just have to keep up with the news.