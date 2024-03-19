Nvidia co-founder and CEO Jensen Huang held up the new Blackwell GPU chip, left, to compare to its predecessor, H100, "Hopper." Nvidia

Nvidia CEO Jensen Huang on Monday presided over the AI chipmaker's first technology conference held in person since the COVID-19 pandemic, the GPU Technology Conference, or GTC, in San Jose, California, and unveiled the company's new design for its chips, code-named "Blackwell."

Many consider GTC to be the "Woodstock of AI" or the "Lalapalooza of AI." "I hope your realize, this is not a concert," Huang said following big applause at the outset. He called out the vast collection of partners and customers in attendance.

"Michael Dell is sitting right there," Huang said, noting the Dell founder and CEO was in the audience.

Huang emphasized the scale of computing required for training large language models of generative AI, or, GenAI. A model that has trillions of parameters, combined with training data that is trillions of "tokens," or word-parts, would require "30 billion quadrillion floating point operations," or 30 billion petaFLOPS, Huang noted. "If you had a petaFLOP GPU, you would need 30 billion seconds to go compute, to go train that model — 30 billion seconds is approximately 1,000 years."

"I'd like to do it sooner, but it's worth it — that's usually my answer," Huang quipped.

Huang opened his presentation with an overview of the increasing size of AI workloads, noting that the most powerful chips would spend 30 billion seconds, or 1,000 years to train. Nvidia

Nvidia's H100 GPU, the current state of the art chip, delivers on the order of 2,000 trillion floating-point operations per second, or, 2,000 TFLOPS. A thousand TFLOPS is equal to one petaFLOP, ergo, the H100, and its sibling, H200, can manage only a couple of petaFLOPS, far below the 30 billion to which Huang referred.

"What we need are bigger GPUs — we need much, much bigger GPUs," he said.

Blackwell, known in the industry as "HopperNext," can perform 20 petaFLOPS per GPU. It is meant to be delivered in an 8-way system, an "HGX" circuit board of the chips.

Using "quantization," a kind of compressed math where each value in a neural network is represented using fewer decimal places, called "FP4," the chip can run as many as 144 petaFLOPs in an HGX system.

The chip has 208 billion transistors, Huang said, using a custom semiconductor manufacturing process at Taiwan Semiconductor Manufacturing known as "4NP." That is more than double the 80 billion in Hopper GPUs.

The Nvidia Blackwell GPU multiplies ten-fold the number of floating-point math operations per second and more than doubles the number of transistors from the predecessor "Hopper" series. Nvidia notes the ability of the chip to run large language models 25 times faster. Nvidia

Blackwell can run large language models of generative AI with a trillion parameters 25 times faster than prior chips, Huang said.

The chip is named after David Harold Blackwell, who, Nvidia relates, was "a mathematician who specialized in game theory and statistics, and the first Black scholar inducted into the National Academy of Sciences."

The Blackwell chip makes use of a new version of Nvidia's high-speed networking link, NVLink, which delivers 1.8 terabytes per second to each GPU. A discrete part of the chip is what Nvidia calls a "RAS engine," to maintain "reliability, availability and serviceability" of the chip. A collection of decompression circuitry improves performance of things such as database queries.

Amazon Web Services, Dell, Google, Meta, Microsoft, OpenAI, Oracle, Tesla, and xAI are among Blackwell's early adopters.

Like its predecessors, two Blackwell GPUs can be combined with one of Nvidia's "Grace" microprocessors to produce a combined chip, called the "GB200 Grace Blackwell Superchip."

Thirty-six of the Grace and 72 of the GPUs can be combined for a rack-based computer Nvidia calls the "GB200 NVL72" that can perform 1,440 petaFLOPS, getting closer to that billion petaFLOPs Huang cited.

A new system for the chips, the DGX SuperPOD, combines "tens of thousands" of the Grace Blackwell Superchips, boosting the operations per second even more.

Alongside Blackwell, Nvidia made several additional announcements:

New generative AI algorithms to enhance its existing library of semiconductor design algorithms known as "cuLitho," referring to photolithography used in the semiconductor design process. The GenAI code generates an initial "photomask" for lithography, which can then be refined by traditional methods. It speeds up design of such photomasks by 100%. TSMC and chip-design software maker Synopsys are implementing cuLitho and the new GenAI functions into their technologies.

A new line of network switches and network interface cards based on the InfiniBand technology developed by Nvidia's Mellanox operation, the "Quantum-X800 Infiniband," and the ethernet networking standard, the "Spectrum-X800 Ethernet." Both technologies deliver 800 billion bits per second, or 800Gbps. Nvidia says the switches and NICs are "optimized for trillion-parameter GPU computing" to handle the speed of floating-point operations of the chips.

A catalog of 25 "micro services," cloud-based application container services software, pre-built for individual applications, including custom AI models, built on top of Nvidia's "NIM" container software suite, which is in turn part of the company's AI Enterprise software offering. The programs are what the company describes as a "standardized path to run custom AI models optimized for Nvidia's CUDA installed base of hundreds of millions of GPUs across clouds, data centers, workstations and PCs." The micro services include a bundle of life sciences-focused, some dedicated to "generative biology" and chemistry and "molecular prediction" tasks, to perform "inference," the generation of predictions, "for a growing collection of models across imaging, medtech, drug discovery, and digital health." The micro services are made available through Dell and other vendors' systems, through public cloud services including AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, and they can be trialed on Nvidia's own cloud service.

Earth-2, a separate micro service designed as a "digital twin" simulation of extreme weather conditions, intended to "deliver warnings and updated forecasts in seconds compared to the minutes or hours in traditional CPU-driven modeling." The technology is based on a generative AI model built by Nvidia called "CorrDiff," which can generate "12.5x higher resolution images" of weather patterns "than current numerical models 1,000x faster and 3,000x more energy efficiently." The Weather Company is an initial user of the technology.

A high-res earth image simulation from a "digital twin" simulation of extreme weather conditions, called Earth-2 climate, intended to "deliver warnings and updated forecasts in seconds compared to the minutes or hours in traditional CPU-driven modeling." The technology is based on a generative AI model built by Nvidia called "CorrDiff," which can generate "12.5x higher resolution images" of weather patterns "than current numerical models 1,000x faster and 3,000x more energy efficiently." The Weather Company is an initial user of the technology. Nvidia

In addition to the product and technology announcements on its own, Nvidia announced a number of initiatives with partners:

A collaboration with Oracle for "sovereign AI" to run AI programs locally, "within a country's or organization's secure premises."

A new supercomputer for Amazon AWS built from DGX systems running the Blackwell chips, called "Ceiba."

A partnership with Google Cloud to extend the JAX programming framework to the Nvidia chips, "widening access to large-scale LLM training among the broader ML community."

You can catch the entire keynote address on replay on YouTube.