Nvidia announces launch of TensorRT 8 designed for chatbots, recommendations, and search

The eighth generation of Nvidia's AI software is able to cut inference time in half for language queries.
Written by Jonathan Greig, Contributor

Nvidia unveiled the eighth generation of its widely used TensorRT on Tuesday, announcing that the AI software is twice as powerful and accurate as its predecessor while cutting inference time in half for language queries.

Tensor RT is used by hundreds of companies for things like search engines, ad recommendations, and chatbots. Siddharth Sharma, head of the product marketing team for Nvidia's AI software, told reporters on Monday that it has been downloaded more than 2.5 million times and is in use by companies like American Express, Verizon, LG, Ford, SK Telecom, KLA, Naver, GE Healthcare and USPS. 

"TensorRT 8 is twice as powerful as 7, twice as accurate as TensorRT 7, and it supports sparsity which can dramatically reduce the amount of compute and memory needed for running applications," Sharma said.

"With this achievement, you can now deploy the entire Bert-Large within a millisecond. That is huge and I believe that is going to lead to a completely new generation of conversational AI applications. A level of smartness, a level of latency that was unheard of before."

Sharma explained that TensorRT 8's optimizations also allow for "record-setting speed for language applications, running BERT-Large, one of the world's most widely used transformer-based models, in 1.2 milliseconds."

"In the past, companies had to reduce their model size which resulted in significantly less accurate results. Now, with TensorRT 8, companies can double or triple their model size to achieve dramatic improvements in accuracy," Sharma added. 

TensorRT 8 is now available and free of charge to Nvidia Developer program members. The TensorRT GitHub repository also has the latest versions of plug-ins, parsers, and samples.

Greg Estes, vice president of developer programs at Nvidia, said AI models are growing exponentially more complex, and worldwide demand is surging for real-time applications that use AI. 

The latest version of TensorRT, Estes said, introduces new capabilities that enable companies to deliver conversational AI applications to their customers "with a level of quality and responsiveness that was never before possible."

Over the last five years, Nvidia said that more than 350,000 developers across 27,500 companies have used TensorRT, and Estes noted that TensorRT applications "can be deployed in hyperscale data centers, embedded or automotive product platforms."

Sharma told reporters that TensorRT 8's unique AI inference was made possible through Sparsity and Quantization, two key features that increase efficiency and allow developers to use "trained models to run inference in INT8 precision without losing accuracy."

GE Healthcare uses TensorRT in computer vision applications for ultrasounds, and Erik Steen, chief engineer of Cardiovascular Ultrasound at GE Healthcare, said the tool was vital in helping clinicians move faster. 

"When it comes to ultrasound, clinicians spend valuable time selecting and measuring images. During the R&D project leading up to the Vivid Patient Care Elevated Release, we wanted to make the process more efficient by implementing automated cardiac view detection on our Vivid E95 scanner," Steen said.

"The cardiac view recognition algorithm selects appropriate images for analysis of cardiac wall motion. TensorRT, with its real-time inference capabilities, improves the performance of the view detection algorithm and it also shortened our time to market during the R&D project."

Editorial standards