Nvidia has taken the wraps off its newest accelerator aimed at deep learning, the Tesla V100.
Developed at a cost of $3 billion, the V100 packs 21 billion transistors laid down with TSMC's 12 nanometre FinFET manufacturing process. The GPU has 5,120 CUDA cores and is claimed to have 7.5 TeraFLOPs for 64-bit precision and 15 TeraFLOPs for 32-bit. On the memory front, the GPU has 16GB of HBM2 RAM that has bandwidth of 900GB per second.
At the heart of the V100 is the new Tensor core that has a 4x4 main processing array that completes matrix multiplications in parallel, giving it 12 times the throughput of its Pascal architecture at certain precisions, Nvidia founder and CEO Jen-Hsun Huang said at GTC on Wednesday.
Huang said the V100 has 1.5 times the general purpose FLOPS compared to Pascal, a 12 times improvement for deep learning training, and 6 times the performance for deep learning inference.
"What was possible on Titax X in a few minutes is possible in a few seconds," Huang said.
Alongside the new GPU, the company is updating its DGX-1 box to pack 8 Telsa V100s at a cost of $149,000 to be delivered in the third quarter. The company also announced a new DGX workstation priced at $69,000.
General availability of the V100 is set for the fourth quarter of the year.
Disclosure: Chris Duckett travelled to GTC as a guest of Nvidia