New University of Edinburgh supercomputer powered by Nvidia

Nvidia announces powering a new system for a Scottish university and a handful of updates to its HPC portfolio as part of MWC 2021.

The University of Edinburgh has received a new high-performance computing platform, dubbed Tursa, which has been optimised for computational particle physics.

The new system is powered by the Nvidia HGX high performance computing platform and is the third of four DiRAC "next-generation" supercomputers announced.

DiRAC is the United Kingdom's integrated supercomputing facility for theoretical modelling and HPC-based research in astronomy, cosmology, particle physics, and nuclear physics. It will run the Tursa system.

Tursa will allow researchers to carry out the ultra-high-precision calculations of the properties of subatomic particles needed to interpret data from massive particle physics experiments, such as the Large Hadron Collider.

"Tursa is designed to tackle unique research challenges to unlock new possibilities for scientific modelling and simulation," professor of theoretical physics at the University of Edinburgh and project lead for the DiRAC-3 deployment Luigi Del Debbio said.

Tursa is built with Atos and will feature 448 Nvidia A100 Tensor Core GPUs and include four Nvidia HDR 200Gb/s InfiniBand networking adapters per node.

In announcing the latest from its partnership with DiRAC, Nvidia also used Mobile World Congress to say it was "turbocharging" the Nvidia HGX AI supercomputing platform, banking on its combination of fusing AI with HPC to break into further industries.

"HPC is going everywhere, AI is going everywhere, every enterprise in the world will use supercomputing to accelerate their businesses," Gilad Shainer, Nvidia senior vice president of networking, told media.

"Supercomputing [is] serving more and more applications … managing the supercomputer therefore becomes much more complicated. You need to bring security into supercomputing because you need to isolate the users, isolate between the applications, protect between the users, you need to protect data."

Nvidia has added three technologies to its HGX platform: The Nvidia A100 80GB PCIe GPU, Nvidia NDR 400G InfiniBand networking, and Nvidia Magnum IOTM GPUDirect Storage software.

image-nvidia-a100-80gb-pcie.jpg

Nvidia A100 80Gb PCle

Image: Nvidia

The Nvidia A100 Tensor Core GPUs, the company said, deliver "unprecedented HPC acceleration" to solve complex AI, data analytics, model training, and simulation challenges relevant to industrial HPC. A100 80GB PCIe GPUs increase GPU memory bandwidth 25% compared with the A100 40GB, to 2TB/s, and provide 80GB of HBM2e high-bandwidth memory.

"When we build a supercomputer, it's always about performance … but this is where we start hitting a major problem," Shainer said. "The way to solve it is to use the GPU … bring the GPU into the supercomputer and use the GPU to run all the infrastructure management … from the CPU."

Nvidia partner support for the A100 80GB PCIe includes Atos, Cisco, Dell Technologies, Fujitsu, H3C, HPE, Inspur, Lenovo, Penguin Computing, QCT, and Supermicro. The HGX platform featuring A100-based GPUs interconnected via NVLink is also available via cloud services from Amazon Web Services, Microsoft Azure, and Oracle Cloud Infrastructure.

Nvidia NDR 400G InfiniBand networking, meanwhile, is touted as scaling performance to tackle the massive challenges in industrial and scientific HPC systems.

"Those systems drive our bandwidth to the next level. We're moving the data centre from running on 200Gb/s to 400Gb/s to be able to move data quicker, to be able to feed the GPUs in order to increase what we can do," Shainer said.

Nvidia Quantum-2 fixed-configuration switch systems deliver 64 ports of NDR 400Gb/s InfiniBand per port, or 128 ports of NDR200, providing three-times higher port density versus HDR InfiniBand, he explained.

The Nvidia Quantum-2 modular switches, he continued, provide scalable port configurations up to 2,048 ports of NDR 400Gb/s InfiniBand -- or 4,096 ports of NDR200 -- with a total bidirectional throughput of 1.64 petabits per second. The 2,048-port switch provides 6.5x greater scalability over the previous generation, with the ability to connect more than a million nodes.

The switches are expected to sample by year-end. Infrastructure manufacturers such as Atos, DDN, Dell Technologies, HPE, and Lenovo are also expected to integrate the Quantum-2 NDR 400Gb/s InfiniBand switches into their enterprise and HPC offerings.

Described as providing "unrivalled performance for complex workloads", Magnum IO GPUDirect Storage, Nvidia said, enables direct memory access between GPU memory and storage.

"The direct path enables applications to benefit from lower I/O latency and use the full bandwidth of the network adapters while decreasing the utilisation load on the CPU and managing the impact of increased data consumption," Nvidia said.

Nvidia and Google Cloud also announced plans at Mobile World Congress to establish an AI-on-5G innovation lab.

The pair are touting it as an opportunity for network infrastructure players and AI software partners to develop, test, and adopt solutions that will "help accelerate the creation of smart cities, smart factories, and other advanced 5G and AI applications".

The company also announced its "next-generation" Aerial A100 AI-on-5G computing platform will incorporate 16 Arm-based CPU cores into the Nvidia BlueField-3 A100.

LATEST FROM NVIDIA