AMD rolls out Instinct MI200 GPUs for HPC and AI workloads

The chipmaker is also taking the wraps off Milan-X, its first server CPU with 3D Chiplet technology.

AMD on Monday unveiled the Instinct MI200 accelerator, the latest generation of its data center GPU. The chipmaker says it's the fastest HPC and AI accelerator, surpassing records set by the MI100, rolled out last year.  

The Instinct MI200 delivers up to a 4.9x boost in high-performance computing than existing data center GPUs, AMD says. The company also claims it's the fastest for AI training, delivering up to 1.2x higher peak flops for mixed-precision performance.

The accelerator contains 58 billion transistors produced with 6nm technology. This allows for up to 220 compute units, which increases compute density by over 80% compared to the MI100. It's also the world's first GPU with 128GB of HBM2E memory.



It's the world's first multi-die GPU, featuring the second generation of AMD's CDNA architecture. AMD unveiled the CDNA architecture last year when it bifurcated its data center and gaming GPU designs. The CDNA architecture is designed expressly to optimize data center compute workloads. 

"These workloads, of course, run on very different systems, so separating them into two products and two chip families was an easy way for us to design better products," Brad McCreadie, AMD VP for data canter GPU accelerators, told reporters last week. 

The new MI200 accelerator is about 5x faster than Nvidia's A100 GPU in peak FP64 performance. This is key for HPC workloads requiring high precision like weather forecasting. Its peak FP32 vector performance is about 2.5x faster. This is important for types of math operations used for vaccine simulations, AMD pointed out. 

AMD is also taking the wraps off Milan-X, its first server CPU with 3D Chiplet technology. It's officially launching in Q1 2022. 

These processors have 3x the L3 cache compared to standard Milan processors. In Milan, each CCD had 32MB of cache. In Milan-X, AMD brings 96MB per CCD. The CPU has a total of 804 MB of cache per socket at the top of the stack, relieving memory bandwidth pressure and reducing latency. That in turns speeds up application performance dramatically. 

At the socket level, Milan-X is the fastest server processor for technical computing workloads, with a more than 50% uplift for targeted technical computing workloads over Milan. 

AMD zeroed in on some workloads that enable product design, such as EDA tools, which are used to simulate and optimize chip design. A large cache is critical to obtaining better performance for these workloads. 

In chip design, verification is one of the most important tasks. It helps catch defects early before a chip is baked into silicon. Compared to Milan, Milan-X completes 66% more jobs in a given amount of time. This should help customers using EDA tools finish verification and go to market faster, or add more tests in the same amount of time to further improve the quality or robustness of their design.