GPU killer: Google reveals just how powerful its TPU2 chip really is

Google's second-generation Tensor Processing Units Pods can deliver 11.5 petaflops of calculations.
Written by Liam Tung, Contributing Writer

So far, Google has only provided a few images of its second-generation Tensor Processing Unit, or TPU2, since announcing the AI chip in May at Google I/O.

The company has now revealed a little more about the processor, the souped-up successor to Google's first custom AI chip.

As spotted by The Register, Jeff Dean from the Google Brain team delivered a TPU2 presentation to scientists at last week's Neural Information Processing Systems (NIPS) conference in Long Beach, California.

Earlier this year, Dean said that the first TPU focused on efficiently running machine-learning models for tasks like language translation, AlphaGo Go strategy, and search and image recognition. The TPUs were good for inference, or already trained models.

However, the more intensive task of training these models was done separately on top-end GPUs and CPUs. Training time on this equipment still took days or weeks, blocking researchers from cracking bigger machine-learning problems.

TPU2 is intended to both train and run machine-learning models and cut out this GPU/CPU bottleneck.


A custom high-speed network in TPU2s means they can be coupled together to become TPU Pod supercomputers.

Image: Google

A custom high-speed network in TPU2s, each of which delivers 180 teraflops of floating-point calculations, means they can be coupled together to become TPU Pod supercomputers. The TPU Pods are only available through Google Computer Engine as 'Cloud TPUs' that can be programmed with TensorFlow.

Dean's NIPS presentation offers more details on the design of the TPU Pods, the TPU2, and the TPU2 chips.

Each TPU Pod will consist of 64 TPU2s, delivering a massive 11.5 petaflops with four terabytes of high-bandwidth memory.

Meanwhile, each TPU2 consists of four TPU chips, offering 180 teraflops of computation, 64GB of high-bandwidth memory, and 2,400GB/s memory bandwidth.

Down to the TPU2 chips themselves, these feature two cores with 8GB of high-bandwidth memory apiece to give 16GB memory per chip. Each one has a 600GB/s memory bandwidth and delivers 45 teraflops of calculations.

As Dean notes, TPU1 was great for inference but the next breakthroughs in machine learning will the power of its TPU2-based TPU Pods. He offered 1,000 free TPUs to top researchers who've made it to Google's selective Tensor Research Cloud.

Previous and related coverage

TPU is 15x to 30x faster than GPUs and CPUs, Google says

Google shared details about the performance of the custom-built Tensor Processing Unit (TPU) chip, designed for machine learning.

Google I/O: Custom TPU chip amplifies machine learning performance

The Internet giant revealed it's built its own hardware that's been powering data centers for the past year with significant results.

Read more on AI

Editorial standards