This week Intel revealed a few new details of its second-generation Xeon Phi processor and gave the first public demonstration. The chip, known as Knights Landing, is big in every way. Manufactured on Intel's most advanced 14nm process, it has a whopping 8 billion transistors, comes with loads of memory in the package, and promises to deliver three times the performance of the current Xeon Phi.
Based on the MIC (Many Integrated Core) architecture, the Xeon Phi is part of Intel's high-performance computing platform. Unlike the Xeon E7, IBM Power or Oracle SPARC, which use fewer, big and fast cores to run databases and analytics, Xeon Phi uses lots of little cores to tackle highly-parallel jobs in fields such as finance, seismic imaging for oil and gas, life sciences, weather simulation and video coding.
The current 22nm version, known as Knights Corner, is a co-processor with up to 61 small cores (244 total threads) used in conjunction with Xeon E5 server processors. It competes with GPU accelerators from Nvidia and AMD that are used in supercomputers and workstations.
Knights Landing is a major overhaul. At Supercomputing 2013, Intel revealed that Knights Landing will work as a host processor, capable of running an OS and applications on its own, as well as functioning as a co-processor. Last year, at the International Supercomputing Conference, it provided more details on the design of the chip, including a new form of memory, and its expected performance. This week we found out a bit more about how Intel plans to deliver such a big boost in performance.
The heart of Knights Landing is a heavily-modified version of the Silvermont core used in Atom C2000 series processors for micro-servers and storage (Avoton) and networking (Rangeley). The out-of-order pipeline in the Knights version is twice as deep (in general, the deeper the pipeline, or the more stages in the execution of an instruction, the more it can complete in a given amount of time), and each core has more L1 cache memory and a 512-bit AVX vector unit for floating-point operations.
The cores are arranged into tiles, each with two cores, a shared L2 cache and a hub that connects the tiles over a coherent mesh network. The chip Intel demonstrated this week had 30 tiles for a total of 60 cores and 240 threads (four threads per physical core), but the company says the final version will have more than 60 cores.
The other big news is the memory architecture. In addition to more cache, the chip package includes eight 2GB stacks of DRAM, or a total of 16GB, of what it calls "near memory." This looks like a version of Micron's Hybrid Memory Cube, which uses stacks of memory chips and an embedded logic chip, all connected with through-silicon vias (TSVs), to deliver higher density and greater bandwidth at lower power. Intel said the on-package memory deliver five times the performance of DDR4 memory on a standard memory bandwidth test for high-performance computing. Knights Landing also has six conventional memory channels that can connect to up to 384GB of DDR4 "far memory."
The result of all this--a more advanced process, a new core design, more cache and a novel memory architecture--is a big boost in performance. Intel says the CPU core itself will deliver three times the performance of the one in the current Knights Corner Xeon Phi on fixed-point integer operations. At the chip level, Knights Landing will also deliver about three times the performance. Knights Corner is capable of a little more than one teraflop (a trillion floating-point operations per second) double-precision and two teraflops single-precision; Knights Landing will deliver 3 teraflops double-precision and 6 teraflops single-precision.
Last week, at its annual GPU Technology Conference, Nvidia announced the Titan X and Quadro M6000, both of which use a Maxwell GPU that delivers up to 7 teraflops of single-precision performance. (Here is CNET's test results for the Titan X.) But it is not designed for double-precision operations. For those applications, the Tesla K80, which uses two of the older Kepler GPUs, delivers up to 2.9 teraflops double-precision while AMD's fastest FirePro server GPUs deliver up to 2.1 teraflops of double-precision performance.
Unlike these GPUs, Knights Landing can be used on its own as a host CPU. It is binary compatible with the mainstream Xeon v3 Haswell server chips, supports the same instructions (with the exception of transactional memory), and can run Linux or Windows Server. Knights Landing will be available in three different versions: a PCI Express accelerator card, a host processor for InfiniBand, and a host processor with Intel's new Omni-Path fabric.
Earlier this month, at the Open Compute Summit, Intel was showing a 1U half-width server board, code-named Adams Pass, with a socket for a Knights Landing host CPU, six channels of DDR4 memory and 36 lanes of PCI Express 3.0. These Knights Landing servers could broaden the potential market for Xeon Phi. In particular, Intel mentioned applications such as machine learning, data analytics, and certain virtualized workloads.
The current Xeon Phi has already found its way into some of the world's fastest computers. The most recent Top500 list of the world's fastest computers includes 25 systems that use Knights Corner (most of the rest use Nvidia Tesla GPUs while three use AMD GPUs).
Knights Landing will be available in the second half of this year. A few supercomputing centers have already announced plans to use it. The National Energy Research Scientific Computing Center (NERSC) plans to build a new system, called Cori and using 9,300 Knights Landing host CPUs, which will go online at its Berkeley facility in 2016. The National Nuclear Security Administration (NNSA) will use both Xeon Haswell server chips and Knights Landing co-processors in a new supercomputer, dubbed Trinity, that the agency says will deliver at least eight times the performance of its current supercomputer at Los Alamos. Intel says some 50 companies will offer server systems using the Knights Landing as a host CPU, while many other will offer servers with the option to use it as a co-processor.
High-performance computing is growing as more companies look for a competitive edge by adopting technology that was once largely the province of government and academia. Cloud companies are also starting to offer high-performance computing as a service. But this bleeding-edge technology is also interesting because it trickles down over time into enterprise data centers and eventually PCs. Intel has already said that its Omni-Path fabric will be available in 14nm Xeon server chips, and as the technology evolves, 3D-stacked memory is almost certain to be used more widely. In this sense, Knights Landing provides a peek at the future of computing.