Modern FPGAs can speed up a wide range of applications, but they still require a lot of expertise. Intel aims to make it easier for the rest of the world to use programmable logic for server acceleration.
These programmable logic devices, which can be reconfigured "in the field" for different tasks after manufacturing, have long been used in telecom gear, industrial systems, automotive, and military and aerospace applications. But modern FPGAs with large gate arrays, memory blocks, and fast IO are suitable for a wide range of tasks.
Microsoft has been using Altera FPGAs in its servers to run many of the neural networks behind services such as Bing searches, Cortana speech recognition, and natural-language translation. At the Hot Chips conference in August, Microsoft announced Project Brainwave, which will make FPGAs available as an Azure service for inferencing. Baidu is also working on FPGAs in its data center and AWS already offers EC2 F1 instances with Xilinx Virtex UltraScale+ FPGAs.
Most customers buy FPGAs as chips, and then design their own hardware and program them in a hardware description language such as VHDL or Verilog. Over time, some FPGAs have morphed into SoCs with ARM CPUs, hard blocks for memory and IO, and more (this week Xilinx just announced a family of Zync UltraScale+ FPGAs with a quad-core Cortex-A53 and the RF data converters for 5G wireless and cable). But the fact remains that FPGAs require considerable hardware and software engineering resources.
"One of the strengths of FPGAs is that they are infinitely flexible, but it is also one of their biggest challenges," said Nicola Tan, senior marketing manager for data center solutions in Intel's Programmable Solutions Group.
Now Intel is aiming to make it easier for other businesses to use FPGAs as server accelerators. This week the chipmaker announced the first of a new family of standard Programmable Acceleration Cards (PACs) for Xeon servers as well as software that makes them easier to program. In addition, Intel and partners are building functions for a wide variety of applications including encryption, compression, network packet processing, database acceleration, video streaming analytics, genomics, finance, and, of course, machine learning.
The PAC is a standard PCI Express Gen3 expansion card that can be plugged into any server. The first card combines the Arria 10 GX, a mid-range FPGA manufactured on TSMC's 20nm process, with 8GB of DDR4 memory and 128MB of flash. It is currently sampling and will ship in the first half of 2018. Intel said it will also offer a PAC with the high-end Stratix 10, manufactured on its own 14nm process, but it hasn't said when that version will be available.
At Hot Chips in August, Microsoft provided a sneak preview of the kind of performance that the Stratix 10 can deliver in the data center and said it expects a production-level chip running at 500MHz with tuned software will deliver a whopping 90 teraops (trillions of operations per second) for AI inferencing using its custom data format.
In addition to the PACs, Intel will also offer an MCP (multi-chip package) that combines a Skylake Xeon Scalable Processor and an FPGA. This is something Intel has been talking up since the $16.7 billion acquisition of Altera, and it has previously shown test chips with Broadwell Xeons and FPGAs, but the first commercial chip will arrive in the second half of 2018.
Conceptually, this isn't really all that different from the Altera and Xilinx SoCs that already include ARM CPUs, but x86 processors should deliver higher performance and Intel can leverage the proprietary interconnect and 2.5D packaging technologies it has been developing.
The Acceleration Stack is a set of APIs, frameworks, libraries and tools for developing applications that run on Xeon servers using either the acceleration cards or MCP. Tan said the software abstracts away a lot of the low-level interfaces used to communicate with the outside world such as the memory bus and PCIe bus. It allows application developers to work in standard OpenCL.
Intel has been working with partners and customers to build some of these applications. For example, Tan said Intel has been doing a lot of work with Swarm64, which accelerates relational databases to enable real-time analytics in MariaDB, MySQL and PostgreSQL, as well as with The Broad Institute on using FPGAs to accelerate the computationally intensive PairHMM algorithm used to compare two gene sequences. Other partners include:
Accelize FPGA Accelerators as a Service (FAaaS) for cloud infrastructure
Algo-Logic solutions for high-frequency trading, packet processing, and data acquisition and processing
B<>com's algorithm to convert SDR (Standard Dynamic Range) content into an HDR (High Dynamic Range) format on a CPU, Nvidia GPU, or FPGA
Bigstream acceleration for Big Data platforms such as Kafka, Spark, and MySQL
Cast Inc., IP for file compression and encryption on FPGAs and ASICs
And DRC Computer co-processors for streaming analytics, graph databases, data deduplication, database queries and character-matching, cryptography, Monte Carlo simulations, biometrics, and DNA pattern matching
The introduction of standard hardware and software is no guarantee of success in data center acceleration (Intel just canceled its PCIe-based Knights Landing co-processor). But it should make FPGAs accessible to a much wider audience. By next year, FPGA acceleration will be available in standalone chips from Intel and Xilinx, Xeon Scalable processors, PCIe cards, complete Dell EMC servers, or as a service from AWS or Microsoft Azure.
The real battle seems be the software where frameworks and algorithms are rapidly evolving, and there are few standards for heterogeneous compute (or perhaps too many). Nvidia says it has spent billions over the past decade developing its CUDA platform, and Intel will need to match this intensity for FPGAs to make the leap from a handful of the very largest cloud players to the wider data center market.