How Facebook scales AI

Facebook's products and services are powered by machine learning. Powerful GPUs have been one of the key enablers, but it takes a lot more hardware and software to serve billions of users.

Most of Facebook's two billion users have little idea how much the service leans on artificial intelligence to operate at such a vast scale. Facebook products such as the News Feed, Search and Ads use machine learning, and behind the scenes it powers services such as facial recognition and tagging, language translation, speech recognition, content understanding and anomaly detection to spot fake accounts and objectionable content.

The numbers are staggering. In all, Facebook's machine learning systems handle more than 200 trillion predictions and five billion translations per day. Facebook's algorithms automatically remove millions of fake accounts every day.

In a keynote at this year's International Symposium on Computer Architecture (ISCA), Dr. Kim Hazelwood, the head of Facebook's AI Infrastructure group, explained how the service designs hardware and software to handle machine learning at this scale. And she urged hardware and software architects to look beyond the hype and develop "full-stack solutions" for machine learning. "It is really important that we are solving the right problems and not just doing what everyone else is doing," Hazelwood said.

Facebook's AI infrastructure needs to handle a diverse range of workloads. Some models can take minutes to train, while others can take days or even weeks. The News Feed and Ads, for example, use up to 100 times more compute resources than other algorithms. As a result, Facebook uses "traditional, old-school machine learning" whenever possible, and only resorts to deep learning--Multi-Layer Perceptrons (MLP), ConvolutionalNeural Networks (CNN), and Recurrent Neural Networks (RNN/LSTM)--when absolutely necessary.

The company's AI ecosystem includes three major components: the infrastructure, workflow management software running on top, and the core machine learning frameworks such as PyTorch.

Facebook has been designing its own datacenters and servers since 2010. Today it operates 13 massive datacenters--10 in the U.S. and three overseas. Not all of these are the same since they were built over time and they do not house the same data since "the worst thing you can do is replicate all data in every data center." Despite this, every quarter the company "unplugs an entire Facebook datacenter," Hazelwood said, to ensure continuity. The datacenters are designed to handle peak loads, which leaves about 50% of fleet idle at certains times of the day as "free compute" that can be harnessed for machine learning.

Rather than using a single server, Facebook took hundreds of workloads in production, put them in buckets, and designed custom servers for each type. The data is stored in Bryce Canyon and Lightning storage servers, training takes place on Big Basin servers with Nvidia Tesla GPUs, and the models are run on Twin Lakes single-socket and Tioga Pass dual-socket Xeon servers. Facebook continues to evaluate specialized hardware such as Google's TPU and Microsoft's BrainWave FPGAs, but Hazelwood suggested that too much investment is focused on compute, and not enough on the storage and especially networking, which in keeping with Amdahl's Law can become a bottleneck for many workloads. She added that AI chip startups weren't putting enough focus on the software stack leaving a big opportunity in machine learning tools and compilers.


Facebook's own software stack includes FBLearner, a set of three management and deployment tools that focus on different parts of the machine learning pipeline. FBLearner Store is for data manipulation and feature extraction, FBLearner Flow is for managing the steps involved in training, and FBLearner Prediction is for deploying models in production. The goal is to free up Facebook engineers to be more productive and focus on algorithm design.

Facebook has historically used two machine learning frameworks: PyTorch for research and Caffe for production. The Python-based PyTorch is easier to work with, but Caffe2 delivers better performance. The problem is that moving models from PyTorch to Caffe2 for production is a time-consuming and buggy process. Last month, at its F8 developer conference, Facebook announce that it had "merged them internally so you get the look and feel of PyTorch and the performance of Caffe2" with PyTorch 1.0, Hazelwood said.

This was a logical first step for ONNX (Open Neural Network Exchange), an effort by Facebook, Amazon and Microsoft to create an open format for optimizing deep learning models built in different frameworks to run on a variety of hardware. The challenge us that there are lots of frameworks--Google TensorFlow, Microsoft's Cognitive Toolkit, and Apache MXNet (favored by Amazon)--and the models need to run on a variety of different platforms such as Apple ML, Nvidia, Intel/Nervana and Qualcomm's Snapdragon Neural Engine.

There are a lot of good reasons for running models on edge devices, but phones are especially challenging. Many parts of the world still have little or no connectivity and more than half of the world is using phones dating from 2012 or earlier, and they use a variety of hardware and software. Hazelwood said there is about a 10X performance difference between today's flagship phone and the median handset. "You can't assume that everyone you are designing your mobile neural net for is using an iPhone X," she said. "We are very anomalous here in the U.S." Facebook's Caffe2 Go framework is designed to compress models to address some of these issues.

The deep learning era has arrived and Hazelwood said there are lots of hardware and software problems to solve. The industry is spending lots of time and money building faster silicon but, she said, we need equal investment in software citing Proebsting's Law that compiler advances only double compute performance every 18 years, "Please keep that in mind so we don't end up with another Itanium situation," Hazelwood joked, referring to Intel's non-defunct IA-64 architecture. The real opportunity, Hazelwood said, is in solving problems that no one is working on building end-to-end solutions with balanced hardware and better software, tools and compilers.