Facebook AI’s friendly battle of researchers versus devs finds PyTorch triumphant

The homely Python-and-C library the AI scientists loved won over the production engineers.

Facebook's AI team is moving all of its machine learning model deployment to the PyTorch framework, and standardizing on the framework for development, the company said Wednesday, during its F8 annual developer event, this year held virtually. 

"We're now at the point where we're standardizing all of our workloads on PyTorch," said the AI team's CTO, Mike Schroepfer, in a press conference.

In a kind of battle of the researchers versus devs, the decision happened only after Schroepfer and team came around to the thinking of Facebook AI's head of research, Yann LeCun, according to Schroepfer.

"I had this knock-out, drag-out debate with Yann LeCun for a long time," explained Schroepfer. 

pytorch-crop-layout-for-twitter.jpg

"Maybe AI is like, I prototype it with a 3-D printer, and then once we figure out what we want, we cast it in concrete, and so we have a totally separate tool chain for production" for performance reasons. LeCun, on the other hand, had insisted on one tool could work for both research and production.

The disagreement relates back to the history of PyTorch, which was introduced by Facebook in 2016. 

PyTorch is a library of machine learning functions written partly in the Python programming language, but with a lot of C++. It was adapted from an existing framework, Torch, as an alternative to frameworks such as Theano, Caffe, and Google's TensorFlow. The software is currently on a stable build of 1.8.1.

Also: Facebook F8 Refresh: All the key announcements

PyTorch's primary value has traditionally been user experience, noted Schroepfer, to easily build and debug. That won it loyalty from researchers such as LeCun who want to rapidly revise neural networks, test, and revise again. It was slower, however, than many programming tools built expressly for the highest performance.

Thus, despite its value for research, Facebook had been using a mix of frameworks for research and production, said Schroepfer, with Caffe 2 having been the second most-prevalent in use inside the company. A variety of "domain-specific frameworks" have also been in use, he said. 

It turned out, said Schroepfer, that despite PyTorch not being always the fastest library, "production engineers want to use all the latest research." More and more problems were being solved by the researchers, first, he explained.

Large AI models for speech recognition or natural language processing are "typically built on top of PyTorch," noted Schroepfer, not just at Facebook but at other companies and in academia. "Often they have GitHub repositories in PyTorch." 

Also: High energy: Facebook's AI guru LeCun imagines AI's next frontier

Currently, 93% of Facebook's AI models are deployed with PyTorch.

The point of standardizing, according to Schroepfer, is to speed up Facebook's AI development and deployment across a variety of applications by collapsing into one library the company's bulk of effort. 

Engineers at Facebook "now have at my fingertips this toolbox of state-of-the-art models that I can put into production much more quickly because they are built on the same toolchain that I use in order to build and ship Web sites at a billion-user scale," he said.

"What you saw when you get this sort of consolidation into a single tool is just a massive inflection in the ability for people to get their jobs done and get work out."

Added Schroepfer, "As the industry further standardizes on tools like this, it will just further accelerate our ability to not just write research papers about this stuff, but get these tools into the world to help real people."

Under the hood, Schroepfer and team had to do things to help PyTorch. 

For example, they run a program called TorchScript that allows for runtime optimization. "Let me write code however I want in Python, but if I label it correctly, we can run this really cool optimizer in the background to make it more efficient."

"We slowly closed the gap here," he said of PyTorch being amped-up. 

"I'm surprised at the outcome, that this tool that started at ease-of-use, experimentation, and agility, found its way into production," Schroepfer said. 

"Yann was right, I was wrong," he said.

Asked if Facebook AI would lose the benefits of diversity by standardizing, Schroepfer clarified that research still gets to play with all the new things it wants. 

"Our research teams have a lot of freedom to experiment, and they are always experimenting with novel technologies, which is how PyTorch came about in the first place," he said.