Opinionated and open machine learning: The nuances of using Facebook's PyTorch

Soumith Chintala from Facebook AI Research, PyTorch project lead, talks about the thinking behind its creation, and the design and usability choices made. Facebook is now unifying machine learning frameworks for research and production in PyTorch, and Chintala explains how and why.
Written by George Anadiotis, Contributor

The release of PyTorch 1.0 beta was part of the big news in last week's machine learning (ML) October fest, along with fast.ai, Neuton, and MLFlow. With AI being what it is today, and machine learning powering a good deal of what is going on there, such news cause ripples beyond the ML community.

Also: Facebook open-source AI framework PyTorch 1.0 released

At last week's Spark AI Summit Europe, we had the chance to discuss with some of the rock stars of this community. MLFlow's new version was presented in Databricks Chief Technologist Matei Zaharia's keynote, and PyTorch 1.0 was presented in Facebook AI Research Engineer Soumith Chintala's keynote.

Chintala is the creator and project lead for PyTorch, one of the top machine learning frameworks. After his keynote, ZDNet caught up with Chintala on a number of topics, ranging from Facebook's motivation and strategy for PyTorch, to the specifics of using it.

How many machine learning frameworks does the world need?

It may sound like a trivial question to ask, but we felt we had to get it out of the way: What was the thinking behind Facebook getting involved and investing resources in its own ML framework with PyTorch? Especially considering that there is another ML framework supported by Facebook, Caffe2.

For cloud vendors like AWS, Google, or Microsoft, there is a very clear incentive: The more people use their ML frameworks, the more compute and storage resources will eventually gravitate toward their respective clouds. But what stakes does Facebook have in this game? Why dedicate the resources -- human and otherwise -- needed to develop and maintain not one, but two ML frameworks, and where is this going?

Also: Fast.ai's software could radically democratize AI

The thinking behind this was not as forward as you may think, said Chintala. He and the PyTorch team set out to build this simply because they are opinionated and wanted something cut out to their needs:

"Google's TensorFlow was released in 2015. We tried using it, but were not super happy with it. Before this, we tried Caffe1, Theano, and Torch. At the time, we were using Torch and Caffe1 for research and production. The field has changed a lot, and we felt a new tool was needed. Looked like nobody else was building it, not the way we thought will be needed in the future.

So, we felt we should build it. We showed it to some people, and they liked it. There is a strong open source culture in Facebook, so we open sourced it, and it took off. But the goal was mostly to make ourselves happy. It was not because we did not want to rely on Google or Microsoft."


Soumith Chintala is a Research Engineer at Facebook AI Research and the creator of PyTorch. His motivation for creating it? Having something that works according to his needs.

Chintala now works full time on PyTorch, and his team includes something between 10 and 15 people. Even though intellectual curiosity and opinions may account for taking the step to create PyTorch, it does not explain why Facebook would assign these people to work on PyTorch in the long run. Or does it?

Also: Startup uses AI and machine learning for real-time background checks

Chintala's take is that some people would have to be assigned on something like this anyway. If PyTorch had not been created, the other option would be to tweak some existing framework, which would end up requiring the same resources too. But then, what about Caffe2? Why maintain 2 ML frameworks?

Machine Learning in Facebook AI Research and in production

First off, PyTorch is now officially the one Facebook ML framework to rule them all. PyTorch 1.0 marks the unification of PyTorch and Caffe2. Going forward, Chintala explained, the choice made was to use PyTorch for the front end, and Caffe2 for the back end. This means that nothing changes for users of previous versions of PyTorch, but Caffe2 front end will be deprecated and people will have to switch to PyTorch.

Also: Facebook advances computer vision using hashtagged pictures

This has to do with the philosophy and goal of each framework. PyTorch was addressed to researchers who need flexibility. Caffe2 was aimed at running in production at extreme scale -- something like 300 trillion inferences per day, as Chintala noted. As you can imagine, merging the two was no easy feat:

"It's not just merging two codebases, but two projects and philosophies. The hardest part was still the technical one, but culture was a close second. It was mostly about how the frameworks evolve, and what needs they are addressing.

At production scale, code has different needs and properties then when you build for research, where you want everyone to be able to express their ideas and be as creative as possible," Chintala said.


Facebook AI Research has more than 100 researchers working on various projects. Open source and publishing is key to its philosophy, according to Chintala.

This distinction was also pronounced when discussing the infrastructure people at Facebook AI Research (FAIR) use. FAIR employs more than 100 people. Chintala noted that you can see people doing research on their laptops, and he personally uses local disk for storage a lot, as it makes it easier for him to work with files. But it really depends on the project.

"We also use Hive, Presto, and Spark. Some projects really push the limits of scale, and for those we use organizational infrastructure," Chintala said.

Also: 10 ways AI will impact the enterprise in 2018 TechRepublic

Another thing that FAIR does a lot, according to Chintala, is publish. Not just code, but also research papers and data:

"In fundamental research, you have to work with peers in the community, otherwise you get siloed. You may think you are working on something awesome, and then publish it five years later and realize it is horrible. We focus on open datasets, and publish right away. And we also leverage graph structures, to some extent.

For example, we use Wikipedia for question answering. Its structure is graph-like, and there also is the structured version, DBpedia. We do a lot of research on dialog and question answering. For these we use text datasets, and we also synthesize our own. In vision, we use large scale vision datasets.

Another example is how we use hashtags to enhance translation. We may have images with many hashtags describing the same thing in different languages, and we embed images and hashtags in a graph structure and then work on the translation. Although we do such things a lot, I don't remember having worked with Facebook's social graph."

Low level, high level, and opinionated openness

On Spark AI Summit's stage, Chintala showed many of the specifics of working with PyTorch. What impressed us was that, to the untrained eye, many of the code fragments that Chintala used seemed quite raw and low-level. This, however, is intentional, and there are higher level shortcuts as well, as Chintala explained.

Let's take building a neural network, for example. In PyTorch, this is done via a function that looks a bit messy. But according to Chintala, this is what people want:

"This is one of the biggest reasons why people use PyTorch. This function describes a neural network. It may not be well-structured, but it's where people get their expressiveness from. Our target audience is well-acquainted with this, and they want to use it this way.

Let's say you want to build a recurrent neural network, and you have some time series you need to use. In other frameworks you need to use an API, construct a time series, etc. In PyTorch, you just use a for loop. Users find this more intuitive, because there is no extra step needed - you just write code."

Not everything has to be low-level, however. Chintala pointed out that for state of the art models, such as ResNet50, there are one-liners that encapsulate them and can be used in the code. PyTorch also comes with an array of pre-trained models ("model zoo"), out-of-the-box distributed capabilities, and integration with probabilistic programming, machine translation, natural language processing, and more.

Also: AI means a lifetime of training CNET

Occasionally, these can look deceptively simple. Could this be a problem? For example, when showcasing PyTorch's abstraction for distributed deep learning, it was hard to believe all the nitty-gritty details can be taken care of by one line of code: Where does the dataset come from, which node gets each part, and so on.

In this case, Chintala explained, users can intervene at a lower level and fine tune data loaders, for example. But the idea here was that in 90 percent of cases there is a structured, well-formed pattern of how most people do distributed deep learning, and the one-liner is built to leverage this. And it seems to work well, considering the near-perfect linear scaling in the graph Chintala shared.


This is how you create a neural network in PyTorch. It may seem low-level to you, but that's exactly how its creators wanted it to be.

So, despite the fact that PyTorch is opinionated, it looks like its creators tried to strike a balance with the ability to accommodate different usage patterns, at least to some extent. As Chintala noted, one of the goals was to make PyTorch a package anyone in the Python ecosystem can use, regardless of what they may be using currently.

Also: How Facebook scales AI

The entire Python ecosystem can be used at will is PyTorch's promise. In fact, there is a mechanism called zero-memory copy in place to facilitate this:

"We don't have the 'not invented here' syndrome. You can take a PyTorch tensor and connect it to NumPy. It works by creating a NumPy struct in C, and then directly using a pointer on it. All you have to do is a very cheap, free almost, operation - query the C struct. In NumPy many many things are done in C, and we use it, too."

This may seem like a triviality, but it goes to show the thinking behind PyTorch. High level and low level, opinions, and openness intertwined, or a balancing act. In all fairness, much of open source development, research, and Facebook itself for that matter, walk similar fine lines. This may help the world benefit from cutting edge research in FAIR, as well as FAIR researchers keep up to date with the broader community.

36 of the best movies about AI, ranked

Previous and related coverage:

What is AI? Everything you need to know

An executive guide to artificial intelligence, from machine learning and general AI to neural networks.

What is deep learning? Everything you need to know

The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.

What is machine learning? Everything you need to know

This guide explains what machine learning is, how it is related to artificial intelligence, how it works and why it matters.

What is cloud computing? Everything you need to know about

An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud.

Related stories:

Editorial standards