Opinionated and open machine learning: The nuances of using Facebook's PyTorch
Soumith Chintala from Facebook AI Research, PyTorch project lead, talks about the thinking behind its creation, and the design and usability choices made. Facebook is now unifying machine learning frameworks for research and production in PyTorch, and Chintala explains how and why.
The release of PyTorch 1.0 beta was part of the big news in last week's machine learning (ML) October fest, along with fast.ai, Neuton, and MLFlow. With AI being what it is today, and machine learning powering a good deal of what is going on there, such news cause ripples beyond the ML community.
Chintala is the creator and project lead for PyTorch, one of the top machine learning frameworks. After his keynote, ZDNet caught up with Chintala on a number of topics, ranging from Facebook's motivation and strategy for PyTorch, to the specifics of using it.
How many machine learning frameworks does the world need?
It may sound like a trivial question to ask, but we felt we had to get it out of the way: What was the thinking behind Facebook getting involved and investing resources in its own ML framework with PyTorch? Especially considering that there is another ML framework supported by Facebook, Caffe2.
For cloud vendors like AWS, Google, or Microsoft, there is a very clear incentive: The more people use their ML frameworks, the more compute and storage resources will eventually gravitate toward their respective clouds. But what stakes does Facebook have in this game? Why dedicate the resources -- human and otherwise -- needed to develop and maintain not one, but two ML frameworks, and where is this going?
The thinking behind this was not as forward as you may think, said Chintala. He and the PyTorch team set out to build this simply because they are opinionated and wanted something cut out to their needs:
"Google's TensorFlow was released in 2015. We tried using it, but were not super happy with it. Before this, we tried Caffe1, Theano, and Torch. At the time, we were using Torch and Caffe1 for research and production. The field has changed a lot, and we felt a new tool was needed. Looked like nobody else was building it, not the way we thought will be needed in the future.
So, we felt we should build it. We showed it to some people, and they liked it. There is a strong open source culture in Facebook, so we open sourced it, and it took off. But the goal was mostly to make ourselves happy. It was not because we did not want to rely on Google or Microsoft."
Chintala now works full time on PyTorch, and his team includes something between 10 and 15 people. Even though intellectual curiosity and opinions may account for taking the step to create PyTorch, it does not explain why Facebook would assign these people to work on PyTorch in the long run. Or does it?
Chintala's take is that some people would have to be assigned on something like this anyway. If PyTorch had not been created, the other option would be to tweak some existing framework, which would end up requiring the same resources too. But then, what about Caffe2? Why maintain 2 ML frameworks?
Machine Learning in Facebook AI Research and in production
First off, PyTorch is now officially the one Facebook ML framework to rule them all. PyTorch 1.0 marks the unification of PyTorch and Caffe2. Going forward, Chintala explained, the choice made was to use PyTorch for the front end, and Caffe2 for the back end. This means that nothing changes for users of previous versions of PyTorch, but Caffe2 front end will be deprecated and people will have to switch to PyTorch.
This has to do with the philosophy and goal of each framework. PyTorch was addressed to researchers who need flexibility. Caffe2 was aimed at running in production at extreme scale -- something like 300 trillion inferences per day, as Chintala noted. As you can imagine, merging the two was no easy feat:
"It's not just merging two codebases, but two projects and philosophies. The hardest part was still the technical one, but culture was a close second. It was mostly about how the frameworks evolve, and what needs they are addressing.
At production scale, code has different needs and properties then when you build for research, where you want everyone to be able to express their ideas and be as creative as possible," Chintala said.
This distinction was also pronounced when discussing the infrastructure people at Facebook AI Research (FAIR) use. FAIR employs more than 100 people. Chintala noted that you can see people doing research on their laptops, and he personally uses local disk for storage a lot, as it makes it easier for him to work with files. But it really depends on the project.
"We also use Hive, Presto, and Spark. Some projects really push the limits of scale, and for those we use organizational infrastructure," Chintala said.
Another thing that FAIR does a lot, according to Chintala, is publish. Not just code, but also research papers and data:
"In fundamental research, you have to work with peers in the community, otherwise you get siloed. You may think you are working on something awesome, and then publish it five years later and realize it is horrible. We focus on open datasets, and publish right away. And we also leverage graph structures, to some extent.
For example, we use Wikipedia for question answering. Its structure is graph-like, and there also is the structured version, DBpedia. We do a lot of research on dialog and question answering. For these we use text datasets, and we also synthesize our own. In vision, we use large scale vision datasets.
Another example is how we use hashtags to enhance translation. We may have images with many hashtags describing the same thing in different languages, and we embed images and hashtags in a graph structure and then work on the translation. Although we do such things a lot, I don't remember having worked with Facebook's social graph."
Low level, high level, and opinionated openness
On Spark AI Summit's stage, Chintala showed many of the specifics of working with PyTorch. What impressed us was that, to the untrained eye, many of the code fragments that Chintala used seemed quite raw and low-level. This, however, is intentional, and there are higher level shortcuts as well, as Chintala explained.
Let's take building a neural network, for example. In PyTorch, this is done via a function that looks a bit messy. But according to Chintala, this is what people want:
"This is one of the biggest reasons why people use PyTorch. This function describes a neural network. It may not be well-structured, but it's where people get their expressiveness from. Our target audience is well-acquainted with this, and they want to use it this way.
Let's say you want to build a recurrent neural network, and you have some time series you need to use. In other frameworks you need to use an API, construct a time series, etc. In PyTorch, you just use a for loop. Users find this more intuitive, because there is no extra step needed - you just write code."
Not everything has to be low-level, however. Chintala pointed out that for state of the art models, such as ResNet50, there are one-liners that encapsulate them and can be used in the code. PyTorch also comes with an array of pre-trained models ("model zoo"), out-of-the-box distributed capabilities, and integration with probabilistic programming, machine translation, natural language processing, and more.
Occasionally, these can look deceptively simple. Could this be a problem? For example, when showcasing PyTorch's abstraction for distributed deep learning, it was hard to believe all the nitty-gritty details can be taken care of by one line of code: Where does the dataset come from, which node gets each part, and so on.
In this case, Chintala explained, users can intervene at a lower level and fine tune data loaders, for example. But the idea here was that in 90 percent of cases there is a structured, well-formed pattern of how most people do distributed deep learning, and the one-liner is built to leverage this. And it seems to work well, considering the near-perfect linear scaling in the graph Chintala shared.
So, despite the fact that PyTorch is opinionated, it looks like its creators tried to strike a balance with the ability to accommodate different usage patterns, at least to some extent. As Chintala noted, one of the goals was to make PyTorch a package anyone in the Python ecosystem can use, regardless of what they may be using currently.
The entire Python ecosystem can be used at will is PyTorch's promise. In fact, there is a mechanism called zero-memory copy in place to facilitate this:
"We don't have the 'not invented here' syndrome. You can take a PyTorch tensor and connect it to NumPy. It works by creating a NumPy struct in C, and then directly using a pointer on it. All you have to do is a very cheap, free almost, operation - query the C struct. In NumPy many many things are done in C, and we use it, too."
This may seem like a triviality, but it goes to show the thinking behind PyTorch. High level and low level, opinions, and openness intertwined, or a balancing act. In all fairness, much of open source development, research, and Facebook itself for that matter, walk similar fine lines. This may help the world benefit from cutting edge research in FAIR, as well as FAIR researchers keep up to date with the broader community.