Programming languages: How Instagram's taming a multimillion-line Python monster

Instagram proposes "strict modules" to help manage massive and fast-moving Python projects.
Written by Liam Tung, Contributing Writer

Facebook-owned Instagram has detailed its solution to improving the app in a way that enhances Python's advantages and mitigates the language's obstacles to productivity in large-scale projects.    

Behind all the too-perfect selfies on Instagram, the Facebook-owned social network is built on the popular programming language Python. 

As Instagram engineers detailed recently, Instagram Server – the backend of the smartphone app that millions of millennials use to share photos of friends and themselves – is almost entirely powered by Python, the third most popular language behind Java and JavaScript.

SEE: Six in-demand programming languages: Getting started (free PDF)

To boost developer productivity, Facebook engineers have developed open-source tools like the Python static type checker, Pyre, which Instagram uses to analyze its server codebase. Instagram engineers have also created developer tools such as LibCST for analyzing Python code

Instagram engineers' key goal is to be as productive as possible when using Python on a massive scale and they have revealed how they're getting around "a few pain points" when using the language. 

The company also wants to attract new talent, even if coders aren't necessarily Python experts, which is why it wants to explain to coding candidates how it's creating tools to help them be more productive. 

It also wants to keep developers who like using Python happy, so the company is pushing Python beyond its limits. That's why Instagram engineers have developed 'strict modules' for Python. 

"One reasonable take might be that we're stretching Python beyond what it was intended for. It works great for smaller teams on smaller codebases that can maintain good discipline around how to use it, and we should switch to a less dynamic language," writes Instagram engineer Carl Meyer.

"But we're past the point of codebase size where a rewrite is even feasible. And more importantly, despite these pain points, there's a lot more that we like about Python, and overall our developers enjoy working in Python. So it's up to us to figure out how we can make Python work at this scale, and continue to work as we grow," Meyer writes.

He explains that Python is brilliant for fast iteration because you can make changes and see the result without needing to compile the code. 

"But with a few million lines of code (and a messy dependency graph), that advantage starts to turn sour," he writes. 

The Instagram Server takes up to one minute to start, which translates to a critical delay when testing new features and is long enough for a developer to be sidetracked by another job and not complete the task. 

Despite Python enabling faster development times, there are challenges with using it at the scale of Instagram. 

"Because imports can have arbitrary side effects, there is no safe way to incrementally reload our server [in Python]. No matter how small the change, we have to start from scratch every time, importing all those modules, re-creating all those classes and functions, recompiling all of those regular expressions, etc," Meyer writes. 

"Usually, 99% of the code hasn't changed since last time we reloaded the server, but we have to redo all that slow work anyway."

Instagram's challenges with using Python at scale mirror some of the difficulties Salesforce had with it for its Einstein Analytics product. Salesforce recently dialed down Python code in favor of Google-backed Go, or Golang.       

"Python is great for quickly writing higher-level applications but doesn't always deliver the high performance needed at an enterprise level," a Salesforce architect explained. 

SEE: Salesforce: Why we ditched Python for Google's Go language in Einstein Analytics

In Instagram's case, Meyer adds that it not only harms developer productivity, but causes a "significant amount of wasted compute in production", because Instagram continuously deploys and reloads the site on production servers.

Meyer describes 'strict modules' as Instagram's answer to Python's weaknesses when used in large-scale production environments

"Strict modules place some limitations on what can happen at module top level. All module-level code, including decorators and functions/initializers called at module level, must be pure (side effect-free, no I/O). This is verified statically at compile time via the abstract interpreter," he says.

"This means that strict modules are side effect-free on import: bad interactions of import-time side effects are no longer possible."

More on Python and programming languages  

Editorial standards