Programming languages: How Instagram's taming a multimillion-line Python monster

Facebook-owned Instagram has detailed its solution to improving the app in a way that enhances Python's advantages and mitigates the language's obstacles to productivity in large-scale projects.
Developer
Behind all the too-perfect selfies on Instagram, the Facebook-owned social network is built on the popular programming language Python.
As Instagram engineers detailed recently, Instagram Server – the backend of the smartphone app that millions of millennials use to share photos of friends and themselves – is almost entirely powered by Python, the third most popular language behind Java and JavaScript.
SEE: Six in-demand programming languages: Getting started (free PDF)
To boost developer productivity, Facebook engineers have developed open-source tools like the Python static type checker, Pyre, which Instagram uses to analyze its server codebase. Instagram engineers have also created developer tools such as LibCST for analyzing Python code.
Instagram engineers' key goal is to be as productive as possible when using Python on a massive scale and they have revealed how they're getting around "a few pain points" when using the language.
The company also wants to attract new talent, even if coders aren't necessarily Python experts, which is why it wants to explain to coding candidates how it's creating tools to help them be more productive.
It also wants to keep developers who like using Python happy, so the company is pushing Python beyond its limits. That's why Instagram engineers have developed 'strict modules' for Python.
"One reasonable take might be that we're stretching Python beyond what it was intended for. It works great for smaller teams on smaller codebases that can maintain good discipline around how to use it, and we should switch to a less dynamic language," writes Instagram engineer Carl Meyer.
"But we're past the point of codebase size where a rewrite is even feasible. And more importantly, despite these pain points, there's a lot more that we like about Python, and overall our developers enjoy working in Python. So it's up to us to figure out how we can make Python work at this scale, and continue to work as we grow," Meyer writes.
techrepublic cheat sheet
He explains that Python is brilliant for fast iteration because you can make changes and see the result without needing to compile the code.
"But with a few million lines of code (and a messy dependency graph), that advantage starts to turn sour," he writes.
The Instagram Server takes up to one minute to start, which translates to a critical delay when testing new features and is long enough for a developer to be sidetracked by another job and not complete the task.
Despite Python enabling faster development times, there are challenges with using it at the scale of Instagram.
"Because imports can have arbitrary side effects, there is no safe way to incrementally reload our server [in Python]. No matter how small the change, we have to start from scratch every time, importing all those modules, re-creating all those classes and functions, recompiling all of those regular expressions, etc," Meyer writes.
"Usually, 99% of the code hasn't changed since last time we reloaded the server, but we have to redo all that slow work anyway."
Instagram's challenges with using Python at scale mirror some of the difficulties Salesforce had with it for its Einstein Analytics product. Salesforce recently dialed down Python code in favor of Google-backed Go, or Golang.
"Python is great for quickly writing higher-level applications but doesn't always deliver the high performance needed at an enterprise level," a Salesforce architect explained.
SEE: Salesforce: Why we ditched Python for Google's Go language in Einstein Analytics
In Instagram's case, Meyer adds that it not only harms developer productivity, but causes a "significant amount of wasted compute in production", because Instagram continuously deploys and reloads the site on production servers.
Meyer describes 'strict modules' as Instagram's answer to Python's weaknesses when used in large-scale production environments
"Strict modules place some limitations on what can happen at module top level. All module-level code, including decorators and functions/initializers called at module level, must be pure (side effect-free, no I/O). This is verified statically at compile time via the abstract interpreter," he says.
"This means that strict modules are side effect-free on import: bad interactions of import-time side effects are no longer possible."
More on Python and programming languages
- Salesforce: Why we ditched Python for Google's Go language in Einstein Analytics
- Python-inspired Nim: Version 1.0 of the programming language launches
- Microsoft TypeScript 3.7: Programming language beta lands with all these features
- Google: Take our free Kotlin programming language courses to build Android apps
- Microsoft: We want you to learn Python programming language for free
- Oracle: Programming language Java 13 is out, it'll make you more productive
- Google: Dart 2.5 programming language SDK will 'supercharge' developers
- Raspberry Pi gets MIT's Scratch 3 programming language for Raspbian
- Julia programming language: Users reveal what they love and hate the most about it
- Programming language Python's 'existential threat' is app distribution: Is this the answer?
- Is Julia fastest-growing new programming language? Stats chart rapid rise in 2018
- Python vs R for data science: Professor rates programming language rivals
- Programming languages: Python predicted to overtake C and Java in next 4 years
- Netflix: Python programming language is behind every film you stream
- Could TypeScript replace JavaScript? Use of programming language spin-off soars
- Is Julia the next big programming language? MIT thinks so, as version 1.0 lands TechRepublic
- Mozilla's radical open-source move helped rewrite rules of tech CNET