Netflix: BPF is a new type of software we use to run Linux apps securely in the kernel

A Netflix performance architect says BPF promises a fundamental change to a 50-year-old kernel model.

The Top 500 supercomputers have one thing in common: They all run on Linux The new list of the world's fastest computers -- supercomputing's Top 500 -- is out, and every one runs faster than a petaflop using Linux.

There's growing interest in a new type of software for Linux machines called BPF, which allows the user to run a program in the kernel and enjoy "observability super powers", according to Brendan Gregg, a senior performance architect at Netflix. 

BPF isn't something an average computer user would know about or even use, but for network and software engineers it promises value. At Facebook, for example, engineers use BPF as part of a network load balancer

Facebook software engineer Alexei Starovoitov is credited with creating Extended BPF, which is now used in Android for collecting statistics from the kernel, monitoring, or debugging. And Google is using it as part of its Kernel Runtime Security Instrumentation to improve detection of security threat signals, such as a kernel module that loads and hides itself.

SEE: Six in-demand programming languages: Getting started (free PDF)

According to Gregg, BPF promises to make "a fundamental change to a 50-year-old kernel model by introducing a new interface for applications to make kernel requests, alongside syscalls".

And it lets him create abstract but cool, performance-related tools like his BPF theremin, which he used to figure out and visualize things like Wi-Fi signal strength.

The term BPF is derived from Berkeley Packet Filter, which in the 1990s was a virtual machine for efficient packet filters. However, as Gregg explained this week at an Ubuntu Masters talk, BPF in 2019 is now a "generic kernel execution engine". With this broader functionality, he argues BPF is now the name of a technology and not just an acronym.  

"BPF is the biggest operating systems change I've seen in my career, and it's thrilling to be a part of it," wrote Gregg. 

That's quite the statement for someone who's been Netflix's primary on-call engineer to lead the worldwide response for Netflix outages, and who led performance testing at Sun Microsystems for its first ZFS-based storage appliance. 

Gregg said he's been using BPF at Netflix to understand why software is blocking performance in ways that it couldn't see before in production systems.

The big change BPF allows for is achieved through BPF Helper Calls, which are the equivalent of system API calls to the kernel in user-mode applications. 

"This allows you to write kernel mode applications that can access resources and run with high performance and efficiency with guarantees of security," he said.

SEE: Netflix: Our Metaflow Python library for faster data science is now open source

Gregg argues this is different to writing a kernel module because while these modules do have access to hardware, there's no fixed API, which creates security risks.

"You can panic the machine, you can introduce security issues. If you came to me and said, 'Brendan, I've written this awesome kernel module, can you run it at Netflix?', I'd be very hesitant to do that," he said. 

"It introduces a lot of risk. If you said you've introduced this awesome BPF program, can you run it at Netflix, that's completely different."

Gregg notes that most BPF applications today are written in C and Assembly but he predicts there will be new languages developed in future that are dedicated to BPF.     

More details about Gregg's BPF talk can be found in the paper he presented at the Ubuntu Masters talk