Video: How Docker brought containers mainstream
Thanks to Docker, containers are everywhere now. But, while containers have revolutionized how we develop, package, and deploy applications, we've not done a great job of securing them. That's where Google has a new answer in locking down containers: gVisor.
With gVisor, Google has introduced a new way to sandbox containers. These are containers that provide a secure isolation boundary between the host operating system and the application running within the container.
It does this by providing a Linux user-space kernel, written in Go. This implements a substantial portion of the Linux system surface and intercepting application system calls from containerized programs.
GVisor includes an Open Container Initiative (OCI) runtime called runsc that provides an isolation boundary between the application and the host kernel. This runtime integrates with Docker and Kubernetes, making it simple to run sandboxed containers in production.
Applications that run in traditional Linux containers, such as Docker and CoreOS rkt, access system resources just like regular applications do -- that is, by making system calls directly to the host kernel. The kernel runs in a privileged mode that allows it to interact with the necessary hardware and return results to the application.
True, in Linux, the kernel imposes limits on what the resources a containerized application can access. It does this using Linux cgroups and namespaces, but not all resources are controlled via these mechanisms. Besides, even with these limits, the kernel still exposes a large surface area for attackers.
You can improve container security by using kernel features, such as seccomp filters, which can provide better isolation between the application and host kernel. But, to use those, you create a predefined whitelist of system calls. Few people want go to that much trouble since it's often difficult to know which system calls will be required by a given application
You can also improve container isolation by running each container in its own VM, but that defeats one of the main reasons to use containers: Their smaller size and faster spin-up speeds.
Kata containers is an open-source project that takes this approach to container isolation. Like gVisor, Kata implements an OCI runtime that's compatible with Docker and Kubernetes. Kata uses stripped-down VMs to keep the resource footprint as small as possible while attempting to maximize performance.
Another approach is to use Canonical's open-source LXD. This is a pure-container hypervisor, which runs unmodified Linux guest operating systems with VM-style operations.
GVisor's approach is more lightweight than a VM while maintaining a similar level of isolation.
The core of gVisor is a kernel that runs as a normal, unprivileged process that supports most Linux system calls. This kernel, like LXD, is written in Go, which was chosen for its memory- and type-safety.
GVisor provides a strong isolation boundary by intercepting application system calls and acting as the guest kernel, all while running entirely in user-space. This architecture allows it to provide a flexible resource footprint, unlike a VM, and lowers the fixed costs of virtualization.
However, Google admits this comes at the price of higher per-system call overhead and application compatibility
It also doesn't implement all of Linux's application programming interfaces (API)s. It now supports over 200 system calls. Some system calls and arguments are also not currently supported. In addition, some parts of the /proc and /sys filesystems aren't supported. As a result, not all applications will run inside gVisor. Google claims many will run just fine. These include Node.js, Java 8, MySQL, Jenkins, Apache, Redis, MongoDB, and many more.
On the plus side, the gVisor runtime integrates seamlessly with Docker and Kubernetes through runsc (short for "run gVisor Container"), which conforms to the OCI runtime API. Its runsc runtime is also interchangeable with runc, Docker's default container runtime.
So, if you want to try a new approach and secure your containers without tears, I'd give gVisor a try.