Are virtual machines (VM) more secure than containers? You may think you know the answer, but IBM Research has found containers can be as secure, or more secure, than VMs.
James Bottomley, an IBM Research Distinguished Engineer and top Linux kernel developer, writes:
One of the biggest problems with the current debate about Container vs Hypervisor security is that no-one has actually developed a way of measuring security, so the debate is all in qualitative terms (hypervisors 'feel' more secure than containers because of the interface breadth) but no-one actually has done a quantitative comparison.
To meet this need, Bottomley created Horizontal Attack Profile (HAP), designed to describe system security in a way that it can be objectively measured. Bottomley has discovered that "a Docker container with a well crafted seccomp profile (which blocks unexpected system calls) provides roughly equivalent security to a hypervisor."
Bottomley starts by defining Vertical Attack Profile (VAP). This is all the code, which is traversed to provide a service all the way from input to database update to output. This code, like all programs, contains bugs. The bug density varies, but the more code you traverse the greater your chance of exposure to a security hole. Stack security holes exploits -- which can jump into either the physical server host or VMs -- are HAPs.
HAPs are the worst kind of security holes. Bottomley calls them, "potentially business destroying events." So, how do you measure a system for HAPs? Bottomley explains:
The Quantitative approach to measuring the HAP says that we take the bug density of the Linux Kernel code and multiply it by the amount of unique code traversed by the running system after it has reached a steady state (meaning that it doesn't appear to be traversing any new kernel paths). For the sake of this method, we assume the bug density to be uniform and thus the HAP is approximated by the amount of code traversed in the steady state. Measuring this for a running system is another matter entirely, but, fortunately, the kernel has a mechanism called ftrace which can be used to provide a trace of all of the functions called by a given userspace process and thus gives a reasonable approximation of the number of lines of code traversed. (Note this is an approximation because we measure the total number of lines in the function taking no account of internal code flow, primarily because ftrace doesn't give that much detail.) Additionally, this methodology works very well for containers where all of the control flow emanates from a well known group of processes via the system call information, but it works less well for hypervisors where, in addition to the direct hypercall interface, you also have to add traces from the back end daemons (like the kvm vhost kernel threads or dom0 in the case of Xen).
In short, you measure how many lines of code a system -- be it bare metal, VM, or container -- uses to run a given application. The more code it runs, the more likely it is to have a HAP-level security hole.
Having defined HAPs and how to measure it, Bottomley then ran several standard benchmarks -- redis-bench-set, redis-bench-get, python-tornado and node-express -- with the latter two also running the web servers with simple external transactional clients. He performed these tests with Docker, Google's gVisor, a container runtime sandbox; gVisor-kvm, the same container sandbox using the KVM, Linux's built-in VM hypervisor; Kata Containers, an open-source lightweight VM; and Nabla, IBM's just released container type, which is designed for strong server isolation.
Bottomley found Nabla runtime had a better "HAP than the hypervisor contained Kata technology, meaning that we've achieved a container system with better HAP (i.e. more secure) than hypervisors."
It wasn't just IBM's project, though, which proved more secure. He also found, "Docker container with a well crafted seccomp profile (which blocks unexpected system calls) provides roughly equivalent security to a hypervisor."
GVisor was another story. At best, gVisor had results about even with Docker use case, but in one case it was, significantly worse. Bottomley speculates that's because "gVisor tries to improve containment by rewriting the Linux system call interface in Go. However, no-one has paid any attention to the amount of system calls the Go runtime is actually using, which is what these results are really showing." If that's the case, Bottomley thinks a future version of gVisor could be rewritten to be much more secure.
The real point, though, isn't which technology is more secure per se. It's that, for the most severe security problems, containers and VMs have about the same level of security. Indeed, Bottomley thinks, "it is perfectly possible to have containers that are more secure than hypervisors and lays to rest, finally, the arguments about which is the more secure technology."
"The next step," he continued, "is establishing the full extent of exposure to a malicious application and to do that, some type of fuzz testing needs to be employed"
In addition, Bottomley's work is only the start. He's shown it's possible to objectively measure an application's security. As he said, "I don't expect this will be the final word in the debate, but by describing how we did it I hope others can develop quantitative measurements as well."