Thinking inside and outside the Bochs with Kevin Lawton

Meet the man who pioneered x86 virtualization, Kevin Lawton, in an in-depth interview about Bochs, virtualization and the future of computing.

As some of you know, I co-authored a book with Amy Newman titled, Practical Virtualization Solutions: Virtualization from the Trenches, Prentice Hall 2009, and in it I mentioned the Bochs project. Bochs is an Intel architecture PC emulator written and previously maintained by Kevin Lawton. Since Kevin began the project in 1994, much has happened in the virtualization space. His original work on Bochs and x86 architecture prompted an entire new way of computing: x86 virtualization.

I had the rare opportunity to interview Kevin and ask him questions about Bochs, virtualization and his vision of the future of computing.

The following is the bulk of that interview in Kevin's own words.

Hess: When did you start the Bochs and plex86 projects?

Lawton: I started the Bochs project in 1994, after leaving MIT Lincoln Lab. And, plex86 in the late 1990s.

Hess: Was there a precursor to Bochs or did you start it from scratch?

Lawton: At the time, I had little knowledge about x86 or the PC architecture, I just picked up some books on it and started. Wow, I had no idea how complicated x86 was, how much baggage it came with, and how poorly or undocumented much of the system architecture was, including the BIOS. But, it's probably a good thing that I didn't know ahead of time. Who knows what virtualization would look like today if I hadn't slogged through and made Bochs.

Hess: Would it be fair to call you the Father of x86 virtualization?

Lawton: I'm not the father of it, I was more like the catalyst of x86 virtualization. Before me was the commercial solution, Connectix. But being closed source of course, it wasn't able to be used to R&D the next generation of ideas.

I even implemented a mode in bochs that could execute x86 code natively under the right circumstances, before similar functionality was added to QEMU. This was based on some of the work I did in plex86. The open-ness of what I did in R&D and implementation catalyzed a lot of downstream innovation.

Hess: Let's talk about what's going on now in the virtualization space. The buzzword for this year is VDI. What do you think of VDI?

Lawton: Sure, VDI is useful. It's a step in the evolution of compute and virtualization. But even better is something we might call "liquid computing" where you don't have to care where the compute happens, and the code is so migratable that it moves around without you knowing. For example, one minute it's running on the server and sending a window to an end-point. And, the next instance, it's running on the end-point. But you don't have to care. That way, execution happens where the resources are, or make sense at that moment. Like an iPad game app that realizes it needs heavy 3D rendering and migrates itself to server-side rendering resources. Or an app/environment which migrates from your notebook to your car's in-dash computer.

Hess: Behind the obvious leader in the field, VMware, who do you think has the next biggest shot at corporate adoption for server virtualization?

Lawton: Being a trend-forecaster of sorts, I'll take the liberty of staying away from the uber-political short term answer, and focus on the long term. Nobody said the future of virtualization will look anything like today. If Google plays its cards right, it could be. Ditto for Amazon. Because if things go where I believe they are going, having infrastructure built and super-optimized for the way computing needs to be done tomorrow will offer huge advantages. We will look back and think how caveman-like running VMware was. Look to companies who are building datacenters for tomorrow's computing paradigm.

Hess: Same question for cloud computing.

Lawton: See answer above. I think ultimately these will be the same question.

Hess: Why is security such a concern for cloud computing?

Lawton: It should be; your data is on someone else's hardware. And that hardware is potentially shared or adjacent to the hardware storing someone else's data. I think that's a very rational reason for concern.

Hess: How can we make cloud computing safer?

Lawton: It's incredibly difficult. Having a trusted secure path right from power on through boot is a start. But everything the VM touches, from the network equipment to the storage would have to be 100% in order to be really safe. Personally, I think one of the most exciting unsolved problems out there is to figure out how to run workloads without the hosting hardware/software having any clue what the workload is. Then you can run on completely untrusted crap, and still have secure computing. It's kind of a paradox--the less you trust it, the more safe it becomes. Until then, it's really important that every piece have safety built in, and the solutions are thought out end-to-end. If you don't have a package deal, you don't have safety.

Hess: What are your ideas about the next generation of virtualization and data center computing?


1) Virtualization for GPGPU. What's the VMware-like solution for OpenCL? Why not have a virtualized OpenCL which allows people to write GPGPU apps, but late-bind them to whatever hardware is available at the time? With virtualization at the layer between the "host program" and OpenCL device, that cracks the door open for live migration of GPGPU workloads, just like for CPU (ala vMotion). And thus, the derivative technologies such as workload balancing (DRS) and power management (DPM). Otherwise, this kind of parallel computing app gets stuck on a hardware node once it starts. And for some HPC apps, that's days or weeks.

2) Higher VM densities and big power savings using a software VM migration acceleration trick. Imagine if you applied the concept of memory de-dup within a server, to equivalent memory identification across servers. And then used that as a way to migrate a lot less VM memory, because a lot of the memory you need to transfer is already on the destination server. Given SAN or NAS, mostly you're transferring memory state. It turns out this technique is very effective, especially for cases where there are lots of similar VMs. Now let's think about why we run nominal loads at such low values -- because of load spikes. If we could migrate VMs in zero seconds, we'd boost the load to near 100% all the time. So, accelerating migration equates to higher load capacity. A lot of the current CPU-based power consumption in a datacenter is a waste because we don't employ this technique.

3) Entire software distributions, all in LLVM IR format (i.e. not in native x86 or ARM). I worked for a Linux distro before. Generally, you end up compiling/targeting some kind of lower-common-denominator, by way of compiler flags etc. A) that leaves a lot of performance on the table, and B) it means the code will not run on other architectures. With today's maturity of LLVM, there's no reason for any of this. In the Linux world, there's adoption of multiarch support, which allows binaries for multiple architectures to be installed on the same filesystem. That's a great opportunity for Linux to feather in support of offering packages in LLVM IR format, with either a compile-to-native-binary during install (optimizing for the target platform) or a later binding, like at execution time. Or possibly a combination (cached translations). Ultimately, people may forget about architectures at the app level, even for "native" (e.g. C/C++) code.

4) Actually, if you push the above idea one step further and add more memory indirection within the LLVM emitted code (at the expense of performance), you can get not only some seriously better memory sanity checks, but allow code to be migrated live across different architectures. Imagine a vMotion between x86 and ARM? It's possible at the app level.

5) I haven't checked the latest LLVM status, but with IR extensions, it could be used to compile OS kernels too. And so expanding yet one step further, entire VMs could be migrated across architectures, and re-optimized when needed for the target.

6) Mass-scale VM fault tolerance. Setting aside execution based fault tolerance (where a clone executes the same code), which is resource intensive and generally better for single threaded VMs, state-based fault tolerance (copy dirty pages and synchronize to checkpoints) has some massive opportunities for commoditizing fault tolerance. Why not have some machines with beefy amounts of RAM, acting as FT proxies, serving as the clone VM for massive amounts of VMs? Essentially, this proxy machine is nothing but a memory state synchronizer. But note that it has massive memory de-dup oppties as it can handle many VMs. I think the price of fault tolerance can be brought down low enough that it's a no brainer for many VMs and becomes the standard. Like, on the Amazon EC2 level.

7) CPU re-design/re-thinking. If interposing layers, like LLVM handle memory accesses, what's the point of virtual memory facilities baked into the processor (like page tables)? All that silicon space dedicated to page table storage and page walking logic might be better dedicated to assisting the new paradigm, and to more cores & execution logic. And by the same reasoning, the whole notion of what a virtualization "container" or boundary around an OS instance (or application for that matter), breaks down. Some of virtualization moves out of the processor, into a software layer.

Hess: What are your aspirations for Bochs going forward?

Lawton: I haven't done anything with it for years. It was a high-instrumentation, lower performance tool. I let many projects use it to R&D their efforts, including VMWare when they were at Stanford and QEMU (used in KVM). It was the first real open x86 PC emulation tool, which made it extremely difficult to build during the very undocumented days of x86, the system BIOS, etc. But it paved the way for many others to take their efforts to new levels. I planted a flag on the moon--that's good enough.

See Also:

The Bochs Project

The Plex86 Project

Practical Virtualization Solutions (bound)

Practical Virtualization Solutions (ebook)