It has been four months since Microsoft took the official wraps off its cloud-computing initiative. Yet still relatively little still is known about the Azure platform and plans.
The part of Azure which intrigued me the most was the cloud operating system, code-named “Red Dog,” that is at its heart. Late last month, Microsoft allowed me access to many of the principals behind Red Dog — everyone from the infamous father of VMS and NT, David Cutler, to the handful of top-dog engineers who helped design and develop the various Red Dog core components. Over the course of this week, I’m going to be publishing a post a day about Red Dog.
Dave Cutler, the father of the VAX VMS and NT operating systems, is a legend inside and outside Microsoft -- and not just because of his coding skills. He's quite the character, according to those who know him, and an incredibly demanding task master who doesn't spare anyone with his pointed criticism.
Nonetheless, the engineers on the Red Dog team with whom I spoke cited the attraction of getting to work with Cutler -- who has been at Microsoft since 1988 -- as one of the main reasons they joined the effort. Cutler was the first person that Azure chief Amitabh Srivasta recruited for Red Dog, knowing Cutler's interest and expertise in virtualization would be key to the team's work. Once he had Cutler on board, other engineers wanted in, too, Srivastava said.
"The first system I ever worked on in college was VMS," said Yousef Khalidi, the Microsoft Distinguished Engineer working on the Red Dog fabric controller. A chance to work with the guy who wrote that operating system was a huge opportunity, he said.
"Cutler can pick any project at this company he wants to work on," said Todd Proebsting, Director of Technical Strategy for Azure. "He's not here to mess around. I'm always clear about where he stands. He's all about the success of the project, but he wants everyone to pull his own weight."
(To see more on Cutler and other core members of the Red Dog engineering team, check out this slide show.)
During the decades I've written about Microsoft, one of the very few execs I've requested repeatedly but have been unable to get was Cutler. Unfortunately Cutler wasn't at Microsoft headquarters when I met with the rest of the team, but I still had a chance to ask him five questions via e-mail.
MJF: What finally convinced you that it was worth your time/effort to join the Azure OS/Red Dog team? Was there something about it that you really wanted to do/try/learn?
Cutler: One of the major premises of Red Dog (RD) is being able to share a single compute node across several properties. This enables better utilization of compute resources and the flexibility to move capacity as properties are added, deleted, and need more or less compute power. This is turn drives down capital and operational expenses. The principle enabler for this type of sharing and the required security and isolation between properties is virtualization. At the time I was not a large proponent of virtualization because of the high overhead it extracted from the base hardware system. I spent a considerable amount of time studying Microsoft's virtualization efforts and after about three months became convinced we could build an efficient hypervisor for RD if we predicated it on second generation virtualization hardware and ran a single OS that was modified to run in the hypervisor environment as efficiently as possible. I never had any doubt that cloud computing would become an important part of Microsoft's product offering and getting over the virtualization hurdle convinced me I should join the team.
MJF: How was working on the Azure OS/Red Dog different from/similar to working on NT? on VMS?
Cutler: RD is very similar to the early days of NT and VMS. It is a small team of dedicated, energetic, smart people working toward a common goal with aspirations of producing a complete and very high quality competitive product. It is different in the sense that we are going after a new business for which we have no installed base or extensive knowledge set and there are significant competitors in the marketplace.
MJF: Some say it's impossible to teach old dogs new tricks.... Is Red Dog proof that it isn't, given that the founding team was primarily comprised of Windows guys like you?
Cutler: In the computer software business old dogs better learn new tricks or they will perish! There was a time when there was almost no competition in the PC space and one could follow the premise "if we build it they will come". Now there is all kinds of competition with smart people starting new companies and offering competitive products that solve customer problems. RD is not a completely new cloud OS from scratch and it does leverage the Windows NT software base (all Windows systems are derivatives of the Windows NT code base) and the other virtualization efforts at Microsoft. The architecture of the RD system, however, is predicated on looking at what Microsoft and others are doing in large data centers and formulating a design that addresses major problems with respect to sharing of compute resources within and between properties, providing durable, efficient, and persistent storage, provisioning compute nodes with software, automatic and timely application of OS and application patches, and automatic scale out for added capacity while at the same time reducing capital and operational expenses.
MJF: What do you think makes Red Dog better than competitive cloud OS solutions from Amazon and Google? Is there anything in either of their approaches you think Microsoft would do to emulate and/or build on (concept or feature-wise)?
Cutler: There are four main components of the RD system: 1) the fabric controller, 2) storage, 3) the integrated development tools and emulated execution environment, and the OS and hypervisor. The one component that we think provides RD with a significant advantage is the fabric controller. The fabric controller owns all the resources in the entire cloud and runs on a subset of nodes in a durable cluster. It manages the placement, provisioning, updating, patching, capacity, load balancing, and scale out of nodes in the cloud all without any operational intervention.
MJF: What was, in your opinion, the biggest challenge you and your team has encountered in building a cloud OS? Did anything about the Red Dog/process/project catch you by surprise (so far)?
Cutler: There have been are lots of challenges and surprises, but none of them are technical!
(I asked Cutler if they weren't technical, what were they. His response: "I think you will have to infer the full meaning of that statement.")
Bonus: Cutler asked and answered his own sixth question:
Cutler: One of the things you did not ask is why aren't we saying more about Azure and in the process filling the marketplace with sterling promises for the future. The answer to this is simply that the RD group is very conservative and we are not anywhere close to being done. We believe that cloud computing will be very important to Microsoft's future and we certainly don't want to do anything that would compromise the future of the product. We are hypersensitive about losing people's data. We are hypersensitive about the OS or hypervisor crashing and having properties experience service outages. So we are taking each step slowly and attempting to have features 100% operational and solidly debugged before talking about them. The opposite is what Microsoft has been criticized for in the past and the RD dogs hopefully have learned a new trick.
(The Red Dog series continues Thursday's installment: How do Live Mesh and .Net Services fit into the Red Dog picture?)