It has been four months since Microsoft took the official wraps off its cloud-computing initiative. Yet still relatively little still is known about the Azure platform and plans.
The part of Azure which intrigued me the most was the cloud operating system, code-named “Red Dog,” that is at its heart. Late last month, Microsoft allowed me access to many of the principals behind Red Dog — everyone from the infamous father of VMS and NT, David Cutler, to the handful of top-dog engineers who helped design and develop the various Red Dog core components. Over the course of this week, I’m going to be publishing a post a day about Red Dog.
Before the Red Dog operating system or the larger Azure stack was even a gleam in anyone's eye, Corporate Vice President Amitabh Srivastava had the opportunity to do almost anything he wanted. He could hand-pick a team of the best and brightest to develop a new Microsoft platform for the cloud.
Srivastava, who admitted he is "very anti-process," assembled a handful of engineers he knew from various Windows and Research assignments at Microsoft. He knew he wanted to keep the core group small and well-knit.
"If you only have 20 people, you don't need as much process. It's not like trying to make sure 5,000 people are all on the same page." (Only recently did the Red Dog team expand, with new services-specific hires from Ask, Yahoo and other non-Windows centric companies. The current headcount for the Red Dog team is about 150, Srivastava said)
His first intended recruit was Dave Cutler, the father of NT and VMS. Cutler "didn't need to write another OS," Srivastava acknowledged, but his "weakness is that he loves coding" and solving hard problems. He convinced him to join the team. Srivastava consulted with Todd Proebsting, a former Microsoft Researcher and director of the company's Center for Software Excellence. He called a few other former colleagues: Storage expert Brad Calder; former Sun utility computing expert turned Microsoft Distinguished Engineer Yousef Khalidi; programming tool and OS specialist Hoi Vo; engineering whiz G.S. Rana; datacenter provisioning expert Hunter Hudson; and developer evangelist Manuvir Das.
"The quality of the communication (between the team) affected the agility and the quality," said Rana, the General Manager of Engineering for Red Dog. "A lot of us had worked together for a long time."
(For a Red Dog core-team "Who's Who list," check out this slide show.)
After an initial two-plus-month fact-finding mission where the core team met with various Microsoft services teams in Redmond and Silicon Valey, the Red Dog team had some ideas of what they did and didn't want to do.
"We said, let's not try to copy Google or Amazon," Srivastava recalled. "We said we'd run things very differently."
The team decided to keep their approach and their mission a secret, even from the Microsoft management. CEO Steve Ballmer knew Srivastava and his core group were working on something for the cloud, but that was about all he knew.
"Steve (Ballmer) asked me 'why are you hiring all our best people'" for your team, Srivastava joked. But he didn't share much, beyond his overall vision statement, with the sometimes loose-lipped CEO.
[Letting the 'Red Dog' cat out of the bag] -->
Keeping it simple
Srivastava and his team decided to make use of assets the company already had -- specifically Windows Server 2008 -- to power Microsoft's datacenters.
"Our biggest learning was if it's not simple, it's not going to work," Srivastava said. "There was a lot of infrastructure that already existed"-- the Windows operating system, tools, debuggers. The idea was to harness all of these things and then "force" a programming model on top of it from Day 1.
The team decided to build a layer -- with pieces akin to what is inside a modern-day operating system -- to manage the thousands of Windows Server machines. A "fabric controller" would manage the cloud; a storage subsystem would act like a traditional "file system" for all of the servers; a virtualization layer, derived from Microsoft's Hyper-V hypervisor, would be at the lowest level between the servers and the rest of the datacenter "operating system."
(Calling Red Dog an "operating system" is an oversimplification, as team members are quick to point out. But each of its components has a parallel in the modern-day operating system world. Red Dog handles switches, load balancers and servers the way a client OS handles device drivers.)
The process of "how you architect software was transferrable from our previous knowledge," said Khalidi, a Distinguished Engineer focusing on Enterprise Strategy who spearheaded the fabric-controller piece of Red Dog. "From Day 1, you just have to think about how to deploy in very large scale."
There's also a transfer of knowledge between the existing Windows teams and the Red Dog team. New features that the Red Dog team builds for its kernel/hypervisor, when applicable, are slated to be folded back into the next version of Windows, for example.
"We touch every component of Windows and tools. We know we want to push the hardware as much as we can," said Vo, Director of the Red Dog (Azure) Operating System.
The Red Dog team, even now that the cat is out of the bag (so to speak) is still big on secrecy. They are part of that "under-promise and over-deliver" school that is growing inside Microsoft. But more and more teams at the company are being moved to the Red Dog platform, starting with Live Mesh, HealthVault and Live Meeting. External beta testers of Microsoft's .Net Services/Live Services platform also are starting to test Red Dog's limits.
(What's Dave Cutler been up to? Tune in to tomorrow's installment for a Q&A with the father of Windows NT on his role in the Red Dog team.)