Azure founders reflect on Microsoft's first decade as a public cloud vendor

Microsoft's Yousef Khalidi and Hoi Vo, key members of the original Azure 'dream team,' look back on the ups, downs and lessons learned since Microsoft began working on its Azure public cloud.

Microsoft's Azure operating system, codenamed "Red Dog," was designed by a team of Microsoft OS experts including Dave Cutler, the father of VMS and Windows NT starting in latter half of the 2000s. Azure launched in February 2010 and has become the foundation of the "new" Microsoft over the past decade.

In 2008, I had a chance to interview the Azure founding-team members. This month, in working on my look back at Microsoft's 2010s, I discovered that three of those original team members were still at Microsoft. Two of the three agreed to answer a few questions and reflect back on lessons they've learned over the past decade. (Unfortunately, Dave Cutler is still not granting press interviews -- at least to this reporter. I did get to do a brief email interview with him in 2008 about Azure, which you can read here. Last we've heard, he's still at Microsoft and working on gaming-related things.)

When Microsoft began working on Azure, Distinguished Engineer Yousef Khalidi was working on its initial architecture design, compute infrastructure, network and operations. "It was a startup, so we did a little bit of everything, including managing code and meeting with customers," Khalidi told me, via email. Today Khalidi is Corporate Vice President of Product Management for Azure Networking.

Hoi Vo was Architect, Windows Core, in Azure's formative years. He was in charge of driving the OS and Virtual Machine capabilities, and determining what server hardware to use, which ended up being AMD. Today, Vo is a Distinguished Engineer within Xbox, working on Project xCloud, Microsoft's game-streaming service, he said, via email.

Here are my questions and Khalidi's and Vo's answers:

MJF: What are the top couple of things that you feel like the Azure team really got right? (by coincidence and/or plan)

Khalidi: When we started, we had a few key principals that guided our efforts, which still serve Azure well today. The first was simplicity and uniformity. In practice this meant that we were not going to build the system with gold-plated hardware, like many enterprise systems were built at the time. Instead, we would use simple, off the shelf hardware built in a uniform fashion so we could scale horizontally. We would also use basic networking and Ethernet equipment.

Second, we viewed every node and server in Azure at stateless. State must be maintained in a distributed fashion, and everything needed to be replicable. Of course, those are now basic tenants of the cloud, but this was a decision we needed to make consciously.

Vo: We also leveraged the hardware as much as possible, rather than doing everything through software. Problems with hardware were much easier to locate and fix. In software, there are more variables to consider. By not starting our efforts from scratch, and using off the shelf components, we were able to get started quickly.

MJF: What are a couple things you wish you could have done over/done better?

Vo: Even early in the life of Azure, we could have done more to help customers migrate to the cloud from on premises enterprise systems. We started early and had some resource constraints, yet compared to where Azure is in this effort today and the wide variety of tools we offer customers, the benefits that additional tools and support would have provided to developers even in the earliest days are clear.

Khalidi: The goal of the project initially was to build a platform to host Microsoft's internal services, like Bing (MSN Search at the time) and Hotmail, first. At the time, these services were written in a very manual way. We wanted to automate them, and we were willing to re-build them from scratch as cloud native. When we determined later that Azure would be an offering for third parties, we also wanted to make it easier for customers to create similarly reliable and scalable systems.

In some ways, we were ahead of our time. We talked about serverless computing before the term was invented. Executing on our initial vision meant a focus on Platform-as-a-Service and greenfield development through simplified programming models that made it easier to write cloud-native applications. All of that has been in our DNA since day one and it continues to pay dividends for Azure today. This meant we began our push into the Infrastructure-as-a-Service (IaaS) space later than we could have. Customers wanted then, and still want today, help migrating their applications and data to the cloud, where they can be surrounded by applications that add additional value. There are always companies born in the cloud that do scalable computing from the beginning, but enterprises want to lift and shift what they have. In the early 2000s, this represented almost all enterprise computing. Once we were able to establish our IaaS offering, we moved forward.

MJF: What's your biggest overall surprise about Azure as it stands right now?

Vo: From where we started, to now building Project xCloud on Azure, the scale of the service is beyond what we could have imagined at the time. This includes the amount of data center regions, but also the network that connects them and amount of services we offer. We knew building services meant infrastructure and a platform, which only Microsoft and a few others could afford to build out, but we didn't have the full picture yet.

AI has also taken off since 2009 in ways we never projected. Without the scale and compute power Azure wields, it's likely AI may not have taken off by leaps and bounds so quickly. We also wouldn't be able to launch a service like Project xCloud without the ability to launch worldwide, with low latency. We now add more capacity to Azure every day than the entirety of the service when it first started.

Khalidi: Today, we have a responsibility to support people and maintain and deploy mission critical applications around the world. It isn't uncommon to re-write code for systems that grow 10x. The code in Azure and most services have been re-written twice already because of our growth rate, so our initial projections were quickly surpassed.

Even in the earliest days, our assumptions also changed in other ways. Before we had a codename - eventually Red Dog - for the service, we started the process of building Azure by doing a road tour at Microsoft's Silicon Valley offices, learning about the needs of the teams that ran some of Microsoft's biggest internal services. That was the initial customer base. Hoi and I met with Bill Gates to discuss the project not long after, who directed us to turn it into a service. Accordingly, we expanded our thinking to widen the focus to third parties and helping developers build big, scalable solutions.

MJF: What do you think is the most underrated/underappreciated thing about Azure right now? (underrated/appreciated by customers and company watchers)

Vo: Stepping away from Azure to work in Xbox for several years has allowed me to view the service with a new perspective. The most underrated aspect of Azure has been the work to make it compliant for use in a variety of regulated industries and government agencies. This is a multi-step process that is not just technically complex, it is also legally complex. Azure's maturity, breadth and depth allows it to work for customers who have a variety of needs, whether it's AI, data analysis, or streaming video games for Xbox customers.

Additionally, the existence of Azure allowed us to reach customers we never would have been able to reach before. In the world of shrink-wrapped software, only a certain level of people would go out and purchase them, due to the complexity involved in deploying and running business software. By shipping, deploying and running software for customers, we widened the customer base.

Khalidi: We saw the need to serve regulated industries very early on. Immediately after launching the beta in 2009, I took time away from the core team to speak with customers to figure out what it would take for enterprises to move to Azure. Compliance was one of the two most common themes that came up in those conversations.

Another of the most underrated aspects is the cultural change that happened because of Azure. When selling shrink-wrapped software, engineers would spend a year designing, writing code, testing and shipping the CDs. Running the service and running and maintaining the code were handled by the customer and partners. Other than patching and fixing bugs, our energy was focused towards the next release. With Azure, we are focused on what customers are going through. We have a shared responsibility for keeping our customers' businesses running. It has become the difference between being a tech vendor to a partner with our customers.

A key example of how this changes what we ship is the big investment we have made in monitoring services, and the requisite debugging and analytics tools. In the old model, the customer would buy the hardware and build the network, and if something went wrong, they could walk up to a machine to see what's wrong and capture packets running through it. In the cloud - a term that wasn't used as often when we started - where everything is virtual, we needed something to carry out the same functions without physical access. The responsibility of running their services in a scalable, virtual, and secure environment, means providing them the ability to manage their services. With requirements like compliance that used to be a customer responsibility, we need to ensure that the service stays compliant going forward.

MJF: Any lessons you've learned over the past decade helping to build Azure that you feel could be applied by other companies/partners/customers?

Khalidi: It may be trite, but simplicity wins. Not only should any service or product be simple to use, but also simple to discover. Some benefits are easy to imagine, but other are less obvious: simplicity allows for services to be maintained, scaled and run at rates that more complex offerings cannot. Our decision to not use gold-plated hardware is tied to this, since they often are elegantly designed and have a lot of moving parts. Simplicity allows you to be customer-driven, rather than being feature-driven.

Several years ago, we open-sourced SONiC, which is our switch software, to great industry reception. Switch software from big vendors are feature-full, with the ability to handle multiple protocols and management systems. This was added complexity we didn't need. So, we wrote our own modular, container-based software (based on Linux) to give us minimal functionality and bring diversity to the community. Even Cisco's latest router line will now support SONiC.

Vo: Inefficiencies and complexities will be exposed, without exception, because scale will bring these issues to light right away.