Microsoft details the causes of its recent multi-factor authentication meltdown

Microsoft has posted a root cause analysis of the multifactor authentication issue which hit a number of its customers worldwide last week. Here's what happened.
Written by Mary Jo Foley, Senior Contributing Editor

Microsoft's Azure team has gone public with the root cause it discovered when investigating the November 19 worldwide multi-factor-authentication outage that plagued a number of its customers. Actually, Microsoft unearthed three independent root causes, along with monitoring gaps that resulted in Azure, Office 365, Dynamics and other Microsoft users not being able to authenticate for much of that day.

Also: Here's why the public cloud is growing rapidly TechRepublic


For 14 hours on November 19, Microsoft's Azure Active Directory Multi-Factor Authentication (MFA) services were down for many. Because Office 365 and Dynamics users authenticate via this service, they also were affected.

The first root cause showed up as a latency issue in the MFA front-end's communication to its cache services. The second was a race condition in processing responses from the MFA back-end server. These two causes were introduced in a code update rollout which began in some datacenters on Tuesday November 13 and completed in all datacenters by Friday November 16, Microsoft officials said.

Also: Best cloud services for small businesses CNET

A third identified root cause, which was triggered by the second, resulted in the MFA back-end being unable to process any further requests from the front-end, even though it seemed to be working fine based on Microsoft's monitoring.

European, Middle Eastern and African (EMEA) and Asian Pacific (APAC) customers were hit first by these cascading issues. As the day went on, Western European and then American datacenters were hit. Even after engineers applied a hotfix which allowed front-end servers to bypass the cache, the issues persisted. On top of all this, telemetry and monitoring wasn't working as expected, officials acknowledged.

Also: Microsoft readies previews of Azure Digital Twins, Azure Sphere secure-edge service

Microsoft identified a number of intended next steps to improve the MFA service, including a review of its update-deployment procedures (target completion date: December 2018); a review of montioring services (target completion date: December 2018); a review of the containment process which will help avoid propagating an issue to other datacenters (target completion date: January 2019); and an update to the communications process for the Service Health Dashboard and monitoring tools (target completion date: December 2018).

Microsoft officials apologized to affected customers, but made no mention of any planned financial compensation. Microsoft's November 19 Azure status history post has more details about the trail of events leading to the MFA meltdown.

Previous and related coverage:

    Microsoft is bringing Cosmos DB to Azure Stack and Azure Sphere

    Microsoft is planning to bring its Cosmos DB NoSQL database to its hybrid-computing and secure microcontroller platforms as part of its 'intelligent cloud, intelligent edge' strategy.

    Microsoft launched Azure 10 years ago and lots (but not everything) has changed

    Microsoft launched Azure in October 2008. In the ensuing decade, Microsoft's cloud platform has come a long way from its 'Red Dog' beginnings.

    SWIFT to use Microsoft Azure for payments transfers

    The companies are proving out a Microsoft Azure cloud-based solution for payments transfers conducted on the SWIFT network.

    Microsoft buys into Grab as pair focus on big data and AI on Azure

    Grab will adopt Microsoft Azure as its preferred cloud platform and Microsoft will make a strategic investment in Grab.

    Editorial standards