Microsoft's Azure team has gone public with the root cause it discovered when investigating the November 19 worldwide multi-factor-authentication outage that plagued a number of its customers. Actually, Microsoft unearthed three independent root causes, along with monitoring gaps that resulted in Azure, Office 365, Dynamics and other Microsoft users not being able to authenticate for much of that day.
Also: Here's why the public cloud is growing rapidly TechRepublic
For 14 hours on November 19, Microsoft's Azure Active Directory Multi-Factor Authentication (MFA) services were down for many. Because Office 365 and Dynamics users authenticate via this service, they also were affected.
The first root cause showed up as a latency issue in the MFA front-end's communication to its cache services. The second was a race condition in processing responses from the MFA back-end server. These two causes were introduced in a code update rollout which began in some datacenters on Tuesday November 13 and completed in all datacenters by Friday November 16, Microsoft officials said.
Also: Best cloud services for small businesses CNET
A third identified root cause, which was triggered by the second, resulted in the MFA back-end being unable to process any further requests from the front-end, even though it seemed to be working fine based on Microsoft's monitoring.
European, Middle Eastern and African (EMEA) and Asian Pacific (APAC) customers were hit first by these cascading issues. As the day went on, Western European and then American datacenters were hit. Even after engineers applied a hotfix which allowed front-end servers to bypass the cache, the issues persisted. On top of all this, telemetry and monitoring wasn't working as expected, officials acknowledged.
Microsoft identified a number of intended next steps to improve the MFA service, including a review of its update-deployment procedures (target completion date: December 2018); a review of montioring services (target completion date: December 2018); a review of the containment process which will help avoid propagating an issue to other datacenters (target completion date: January 2019); and an update to the communications process for the Service Health Dashboard and monitoring tools (target completion date: December 2018).
Microsoft officials apologized to affected customers, but made no mention of any planned financial compensation. Microsoft's November 19 Azure status history post has more details about the trail of events leading to the MFA meltdown.
Previous and related coverage:
Microsoft is planning to bring its Cosmos DB NoSQL database to its hybrid-computing and secure microcontroller platforms as part of its 'intelligent cloud, intelligent edge' strategy.
Microsoft launched Azure in October 2008. In the ensuing decade, Microsoft's cloud platform has come a long way from its 'Red Dog' beginnings.
The companies are proving out a Microsoft Azure cloud-based solution for payments transfers conducted on the SWIFT network.
Grab will adopt Microsoft Azure as its preferred cloud platform and Microsoft will make a strategic investment in Grab.