Global DNS outage hits Microsoft Azure customers

Update: An Azure DNS outage, which affected Microsoft customers and services across the globe for several hours, now seems to be mostly mitigated.
Written by Mary Jo Foley, Senior Contributing Editor

It's been a rough morning for Microsoft Azure customers worldwide.

Image: Microsoft

As the Azure status page makes clear, a global DNS outage is impacting users in all regions on Sept. 15, 2016. The status page said the outage started at 11:48 UTC (7:48 am ET). The page said the outage affected "a subset of customers using DNS."

Among the Azure services affected are SQL Database, Virtual Machines, Visual Studio Team Services, Service Bus, API Management, and App Service\Web Apps. The page notes that "engineers are aware of the issue and actively investigating."

Just before 9 am ET, the status page updated, noting that "engineers have identified a possible underlying cause and are working to determine mitigation options". Just before 10 am ET, the status page updated to reflect "degraded service availability," and added engineers were working on mitigation options.

The DownDetector page also highlighted a spike in outage reports for Azure beginning this morning.

Azure user Eric Renken (@KD8FWV on Twitter) tweeted "All our saas (software as a service) products were offline. Fun thing to wake up to."

"Second Azure outage in a week (Europe), again multi region. Back to basics @Microsoft," tweeted Bartlomeij Owczarek (@bowczarek on Twitter). Owczarek noted that the outage is "annoying for our startup surely, but for our corporate clients, it would be entirely different thing if they put eg. ERP on it."

Last week, on Sept. 9, a number of European Azure customers were hit by an escalating, multi-hour outage.

Update (11 am ET/3 pm UTC): Microsoft's Azure status history page is reporting that most of the downed services are back, if not coming back. (Some in Central US are still experiencing "residual impact" with SQL Database.)

Other services that ended up affected include Azure Media Services, Azure Search, HD Insight, Application Insight, IotHub, AzureLog Analytics, Azure Automation, and Data Movement.

Microsoft is identifying the preliminary root cause as a "spike in networking traffic ... which caused service-level drops for the DNS service." That problem caused connectivity issues for the services reliant on DNS. Microsoft SQL Azure also had secondary impact due to a misconfiguration, Microsoft's page said.

The DNS issues were "self-healed by the Azure platform," Microsoft officials said. "The additional impact to SQL was mitigated by a configuration change in networking."

A detailed public postmortem will be on the Azure Status Dashboard site in approximately 48 hours, officials said.

Update (11:15 am ET): I'm now seeing users reporting that OneDrive is having problems. The Down Detector site has confirmed this. I would think this could be Azure-related. I have a question in to Microsoft as to what's going on.

Update (4 pm ET): Microsoft still seems to be having OneDrive connectivity issues that are affecting some, but not all, users of its consumer service. This from a spokesperson:

"We can share that some customers may be experiencing connectivity issues and we're working to restore service."

The spokesperson hasn't replied to my questions as to what's happening behind the scenes or how many are affected.

Editorial standards