Microsoft's Windows Azure storage service went down worldwide just before 4 p.m. ET/9 p.m. UTC, apparently due to an expired HTTPS certificate.
Here's what the Windows Azure status dashboard is currently showing:
The dashboard messages indicate that Microsoft is aware of the service issue and "actively working on resolving it." The message also notes that all services dependent on Azure storage are affected, as one might expect.
A Microsoft spokesperson sent the following statement:
"At approximately 8:44pm UTC Microsoft became aware of an issue affecting storage worldwide. We are actively investigating this issue and working to resolve it as soon as possible. Updates will be published to the Windows Azure dashboard to keep customers apprised of the situation. s and data in cloud storage."
On the Windows Azure forums, Brian Reischl posted "So is it just me, or did the HTTPS certificate for Azure Storage just expire?" He added a screen shot of what seems to indicate that this is what, indeed, has happened.
There also are reports of Xbox Live Service problems happening at the same time. There are problems "accessing saved games and data in cloud storage." This may be connected to the Azure Storage issue.
I have a message in to the Azure team for a further updates.
Update No. 2: As a few readers have pointed out, if this is a security-certificate issue, this won't be the first time this kind of bug bit Microsoft. When Windows Azure went down at the end of February 2012, the so-called Leap Day bug was linked to a security certificate problem triggered by the date.
Update No. 3: Former Azure technical evangelist and current Aditi Technologies CTO Wade Wegner suggested a temporary workaround for those who don't need to insure data is accessed securely: "Try switching to HTTP instead of HTTPS and redeploy (via upload on portal)."
Update No. 4 (6:15 pm ET): Here's the new message at the top of the Azure Dashboard status page:
"Storage is currently experiencing a Worldwide outage impacting HTTPS operations (SSL traffic). Status of affected services will be updated in the table below. We have identified the root cause and are validating the recovery options before implementing them. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers."
Update No. 5 (7:10 pm ET): The Azure dashboard is now showing worldwide Compute service-management performance problems, which Microsoft is attributing to the Storage outage.
Compute service availability is unaffected, according to the error message. But the ongoing impact to Storage SSL traffic is resulting in users being unable to create new virtual machines or deploy new or existing hosted service deployments, the dashboard message noted.
Update No. 6 (7:20 p.m. ET): It looks like the Office 365 Twitter account is confirming an expired certificate is the culprit to blame for the Storage outage.
Update No. 7 (8:10 p.m. ET): The Azure team also is confirming an expired certificate is at fault. The team is "validating the recovering options" at this point. The full status update from the dashboard:
"Storage is currently experiencing a worldwide outage impacting HTTPS operations (SSL traffic) due to an expired certificate. HTTP traffic is not impacted. We are validating the recovery options before implementing them. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers."
Update No. 8 (February 23, 8 a.m. ET):Some time overnight, Microsoft made the necessary repairs. As of Saturday morning, the dashboard claimed 99 percent availability across all sub-regions. The Compute service is fully back to normal operations, as well, according to the dashboard. Here's the latest message:
"On Friday, February 22 at 12:44 PM PST, Storage experienced a worldwide outage impacting HTTPS traffic due to an expired SSL certificate. This did not impact HTTP traffic. We have executed repair steps to update SSL certificate on the impacted clusters and have recovered to over 99% availability across all sub-regions. We will continue monitoring the health of the Storage service and SSL traffic for the next 24 hrs. Customers may experience intermittent failures during this period. We apologize for any inconvenience this causes our customers."