Windows Azure storage issue: Expired HTTPS certificate possibly at fault

Windows Azure storage issue: Expired HTTPS certificate possibly at fault

Summary: Microsoft's Windows Azure storage service went down just before 4 p.m. ET. An expired HTTPS security certificate could be at fault.

SHARE:
16

Microsoft's Windows Azure storage service went down worldwide just before 4 p.m. ET/9 p.m. UTC, apparently due to an expired HTTPS certificate.

Here's what the Windows Azure status dashboard is currently showing:

storageservicedegredation

The dashboard messages indicate that Microsoft is aware of the service issue and "actively working on resolving it." The message also notes that all services dependent on Azure storage are affected, as one might expect.

A Microsoft spokesperson sent the following statement:

"At approximately 8:44pm UTC Microsoft became aware of an issue affecting storage worldwide. We are actively investigating this issue and working to resolve it as soon as possible. Updates will be published to the Windows Azure dashboard to keep customers apprised of the situation. s and data in cloud storage."

On the Windows Azure forums, Brian Reischl posted "So is it just me, or did the HTTPS certificate for Azure Storage just expire?" He added a screen shot of what seems to indicate that this is what, indeed, has happened.

azurecertificate

There also are reports of Xbox Live Service problems happening at the same time. There are problems "accessing saved games and data in cloud storage." This may be connected to the Azure Storage issue.

I have a message in to the Azure team for a further updates.

Update: Among the Microsoft customers affected by the Storage problem is the .Net package management service NuGet. Microsoft's Team Foundation Service (TFS) on Azure also seems affected.

Update No. 2: As a few readers have pointed out, if this is a security-certificate issue, this won't be the first time this kind of bug bit Microsoft. When Windows Azure went down at the end of February 2012, the so-called Leap Day bug was linked to a security certificate problem triggered by the date.

Update No. 3: Former Azure technical evangelist and current Aditi Technologies CTO Wade Wegner suggested a temporary workaround for those who don't need to insure data is accessed securely: "Try switching to HTTP instead of HTTPS and redeploy (via upload on portal)."

Update No. 4 (6:15 pm ET): Here's the new message at the top of the Azure Dashboard status page:

"Storage is currently experiencing a Worldwide outage impacting HTTPS operations (SSL traffic). Status of affected services will be updated in the table below. We have identified the root cause and are validating the recovery options before implementing them. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers."

Update No. 5 (7:10 pm ET): The Azure dashboard is now showing worldwide Compute service-management performance problems, which Microsoft is attributing to the Storage outage.

computeissuesstorage

Compute service availability is unaffected, according to the error message. But the ongoing impact to Storage SSL traffic is resulting in users being unable to create new virtual machines or deploy new or existing hosted service deployments, the dashboard message noted.

Update No. 6 (7:20 p.m. ET): It looks like the Office 365 Twitter account is confirming an expired certificate is the culprit to blame for the Storage outage.

o365twittermessage

Update No. 7 (8:10 p.m. ET): The Azure team also is confirming an expired certificate is at fault. The team is "validating the recovering options" at this point. The full status update from the dashboard:

"Storage is currently experiencing a worldwide outage impacting HTTPS operations (SSL traffic) due to an expired certificate. HTTP traffic is not impacted. We are validating the recovery options before implementing them. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers." 

Update No. 8 (February 23, 8 a.m. ET):Some time overnight, Microsoft made the necessary repairs. As of Saturday morning, the dashboard claimed 99 percent availability across all sub-regions. The Compute service is fully back to normal operations, as well, according to the dashboard. Here's the latest message:

"On Friday, February 22 at 12:44 PM PST, Storage experienced a worldwide outage impacting HTTPS traffic due to an expired SSL certificate. This did not impact HTTP traffic. We have executed repair steps to update SSL certificate on the impacted clusters and have recovered to over 99% availability across all sub-regions. We will continue monitoring the health of the Storage service and SSL traffic for the next 24 hrs. Customers may experience intermittent failures during this period. We apologize for any inconvenience this causes our customers."

Topics: Storage, Cloud, Microsoft, Windows

About

Mary Jo has covered the tech industry for 30 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

16 comments
Log in or register to join the discussion
  • New Azure Job Opening coming soon!

    To allow something as mission critical as this expire is beyond the comprehension of EPIC FAIL. Certainly qualifies as a Job Terminating Event. I'm sure this downtime is costing MS thousands of dollars per MINUTE, and likely per second. It was completely preventable and so trivial for an IT person to be notified of upcoming expiring certificates, along with enough lead time to renew it, and have the plan in place to install the new certificate..
    ZStoner
    • Maybe not. Perhaps this new person doesn't need to be hired until Feb/2013

      Early Feb to be safe. I see a pattern here :) And also a facepalm.
      Johnny Vegas
      • but of course

        Microsoft are known to be ignorant in software development and definitely in API design, now they prove once more they have no clue in service provision. And best of all, they are "all in" hardware manufacturing. Pathetic.

        Any kid who run Internet services knows how to monitor SSL certificates and these certificates are probably issued by Microsoft themselves, so as an issuer they too, are supposed to know when the certificates expires. If someone else issued the certificates, those companies have as standard practice to nag you that your certificate is about to expire well in advance - at least 90 days.

        Apparently more than one "manager" at the greatest software company in all times goofed badly - so there might be plenty of new job openings. For those who dream working for Microsoft, that is. :)
        danbi
        • Wow

          Love my job, since I've been bringing in $5600… I sit at home, music playing while I work in front of my new iMac that I got now that I'm making it online.(Click Home information)
          http://goo.gl/DxXIp
          AndyCarroll34
        • When did your brain death occur?

          "Ignorant about software development and definitely in API design?"

          Tell us, who designs better APIs? This is going to be good for a laugh. I'd be somewhat surprised if you could even accurately describe what a for loop is, and here you are giving your opinions on API designs.
          jackbond
          • how much you bet

            I bet you won't bet much to be proved wrong. :)

            So, how much you bet that I don't know what and "for" loop is?
            danbi
        • I'm intrigued about your comment on API design as well

          Having gone though Brad Abrams' and Krzysztof Cwalina's "Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries" book, I'm curious who you think has done a better job at API design.

          I'm not going to argue that letting a TLS certificate on a major site like that expire isn't, uh, stupid. But I am curious about your views on APIs.
          Flydog57
    • Microsoft and Competence,

      Microsoft and Competence, do they even know each other?
      eulampius
  • TFS is also down

    We recently stated using TFS service from Visual Studio. I was attempting to checking my final code, when the visualstudio.com started giving SSL errors. It turned out the real cause was Azure Storage. I hope they fix it soon -- we had to cancel our deployments scheduled for today.
    fakher@...
  • Oh dear how shoddy

    A complete world wide failure.

    ASOD

    Blue wave bye bye.
    Alan Smithie
  • What else could you expect from MS.

    They can't even keep systems running for 30 days...

    Much less a world wide system.

    I do believe that cloud is evaporating...
    jessepollard
  • Microsoft should shoot a video...

    of the brain dead filth that allowed this to happen cleaning out his desk and being escorted from the building by security. Seriously, an expired certificate?
    jackbond
    • Excellent Idea

      Love to see Ballmer fired.
      Alan Smithie
      • Absolutely...

        And be replaced by Scott Guthrie.
        jackbond
  • Just Unacceptable

    We can all understand that sh1t happens, but this was a totally predictable event. No excuses.
    TheCyberKnight
  • That's pretty embarrassing.

    To help make sure Microsoft's silly mistake doesn't happen to anybody else, we just launched a free SSL certificate monitor: http://www.stackify.com/stackify-launches-free-certalert-me-service-to-monitor-ssl-certificates/
    travisfoster