Preventing your own Azure networking flop

Microsoft had the embarrassment of seeing its Azure flagship cloud storage system crash for 12 hours on Friday because it forgot to renew an SSL certificate. Before laughing yourself silly, are you sure that a similar disaster couldn't happen to your internet presence?
Written by Steven Vaughan-Nichols, Senior Contributing Editor

Oh, the shame of it all! Microsoft's worldwide cloud service Azure had a critical failure — and for about 12 hours, the service was down. The cause? An expired Secure Socket Layer (SSL) certificate.

See what can happen if you forget to renew one lousy SSL certificate? A global cloud failure.
Image: ZDNet

First, this was an incredibly stupid mistake. Sure, everyone can make blunders, but it's hard to take Microsoft's cloud offerings seriously after a misstep like this one. What makes this fiasco especially hard to take is that this is the second time Microsoft Azure has tripped over an SSL certificate problem. Last year, Microsoft had an even worse SSL-certificate-related Azure meltdown. That one was traced back to an SSL certificate that expired at the end of February — which Microsoft had renewed as of February 28, even though 2012 was a leap year and thus, February's last day was February 29.

This one though? It wasn't some programmer forgetting about leap day. It was just a failure to make sure that a vital SSL certificate had been kept up to date. Idiots. The certificate expired and suddenly, every SSL connection to Azure storage was blocked. That in turn lead to one Azure service after another failing in a cascading avalanche of disaster.

This was more than just a Microsoft foul-up. It's also a painful reminder that for all our talk about how resilient the internet and clouds can be, that there are several single points of failure that can take down a world-spanning cloud service.

So what can you do about it? Well, for starters, you shouldn't trust all your IT eggs in one basket. In particular, after two major foul-ups caused by trivial technology administration mistakes, I don't see how you can trust Microsoft with any mission-critical cloud work.

That said, all cloud-services fail from time to time. Amazon and Google's track records may be better, but they've had their share of failures as well. The day may come when you can have perfect trust in the cloud, but that day isn't here yet. If you have a business of any size, you still need the belt and suspenders of at least local backups of your critical data.

Moving from the cloud to your business, do you know when your SSL certificates are due for renewal? You can't simply automatically renew them, or to be more exact, you can, but it's a security risk. You need to have someone in your organization be responsible for tracking your SSL certificates. If you don't, well don't blame me if your on-site shopping cart starts failing one day.

Another blunder that keeps getting made time after time is that companies forget to renew their domain name registration. So it is that major companies, such as Australian business telecommunication firm AAPT, can suffer the major business embarrassment of having its website and email suddenly fail.

Just like SSL certificates, the fix is simply to make sure that someone, in some branch in your organization is responsible for maintaining your domain registration, and that the bill is paid. That's all.

It's not that hard and it doesn't cost much. Domain names can be had for under $5 a year and SSL certificates from a reputable SSL certificate provider can be had for as little as $31 annually.

There is no reason why Microsoft, or your company, should fall prey to an SSL or domain failure. All you need to do is keep track of both and pay their bills out of petty cash. It's not that hard.

Related Stories:

Editorial standards