Comments on: 99.999% Reliable? Don't Hold Your Breath

A better question that we should all be asking is: Can we afford the consequences of downtime?

I can't believe that I missed an article Randall Stross published in the NY Times on January 9th titled 99.999% Reliable? Don't Hold Your Breath. It should have turned up in my morning news scan. A friend had to bring this article to my attention and ask what I thought of the concepts behind the article.

When I posted Levels of availability and disaster recovery back in January 2009, I explored just how much "availability" is enough. I'll reproduce a a snippet from that post here:

How much is enough - levels of availability
Availability is a good thing, after all, automated work only gets accomplished if the IT infrastructure is working and available. (as my Grandson would way, “Well Duh.”)

As is always the case with IT-based solutions there are several ways to look at this concept. One is based upon what percent of uptime is required to fulfill an organizations needs. Let’s look at percent uptime and then calculate what downtime that would be experienced.

Levels of Availability

90% 36.5 days
95% 18.25 days
99% ("2 Nines") 3.65 days
99.9% ("3 Nines") 8.76 hours
99.99% ("4 Nines") 50 minutes
99.999% ("5 Nines") 5 minutes
99.9999% ("6 Nines") 52 seconds

Snapshot analysis

Mr. Stross posed a really good question when he asked "Can we realistically expect that such availability will ever come to Internet services?" The answer is that the technology exists to offer continuous availability, non-stop computing or whatever catch phrase you'd prefer. Some applications simply must be available at all times. that is no level of perceived downtime is acceptable.

A better question that we should all be asking is that can we afford the consequences of downtime?

The truth is that some applications would be useful even if they only offered 2 Nines of uptime — that is, they were unavailable for up to 3.6 days.

Other applications or workloads become significant liabilities if they don't experience 6 Nines of uptime — that is, they were unavailable for nearly a minute. Examples would be funds transfer solutions or software controlling expensive manufacturing processes. Some eCommerce applications fall into this category as well.

Some applications can experience absolutely no downtime or the company is out of business.

As organizations increasingly rely on internet-based services and cloud computing, they should expect levels of reliability and availability offered by service providers to match the best of their own on-premise solutions. The technology to support this has existed for ages (on the order of 30 years!) It is simply a matter of will and accepting that the costs are totally outweighed by the costs of downtime.