Time for a Bezos trustworthy cloud initiative?

Sooner or later, it was inevitable that a server outage would expose Amazon's lack of preparedness for failure. It's beyond me why providers always have to wait until it happens instead of acting beforehand.

Sooner or later, it was inevitable that a server outage would expose Amazon's lack of preparedness for failure. It's ironic that this should have happened within hours of my posting an item here arguing that SaaS vendors should all rely on cloud providers for their infrastructure. Those that do rely on Amazon will be looking for far better outage management and service level reporting in the future than they've tolerated to date.

Amazon Web Services logo

What I can't understand is, why do providers only understand this after they've suffered a major outage? Salesforce.com learnt its lesson two years ago, and as a result its partial outage on Tuesday aroused little reaction. Why on earth Amazon couldn't have invested in a similar system to keep customers informed is beyond me.

At least I can say, 'I told you so'. This is from Time for Web 2.0 to get real, posted in June last year 2006:

... a complete disregard for accountability to their users among service providers ... is nothing new. An article I commissioned from my Loosely Coupled colleague David Longworth in October 2004 reported on Web services without warranties. Here's what [program manager for Amazon Web Services] Jeff Barr had to say about service level guarantees back then:

"We have not found it necessary to offer any kind of formal guarantee in this regard. What works best is to realize that our interests are aligned with the interests of our developers — if the service is not running then their sites are not running, and no transactions are occurring. Clearly, this is bad, and we do all that we can to make sure that it doesn't happen."

In other words, 'If you're down, we’re down, so trust us to stay up — after all, if you can't trust Amazon, who can you trust?' As I pointed out at the time in a blog posting entitled Rips in the Web 2.0 fabric, such breathtaking arrogance is characteristic of a vendor in the grip of what Geoffrey Moore called 'the tornado'. We all know where this kind of thing leads, as I wrote back then:

"Sure, there are going to be a lot of headaches when everyone has standardized on Web 2.0 services in a decade's time. Gartner will come out with a damning report on the unrecognized TCO of on-demand services, and [Amazon CEO] Jeff Bezos will suddenly launch a 'Trustworthy Services' initiative in response to corporate concerns over alleged performance glitches ..."

After all this, do I really think SaaS providers are going to trust cloud infrastructure? I think it keeps the debate open, but I think what today's outage shows is not that the model is broken but that the execution needs fixing.