S3 outage: time to double up

Amazon's S3 outage is proof that relying on a single cloud isn't enough, especially if, like RSS analytics provider Mediafed, your enterprise customers demand continuous service. You have to run on at least two.

Probably the best presentation at London Cloud Camp last Wednesday was the one the organizers saved until last: Alan Williamson spoke about Mediafed's experiences as a company that relies on cloud providers. Mediafed specializes in providing RSS traffic analytics to European media companies, with a blue-chip client roster that includes BBC Worldwide, LeMonde, The Guardian, IDG, Axel Springer and others. Williamson's advice will be heeded by many wondering what to do after Amazon S3's 6-hour long outage yesterday:

"We've come to realize we cannot rely on putting all our eggs in one basket," he said, explaining that Mediafed uses two cloud providers side-by-side: Amazon and UK-based Flexiscale. "We run both at the same time." Some customers are on one and some on the other, but all are backed up to the other cloud so that if one fails the service can switch across to the other — which presumably means its customers can still this morning access all their stats from yesterday's RSS traffic, even during the hours that S3 was down.

Anyone concerned about what Om Malik is calling the fragility of cloud services after yesterday's outage needs to consider putting a similar set-up into place. Frankly, I think Malik has it completely wrong — it's not the cloud that's fragile, it's computers, and anyone who expects perfect uptime when relying on a single point of failure has their head, not just their infrastructure, in the clouds. Either you stay cool, like SmugMug, and accept occasional glitches as part of the value proposition you pay for; or you do what Mediafed has done and put some redundancy and a failover plan in place.

As Williamson said in his presentation, there are plenty more risks to worry about besides systems outages: "My heart is fearful of the credit card stopping at Amazon. It scares the bejesus out of me that's going to happen at Amazon." For all its benefits, the cloud is still no silver bullet. Working with the cloud means getting savvy about a whole new set of issues, such as becoming expert in building server infrastructure that monitors your cloud resources, or solving hairy back-up and archiving challenges (Mediafed recently calculated that it is now storing so much data in the cloud that it would take three weeks to download a back-up of its S3 data). Williamson concluded: "We appreciate that cloud computing has moved us on but [now] we've got a whole set of new problems."

See also these posts about previous outages at Amazon Web Services:

Amazon Web Services gets serious about enterprise

Time for a Bezos trustworthy cloud initiative?