AWS outage reveals backup cheapskates

Was Amazon to blame for the Instagram, Netflix, Pinterest and Pocket outages? According to analysts, they were just being too cheap.
Written by Michael Lee, Contributor

Amazon Web Services' (AWS) hiccup over the weekend saw a number of web services suffer outages, but the issue was less to do with Amazon, and more to do with individual companies not using cloud services to their full potential, according to analysts.

Intelligent Business Research Services advisor Jorn Bettin laid the blame for the outage with providers failing to utilise cloud services as they should be.

He said that the real issue wasn't that such a huge cloud-services giant like Amazon had stumbled over a storm, but that the affected customers — Instagram, Pinterest, Pocket and Netflix, which all suffered from Amazon's recent outage on the weekend — hadn't used the ability of the cloud to create geographically redundant links.

"They could operate at a higher level of redundancy, so that these sort of outages would only have a minimal impact on them. It's a matter of cost," Bettin said.

Bettin said that the cost of creating redundant connections, to ensure that a natural disaster in one area of the world won't affect services in another, could double cloud costs, however. Despite the call for Amazon to pick up the bill and shield customers from this risk, Bettin said that this isn't the way that cloud services should be treated.

"It doesn't really make sense on a global scale that everyone relies on Amazon as, let's say, the ultimate risk manager for everything. That would be a dangerous proposition."

Instead, he said that the current hands-off model that Amazon has taken to giving customers, with the option to choose whether they want to pay for the risk, is more logical.

"Amazon's doing the right thing here of giving the customer the ability to do these switches from one [geographic] system to another."

What this has means, though, is that several companies have looked at their bottom line, and decided that the cost to mitigate the risk isn't worth maintaining 100 per cent uptime. Bettin said that these organisations tend to be small, and, in order to maintain any sort of profit, they have to be cutthroat with their costs. This is something that the cloud has enabled, but it also puts them at significant risk.

"They're effectively putting all their eggs in one basket. This whole topic is about managing levels of redundancy."

Gartner research vice president for IT services Jim Longwood sees the issue as problematic for Amazon, however, saying that after the most recent two issues, the psychological "third-strike" rule is in effect, and Amazon will be doing its utmost to prevent a repeat incident.

"You can bet your bottom dollar that Amazon will respond very strongly to this, and take remedial action, because if it does happen too often, it will affect their brand and their potential market penetration."

It will definitely be an opportunity for competitors, according to Longwood.

"All the competitors are making mileage from [this incident] already," he said, adding that this competition would then drive customer demand for reliability and availability.

"This is going to drive competition for improved services, particularly [for] reliability and availability. In two years' time, the tolerance will be much less, because they should be much more reliable. We're going to see more of [these incidents], but hopefully less frequently. Certainly, you'll see it amongst the smaller, newer service providers coming into this environment."

Editorial standards