Amazon's Web Services outage: End of cloud innocence?
Summary: Cloud computing is learning the harsh reality of resiliency as Amazon Web Services' outage has crossed its second day. Meanwhile, startups and a host of other AWS customers are in uncharted waters. What have we learned?
Cloud computing is learning the harsh reality of resiliency as Amazon Web Services' outage has crossed its second day. Meanwhile, startups and a host of other AWS customers are in uncharted waters.
On Wednesday, the common belief was that startups could build their infrastructure on AWS completely. Set the servers up and forget them. Things like availability zones---for an extra fee---would mean you'd get no single point of failure. Some startups took advantage of that and others didn't.
Given that AWS' North Virginia data center has been out of whack for more than 24 hours, it's clear you need to procure more than one cloud. You need a backup for your cloud provider's backup.
Also: Amazon's N. Virginia EC2 cluster down, 'networking event' triggered problems
The good news for AWS customers is that the service appears to be coming online again. Amazon said in its most recent update:
2:41 AM PDT We continue to make progress in restoring volumes but don't yet have an estimated time of recovery for the remainder of the affected volumes. We will continue to update this status and provide a time frame when available.
6:18 AM PDT We're starting to see more meaningful progress in restoring volumes (many have been restored in the last few hours) and expect this progress to continue over the next few hours. We expect that well reach a point where a minority of these stuck volumes will need to be restored with a more time consuming process, using backups made to S3 yesterday (these will have longer recovery times for the affected volumes). When we get to that point, we'll let folks know. As volumes are restored, they become available to running instances, however they will not be able to be detached until we enable the API commands in the affected Availability Zone.
The AWS fallout is going to be far and wide. Here's a look at some of the key issues:
The blame game only goes so far. First, it's clear that Amazon's communication could be better. But data centers do fail and it's up to customers to make sure their supply chain---in the Web's case Amazon---is backed up. Amazon failed. So did some of its customers for not planning better. Startups will have to plan better. Customers aren't going to give startups a free pass completely.
Amazon will get better. To say this debacle is a learning lesson is going to be an understatement. Communication will improve. And availability zones are likely to become availability regions. Service level agreements (SLAs) will matter more. Gartner's Lydia Leong has a great overview of what went wrong. Here's what she said about SLAs and Amazon.
Amazon’s SLA for EC2 is 99.95% for multi-AZ deployments. That means that you should expect that you can have about 4.5 hours of total region downtime each year without Amazon violating their SLA. Note, by the way, that this outage does not actually violate their SLA. Their SLA defines unavailability as a lack of external connectivity to EC2 instances, coupled with the inability to provision working instances. In this case, EC2 was just fine by that definition. It was Elastic Block Store (EBS) and Relational Database Service (RDS) which weren’t, and neither of those services have SLAs.
Architecture will garner more attention. Bob Warfield noted:
Most SaaS companies have to get huge before they can afford multiple physical data centers if they own the data centers. But if you’re using a Cloud that offers multiple physical locations, you have the ability to have the extra security of multiple physical data centers very cheaply. The trick is, you have to make use of it, but it’s just software. A service like Heroku could’ve decided to spread the applications it’s hosting evenly over the two regions or gone even further afield to offshore regions.
This is one of the dark sides of multitenancy, and an unnecessary one at that. Architects should be designing not for one single super apartment for all tenants, but for a relatively few apartments, and the operational flexibility to make it easy via dashboard to automatically allocate their tenants to whatever apartments they like, and then change their minds and seamlessly migrate them to new accommodations as needed. This is a powerful tool that ultimately will make it easier to scale the software too, assuming its usage is decomposable to minimize communication between the apartments. Some apps (Twitter!) are not so easily decomposed.
This then, is a pretty basic question to ask of your infrastructure provider: “How easy do you make it for me to access multiple physical data centers with attendant failover and backups?”
Welcome to the new world of cloud computing. You'll need multiple cloud providers. Resiliency still matters whether the infrastructure is real or virtual. You wouldn't have one supplier for steel would you? Going forward you'll use AWS, Rackspace and maybe a few others.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
RE: Amazon's Web Services outage: End of cloud innocence?
RE: Amazon's Web Services outage: End of cloud innocence?
RE: Amazon's Web Services outage: End of cloud innocence?
Just as I'd mentioned hundreds of times before... Cloud computing is a nice addition but not solution. If people are going to be so ignorant to think that the web is safe is sorely mistaken. They have been proving my point for the past couple months with certificates being stolen and breaking into all sorts of email systems and so on. Also if one is clever enough to hack into a cloud server they would also be smart enough to know how to make a virus spread and hit other servers as well as redundant backups. I sure hope they will be on top of these things. Also good luck streaming anything offline that you'd paid for and so on. Sorry but I like to keep what I buy in hand not in the "clouds" like they are trying to make so appealing. Also how would gamers ever expect to play real games OTA? They have gaming computers for reasons. They sell hardware to power the OS for a reason. If we scale back and dumb down then what is left? I laugh to think of all the people turning into sheep and those who know computers, systems and how to manipulate things will be the wolfs among the sheep. As China sits on 10 year old OS's and some moving to newer OS's are overtaking parts of the web... I highly doubt they are dumb enough to push everything to to cloud... What army would you have left? Too many things to think about but Cloud is nice addition or option but never a solution.
RE: Amazon's Web Services outage: End of cloud innocence?
Businesses will try to cut corners to save money. Maybe, Amazon will not add redundant servers because it would be too expensive.
There was once a time when people would place their money into several banks, in case one or two closed down. (Those were the people who survived the depression.)
All I can say is wake up cloud users. These things are going to happen, even at a critical time in you business. You must analyze the risk (of the worst possible event and its frequency of occurring) and weigh it with the benefits. Is it worth it? I will not bet my life (or life savings) on the cloud.
was it some sabotage?
from a company that starts with M has 'soft' in its name and resides in Redmond, WA?
RE: Amazon's Web Services outage: End of cloud innocence?
More likely a company that begins with G.
The frakking toasters did it?
Amazon is A Technology Company?
RE: Amazon's Web Services outage: End of cloud innocence?
Are you serious?
RE: Amazon's Web Services outage: End of cloud innocence?
Amazon is Internet Walmart
Wake up call for the providers.
RE: Amazon's Web Services outage: End of cloud innocence?
I can't believe a place like Amazon doesn't run redundant servers and have backups of their own backups. It's no solution to expect the customer to double-cloud, cloud being a misnomer at best, when Amazon should be doing it already, along with colocations and gobs of verification data.
I'm not happy to see anyone lose money, but I am glad to see we're finally starting to get reality checks on these (mis-named) clouds. It's a dumb concept and Amazon could have done a lot better. But they didn't. Neither will others, even in the face of more events like this one.
RE: Amazon's Web Services outage: End of cloud innocence?
RE: Amazon's Web Services outage: End of cloud innocence?
Good theory, welcome to reality!
RE: Amazon's Web Services outage: End of cloud innocence?
Outage does not have to in Amazon's neighborhood.
If some idiot cuts the main cable to the company's building, the company will lose all connection to their data. If Comcast decides to cutout any access to your company's internet provider for some reason, you are out of luck.
The Amazon outage is just a little drop on the bucket of water that will eventually drown the "cloud customers".
RE: Amazon's Web Services outage: End of cloud innocence?
RE: Amazon's Web Services outage: End of cloud innocence?
Redundancy. There was a news story some years ago about a California company that had contracted two service providers because they wanted a good deal of failover built in to their service operation. After a srorm and flash flood somewhere out in the eastern side of the state their service to the East went off line. Turns out that BOTH ISPs had their backbone fiber strung over the same bridge, which was washed out in the flood.
Yeah, the cloud is really great idea... NOT.
Would you rather have your business or compensation?
And the bigger that service provider is, the bigger his legal team when your push meets his shove.
No going back...