Update below: Cloud services sound great. A company can host their infrastructure with a large player like Amazon and Google, spend little and grow the business. Data center investment? Why would you do something like that?
Those theories are being tested today as Michael Krigsman is on the case of a major Amazon Web services outage. Amazon recently installed an SLA promising 99 percent uptime so any financial hit will be determined later. For now, customers are getting a lesson in backup options (Techmeme discussion).
As Michael notes this outage could have big implications since Amazon is increasingly hosting enterprise-class software such as Red Hat Enterprise Linux. Amazon is even rumored to be in the sweepstakes to host SAP's BusinessByDesign.
For now your best bet is to monitor Amazon's message board to see what happens when the cloud goes awry. A few choice excerpts:
- Hi, what is the deadline to fix this inssue, because i have many clients using the S3 service.
- And this is why you have to setup a fail-safe. My new sites hosts over 25,000 images on Amazon and I wake up to notice major issues this morning. I switched over to using my local server and everything is back up...I really need to set something up so it does this automatically. The s3 service is great but this just proves you can't rely on it, this is a major issue especially since it's been down for so long. Way to go Amazon.
- This is really a severe blow to confidence in trusting AWS services.
Update: Amazon has resolved the issue, adding in a post.
We’ve resolved this issue, and performance is returning to normal levels for all Amazon Web Services that were impacted. We apologize for the inconvenience. Please stay tuned to this thread for more information about this issue.
The question now is whether folks view this spell as mere growing pains or something larger to worry about.
Update 2: Suggestion of the day from an Amazon customer:
A health monitor would be useful -- something to show what amazon thinks the status of the services are and to post official information. Maybe even proactive alerts or something I could tie our other infrastructure notifications into so I could be proactive in alerting our downstream affected users.
That idea isn't original, but is pretty handy. After a series of outages, Salesforce.com created a similar dashboard.