AWS down again? Nobody seems to care

AWS down again? Nobody seems to care

Summary: No, AWS didn't break again today. But reliability just doesn't seem to matter.


There was a time when businesses were afraid to use cloud services. After all, they would have little control over problems with the service, and if it went down, so did their business. And while that could still be the majority opinion, many large enterprise web-based businesses have bet the farm on cloud service providers like Amazon Web Services. So you think that the repeated failure of that service would cause some consternation. Apparently that’s not the case.

When Azure has problems it affects Microsoft users; it might impact Office 365 or another of Microsoft’s service offerings. And the knives quickly come out with the Internet ablaze with messages belittling Microsoft’s efforts.  But when AWS fails it seems to have become another story; even the media coverage has been minimized.  When their failure actually took down in the US for half an hour in the middle of a business day recently, that drew some attention. But when yet another failure of Elastic Block Storage (EBS) at the Amazon US East datacenter took out  a number of high-profile business for an hour last Sunday, it seems to have just been business as usual.

Amazon continues to say they will fix this problem, but it has cropped up repeatedly, yet their customers remain. Our own Zack Whitaker mentioned this in his coverage of the Sunday outage event last Monday. But even he had a somewhat dismissive tone, wondering why customers continued to use the specific datacenter that has the problems, but not why they continue to use AWS when Amazon has clearly been unable to fix a recurring problem with this datacenter. The latest problem started at 12:50 in the afternoon, but Amazon was unable to provide full details on what was happening until over two and a half hours later.

When an Amazon US-East failure took out Netflix last Christmas;, people really seemed to notice that. The outcry when went down last week was much more muted, and last Sunday’s event really seems to have been barely a blip on most people’s radar.

Despite the consistent pattern of failure with AWS and US-East, customers don’t seem to care. Consumers are seem to becoming acclimated to the semi-regular failures, developing an “It’s down, I’ll get back to it later” attitude during these outages. But that isn’t a good thing.

Cloud services, whether for business or personal use, need to have bulletproof reliability. There are too many links in the chain between the end-user devices and the service backend to guarantee any given consumer will be able to access any service at any given point in time, but the back-end should be the most reliable piece of the puzzle and not where we come to expect failure.

See also:

Topics: Amazon, Cloud, Data Centers

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • The choir responds with a hallelujah

    The crickets remain silent.
  • Companies

    Companies are so cheap these days, they are willing to sacrifice the quality of the product they deliver to their employees or customers to save a few bucks.
    Susan Antony
  • Well, Duh!

    I am happy to see someone who writes these articles finally wake up and realize that the Cloud is problematic, at best, due to these points of failure.

    Yes, we use certain services that are based in the "cloud", such as online file sharing, but we won't use any to say replace our onsite file server. This highlights the reason that we wouldn't do so.
    • Cloud myths

      Your onsite File Server can take the full advantage of public cloud storage while at the same way presenting on-premises data.

      Take a look on the Storesimple solution which has several appliances with multi-tiering and hybrid storage capabiblities, primary data is stored on SSD Tier, recent data on SATA Tier where it also does Deduplication and older data is sent to Windows Azure. Before the data is sent to Azure it is encryptied with AES-256, all data on azure are already encrypted and the private key is kept only by you on-premises with absolute no privacy risks involved.

      Cloud is programatic as long people trust in the public cloud in a way that they think that it solves all the IT problems, in the same way as you build solutions for the private cloud, in the private cloud you in the design process you need to build the solutions carefuly and either do na appropriate risk management analysis to understand the impact of being offline.

      The cloud on it's terms has several fundamentals (such as elasticity, self-service, automation, etc) that can be built on the private cloud or in the public cloud, what separates the two and bring some advantage to the public cloud is the fact that it can be cheaper to build the right solution.
  • What's a 9 worth

    Uptime measured in 9s while downtime measurement is in dollars but its not a direct relationship to revenue per hour. X i.e. if you make 1 million per hour of uptime doesn't necessarily mean you lose 1 million per hour of downtime... It may just be deferred an hour and you may end the week with the same amount. In which case, the who cares attitude will be tough to avoid. Three are few online scenarios that are so time dependent that they cannot recoup outage time particularly if you adhear to the cloud principals of design.

    Use any website nowadays and you'll likely find it tuned to maximize data harvesting and not tuned to user experience which tells me they really don't care about your time or experience.
  • inconvenience vs. cost

    When AWS goes down, visitors to websites affected are inconvenienced. When M$ cloud service goes down, critical applications are suddenly unavailable and any work relying on those tools is stopped dead.

    Since AWS is far and away the cheapest alternative for web services, it's also something of a cost vs. reliability issue. Running your own web servers offers only slight marginal improvement in uptime at markedly higher cost. Interruptions to web service also don't typically delay completion of key business tasks on deadline.

    Comparison of these two services is apples-to-oranges.
    • Except that there are critical apps running on AWS

      Both Oracle and SAP have AWS-based services that are sold to customers. I would presume that users of those applications are not simply "inconvenienced" and are just as annoyed as Azure customers when line of business apps are unavailable.
      David Chernicoff
    • SLA

      No matter how big your customers are, all customers are customers, every single cloud provider needs to take care of their operations with that level of abstraction. AWS does not have reimbursement SLA's while Azure has and that is a major differentiation in both companies commitments to provide a reliable service. AWS is not transparent in post mortem, Microsoft is obligated to be transparent has it has a reimbursement process and has to clarify in detail what happened, the customer has monitoring capabilities of cloud services and the process either involves a deep analysis and mitigation.

      Will that avoid outages? No! But at the end we see that AWS outages appear much more often.
  • Not a good thing?

    What's so bad about a “It’s down, I’ll get back to it later” attitude? Sounds like a perfectly sane and healthy response to the reality that sometimes things do not go as planned and you can't do or buy something right this very minute. I wish it was a more common skill (and that I was better at it myself!)

    I always shake my head at the wails over business "lost" to downtime that was likely just deferred a bit, or went to a competitor whose site will surely go down and send it back soon enough.
    • Not all AWS computing is aimed directly at retail web sales.....

  • Failure-proof in-house systems!!!

    System Outage is not unique to cloud servers, actually in-house systems are as vulnerable as cloud. This misperception about cloud outages occurs due to the "Availability Effect"; strong media coverage of cloud outages lead people perceive the likelihood of the event more than its original level. Only if we could hear about the outages occurring in private data centers of companies in the media!
  • High Availability, Disaster Recovery & Backup

    Customers using cloud services like AWS, need to understand they are not immune to malfunctions and outages. To achieve a more resilient setup in a traditional data center will incur very high costs.
    Users need to apply high availability solutions to their cloud applications, as well as backup & DR. In AWS's case it can be between availability zones and even between regions. Some of these solutions are built in in AWS's services. Furthermore users need to back up their data and be ready to fail over to another zone/region whenever a significant outage occurs.
    With all these measures, cloud-based applications can become (almost) bullet proof.
    Uri Wolloch