Amazon Web Services suffers partial outage

Amazon Web Services suffers partial outage

Summary: Parts of Amazon Web Services were hit by an outage in a North Virginia datacenter with many popular websites clocking off the Web for the evening.

SHARE:
TOPICS: Cloud
13

Parts of Amazon Web Services (AWS) suffered an outage on Thursday which led to a spread of high-profile sites clocking off the Web for the evening.

Amazon was quick to update its cloud status --- its first update at 8:50 p.m. PDT --- stating the problems were due to a power outage in a Virginia datacenter.

It's the same datacenter that forced Quora, Foursquare and other major website to crumble in April 2011 as the cloud infrastructure began to fall from the sky. Since last year's outage, a detailed post-mortem noted the need for greater transparency and better communication with its customers.

As of this morning, "almost all affected EBS volumes have been brought back online" but some still report problems. It may take a few more hours for the service to fully recover.

Amazon's RDS service also fell down, but has since recovered from a multi-availability zone failure. However, a "small number of [database] instances remain unavailable" at 1.09 a.m. PDT.

Customers were quick to vent their frustration on Twitter, which thankfully isn't hosted by the AWS service.

Sites like Quora (it got hit again, bless it) and Hipchat, along with Heroku --- a division of Salesforce, and leading social movement Pinterest and file-hosting site Dropbox hit the stumbling block as a result of the outage.

It's a case of putting all of the Web's eggs all in the same basket. Or, at least in one case, all the tofu in one food truck. (I think he was kidding.)

Amazon Web Service, when it works ---and give it credit, we're talking the very vast majority of the time --- it works well. Amazon says it is "committed" to a 99.95 percent uptime, but other smaller, nimbler companies, as you might expect, offer a 99.99 percent uptime.

It doesn't mean that Amazon's cloud service will fall down on average 7 minutes a month, but it doesn't help when customers start calling to ask why their service is down, only to reassured that "most of the time it's up."

Earlier this week, Amazon announced its S3 online storage service hit the 1 trillion object milestone mark, equating to roughly 140 objects per person for everyone on the planet.

Image source: Twitter.

Related:

Topic: Cloud

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

13 comments
Log in or register to join the discussion
  • "single Availability Zone(s)" are scary.

    Wonder if customers knew this going in?
    droidfromsd
    • For the affected technology companies, I would say yes.

      Two opposing views at planning meetings normally take the form of:

      a) to do this cloud thing "right" it will cost about 1/3 more
      b) we can tolerate an SLA outage, so why spend the extra money

      Choice "a" is a tough sell when everyone in the room is focused on upfront and ongoing costs. You get what you pay for.
      Tired Tech
  • There words "Single" and "Availability" and "Zone" sum it up

    Customers of AWS know this...

    It's simply poor system design to have any meaningful application or service dependent on one data center (in AWS, your own data centers or other cloud providers). AWS makes it easy to develop multi-zone and even multi-region applications - I'm really surprised Heroku and others who were affected haven't invested the effort to operate with a zone failure.

    My company has plenty of AWS hosted apps, many using RDS, and we sailed right past the failure (with no intervention required).
    TroyOO
  • Reliability

    And this is why you shouldn't rely solely on the cloud. It seems to me that cloud services would be a good part of your disaster recovery solution. Not too many small to medium size companies can afford to run 2 server rooms and 2 sets of servers. If you duplicated your non-sensitive data with your cloud vendor, that would probably be a cost-effective solution.

    There are just too many single points of failure for me to be comfortable using the cloud as my sole back end.
    GSG
  • Live by the cloud, die by the cloud

    Yesterday I went to a local bank and found that they could not do any processing because "the Internet" was down. They had to do my transactions manually and promise to post them later, when processing returned. A second bank and a credit union also had the same problem.

    I don't know how wide spread it was, but it certainly affected more than one business here (San Diego area). Really makes me feel comfortable about services and data storage in the cloud. (Did you detect my sarcasm?)
    Shara8
    • @Shara8

      No bank (that I know of) houses its systems in "the cloud". All the banks I've worked with operate and maintain their own infrastructure.

      Most of these systems are highly fragile and depend on lots of complex legacy back-end infrastructure which is VERY hard to move to the cloud.
      bitcrazed
      • Sad, yet true

        Internet down can mean, "my JRE 1.3 app running on IE 6 connected to the Access 97 database which is mapped to the P: drive on my manager's computer has locked up my workstation. Please call back in 20 minutes when Windows 2000 reboots."
        Tired Tech
  • Fail to plan....and you indeed plan to fail

    I run my business on Amazon Web Services and have my spread across two availability zones for situations exactly like this. I find it humorous that companies think that because it's in the cloud, they would have to architect any differently then if on premise. I would imagine these companies that don't plan for failure are the same IT departments that ran everything on one or two boxes to service 100+ people with no redundancy or disaster recovery. It amazes me the amount of people who still work in IT that have no business being there.

    Everyone on Twitter, Blogs, and in this forum that rattle their sabers that this is the reason they don't trust the cloud should really consider other careers. Kind of reminds me of the guy that still owned a horse whip & buggy store saying cars will never work out.
    seattlereign67
  • Beware the Cloud

    As we have seen, Amazon, Google and Microsoft have all had their share of Cloud Outages. It is the nature of the beast. Too many variables come into play with the cloud that the consumer and cloud provider customers have no control over. It is disconcerting to see so many jumping on the cloud bandwagon, and as more do, we will see more companies/users impacted by the inevitable outages. So Beware the Cloud !
    jpr75_z
  • Zones?

    I'm not sure I quite get it. The customer has to set up and maintain "zones" to ensure availability? Isn't the whole point of moving things to the "cloud" to make sure this is all done automatically and transparently behind-the-scenes? Otherwise, it's not a "cloud." You're basically just outsourcing your server. So, let's just call it "Outsourced Server" instead of a "cloud."
    cmoya
    • Bait and Switch

      The ala carte pricing for services inevitably pushes firms to essentially go with "Cloud Lite" environments. As you infer, IAAS does not a cloud make. Reminds me of the way the original Chevy Aveo was priced at $9,999 - no A/C, roll up windows and golf cart tires. Most everyone walked away spending 12-15k for something bearable to own.

      The cloud providers will endure more bad PR until they price their services appropriately. If they wish to call it "cloud", then include one additional zone as part of the base costs to differentiate between what they provide versus pure IAAS from large ISPs.
      Tired Tech
  • Discouraging news

    It is definitely discouraging to hear such news. As the technological boom takes place, people are switching to internet and other technical gadgets to replace manual functioning. This shows even it has its own disadvantages.

    - Sara
    http://www.hireamobileappdeveloper.com/
    SaraParker23
  • Outage overshadows security threat...

    The outage is a real pain, but frankly AWS provides reasonable reliability and you can distribute your infrastructure to accomodate such events. But I thought I'd mention a security alert that AWS also announced, that got lost in the noise, talking about a new Microsoft vulnerability for Windows instances. The vulnerability lets hackers remotely access and gain control of AWS EC2 instances not properly secured. You can check out the blog post at http://www.dome9.com/blog/vulnerabilities-plague-microsoft-windows-servers-remote-desktop-protocol-rdp-port-3389 to learn more, and get advice on how to secure your EC2 instances.
    meizlik