Outages beyond the recent Amazon event

Summary: The Amazon outage wasn't the first, nor will it be the last, outage of a major company's IT environment. Few internal outages ever get publicized. Should the focus of this event change?

My good friend, Vinnie Mirchandani (see Deal Architect) posted last week a piece re: the Amazon outage.

In his short piece, he suggested that the cries for more transparency from Amazon might better be focused on getting more transparency from many other firms, not just cloud providers.  Vinnie stated:

"The other thing that may surprise folks is the last time many on-premise data centers ran a full disaster recovery drill. They have their own disasters and plenty of down time - they just are not that public or reported in blogs, newspapers or Twitter."

His timing for this post was eerie for me as I had lunch with the top IT exec of a manufacturer the previous day. This individual recounted for me the lost four days he had spent working with a security firm trying to get numerous desktop computers operational again. Apparently, some nasty virus had snuck into several parts of their enterprise. The IT group spent days rebuilding many machines at several locations. In short, they suffered downtime and lost worker productivity. The good news was that none of their ERP systems were impacted.

Vinnie's right - few outages actually get press coverage. Usually, when outages are made public, they often are on shared systems, like cloud or outsourcing sites. Few companies really see any upside with going public when their internal systems fail.

Moreover, many (not all) failures, from the anecdotal accounts I'm aware of, often involve equipment failures and not some sort of IT malfeasance or neglect. If that's true, then failures can happen to anyone. It's not a phenomenon that only affects a cloud provider like Amazon.

What is worth publicizing is the way that all companies should handle these outages. Let's see a frank assessment of how well each company worked its way through the downtime. Let's see more sharing of best practices and more discussion around the successful remedies that minimized downtime. And, finally, let's have more discussion around how to eliminate these outages all together.

Topics: Amazon, Outage

About

Brian is currently CEO of TechVentive, a strategy consultancy serving technology providers and other firms. He is also a research analyst with Vital Analysis.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

42 comments
Log in or register to join the discussion
  • RE: Outages beyond the recent Amazon event

    Failuers can happen to anyone. Bigger systems Bigger failures. Reliable until it fails.
    ...remeber the little power outage that the east coast of the country had not so long ago.......that was a power cloud failing :)

    What did they use to say about your eggs in a basket?
    Now everybody's is in it.
    Tell me how that is a good thing again.
    Chris S
    • RE: Outages beyond the recent Amazon event

      @Chris S I mean this is just plain wrong. Yes the AWS event means people will think long and hard about their architecture. Yes, so<a href="http://vb.maas1.com/">m</a>e en<a href="http://www.tran33m.com/vb/">t</a>erprises that were toying with the idea of public cloud might pull back for awhile. Yes private cloud providers will use the event ad infinitum to justify private versus public but lets be a little realistic, it doesnt spell the end of the cloud.
      alasiri8
      • RE: Outages beyond the recent Amazon event

        I think the first important thing to do is by securing the cloud data. A big cloud provider like Amazon can analyze their data that has been collected during the outages. This will help them to identify any week processes. Which one is the root cause of the outages. Is it the human error, hardware or internal documentation. Finally, as a <a style="color: #252525;font-weight:normal; text-decoration:none!important; background:none!important; text-decoration:none;" href="http://www.naturalantiinflammatory.org">natural anti inflammatory</a> quality control person, I think Amazon or other cloud services providers need to implement new best practices, system, controls, and also safeguards to improve and limit the outages.
        berna1
      • RE: Outages beyond the recent Amazon event

        I really like your articles because they are informative and intriguing at the same time....And after reading this, I came to liek this part "Vinnie???s right - few outages actually get press coverage. Usually, when outages are made public, they often are on shared systems, like cloud or outsourcing sites. Few companies really see any upside with going public when their internal systems fail.

        Moreover, many (not all) failures, from the anecdotal accounts I???m aware of, often involve equipment failures and not some sort of IT malfeasance or neglect. If that???s true, then failures can happen to anyone. It???s not a phenomenon that only affects a cloud provider like Amazon" more power

        <a href="http://www.rentalprotectionagency.com/tenant-screening.php">Tenant Screening</a>
        apollosan
      • RE: Outages beyond the recent Amazon event

        What is worth publicizing is the way that all companies should handle these outages. Let???s see a frank assessment of how well each company worked its way through the downtime. Let???s see more sharing of best practices and more discussion around the successful remedies that minimized downtime. And, finally, let???s have more discussion around how to eliminate these outages all together. <a href=http://xbox360steeringwheel.org>Xbox 360 steering wheel</a>
        richard8990
      • RE: Outages beyond the recent Amazon event

        Without a doubt this is an enterprise-level bit of kit and while it has features that will appeal to consumers, I think that the price is likely to put it out of their reach. <a href=http://ithermalunderwear.com>thermal underwear</a>
        Jlambert011
    • RE: Outages beyond the recent Amazon event

      @Chris S Very true.. Sometimes no matter what safeguards are put in place power failures will happen and cause major disruptions. So is the way of the world!
      <a href="http://www.barskareviews.com">barska reviews</a>
      krtinberg
      • RE: Outages beyond the recent Amazon event

        Very true. There needs to be more transparency for sure. I agree with all of the above. <a href="http://www.smallgraymatters.com">articles</a> <a href="http://www.uk-djs.net">dj hire</a> <a href="http://www.mp3kick.com">free mp3</a> <a href="http://www.gigsource.co.uk">entertainers</a>

        Best regards, MadClive
        madclive
      • RE: Outages beyond the recent Amazon event

        @krtinberg
        Yeah, I agree. But at least there could something be done to prevent it, though not all can be prevented. Hopefully, when it reach its peak, this can be more secure or perhaps another version of it which is more solid...
        <H1><a href="http://mavericksuccesssecrets.com/dietsolution/">Diet Solution</a></H1>
        tobiastracen
    • RE: Outages beyond the recent Amazon event

      @Chris S I agree with you that failures can happen to anyone at anytime. Should these outages be publicized more frequently or are these companies trying to hide? <a title="Austin SEO" href="http://extremesocialmedia.org/">Austin SEO</a>
      esm2012
    • RE: Outages beyond the recent Amazon event

      .
      berna1
    • RE: Outages beyond the recent Amazon event

      @Chris S Good point mate, this is so true, bigger systems, bigger failures <a href="http://www.drstevenjwhitereviews.com/">Dr Steven J White Reviews</a>
      <a href="http://www.drsharonpackerreviews.com/">Dr Sharon Packer Reviews</a>
      <a href="http://www.drsydneycolemanreviews.com/">Dr Sydney Coleman Reviews</a>
      ripslyme00
    • RE: Outages beyond the recent Amazon event

      @Chris S <br>This comment seems to be a bit off to me. Inasmuch as I would like to think the above reasoning is correct, the force of logic compels me to think otherwise.<a href="http://www.findacellphoneuser.com/">Reverse Cell Phone Lookup</a>|<a href="http://www.outdoorlightsgalore.com/">Landscape Lighting</a>|<a href="http://www.outdoorlightsgalore.com/landscape-lighting/outdoor-flood-lights">Outdoor Flood Lights</a>
      john7334
      • RE: Outages beyond the recent Amazon event

        Whoa! It seems that I learned so many things after reading your article! I specially find this part intriguing "Vinnie???s right - few outages actually get press coverage. Usually, when outages are made public, they often are on shared systems, like cloud or outsourcing sites. Few companies really see any upside with going public when their internal systems fail.

        Moreover, many (not all) failures, from the anecdotal accounts I???m aware of, often involve equipment failures and not some sort of IT malfeasance or neglect. If that???s true, then failures can happen to anyone. It???s not a phenomenon that only affects a cloud provider like Amazon"...I hope to see more...

        <a href="http://www.rentalprotectionagency.com/tenant-screening.php">Tenant Screening</a>
        apollosan
    • RE: Outages beyond the recent Amazon event

      That was a very interesting read. I look forward to checking back in the future to see if any relevant content has been added, thank you for making this available to us. <strong><a href="http://www.my-oklahomacitychiropractor.com.com">Oklahoma City chiropractor</a></strong>
      epark732
    • RE: Outages beyond the recent Amazon event

      @Chris S

      ???The other thing that may surprise folks is the last time many on-premise <strong><a href="http://learnviolinonlinehq.com/">learn violin online</a></strong> data centers ran a full disaster recovery drill. They have their own disasters and plenty of down time - they just are not that public or reported in blogs, newspapers or Twitter.???

      And this is exactly <strong><a href="http://glaucomaeyedrops.com/">glaucoma eyes drops</a></strong> where I see the greatest issue! The newspapers pretty much decide everything these days! I mean, seriously. They control what people hear. If a newspaper does not like Apple, they can write something bad about Apple even though <strong><a href="http://brighteyesdrops.com/">bright eyes drops</a></strong> the same issue happened to Microsoft and Google for example.
      runeklan
  • RE: Outages beyond the recent Amazon event

    How about Amazon make it not prohibitively expensive to span an application across zones? Diversify the storage and processing across zones to reduce the impact of one zone going down.

    Oh, and I love the statement "And, finally, let?s have more discussion around how to eliminate these outages all together." It made me giggle.
    FizzyOrangeDrink
    • RE: Outages beyond the recent Amazon event

      @FizzyOrangeDrink If thats true, then failures can happen to anyone. Its not a phenomenon that only affects a cloud provider like Amazon.<br><br><a href="http://spas-sanfrancisco.com">San Francisco Spa</a>

      <a href="http://www.sanfrancisco-weddingphotographers.com">san francisco wedding photographers</a>
      lasvegasbacon
    • RE: Outages beyond the recent Amazon event

      @FizzyOrangeDrink <br>This is the "think out of the box" suggestion, diversify the processing is the way to go. Unless there is major disaster, nation wide outage not likely can happen. <a href="http://dog-food-coupons.net/">Dog Food Coupon</a>
      walterJR
  • RE: Outages beyond the recent Amazon event

    A few outages actually get press coverage. Usually, when outages are made public, they often are on shared systems, like cloud or outsourcing sites. Few companies really see any upside with going public when their internal systems fail<br><br><a href="http://www.sanfranciscodentistdds.com">San Francisco Dentists</a>
    <a href="http://www.thesanfranciscolocksmith.com">san francisco locksmith</a>
    lasvegasbacon