Whether it's Amazon or Microsoft, there's (still) no foolproof cloud

Whether it's Amazon or Microsoft, there's (still) no foolproof cloud

Summary: Talk about strange timing: Yesterday, I heard from a business user of Microsoft's Windows Azure cloud platform that his company had been taken down by an Azure storage outage earlier this month.

SHARE:

Talk about strange timing: Yesterday, I heard from a business user of Microsoft's Windows Azure cloud platform who said that his company had been taken down by an Azure storage outage that lasted for six hours on April 15.

A day later, the Web is abuzz with news about an Amazon EC2 outage (going on 10 hours as I type this post) that seems to be centered around the company's cloud storage components.

Like Amazon does with AWS, Microsoft maintains visible dashboard pages showing the real-time status of all of its Azure-related components. From the Azure Storage page, it looks like there've been Azure storage problems resulting in "service degradations" on not just April 15 (in the North Central and South Central regions), but also on April 19 (in East Asia and Western Europe).

(click on the image above to enlarge)

I've asked Microsoft for more details about what specifically happened on April 15 that caused the reported downtime and am awaiting word back.

Update (4/22): Microsoft isn't saying much about the outage, other than to acknowledge it happened. The official response, delivered through a company spokesperson:

"At 6:40 AM PDT on April 15th, Microsoft became aware of an issue that affected some customers using the Windows Azure Storage service in the North Central and South Central US regions. This issue has been resolved.  We regret any inconvenience the outage may have caused our impacted customers. As always, we will investigate the cause of this issue and take steps to better ensure it doesn’t happen again."

The user who contacted me -- who asked not to be named -- said he believed there was a misconfiguration during storage deployment that hit both North Central and South Central U.S. at the same time that affected the way the load balancers were sending traffic. The user wanted to know more details about exactly what happened and what Microsoft is doing to head off similar types of problems in the future.

I'm not posting this to downplay what's going on with Amazon's EC2. Nor am I doing so because I've heard Microsoft or Microsoft partners trying to use Amazon's EC2 outage as a way to paint Azure as superior. (In fact, one member of the Azure team tweeted today that he hoped no one at Microsoft would do such a thing.)

Outages and glitches happen across the cloud, not just on the infrastructure side, but on the cloud apps side, too. They're a good reminder about the importance of backup/redundancy and the need to distribute one's cloud storage across multiple geographic locations, if and when possible, as one of my ZDNet UK colleagues tweeted today.

Topics: Windows, Amazon, Hardware, Microsoft, Outage, Storage

About

Mary Jo has covered the tech industry for 30 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

24 comments
Log in or register to join the discussion
  • RE: Whether it's Amazon or Microsoft, there's (still) no foolproof cloud

    make it fool proof, a better fool will be invented. Not my original words, but they are true.
    michiel@...
    • RE: Whether it's Amazon or Microsoft, there's (still) no foolproof cloud

      @michiel@... little bit i am also think that but i research about it and now i am agree with you whole article. <a href="http://www.paperprofs.com/writing-types/book-report/">custom book report</a> | <a href="http://www.paperprofs.com/writing-types/admission-essays/">buy Admission essay</a> | <a href="http://www.paperprofs.com/writing-types/thesis/">buy thesis</a>
      andrewroy
      • RE: Whether it's Amazon or Microsoft, there's (still) no foolproof cloud

        @andrewroy I did not understand anything from what you said. You agree with what?<br><a href="http://www.ofertareonline.ro/ferestre-termopan.html">Termopan</a>
        termopane
  • Banking on cloud is like banking your retirement on Social Security

    Just like you'd be better off managing retirement on your own, you'd be better off going through the due diligence and managing your IT infrastructure on your own.
    LBiege
    • RE: Whether it's Amazon or Microsoft, there's (still) no foolproof cloud

      @LBiege I am looking forward for your next post, I will try to get the hang of it!
      <a href="http://www.riseuniversity.com/schools-majors/business-and-management/">Business management degree</a> <a href="http://www.riseuniversity.com/schools-majors/computer-science/">online computer degree</a>
      disturbforce
  • RE: Whether it's Amazon or Microsoft, there's (still) no foolproof cloud

    Outages and glitches happen in house, too. They may or may not be more problematic or more long lasting than in the cloud. Or less so.
    John Baxter
    • RE: Whether it's Amazon or Microsoft, there's (still) no foolproof cloud

      @John Baxter
      Don't worry, if your machines don't break, your people will get sick. Murphy's Law.
      Robert Hahn
    • In house outages are a different animal entirely.

      @John Baxter
      First off, if the outage is purely an in house problem, then its at least a problem of your own making, so to speak. Of course in house outages may occur due to issues relating to sources you have little or no direct control over but at least you have the sense of feeling that its your problem and you are taking steps to deal with it.

      And there in lies the rub. When you are the one dealing with the problem, it brings an entirely different dynamic to the whole coping strategy and optic as opposed to simply knowing your services are down and hoping to heaven that those out there in control will bring them up again ASAP.

      When its your problem and your the one working on it, the situation brings many things to the table that are important to long term decision making. Firstly, you are likely to get far more informative and timely updates as to the current status of your outage. This among all things is of paramount importance for those in positions of responsibility who make the decisions. It at least brings some confidence to the process of recovery if the answers one is getting indicate that indeed everything that can be done is being done. It also helps to know that those working on the problem for you are working on your problem specifically as opposed to doing things in a way that is perhaps best for the company hosting your services in a more general way even if it means further delays for you specifically.

      Could the in house outage be longer then the cloud outage? Of course, its kind of a crazy question actually. Its like saying what is likely to be worse, falling ill at home or in a hospital? Without any further parameters characterizing the question its almost pointless. You might catch a cold at home but catch some kind of flesh eating disease at a hospital. Or visa versa. Or whatever. A more important point is this. Any company providing reliable cloud based services should by all accounts have numerous backup and fail safe protocols that the average small business just doesn't get into. In those respects it makes the cloud based service generally more reliable when serious issues arise that would need those kinds of things in place. On the other hand, when something goes really bad with one of these big service providers it could be very bad because with the kind of back up they have only the really bad would typically have an impact.

      Its like my father used to say about four wheel drive vehicles, they don't get stuck often, but when they do its a disaster.

      Without an in house backup you are really left to the mercy of the powers that be with cloud computing. Not getting on the spot detailed timely updates on the recovery process and little to no say whatsoever in any remedial plans to avoid the same thing happening in the future. And of course, as an individual person or organization, having little to no priority in your interests over that of the provider or of any of the multitudes that they are servicing. And the fact is, many of the better in house backup plans might well put into question the need for cloud based services at all.

      Cloud; not yet ready for today.
      Cayble
  • Message has been deleted.

    ElasticHosts cloud servers
  • using the cloud is fine IF...

    ...you understand it's limitations and don't put your eggs in one basket. Cloud !=guarenteed uptime. never has, never will. Setting up your own DR and redundancy always was and is still a damned good idea.
    gary@...
  • Message has been deleted.

    mmichalik
    • RE: Whether it's Amazon or Microsoft, there's (still) no foolproof cloud

      @mmichalik - I've read two cloud-based articles. You just cut'n'paste the same stuff, complete with spamming your own website.
      HypnoToad72
      • Message has been deleted.

        mmichalik
      • mmichalik, the offense is from you

        @HypnoToad72 <br>but not in the way you feel it is.<br><br>You are here, with a link in hand to your website, likely for no other reason then to have people click up your page hits, nothing more.<br><br>The offending part is that you feel the intelligence level of those here so low as to not see what it is you are attempting.<br>I assume that your business is not what you wish it to be, so you feel posting here will garner some possible business.<br><br>I realize my words offend you, but it's the truth.<br>
        :|
        Tim Cook
  • &quot;We don't need no steenkin' server outages!&quot;

    Forget about the servers possibly taking down your cloud apps - every bit of wire and hardware between you and the servers can (and eventually will) take you down.
    fairportfan
  • I'll stick to running my own cloud services

    Then when it fails, it's no one's fault but my own - and I can deal with my own paranoia. :)
    TheWerewolf
  • In the end, it's still hardware

    The marketing wonks have done a good job of abstracting away the core reality that cloud-based solutions are hardware based solutions, generally better distributed and fault-tolerant than non-cloud solutions, but not necessarily so. As OS vendors move inexorably toward VM, every group of 2 or more servers will start to become "cloudlike" in their operations, and everyone will have their own cloud (with better privacy and security IMHO). It's in everyone's interest to demystify the mechanics of "clouds" so that we can stop pitting one storage solution against another...as a data preservation company, we spend more time defending different technologies than protecting customer data...it's a big waste of time for us.
    dpsAndrew
  • These outages are all about money not technology.

    I am involved in a very large computing environment that is distrubuted all over the world (over 90 countries) where our business case requires us to failover storage and processing a minumum of 500 miles away. We are not allowed to even strand even a message in a queue for more than 2 seconds or it puts our company a large financial risks. We test this every week and are able to acheive failovers of 350ms with a full SAN and processing switch over and without stranding messages for more than this time.

    Our business case was to save the company huge dollars and it worked well. So the technology is there to do it. It would litterally take a natural disaster to hit all 4 of our major data centers all at once to have the type of outage that Amazon and MS are having.

    However, and this is a BIG however. It was costly to do. The investment was in the billions dollars.

    What these companies are doing is a cost analysis and determining for you what acceptable downtimes are - they are probably right for 90% of their customers - or whatever percent they choose to plan for. They have obviously not designed for the type of business case we have. It is one of the reason I think cloud computing has its place, but it is not the silver bullet everyone is proclaiming. It is like public education where everyone gets taught the same, it does not work for everyone. Imagine the application for my pacemaker monitor going down for 6 hours. I could die before my Dr. is notified of any problem.

    There is no reason Amazon and MS cannot eliminate these outages, except for the cost would probably price them out of the market for the average cloud user.

    You will not see mission critical applications in the cloud unless this HA costs come down, for this reason.
    thebeefman
    • RE: Whether it's Amazon or Microsoft, there's (still) no foolproof cloud

      @thebeefman

      The difference here is your company created in an essence a private cloud which you invested the neccessary capital into and you control it therefore it works as well you designed it. MS - Amazon are operating under the premise how can we do this for as cheap as possible and still attract/keep our customer base. 90% is what they believe customer will tolerate since 1. Most customer cannot afford to implement it or 2. Are lazy and just work rather have the outage and take it in the rear.
      MLHACK
  • Message has been deleted.

    rdefazio