Microsoft apologizes for spate of recent Online Services outages

Microsoft apologizes for spate of recent Online Services outages

Summary: Microsoft officials posted an apology the evening of September 8 for three recent outages for customers of its Microsoft-hosted cloud applications.

SHARE:
TOPICS: Microsoft, Outage
31

Microsoft officials posted an apology the evening of September 8 for three recent outages for customers of its Microsoft-hosted cloud applications.

The first of the North American outages hit Business Productivity Online Suite (BPOS) customers in late August. Two more occurred in early September. (The most recent, which happened at the start of this week, seemed to be focused around Exchange Online, from customer reports I received.)

Morgan Cole, a Director with Microsoft's Online Services team, posted an apology for the outages on Microsoft's Online Services Team blog. Cole also shared additional details about what caused some of these outages. Cole explained:

"Specific to the August 23 event: our proactive efforts to upgrade to next generation network infrastructure caused unforeseen problems that affected access to some services. Operations and Engineering quickly identified a design issue in the upgrade that caused unexpected impact, but the issue resulted in a 2-hour period of intermittent access for BPOS organizations served from North America.

"The August 23 event was remediated, but the solution did not resolve another underlying issue which created subsequent problems on September 3rd and 7th. BPOS customers experienced brief periods of service degradation, primarily affecting the sign-in service and administrative portals. The impact during the afternoon of September 7th had more widespread customer impact, although the duration was relatively short. We performed emergency maintenance to isolate suspect traffic, which has proven successful in stabilizing the service. We continue to monitor the network and all services to ensure stable operations. Needless to say we, like you, find the events unacceptable and have 24/7 efforts underway to ensure we do not have a repeat of these events."

Microsoft has scheduled maintenance for Exchange Online and SharePoint Online in North America this coming Saturday, September 11. The planned maintenance period begins at 4 a.m. GMT and may last through 10 p.m. GMT, company officials have told customers.

There was no mention in Cole's post as to whether Microsoft plans to compensate users affected by the three outages. Microsoft Small Business Specialist Guy Gregory asked the question in the comments section of the blog post:

"Given the 2 hour outage equates to 99.7% for August, will you be honoring your pledge to refund affected users? My understanding was that the 99.9% uptime promise was backed by a money-back guarantee."

I've heard from a few other customers directly via e-mail who are worried about the effect of these kinds of outages on their businesses and those of their customers. One partner mentioned "a huge hit to our credibility from the various outages" in the eyes of its customers, leading him to wonder about the wisdom of migrating to hosted Exchange.

Customers said they wanted and needed more communication from Microsoft about service interruptions -- both when they are happening and afterward. Commentator David Girdner noted:

"What is being done to improve communication when there are issues?  On 9/7, on the Online Services admin site, the Service Status showed services were "Healthy" during a time when the services were not accessible. Additionally, the information provided by the RSS feeds is frustratingly vague and not timely."

Topics: Microsoft, Outage

About

Mary Jo has covered the tech industry for 30 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

31 comments
Log in or register to join the discussion
  • RE: Microsoft apologizes for spate of recent Online Services outages

    Sure, they were down for a few hours, but moving forward means taking on a risk that mistakes will be made as we move a complex system forward. I would rather mistakes be made from time-to-time and lessons learned so we can continue to evolve. Lessons coming out of this incident include producing more real-time communication and a test of the uptime guarantee. If we get better status communication and see the uptime guarantee is fulfilled then I would be more comfortable that things are moving in the right direction. If NASA can't put a ship in orbit without the risk of it blowing up, how can we expect the rest of the world to move forard without the occassional issue?
    colinbowern
    • Take Risks? With thousands of customers running in production?

      @colinbo - The people on the space shuttle understand those risks, the people selling and supporting BPOS do not. They tout the stability, redundancy, and capability of the Microsoft team to manage and support this platform. What we have quickly found over the past 18 months is that it's all smoke and mirrors. There is no redundancy, the contractors they have hired dont know how to support enterprise systems and customers, and it seems they are not telling the full truth when outages occur. If they really could fail to a separate datacenter, or to redundant systems during an outage- why wouldnt they do that when they had these issues? They even removed all RSS posts older than 90 days so new customers cant see the platforms vulgar past performance. Like others, we have had 4 hour outages which the RSS feed shows only 1.5 hours.<br><br><a href="https://rss.microsoftonline.com/feeds.aspx?center=default&chan=notifications&lang=en-us" target="_blank" rel="nofollow"><a href="https://rss.microsoftonline.com/feeds.aspx?center=default&chan=notifications&lang=en-us" target="_blank" rel="nofollow">https://rss.microsoftonline.com/feeds.aspx?center=default&chan=notifications&lang=en-us</a></a> <br><br>You and Loverock must work on the BPOS team or in the MS marketing department. Real companies are effected here, they are paying real money for the Enterprise service they were sold. I do not think it is fair to expect customers to fund their long and drawn out learning curve. It's clear that while Microsoft is great at Developing enterprise software, they are horrible at Supporting it.
      msbrianp
      • And further, an upcoming scheduled 18 hour outage?? They're useless.

        @msbrianp

        "The planned maintenance period begins at 4 a.m. GMT and may last through 10 p.m. GMT, company officials have told customers."

        Wow! Why can't they shift workload to avoid up to 18 hours outage time? And they say they know how to run a cloud? Think about it.
        Plain Logic
    • Well, when you're an also-ran, more room for mistakes

      @colinbo ... not a whole lot of people are going to notice Microsoft's outages.
      HollywoodDog
  • RE: Microsoft apologizes for spate of recent Online Services outages

    Very cool of Microsoft to remedy the situation so quickly. For those asking about compensation, they need to read their contract. I would not cancel the BPOS over this considering the short downtime that it was. Its more like people are just looking for an excuse. If they want more communication they can call Microsoft's technical support.
    Loverock Davidson
    • Are you trying to be funny?

      This is NOT funny.
      OS Reload
    • RE: Microsoft apologizes for spate of recent Online Services outages

      @Loverock Davidson Typical brown nosing comment from Loverock.

      Apparently you don't actually use their services. We do, and it was extremely annoying to lose email, which has happened several times this summer. Especially when in the middle of a project requirement me to email off site employees documents and information.
      Stuka
    • RE: Microsoft apologizes for spate of recent Online Services outages

      @Loverock Davidson
      You are a MS apologist aren't you!!!

      Considering you declared that Apple should not have released Ping because of some failures of iTunes to handle some invite requests when flooded with massive uptake of the new service - yet you say this when MS has many outages affecting many many users.

      I experienced a problem on Monday when obtaining licence key for a user, I gave up after several hours and left it for the user to handle the next day.

      I have not screamed about an outage - I was put out and had time wasted, and the user's time wasted - but I do not make stupid statements like 'MS should not have released office because their online server could not generate the key for some time'.

      You do make such stupid statements about Apple - then try to spin MS's outage as a fast response when they have had multiple outages.

      I used to be in a position where an online outage of 14 seconds was sufficient reason for my clients to ask for a written explanation - and this was in a 3 person start-up.

      MS is not a 3 person start-up - and yet you consider several hours on several occasions to be good response time?

      It clearly wasn't good response time, and they have made mistakes in upgrading services - I give them credit for trying though.

      Look - just pretend it was an Apple outage - and then blast them!!! You'll feel better for it.
      richardw66
      • Do not....

        @richardw66

        try to use logic in discussions with LD. There is none. As a matter of fact, it is best to avoid discussions altogether. LD lives in a parallel universe where everything MS is wonderful and the greatest and everything else sucks.

        His mindless stupidity amuses me - most of the time.

        Maybe he will make a "Look, he mentioned me" post below. That's how imbecile this guy is.
        Economister
  • MS really screwed this one up

    They should have simply blamed AT&T.
    NonZealot
    • No true to form...

      @NonZealot... Monkey boy Ballmer will stomp on the floor, beat his chest, throw a chair at some innocent bystander, and then fires the next person he sees. The same way the Courier project died.
      Snooki_smoosh_smoosh
      • Courier is going to be released in November

        @JM1981

        I bet that Steve Jobs sighed a [b]HUGE[/b] sigh of relief when MS announced that they had canceled Courier. It was a trick though to lull Apple to sleep. Courier is going to be released in November and December iPad sales are going to hit 0. :)
        NonZealot
      • JM1981: If Ballmer

        <i>stomped on the floor, beat his chest, and threw a chair at some innocent bystander</i> Steve Jobs would just say that it's another example of MS copying Apple, as you <b>know</b> that's what Jobs did after the whole iPhone 4 fiasco!

        (and Jobs one upped him as he fired a whole buch of people he saw, starting with Papermaster and ending with some innocent engineering student) ;)
        John Zern
  • Manning up and accepting fault is the right thing to do

    IMO, MS should compensate the businesses affected by this outage even if the contract may not specifically call for it. This would show that MS does care for its customers not just their money.
    DontBeEvil
    • I'm with you

      @DontBeEvil

      I didn't have any clients affected and have had no real issues with BPOS at all. If I were in Enterprise IT and I were looking into BPOS it would be a Hosted/On-Premise setup anyway. I wouldn't rely 100% on any cloud vendor, assuming we're talking enterprise-level configurations. Services like this really shine for SMB because it gives them an affordable way to leverage these technologies. And it delivers better up time than they could achieve with a single server configuration, which many offices have. If I had an office with 100 users then BPOS is a no brainer. If I'm looking at 1000 users it's a different story.

      They should offer a month credit to those who were impacted by the outage. I do think they should have required a case being opened by the company during the outage in order to qualify. Even if we were talking 50k users, a month's subscription is a drop in the bucket compared to the good will it could create.
      LiquidLearner
  • Hrmm..

    "We are sorry; We suck" This is what they have become....

    I just can't wait for the next Mac commercial. LOL
    ctunk
    • Same can be said about Apple as well

      @ctunk
      With their iPhone 4 call dropping debacle, Apple is also in the "We suck" category. The only difference is that Steve Jobs hubris led to an arrogant retort to ask his customer to "hold the phone in a different way", then their greed led them to suggest buying the $30 rubber band from Apple store, having finally exhausted all their antics they agreed to give a non-apology by saying that the cellphone issue has been blown out of proportion but for the benefit of mankind, we are willing to give those rubber bands for free.

      In my book, thats quite sucky
      DontBeEvil
      • RE: Microsoft apologizes for spate of recent Online Services outages

        @DontBeEvil

        What call dropping debacle?

        Oh yes, that's right the one where the bloggers blew a non-problem into a major international issue!!!

        Yes that one!!!

        I have had the supposed problem replicated on an iPhone 4 - by having someone use one in a very marginal signal area and use their palm to cover the phone - just like my very very reliable sony-ericsson.

        So where are you bloggers lambastng Sony-Ericssson for their antennagate issue?

        Thought not!!!

        Get real!!!
        richardw66
      • RE: Microsoft apologizes for spate of recent Online Services outages

        @richardw66 [i]Oh yes, that's right the one where the bloggers blew a non-problem into a major international issue!!![/i]

        Oh, that non-problem that Steve called a conference for, got on stage to pronounce 'we're not perfect', agree to give out iPhone condoms for free.

        That non-problem? ROFL.
        Badgered
      • richardw66

        I have seen the issue with two different iPhone 4 units for myself, there was no blowing anything out of proportion.

        The truth is that it was a real issue that Apple felt the need to address.

        Are you actually asking us to believe that one blogger with a non-issue could actually force Apple to hold a press conference and hand out free bumpers to those affected.

        What that one blogger did was entice people from news orginizations around the world to try their own tets, which found the issue was real, and not repeatable on many competing phones.

        It is you who, without a doubt, the one who must "get real"
        :|
        Tim Cook