Microsoft: Here's what caused our cloud outage this week

Microsoft: Here's what caused our cloud outage this week

Summary: Microsoft is informing customers hit by its Office 365 cloud outage this week that a "networking interruption" caused the problem, and that the company is planning to offer them a 25 percent credit for their trouble.

SHARE:
TOPICS: Outage, Microsoft
51

Microsoft officials are starting to share some details with customers and partners about what led to several cloud-service outages this week.

On August 17, many North American users of Microsoft Office 365 and SkyDrive were unable to access their email and calendars due to a three-plus-hour outage.

Some Dynamics CRM Online users also experienced service problems that day, but Microsoft execs are not saying the two sets of issues were due to the same root cause. The Dynamics CRM team has declined to provide information on what led to Wednesday's outage or on how many users were affected. (Microsoft officials have said that Microsoft is planning to add CRM Online to the company’s hosted Office 365 suite — which currently includes Microsoft-hosted Exchange, SharePoint and Lync — before year-end.)

Update: The Dynamics team sent this update via a company spokesperson:

“The root cause of the Microsoft Dynamics CRM Online service has been identified as a site configuration issue. A configuration change was made in all data centers that should prevent this from happening again. This was not a complete outage and separate from any other service issue experienced by customers. ”

While not sharing exact details, Microsoft officials are attributing the Office 365 problems to "a networking interruption" in one of its North American datacenters. One of my contacts said he believed faulty Cisco networking gear was the culprit -- something Microsoft a Microsoft spokesperson didn't confirm (or deny) when I asked.

Microsoft sent out notes to Office 365 customers using the affected Microsoft-hosted services on August 18 informing them of their initial findings and plans to credit affected users with 25 percent of their monthly invoices. Here is a copy of the note Microsoft e-mailed to customers:

Dear Customer:

The Office 365 team strives to provide exceptional service to all of our customers. On August 17, customers served from one of our North America data center lost access to email services included in the Office 365 suite. We apologize for the inconvenience this may have caused you and your employees.

We are committed to communicating with our customers in an open and honest manner about service issues and the steps we’re taking to prevent recurrences.

•What happened?

º Preliminary investigation indicates that a networking interruption in one of our North America data centers caused Office 365 Exchange Online to be inaccessible by some customers. º This incident lasted from approximately 11:30 AM PDT to 2:40 PM PDT, during which time customers were not able to access the Outlook Web App or send and receive email through Exchange Online. º The Service Health Dashboard was updated regularly during the event to notify customers of the problem, though there was a brief period of intermittent access issues to that dashboard.

• What actions have been taken to prevent a recurrence?

º The data center’s networking facilities have been remediated and we are investigating the root cause. º We continue to monitor the overall network very closely to maintain high levels of service to customers.

We understand that any disruption in service may result in a disruption to your business. As a gesture of our commitment to ensuring the highest quality service experience Microsoft is proactively providing your organization a credit equal to 25% of your monthly invoice. The credit will appear on a future invoice, and you do not need to contact Microsoft to receive this credit. Please note, processing of the credit may take as long as 90 days.

If you have additional questions, please do not hesitate to contact us at the Office 365 community site.

Thank you for choosing Office 365 to host your business productivity applications. We appreciate your business.

Sincerely,

The Office 365 Team

Microsoft launched Office 365 at the end of June and have on-boarded number of customers and partners since then. Microsoft also has moved some of its existing BPOS (Business Productivity Online Suite) users onto Office 365, but has advised the majority of BPOS users interested in Office 365 to wait until September before the migration process will begin in earnest.

Topics: Outage, Microsoft

About

Mary Jo has covered the tech industry for 30 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

51 comments
Log in or register to join the discussion
  • RE: Microsoft: Here's what caused our cloud outage this week

    the 25% credit is standard for Office365 - paying it without customers having to ask for it isn't...
    mary.branscombe
    • RE: Microsoft: Here's what caused our cloud outage this week

      @mary.branscombe That is not true, automatic renumeration is part of the SLA. Other cloud providers require you to ask for a refund and set a time limit.
      Your Non Advocate
  • Two hour, limited interruption...

    ...really isn't that catastrophic, considering that most businesses hosting things on-site have that kind of downtime a lot more frequently. Plus, hardware failure is inevitable once in a while.

    What's bigger news here is how they responded to it. They not only identified and resolved the issue, but they seem to have communicated the information quickly. The offer to discount the service for the inconvenience is a nice touch too.
    GoodThings2Life
    • I'd also point out...

      ... that since it only affected access to Outlook Web App, and send/receive new messages, that anyone using ActiveSync connections on their phone as well as Outlook client still had access to most of their information... an option certain other services wouldn't have had.
      GoodThings2Life
      • RE: Microsoft: Here's what caused our cloud outage this week

        @GoodThings2Life - The outage did not just affect the Outlook Web App; the regular Outlook connection, which uses Outlook Anywhere (RPC over HTTP), and ActiveSync connections were down as well. Granted any old data was available to those users, but new mail could not be sent or received in any way as all connections to the Exchange portion of Office 365 were down. As for your comment about "other services," I assume this is a thinly veiled reference to Google Apps. While I agree that Office 365 is a better overall service, customers of the paid version of Google Apps get ActiveSync access and Outlook syncing using the Google Apps Outlook Sync application, so they would have the same access to old data as Office 365 users do in an outage situation.
        reidjim76
      • actually

        @GoodThings2Life anyone running Outlook and ActiveSync with Google Apps would have had that option as well.
        @...
    • RE: Microsoft: Here's what caused our cloud outage this week

      @GoodThings2Life - The outage was actually a little over 3 hours, not 2. Also, the frustrating thing was that the "Service Health" console showed that everything was fine for the first hour of the outage, before finally being updated to show a problem. However, what really was bad for me personally was that I had just migrated a client from in-house Exchange to Office 365 over last weekend and then, just three days later, I was left answering questions about the viability of the service and my recommendation of it. When I forwarded my client the Microsoft apology letter this morning, the CEO responded that maybe Microsoft should change the name to Office 364.99.
      reidjim76
    • RE: Microsoft: Here's what caused our cloud outage this week

      @GoodThings2Life

      No outage here to our Data Center's Exchange boxes over the past 3 years. I know alot of companies that have been able to keep their fail-over data centers up continuously over the past few years. Sure individual sites might go down but most can keep the services out of a datacenter up pretty continuously.
      ploco@...
    • RE: Microsoft: Here's what caused our cloud outage this week

      @GoodThings2Life
      I have email clients that haven't been down in years. Not sure where a few hours of downtime each month became the standard. But hey, cloud.
      FarVision
  • I must say...

    ...a credit for 25% of the monthly fee for a *three hour* outage is very generous.

    How many hours in a month? 28*24= 672 / 4 (25%) = 168 hours / 3 (hours out) = 56 *times* the number of hours out! :)

    Can you imagine anyone who pays you 56 times what you lost as a thank you? (chuckle)
    wolf_z
    • RE: Microsoft: Here's what caused our cloud outage this week

      @wolf_z - Office 365 has a 99.99% uptime guarantee*, so after 446.4 minutes of downtime in a 31 day month, they are required to give a 25% discount. So while this outage was only 190 minutes and Microsoft is giving the discount proactively instead of waiting for customers to request it, I wouldn't say it was "very generous," especially considering they had already had a partial Exchange outage just 5 days earlier for a short amount of time. They made a uptime guarantee, so they have to live up to it.<br><br>The math goes:<br>60 min * 24 hrs * 31 days = 44,640 total minutes in August<br>44,640 min * .01% maximum downtime = 446.4 total minutes max downtime for a 31 day month<br><br>* - <a href="http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=8094" target="_blank" rel="nofollow"><a href="http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=8094" target="_blank" rel="nofollow">http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=8094</a></a>

      EDIT - I had a complete brain fart and did the math wrong. Yaksplat below is correct and the SLA is a 99.9% uptime guarantee. The math should be:

      60 min * 24 hrs * 31 days = 44,640 total minutes in August
      44,640 min * .001 maximum downtime = 44.64 total minutes max downtime for a 31 day month

      @yaksplat - Thanks for the correction.
      reidjim76
      • RE: Microsoft: Here's what caused our cloud outage this week

        @reidjim76
        check your math. .1% downtime = .001 or 44.64 minutes.
        yaksplat
  • Welcome to the cloud, folks.

    I'm sure that now the problem has been fixed, this will never happen again. (chuckle)
    Userama
    • Agreed!

      @Userama
      ;)
      William Farrell
  • RE: Microsoft: Here's what caused our cloud outage this week

    A little sarcasm here? Since this is "the cloud", how could a network interruption at one data center bring down a service for any length of time? Shouldn?t users have just be redirected to a different data center?
    thensley@...
    • RE: Microsoft: Here's what caused our cloud outage this week

      @thensley@...

      "Let me explain. No, there is too much. Let me sum up."

      It seems like a they suffered form a "single point of failure" (kinda like what happend with Frontier.com webmail where that service was unavaillable for over 14 hours a couple of days ago and affected Frontier customers nation wide) and *that* is not a forgivable error in this day and age where redundancy should be the default, not the exception.
      PollyProteus
    • RE: Microsoft: Here's what caused our cloud outage this week

      @thensley@... <br><br>So a MicroSoft product works some times and not at others due to what they describe as something they had no control over. Is there news here? I think I would feel safer trusting my business with a service that always worked and had several layers of redundancy. This s not an Xbox, this is serious business.
      john_gillespie@...
  • RE: Microsoft: Here's what caused our cloud outage this week

    There goes their promise of 99.98% uptime ^^
    Ambiorix2
    • RE: Microsoft: Here's what caused our cloud outage this week

      @Ambiorix2 Not really, it is infact the missing 0.02% time is for events like these!
      cbrcoder
  • what good is a credit when you are running a business

    availability of the system is much more important than an advertised SLA or a credit. lets face it, Microsoft wants to be a services company and the reality is they have priced office365 where they have because they simply dont have the proven track record of delivering software services for enterprises. The trade off of the lower price is spotty performance.

    Office 365 is appropriate for the SMB not larger companies.
    smtp22