Microsoft: Here's what caused our cloud outage this week
Summary: Microsoft is informing customers hit by its Office 365 cloud outage this week that a "networking interruption" caused the problem, and that the company is planning to offer them a 25 percent credit for their trouble.
Microsoft officials are starting to share some details with customers and partners about what led to several cloud-service outages this week.
On August 17, many North American users of Microsoft Office 365 and SkyDrive were unable to access their email and calendars due to a three-plus-hour outage.
Some Dynamics CRM Online users also experienced service problems that day, but Microsoft execs are not saying the two sets of issues were due to the same root cause. The Dynamics CRM team has declined to provide information on what led to Wednesday's outage or on how many users were affected. (Microsoft officials have said that Microsoft is planning to add CRM Online to the company’s hosted Office 365 suite — which currently includes Microsoft-hosted Exchange, SharePoint and Lync — before year-end.)
Update: The Dynamics team sent this update via a company spokesperson:
“The root cause of the Microsoft Dynamics CRM Online service has been identified as a site configuration issue. A configuration change was made in all data centers that should prevent this from happening again. This was not a complete outage and separate from any other service issue experienced by customers. ”
While not sharing exact details, Microsoft officials are attributing the Office 365 problems to "a networking interruption" in one of its North American datacenters. One of my contacts said he believed faulty Cisco networking gear was the culprit -- something Microsoft a Microsoft spokesperson didn't confirm (or deny) when I asked.
Microsoft sent out notes to Office 365 customers using the affected Microsoft-hosted services on August 18 informing them of their initial findings and plans to credit affected users with 25 percent of their monthly invoices. Here is a copy of the note Microsoft e-mailed to customers:
Dear Customer:
The Office 365 team strives to provide exceptional service to all of our customers. On August 17, customers served from one of our North America data center lost access to email services included in the Office 365 suite. We apologize for the inconvenience this may have caused you and your employees.
We are committed to communicating with our customers in an open and honest manner about service issues and the steps we’re taking to prevent recurrences.
•What happened?
º Preliminary investigation indicates that a networking interruption in one of our North America data centers caused Office 365 Exchange Online to be inaccessible by some customers. º This incident lasted from approximately 11:30 AM PDT to 2:40 PM PDT, during which time customers were not able to access the Outlook Web App or send and receive email through Exchange Online. º The Service Health Dashboard was updated regularly during the event to notify customers of the problem, though there was a brief period of intermittent access issues to that dashboard.
• What actions have been taken to prevent a recurrence?
º The data center’s networking facilities have been remediated and we are investigating the root cause. º We continue to monitor the overall network very closely to maintain high levels of service to customers.
We understand that any disruption in service may result in a disruption to your business. As a gesture of our commitment to ensuring the highest quality service experience Microsoft is proactively providing your organization a credit equal to 25% of your monthly invoice. The credit will appear on a future invoice, and you do not need to contact Microsoft to receive this credit. Please note, processing of the credit may take as long as 90 days.
If you have additional questions, please do not hesitate to contact us at the Office 365 community site.
Thank you for choosing Office 365 to host your business productivity applications. We appreciate your business.
Sincerely,
The Office 365 Team
Microsoft launched Office 365 at the end of June and have on-boarded number of customers and partners since then. Microsoft also has moved some of its existing BPOS (Business Productivity Online Suite) users onto Office 365, but has advised the majority of BPOS users interested in Office 365 to wait until September before the migration process will begin in earnest.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
RE: Microsoft: Here's what caused our cloud outage this week
RE: Microsoft: Here's what caused our cloud outage this week
Two hour, limited interruption...
What's bigger news here is how they responded to it. They not only identified and resolved the issue, but they seem to have communicated the information quickly. The offer to discount the service for the inconvenience is a nice touch too.
I'd also point out...
RE: Microsoft: Here's what caused our cloud outage this week
actually
RE: Microsoft: Here's what caused our cloud outage this week
RE: Microsoft: Here's what caused our cloud outage this week
No outage here to our Data Center's Exchange boxes over the past 3 years. I know alot of companies that have been able to keep their fail-over data centers up continuously over the past few years. Sure individual sites might go down but most can keep the services out of a datacenter up pretty continuously.
RE: Microsoft: Here's what caused our cloud outage this week
I have email clients that haven't been down in years. Not sure where a few hours of downtime each month became the standard. But hey, cloud.
I must say...
How many hours in a month? 28*24= 672 / 4 (25%) = 168 hours / 3 (hours out) = 56 *times* the number of hours out! :)
Can you imagine anyone who pays you 56 times what you lost as a thank you? (chuckle)
RE: Microsoft: Here's what caused our cloud outage this week
EDIT - I had a complete brain fart and did the math wrong. Yaksplat below is correct and the SLA is a 99.9% uptime guarantee. The math should be:
60 min * 24 hrs * 31 days = 44,640 total minutes in August
44,640 min * .001 maximum downtime = 44.64 total minutes max downtime for a 31 day month
@yaksplat - Thanks for the correction.
RE: Microsoft: Here's what caused our cloud outage this week
check your math. .1% downtime = .001 or 44.64 minutes.
Welcome to the cloud, folks.
Agreed!
;)
RE: Microsoft: Here's what caused our cloud outage this week
RE: Microsoft: Here's what caused our cloud outage this week
"Let me explain. No, there is too much. Let me sum up."
It seems like a they suffered form a "single point of failure" (kinda like what happend with Frontier.com webmail where that service was unavaillable for over 14 hours a couple of days ago and affected Frontier customers nation wide) and *that* is not a forgivable error in this day and age where redundancy should be the default, not the exception.
RE: Microsoft: Here's what caused our cloud outage this week
RE: Microsoft: Here's what caused our cloud outage this week
RE: Microsoft: Here's what caused our cloud outage this week
what good is a credit when you are running a business
Office 365 is appropriate for the SMB not larger companies.