X
Innovation

Office 365 outage Thursday night

Any major cloud service outage is bad, but good communications can make it less stressful. Microsoft did OK with last night's Exchange Online outage.
Written by Larry Seltzer, Contributor

My personal domain is run on Office 365. I've generally been very happy with it, but last night was one of those times that makes for unhappy customers: a major outage. 

Late in the evening email just stopped working.  Multiple computers and devices on different networks were all timing out on Exchange Activesync access so it seemed pretty clearly to be the service. I also couldn't log into the Office 365 web portal.

At this point, my first thought was to check the Office 365 Twitter feed; surely Microsoft would post notice of a major outage there. But there was no sign of anything wrong, so I asked.

The Office Twitter folks seemed to know nothing about it, but another user told me there was a "major outage". At this point I thought to check the service status page for my account. Below are the content for the full incident last night. When I first checked, only the bottom 2 or 3 statuses were up.

INCIDENT

DATE AND TIME

STATUS

DETAILS

EX3459

Aug 30, 2013 12:32 AM

Service restored

Closure Summary: On Friday, August 30, 2013 at approximately 12:50 AM UTC, Microsoft identified an issue where some customers served from the Americas may have experienced problems connecting to the Exchange Online Service. Affected users were unable to access email via Outlook, OWA, and mobile devices. Investigation determined that upgrades to the environment had impacted domain controller health, causing an unexpected outage. Engineers implemented a fix to restore domain controller health across the environment, and then confirmed all Exchange services were restored. The issue was successfully fixed on Friday, August 30, 2013 at 3:45 AM UTC. A complete post-incident report will be available on the Service Health Dashboard within five business days.

 

Aug 29, 2013 11:35 PM

Restoring service

Microsoft has identified an issue where some customers served from the Americas may be experiencing problems connecting to the Exchange Online Service. Affected users may be unable to access email via Outlook, OWA, and mobile devices. Engineers have identified that impact was caused by upgrades to the environment, and are implementing a fix to restore service.

 

Aug 29, 2013 10:40 PM

Service degradation

Microsoft has identified an issue where some customers served from the Americas may be experiencing problems connecting to the Exchange Online Service. Affected users may be unable to access email via Outlook, OWA, and mobile devices. Engineers have identified that impact was caused by upgrades to the environment, and are currently working on steps to restore service.

 

Aug 29, 2013 9:40 PM

Service degradation

Microsoft has identified an issue where some customers served from the Americas may be experiencing problems connecting to the Exchange Online Service with multiple protocols, including Outlook and OWA. Engineers are currently investigating the issue.

 

Aug 29, 2013 9:27 PM

Investigating

We are investigating a service alert. Multiple protocols may be down impacting access to the Exchange Online Service. We will provide more information shortly.

I'm pretty sure that the times in the table are east coast times (because I'm east coast and the table is generated for my account). Based on the final (top) status, the outage lasted almost 3 hours and was caused by some unspecified upgrade. Let's call it 3 hours. At 99.9% service, that would meet the contract for a 3000 hour period, which is 125 days. I'll have to re-read the SLA to see if they owe me something.

Overall, I'm not especially mad. These things happen in any sufficiently large and complex environment. I also don't know how big the outage was; it could have been relatively few users and I was just unlucky enough to be one of them.

But I am disappointed at Microsoft for their outreach on this. I can't be too mad because they did have the neat status page I embedded above, but I had to remember to go check that. Microsoft has an alternate email for me as part of my profile; the stated purpose of it is password recovery, but it would be useful in these cases. I'd be surprised if Microsoft had no idea which users were affected.

Even if they didn't know which users were affected, the Twitter feed would be a good place to note that there were outages and to link to a page with more detail. To my embarrassment I discovered this morning that they do have a field for just this, @Office365Status. It's all in there.

So the more I look into this, the less I have to complain about.  This sort of outage is rare, but inevitable, so what matters is how you handle it. Microsoft handled it well enough.

Editorial standards