Microsoft's BPOS cloud customers hit by multi-day email outage

What the heck has going on with Microsoft's Business Productivity Online Services (BPOS) platform this week?
Written by Mary Jo Foley, Senior Contributing Editor

What the heck has going on with Microsoft's Business Productivity Online Services (BPOS) platform this week?

Microsoft officials warned customers of its BPOS bundle of hosted Exchange, SharePoint and Lync on Monday, May 10, that an upgrade of Exchange Online was slated to begin on May 12. Microsoft didn't tell users to expect any downtime as a result of the upgrade. Here's the update customers got on Monday:

But it seems something went wrong before May 12's upgrade ever began. Users are reporting there have been intermittent, multi-hour Exchange Online outages on May 10, 11 and 12 at different times in North America, Europe and Asia.

My ZDNet colleague Larry Dignan noted user reports in the Microsoft Online Tech Services forum of multiple BPOS outages over the past few days.

From what I can tell from talking to a handful of users, the outage seems to be Exchange Online specific, and to have affected customers for up to four hour periods over the past few days.

I've asked Microsoft for more details as to what went wrong and when the problem was expected to be ultimately resolved. I received this official statement from a company spokesperson, and was told this was all the Softies had to say:

"On May 12 starting at 9:10AM PDT, some BPOS-Standard customers served from the Americas region began experiencing delays sending and receiving e-mail. Microsoft Operations and Engineering teams are actively working the incident, and we are communicating with customers via our normal incident communication channels. We sincerely apologize to our customers for any inconvenience this incident may have caused them."

Shortly after I received that statement, the spokesperson sent me an update:

"This is a short update on work underway to resolve problems that have occurred with the Exchange Online Service on May 12 2011 and the actions that the team is taking to resolve these problems.

"Starting at 9:10am PDT, service monitoring detected malformed email traffic on the service. This malformed email traffic resulted in problems sending and receiving email until 10:03am PDT, when the problem was rectified. The offending mail was removed from the service, and service restored.  Email was delayed by ~45minutes during this time.

"A second issue was detected via monitoring at 11:35am PDT, with email stuck in end users outboxes. The issue was remediated at 12:04pm PDT. During this time, more than 1.5 million messages had queued on the service awaiting delivery.  This email is now flowing through the system, however because of this large volume of email, we are experiencing delays of as long as 3 hours.

"The team continues to work to fully resolve the issue, and will provide a full post mortem of this incident following service restoration, and also will provide additional updates on how our service level agreement (SLA) was impacted."

Microsoft declined to acknowledge or comment about May 10 or 11 problems reported by users.

Meanwhile, I saw this update from the Microsoft Online account on Twitter around 3 p.m. ET on May 12:

The dashboard referred to in the tweet above is visible only to paying BPOS customers. One of those North American customers sent me a picture that shows problems have been happening over the past three days:

(click on image above to enlarge)

"There have been various outages over last 48 hours. Not a good situation," said one Microsoft partner with whom I spoke. "Microsoft does not have a formal response ....if they had isolated the problem they would have issued a response. When this happened previously they responded quickly with the cause and SLA (service level agreement) remedy."

The Microsoft spokesperson said the team also declined to say whether the Exchange Online problems were related in any way to Microsoft moving from BPOS to Office 365,the successor to BPOS. The Office 365 launch has been rumored to be set for early June 2011.

The last widespread BPOS outages occured in the fall of 2010.

Anyone out there know more on what caused earlier BPOS issues this week?

Update: Around 9 p.m. ET on May 12, Corporate Vice President of Online Services Dave Thompson posted a more detailed explanation of the problems that hit Exchange Online, acknowledging that there were problems on Tuesday, as well. Thompson also apologized to customers in his blog post. Several of the commentators asked why Microsoft's health-status dashboard is password-protected, rather than freely visible to the public.

I've heard from several BPOS customers since my post that they've had serious service degradation complaints about Exchange Online for several weeks running and have not received much in the way of explanation from Microsoft support about what was going on.

Editorial standards