X
Tech

Microsoft grapples with Windows Azure outage

A component of Windows Azure has experienced a worldwide outage for the past eight hours, preventing customers from carrying out management operations for technology that uses the cloud management service.The worldwide outage of the Windows Azure Service Management technology began at 1:45am GMT on Wednesday and, at the time of writing, Microsoft was in the process of rolling out a fix to deal with the problem.
Written by Jack Clark, Contributor

A component of Windows Azure has experienced a worldwide outage for the past eight hours, preventing customers from carrying out management operations for technology that uses the cloud management service.

The worldwide outage of the Windows Azure Service Management technology began at 1:45am GMT on Wednesday and, at the time of writing, Microsoft was in the process of rolling out a fix to deal with the problem.

Windows Azure Service Management lets customers manage their deployments, hosted services and storage accounts on the cloud platform-as-a-service.

The most recent update by Microsoft on the Windows Azure status page said it had broadened a hotfix patch to cover all sub-regions.

"As we proceed through the rollout, we will progressively enable service management back for customers," the company wrote at 10:30am. "Further updates will be published to keep you apprised of the situation. We apologise for any inconvenience this causes our customers."

At the time of writing Microsoft had not responded to requests for further information on the outage.

Update: 12:02pm GMT

After rolling out the hotfix to deal with the Windows Azure Service Management fault, Microsoft began reporting problems with Windows Azure Compute across North America.

"Incoming traffic may not go through for a subset of hosted services in this sub-region," the company wrote at 10:55am GMT. It assured customers that applications will continue to run and there is no impact on storage.

Update: 12:36pm GMT

ZDNet UK is now unable to access the Windows Azure Service Status dashboard.

Update: 1:54pm GMT

ZDNet UK continues to have trouble accessing the service dashboard. Microsoft has acknowledged requests for further information but not responded to questions.

Update: 2:20pm GMT

The service management issue has been "mitigated" and service has been brought back for the majority of customers, Microsoft wrote on its periodically inaccessible service dashboard at 1:30pm GMT. However, "We still need to work through some issues before we can completely restore service management."

Meanwhile, the company continues to investigate the Azure Compute issue and is in the process of "verifying the most probable cause," it wrote at 1:30pm GMT.

Update: 4pm GMT

Microsoft has started trying to restore Azure Compute service in the North Europe, North Central US and South Central US regions, the company said at 3:30pm, without giving details on what caused the fault.

According to a Microsoft dashboard update at 2:30pm around 37 percent of Azure Compute services in North Europe, 6.7 percent of North Central US and 28 percent of South Central US, were affected by the problems.

Service Management is still unavailable for some customers in the North Central US, South Central US and North Europe regions, Microsoft wrote at 3:30pm.

At some point during the day the Access Control 2.0 service experienced an outage in the North Europe and South Central US regions. A dashboard post at 11:15am said customers were not able to access their ACS namespaces. The post has not been updated since then.

At the time of writing Microsoft had still not been able to raise anyone for ZDNet UK to speak to about the outages.

Update: 4:58pm GMT

Microsoft has acknowledged the issue and admitted that some customers in the North Central US, South Central US and North Europe remain affected.

"On February 28th, 2012 at 5:45 PM PST Microsoft became aware of an issue impacting Windows Azure service management in a number of regions. Windows Azure engineering teams developed, validated and deployed a fix that resolved the issue for the majority of our customers," a Microsoft spokesperson said in a statement delivered to ZDNet UK at 4:55pm, over 12 hours after problems started being flagged on the service dashboard. "Some customers in 3 sub regions — North Central US, South Central and North Europe — remain affected. Engineering teams are actively working to resolve the issue as soon as possible We will update the Service Dashboard, hourly until this incident is resolved."

Update: 5:05pm GMT

One ZDNet UK reader called in to give some details on how the Azure problems affected their business.

"Our live site's been down all day now, so we've been losing money. The address it's on is not resolving, you can't even ping it," Ashley Rudland, who runs a startup travel site hosted on Azure named worldcitycard.com, told ZDNet UK. "Everything I've been told is that the management portal is the only thing that's got down, but the thing is I can go in and see my machines are running in the cloud, they all say they are ready and green and fine, but they're completely inaccessible."

Rudland said he shares an office with a major Microsoft partner that does cloud integration for the public sector and large companies. "Their biggest clients are all offline," he said. "It's ridiculous, it's crazy, you need so much confidence to use these cloud services."

Earlier in the day the Government's G-Cloud application store, CloudStore, was also down due to the Azure problems, though it was brought back online in the late afternoon.

UPDATE 7:50pm GMT

Faults spread across Azure as Microsoft tries to restore service

Failures are propagating across the Azure cloud in America and Northern Europe as Microsoft tries to get its cloud online.

Due to attempting to restore Compute functionality in the North Central US, South Central US and North Europe regions, functionality has been downgraded or squashed on a range of Azure services, including the Windows Azure Marketplace Datamarket in South Central US; SQL Reporting in North Europe; SQL Azure Data Sync across the East Asia, North Central US, North Europe, South Central US, Southeast Asia and West Europe regions; the Service Bus, Access Control & Caching Portal worldwide; the Service Bus in South Central US, and there are continuing problems in Access Control 2.0 across the North Europe and South Central US and Access Control in the South Central US region.

The faults for the above services are reported on Microsoft's flaky Windows Azure Service Dashboard as all having occurred earlier in the day. ZDNet UK has been checking the dashboard since the problems first emerged and can attest to readers that Microsoft appears to be retroactively giving fault information about Azure services — not as they happen.

Microsoft is keeping quiet on the entire thing. "There will not be any briefings around this issue," one of Microsoft's PR companies informed ZDNet UK on Wednesday afternoon.

Update: 11:35pm GMT

Azure customers continue to be affected by the outage as Microsoft attempts to bring the cloud back online.

"Recovery efforts are still underway," for core component Azure Compute, Microsoft wrote on the cloud platform-as-a-service's dashboard at 10:30pm GMT. "Recovery efforts are still underway. Further updates will be published to keep you apprised of the situation. We apologise for any inconvenience this causes our customers."

Update: 11:50pm GMT

Microsoft is 70 percent through the recovery process for Azure Compute in the North Europe, South Central US and North Central US regions, the company wrote at 11.30pm GMT.

Update: 1 March: 10:20am

Microsoft said it restored all service management functionality for customers in the North Europe region at 1:25am. At 3:00am it said "we are working on stabilising the Windows Azure Platform as well as following-up with all customers who were impacted by this incident."

Azure Compute problems in the South Central US and North Central US regions persisted — by 4:30am Microsoft was 85 percent through the recovery efforts for both regions.

By 7:30am recovery efforts were complete in both regions, it said. However "a small number of customers in [both regions] may face long delays during service management operations."

If customers continue to have problems they can contact Microsoft via the support channel described on a company webpage.

Editorial standards