Microsoft works to resolve Windows Azure compute issue affecting users worldwide

Microsoft works to resolve Windows Azure compute issue affecting users worldwide

Summary: Microsoft has fixed an issue with its Azure cloud service that affected some users attempting to move a cloud service from staging to production. It's still repairing some related cloud services.

SHARE:
TOPICS: Cloud, Microsoft, Windows
4

Microsoft has fixed an issue with its Windows Azure Compute service that affected users worldwide, starting on Tuesday, and is working to completely resolve other affected cloud services.

windowsazurecomputeproblem

At 2:35 a.m. UTC on October 30 (10:35 p.m. ET on October 29), the Azure status dashboard noted that customers worldwide were experiencing a "partial performance degradation." At 8 p.m. UTC (4 p.m. ET) on October 30, the Windows Azure dashboard noted a problem with Azure Comput that might affect service management operations across the world.

The dashboard explanation noted that "Manual actions to perform Swap Deployment operations on Cloud Services may error, which will then restrict Service Management functions. At this time we advise customers to delay any Swap Deployment operations." Users were still able to run applications and compute; it was the swap deployment function that was affected, officials said.

Via the dashboard, Microsoft provided updates every few hours, though 10:45 a.m. UTC (6:45 a.m. ET) on October 31. At that time this morning, Microsoft's dashboard noted that "compute service management functionality has been restored in all regions." Microsoft Support is working to repair any cloud services affected by the issue, the dashboard update added.

One of those services that seems to be affected by the Compute/Swap issue is Azure Web Sites. An update on the Microsoft Azure dashboard (6:46 a.m. UTC) noted that there's a partial FTP Service interruption that is affecting users of that service worldwide. From the dashboard:

"Web Site customers are advised to publish content using Web Deploy or Git which are fully functional. For details on using these methods, visit Azure.com and search for "Websites with Webmatrix" or "Publishing with Git". We apologize for any inconvenience this causes our customers and will provide an update at 2pm UTC.

The swap functionality on Azure is what customers use to move a cloud service from staging to production. "(W)hen you decide to deploy a new release of a cloud service, you can stage and test your new release in your cloud service staging environment while your customers are using the current release in production. When you're ready to promote the new release to production, you can use Swap to switch the URLs by which the two deployments are addressed," according to Microsoft's explanation of the functionality.

Normally when there are Azure problems, I hear almost immediately from users via Twitter and e-mail. I didn't hear from anyone this time. In fact, the only reason I learned of this latest compute problem was because the official Azure account tweeted about it.

azureswapproblem

That said, the Azure team will no doubt post a post-mortem about this one at some point in the coming weeks.

Update (November 1): The Windows Azure account tweeted that Microsoft finally resolved the Azure Websites FTP access problem around 2 a.m. ET.

Topics: Cloud, Microsoft, Windows

About

Mary Jo has covered the tech industry for 30 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

4 comments
Log in or register to join the discussion
  • Thats the problem with "the cloud"...

    What would be a problem for a single site becomes a problem for the entire business world.

    There is nothing to be gained by putting all the eggs in one basket.
    jessepollard
    • Shrug

      It is a problem with any data center.

      if you host on prem, and your electricity goes out, and your generator won't start up... you're done, as soon as the batteries die.

      What is required is a hard nosed look at whether your uptime is going to be better than Microsoft's (or Amazon's), and balance that against the costs. I think people might find that achieving Azure class uptime is harder than it looks.
      Mac_PC_FenceSitter
  • So deployed apps/sites were uninterrupted, right

    This only affected some administration capabilities, right? Even so, I'm sure it will get over-hyped anyway as a reason Azure is so much worse than trying to do it all yourself.
    scH4MMER
    • Cloud Atlas Shrugged

      I'm theorizing that cloud admin services will never be as indestructible as the clouds they manage -- you can't have a cloud manage a cloud (or can you?). Even if all the compute instances are globally distributed and redundant, to administer them you need a controller that talks to them all - but it's of key importance that a problem in that layer does not cause an end-user service interruption. (although long or ill-timed admin outages could be catastrophic on a case-by-case basis, such as when you're half-way through a deployment)
      scH4MMER