"The Windows Azure Service Management is / has been down worldwide for about 12 hours or more. Have a look at the Service Dashboard of Windows Azure (if you can reach it): https://www.windowsazure.com/en-us/support/service-dashboard/. It looks like the Service Management works again, but despite that I just see more warnings and errors popping up on the dashboard latest hours...
"I'm really astonished how this can happen world wide, and for such a long time. And glad we don't have anything in production yet (just playing around so far). How reliable and mature is Azure at the moment?"
I can't access the Azure dashboard myself.
Update No. 1 (11:25 am ET): I finally got the dashboard to load. I see it saying that the Management Service is still experiencing an outage worldwide. Compute and Access Control services are experiencing "performance degradation."
But another Azure customer told me that "reports are that 6.7, 28 and 35 percent of users are experiencing problems in the three data centers. Report says they’re investigating the cause of the problem."
I've seen some speculating on Twitter that all of these problems could stem from some kind of Leap Year bug. Microsoft officials said they had an update for me. I will add it to this post once I get it and will continue tracking the issue.
Update No. 2 (12:05 pm ET): Here's an update from a Microsoft Azure spokesperson. Still no word from Microsoft as to what is causing the rolling series of problems:
"On February 28th, 2012 at 5:45 PM PST Microsoft became aware of an issue impacting Windows Azure service management in a number of regions. Windows Azure engineering teams developed, validated and deployed a fix that resolved the issue for the majority of our customers. Some customers in 3 sub regions - North Central US, South Central and North Europe – remain affected. Engineering teams are actively working to resolve the issue as soon as possible We will update the Service Dashboard, hourly until this incident is resolved."
Update No. 3 (12:30 pm ET): Missed this February 29 piece on Data Center Knowledge that says Microsoft officials earlier confirmed that a cert issue (which sounds like it is Leap Year-related) does seem to blame for at least some of what's gone wrong.
Update No. 4 (3:15 pm ET): No new update from Microsoft for the past three hours, but it doesn't look like things are resolved by a longshot.
I am hearing from more and more customers that they are being affected across a variety of Azure services. A new check on the status dashboard is showing SQL Azure Data Sync is down for most of the U.S. Compute is still iffy in North Central and South Central U.S., as well as Northern Europe. Service Bus is down totally in South US. And Service Management is still totally down worldwide. ZDNet UK is likewise monitoring the dashboard and keeping up with the latest service degradation and outage reports across the Azure service stack.
From Laing's update, which noted that even after a fix was applied, some customers still had issues:
"(S)ome sub-regions and customers are still experiencing issues and as a result of these issues they may be experiencing a loss of application functionality. We are actively working to address these remaining issues. Customers should refer to the Windows Azure Service Dashboard for latest status. Windows Azure Storage was not impacted by this issue."
Microsoft plans to share more of its analysis of the root cause of today's outage once it is resolved, Laing added.
Update No. 6: (7:45 am ET on March 1): The dashboard is looking almost all green this morning, with the exception of some ongoing performance degradation in the South Central US region. Looks like it's all systems go for Azure customers.