This has not been a good month for the Internet's core address system: the Domain Name System (DNS). First, there was a man-in-the-middle attack on numerous Web site users caused by a Turkish cracker. Now, according to Microsoft, many of its online services were disabled by a DNS failure.
At first, some people thought this collapse of Office 365, Hotmail, SkyDrive, and other Windows Live programs might be due to problems with Windows Azure cloud or other Windows server problems. It quickly became apparent though that it was a DNS problem.
Microsoft's senior vice president for Windows Live, Chris Jones, has been keeping users up to date on how the company is handling the problem on the Inside Windows Live blog. By 12:45 AM Eastern time, Jones reported that "We believe we have restored service for all customers at this time. We will continue our investigation into the root cause of these issues and post an update following our investigation. Again we appreciate your patience and apologize for the inconvenience."
While an easy fix, it wasn't an instant fix. At that time, Microsoft had only corrected its DNS problem with their master DNS servers.
As Jones went on to explain to puzzled users, "We're aware of reports including the comments posted below that some customers still are seeing issues. We are working on propagating the DNS configuration changes and so it will take some time to restore service to everyone. Again we appreciate your patience."
Jones was describing was the result of DNS' distributed design. That is, while DNS is a world-wide system, it doesn't have a single master control that allows a change to be made across the globe in seconds. Instead it takes hours for a DNS change, even from a company the size of Microsoft or Google, to reach all the other DNS servers. Thus, so long as the DNS server you use for all your Web addressing needs still had the wrong Windows Live services address information in it, your Web browser or application couldn't easily reach Microsoft's services.
By 2:49 AM, Jones reported that "We have completed propagating our DNS configuration changes around the world, and have restored service for most customers. Depending on your location you may still experience issues over the next 30 minutes as the changes make their way through the network." A quick check with my thousands of Twitter and Google+ friends around the world at 10:45 AM Eastern time reveals that everyone, in my circles at least, were now able to reach Microsoft's online services.
So what happened? We don't know yet. It could have been a cracker getting into Microsoft's own DNS servers and making an unauthorized change. It might have been simply a blunder at Microsoft by a network administrator.
What we do know though is that ,even as we use the Internet more and more for our daily work, the fragility of its fundamental infrastructure is becoming ever more painfully clear.