Skype has its official response to its nearly two-day outage: A software bug was unearthed after numerous restarts over a Microsoft patch download.
Russell Shaw has more, but here's what Skype had to say:
On Thursday, 16th August 2007, the Skype peer-to-peer network became unstable and suffered a critical disruption. The disruption was triggered by a massive restart of our users’ computers across the globe within a very short timeframe as they re-booted after receiving a routine set of patches through Windows Update.
The high number of restarts affected Skype’s network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.
Normally Skype’s peer-to-peer network has an inbuilt ability to self-heal, however, this event revealed a previously unseen software bug within the network resource allocation algorithm which prevented the self-healing function from working quickly. Regrettably, as a result of this disruption, Skype was unavailable to the majority of its users for approximately two days.
So what are the key learnings here (Techmeme has more):
1. Patch management is a headache. As anyone that reads Ryan Naraine knows patch management is a pain--a monthly one. You download the patches and applications break. The systems and processes you have in place when managing patches are critical. Hopefully, patch management is automated to some degree. And it's not just Microsoft patches. The entire industry patches about the same time. That means you better have a strategy to ease the pain.
2. Skype's reputation as a phone replacement took a hit. Skype did a nice job of keeping people updated during the crisis, but its reliability reputation took a hit. The rub: You really have to wonder if Skype can replace your land line. Is that fair? Maybe not, but at last check my plain old telephone wasn't impacted by patches, algorithms and software bugs. The damn thing just works.
3. Peer to peer isn't perfect. Skype noted that it had self-healing functions, but it stumbled. There's a bit of a debate over whether Skype's outage reflects on P2P. Once you delve into the nitty gritty Skype's outage may not apply to P2P. But as the poster child of P2P Skypes outage will hurt perception.
4. Skype's goals are unclear. If Skype is supposed to be a phone service that could replace a land line this line should probably been edited.
"This disruption was unprecedented in terms of its impact and scope. We would like to point out that very few technologies or communications networks today are guaranteed to operate without interruptions."
Two-day interruptions don't fly at the corporate--or even consumer--level.