Skype blames Patch Tuesday PC reboots for outage
Summary: Skype is blaming last week's two-day outage on millions of Windows machines restarting after the installation of Microsoft's security patches.
(See update below)
Skype is blaming last week's two-day outage on millions of Windows machines restarting after the installation of Microsoft's security patches.
The massive number of reboots caused a flood of log-in requests (the Skype default is to login at reboot), causing "a chain reaction that had a critical impact."
In a note posted on the Skype home page, the eBay-owned company that the peer-to-peer network that powers the Internet phone service has a self-healing component that failed because of a software bug.
[This] event revealed a previously unseen software bug within the network resource allocation algorithm which prevented the self-healing function from working quickly. Regrettably, as a result of this disruption, Skype was unavailable to the majority of its users for approximately two days.
The issue has now been identified explicitly within Skype. We can confirm categorically that no malicious activities were attributed or that our users’ security was not, at any point, at risk.
The Windows Update explanation seems a bit bizarre. After all, Microsoft has been delivering automatic updates (and simultaneous reboots) every month since 2003. Something still isn't adding up.
[UPDATE: August 21, 2007 @ 10:46 AM] Skype has posted another explanation to clarify the Microsoft Patch Tuesday connection and explain why this never happened before:
2. What was different about this set of Microsoft update patches?
In short – there was nothing different about this set of Microsoft patches. During a joint call soon after problems were detected, Skype and Microsoft engineers went through the list of patches that had been pushed out. We ruled each one out as a possible cause for Skype’s problems. We also walked through the standard Windows Update process to understand it better and to ensure that nothing in the process had changed from the past (and nothing had). The Microsoft team was fantastic to work with, and after going through the potential causes, it appeared clearer than ever to us that our software’s P2P network management algorithm was not tuned to take into account a combination of high load and supernode rebooting.
3. How come previous Microsoft update patches didn’t cause disruption?
That’s because the update patches were not the cause of the disruption. In previous instances where a large number of supernodes in the P2P network were rebooted, other factors of a “perfect storm” had not been present. That is, there had not been such a combination of high usage load during supernode rebooting. As a result, P2P network resources were allocated efficiently and self-healing worked fast enough to overcome the challenge.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
Doesn't ring true
Surely if this was the simple cause then there would have been a slim chance that each individual login attempt would be processed by the server. This chance should increase with each subsequent successful attempt until service resumed as normal.
I can't believe that the login process is so much more demanding than the polling that occurs when the app is logged in that it could take two daysw to clear the backlog of requests.
I agree with you Ryan, this just doesn't add up.
Yes it does!
I smell something rotten here.
Skype is lying.
Why not?
Umm, because it was twoi days later...
I hate to say this
Or if they had a backup plan it sure as heck didn't seem like it.
imPOSSIBLE that there was a problem related to MS
MS makes trash software. Skype probably uses it!
LOL
NATG defending MS.
What a hoot!
BS!
skype better wake up fast
every company has minotr flaws in their code that end up causing massive problems . even the greats MS,google,oracle.
It is usually thought in dev cycle as that one area that we dont have time we will get back to it and by the end of the development cycle there is wars going on and everybody hates one an other and there is no time left to go back and fix things that were not considered important in the first place.
It is just the matter sometimes you don't want to over engineer and make a problem more complicated than it is. For skype the important thing is for this not to happen again. This happening once is understandable for any developer who has worked in a comercial world where schedules are tight and deadlines must be met.
Skype has had a couple of instances of miss haps recently and this better not continue or they will get the repuation of unreliable.
Ummm..
Something already smells funny.
Add to that the liklihood that "millions of users" are simultaneously rebooting their systems, and they all have the same connection speed, so they are all logging back on at the same time?
Now it's starting to straight up smell.
Sounds like the citrix black hole effect.
About that swampland in Florida you wanted to buy... (NT)
Skype outage
I hate to get an update from most software houses because it causes me so much trouble. Quick Books is another one that kills me on most updates.
Why can't they just do it right the first time and not have so many problems. If we had to update and fix our cars or airplanes that often it would be a mess.
Updates
not lucky at all...
As for the "get it right the first time": I would love to see MS do that, but then again I do not want to pay $30,000 for an OS. As for getting "getting it right the first time", my last new car had a recall in the accelerator which caused a decent number of accidents. The correctness is proportionally related to the risk involved - my computer crashes, mere irritation and perhaps important missing data about my finances. My car crashes and my kids are dead. See, the analogy the previous poster tried to create does not scale.
I am an open source proponent, but one of my most frustrating issues is getting fixes in a timely and inexpensive manner. I lack the skills to fix the things I need fix much of the time and have to wait upon the whims of the unpaid labor of OSS much of the time. I know you can hire people to do the work, but the one time I tried to hire the correct group of consultants to get the work done, it was going to cost me close to $10K - more money than running Windows on all my small companies servers and clients...Granted, it was the one time I really needed the fix and with MS I would have needed a fix each month. Not sure what my point is...
Oh, I know...all software sucks and we have yet to discover the correct model for creating and supporting it.
Sounds like they are not quite ready for Prime Time
BS!!! Use SUS dumbasses
Don't blame microsoft for your employee's not doing what they are paid for.
Donald
SUS Recommendation
Probably only part of the problem
Hotmail was essentially unavailable when the improved version came out recently as everyone tried to get it. These surges do make a difference - even to Microsoft.
But, if the default behavior is to re-login at reboot, perhaps a redesign is in order. MS will continue to have patch Tuesday and will continue to need reboots to fix gaping holes in the kernel. Always had, always will.