A Game of Clue: What Killed Skype

A Game of Clue: What Killed Skype

Summary: It was a server failure, with a Windows application crash, over a peer-to-peer network.

SHARE:

Days after Skype, the popular Voice-over-Internet-Protocol (VoIP), crashed we finally know why Skype died for several days. Perhaps launching into what blasted Skype though you need to know how Skype works.

You need to keep in mind that Skype is a true peer-to-peer (P2P) network application. Indeed, if you trace back Skype's ancestry you'll find that its developers first cut their teeth on the Kazaa P2P file-sharing program. What's important about that is that Skype, unlike client-server programs, relies on its client PCs to help carry voice communications.

If you're a Skype user your PC may not just be an ordinary client, but it may be working as a Super Node (SN) as well. When you login to Skype, the odds are you're not logging directly into the Skype login-servers but into a SN instead. The SN in turn, stores your Skype name, your e-mail address, and an encrypted version of your password.

Skype automatically and constantly modifies its network as users go off and on the service. With Skype installed, your PC may be used as a SN and you'll never know it. As a SN, your PC will store the addresses of up to several hundred Skype users. If your PC isn't behind a firewall and/or NAT (Network Address Translation), it may also be used to route calls.

The program' is designed so that Skype won't be using your system when you're in the middle of a big project. In addition, even if you're watching for Skype traffic, you're not likely to be able to crack it since voice traffic is encrypted with 256-bit Advanced Encryption Standard (AES, aka Rijndael).

The idea behind all this is to make Skype extremely scalable without requiring the company to maintain a large, read expensive, server infrastructure. Technically, this worked well for Skype until December 22, 2010.

Then, as Lars Rabbe, Skype's CIO explained, bad things started to happen: "A cluster of support servers responsible for offline instant messaging became overloaded. As a result of this overload, some Skype clients received delayed responses from the overloaded servers. In a version of the Skype for Windows client (version 5.0.0152), the delayed responses from the overloaded servers were not properly processed, causing Windows clients running the affected version to crash."

That wasn't the latest version of Skype for Windows, but "around 50% of all Skype users globally were running the 5.0.0.152 version of Skype for Windows, and the crashes caused approximately 40% of those clients to fail. These clients included 25-30% of the publicly available supernodes, also failed as a result of this problem."

Those of you know how cascade problems work can already see where this is going. "Once a supernode has failed, even when restarted, it takes some time to become available as a resource to the P2P network again. As a result, the P2P network was left with 25-30% fewer supernodes than normal. This caused a disproportionate load on the remaining available supernodes." How big a load on the last standing supernodes? Try, "about 100 times what would normally be expected at that time of day."

Whoops.

Well that worked about as well as you might think it would: "Regrettably, as a result of the confluence of events - server overload, a bug in Skype for Windows clients (version 5.0.0.152), and the decline in available supernodes - Skype's functionality became unavailable to many of our users for approximately 24 hours."

To combat this, Skype started adding its own "dedicated supernodes, which we nick-named 'mega-supernodes,' to provide enough temporary supernode capacity to accelerate the recovery of the peer-to-peer cloud. " While Skype claims that officially the problem only lasted about 24-hours, Rabbe admitted that "The supernodes stabilized overnight on Thursday and by Friday, several tens of thousands of supernodes were supporting the P2P network. During Friday, we withdrew a significant proportion of the mega-supernodes from service, leaving some in operation to ensure stability of the P2P network over Christmas and New Year." So, the problem really lasted over 48-hours. Skype seems to be working just fine now.

To prevent this kind of thing from happening again, Skype will be working on improving its Windows client quality assurance. The company is also working on adding to its small number of core servers so that if the P2P side of Skype goes down, there will be a bit more robustness in the service's infrastructure.

I'd also ask Skype to improve its software automatic update functionality. There's no way that so many old versions of the Windows client software should have been out there when the failure hit. The latest version of Skype, version 5.0.0.156, which proved able to resist the problem, had been released a week before the crash. Almost all Windows users should have been automatically upgraded to it.

I've said it before, I'll say it again: People should be forced to upgrade their systems if they're going to be on the Internet. One way to do that is to make sure applications, like Skype, which depend on the Internet, can be automatically updated. Yes, that can be a headache for system administrators, but then so is having out-of-date software on the loose that contributed to taking down an important service.

After all, what would you rather do? Play a game of Clue with what the heck just happened to a major Internet service or deal with automatic push software upgrades? I'd rather deal with the patches myself.

Topics: Operating Systems, Browser, Collaboration, Hardware, Networking, Servers, Software, Windows, Social Enterprise

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

22 comments
Log in or register to join the discussion
  • Allowing automatic updates opens up the possibility of mass infection

    If the update mechanism is hit by a bug or an internal employee plays along with hackers, you get the possibility of all the client computers being upgraded to a mass worm/virus/botnet infestation.
    laxamar
    • RE: A Game of Clue: What Killed Skype

      @laxamar And we don't have that now? I mean seriously Windows' security is already awful--there's a reason why Patch Tuesday happens every month. But that aside, most of the trouble happens to users who don't use Windows update and who are then hit by with exploits of already patched holes. Seriously, there's still 1 in 20 PCs out there still running IE 6.

      Steven

      Steven
      sjvn@...
      • What SHOULD Happen . . .

        @sjvn@...

        Is the software should automatically check for updates and then NOTIFY THE USER THAT AN UPDATE IS AVAILABLE.

        I checked mine today, and I still had the old software. So I did an auto update, which failed the first time, and succeeded on the second attempt.

        I don't think users should be forced to update, but they should at least be notified when an update exists, and given the opportunity to download and install it. I honestly think that most of them don't upgrade (which IE6-8 are, not an update), or update simply because they use the default settings, which until recently, were for the most part, set not to auto-update. This means they aren't even aware of the updates, and quite frankly, grandma and grandpa may not even realize that they need to do it.

        I choose not to auto-update until I'm sure the update actually works (please reference the Mcafee disaster recently when an important windows file was mis-identified as an infected file and removed, due to an automatic update).
        JLHenry
      • RE: A Game of Clue: What Killed Skype

        Gotta agree with Henry. Updating software without the user's knowledge or consent is both a violation of the user's property rights and also really annoying sometimes. Especially closing programs or rebooting the system without the user's explicit consent. I'm surprised there hasn't been a class action lawsuit against Microsoft over Windows Update's automatic reboots, for example. When I go to lunch and come back to find myself sitting at a blank desktop, with unsaved work lost, it makes me want to head down to Redmond with a baseball bat...
        masonwheeler
      • RE: A Game of Clue: What Killed Skype

        @sjvn@...

        I agree with JLHenry and masonwheeler.

        The first rule of working with computers, mason, is "save early, save often." Losing work by leaving that work unsaved is the user's fault.
        Cardhu
  • I can not understand how skpe servers became overloaded

    as they are Linux based, and this does not happen to Linux, from what I have read here.
    :|
    Tim Cook
    • You missed the bit about...

      It being to do with a Windows client.
      zkiwi
      • RE: A Game of Clue: What Killed Skype

        @zkiwi
        But don't go down that path of blaiming it on the OS. It had NOTHING to do with the OS and everything to do with their own buggy Windows client, though outdated. In the end it still was the Linux servers that went down. "Then, as Lars Rabbe, Skype?s CIO explained, bad things started to happen: ?A cluster of support servers responsible for offline instant messaging became overloaded. As a result of this overload, some Skype clients received delayed responses from the overloaded servers." Those clients were the Skype client for Windows. Nothing to do with the Windows OS. But it all cascaded from the servers being overloaded.
        smfrazz
      • RE: A Game of Clue: What Killed Skype

        @zkiwi
        Compute:
        Application Code <> Operating System
        12312332123
      • Hmmm...

        @smfrazz
        I blamed the client, not windows. And "in the end" it wouldn't have mattered what the servers were running, they would have been toast.

        @Traxxion
        Perhaps you could let mr_Spock know that and for that matter smfrazz too.
        zkiwi
  • Does not compute

    If 30% of the SN's failed ... then the remaining 70% would be bearing 42% extra load ... and since SKYPE surely does not max out a client's PC ... I don't understand why traffic increased 100 fold. Indeed I would have thought the bandwidth of broadband connections and CPU power available would mean a 42% increase was easily absorbed. I would also have thought that half the network with the new software would carry on talking to that half ... and just think everyone else had gone to sleep (having crashed the sleepers wouldn't be generating any traffic for a while ... given that recovery is reported as slow).

    OK, so it's me who doesn't understand how the cascade propagates :-(

    I guess the failed support nodes and clients kept taking down working nodes faster than they were recovering ...
    ... or is it the case that the redundancy mechanism failed and that a crashing client took down part of the working segment too?

    "The latest version of Skype, version 5.0.0.156, which proved able to resist the problem, had been released a week before the crash. Almost all Windows users should have been automatically upgraded to it."
    No!
    Imagine what would happen after a buggy version was pushed out. (No software quality assurance programme can eliminate ALL bugs.)
    A much slower release ... accompanied by monitoring of failures after the same ... would surely be preferable.

    Also I am not entirely comfortable with a forced upgrade: I currently value a feature of Photoshop 3 which has been removed from subsequent versions so I haven't upgraded. In the case of Skype I think they need to make it clear to customers in advance that their machine is part of the network and not entirley their own AND ASK PERMISSION for automatic-ish upgrades. (OK, then if you don't grant permission - you don't get the software!)
    jacksonjohn
  • RE: A Game of Clue: What Killed Skype

    All very well to state that everyone should be forced to upgrade their client software all the time, but personally I like to keep my software portable, not installed. I only use a portable version of Skype - I'm not sure if it was an official or not at this stage, (it may well be able to update itself) but you get my point....
    12312332123
  • What kind of idiot company would take down a significant

    portion of the mega supernode computers they put up to fix this problem while all those same old versions of the client are still out there in use and it's still the holidays? Spend a few bucks cheap ass!

    The lesson here is that skype seems to know jack sh** about cloud computing. Peer to peer is for rinky dink kids guys...
    Johnny Vegas
  • Auto-updating has to be optional

    There is now way we would ever want software to start updating itself en masse without it being optional. It is great to have the ability but it has to be optional. I will never allow anything on my computers that does not make silent updating optional. Even Windows Update is optional though I do leave that enabled.

    This was purely a Skype failure and suggesting that all internet software should silent update beyond user control to protect us from sloppy companies is not well thought out. It could have just as easily happened in reverse where a new version was the culprit and if silent updating was in place the problem would have been even worse.

    I only use Skype when I'm on the road so I'm sure I'm one of the ones with an older version and I have no intention of updating it until the next time I need it.
    Mythos7
  • Autoupdating can really be a headache

    We have seen this over the last year with AVG, mcafee, and XBox. Automatic updates can automatically hose your system. Who does the average consumer turn to when that happens? Sure auto update, but then set aside some money to pay damages when your update toasts somebody's pc and they have to call out the Geek Squad.
    zmud
  • The real story here is that Skype can crash your computer

    based solely on how many OTHER people may be using Skype.
    frgough
  • RE: A Game of Clue: What Killed Skype

    Skype killed Skype.
    james347
  • RE: A Game of Clue: What Killed Skype

    Where I work, certain Microsoft updates are not optional ... they arrive often un-announced, in the middle of big projects, and create huge headaches for those of us who must explain to our bosses why our PC's simply won't respond for an hour or two at a time. While MS does allow us to make these updates to OS and browser optional, our company doesn't. Is this a good thing or a bad thing? Well, on the plus side, we are as up to date as it is possible to be, across the board, all the time. That's great . On the down side, being mid project, facing tight deadlines, only to discover your system is on vacation for a few hours can be quite upsetting.

    It always seemed to me there should be a third option between "optional" and "forced / NOW!" ... that being something along these lines:

    "Your (insert name of application here) application requires an important and urgent update. This update will demand considerable system resources for anywhere from (min) to (max) minutes/hours, etc. This update is not optional. You do have the option of processing the update now or, if you are in the middle of something critical, delaying the update for up to XX hours (where XX is determined by the application creator)."

    Given this sort of option, I could get my most urgent work done, then authorize the update to process when the red hot work was complete.
    justin.donie@...
  • Allow opting out of auto-update, but then disallow SN status

    As a former Network Admin for a large company, and as a home user, I ALWAYS want to be able to opt out of auto-updates for the reasons others have given in this discussion. In Skype's case, one solution would be to not allow nodes with older clients to become or remain SuperNodes. Of course, this would have to be combined with more robust software update pushes to users who have NOT opted out.
    rschoonh@...
  • Example of forced update

    I use firefox for years because other browers have problems, now I find many problems witn firefox, the loss of functionality from firefox updates. I went back to something else Firefox 2.0.0.20 to get functionality, to do that I get comments of unapproved browser, if I used what they expected I would only use Internet Explorer. Safari for windows has no place to get the URL or search term working, it works by shear luck. The most recent firefox will not allow add-ons, the favorites or bookmarks never work as good as Internet Explorer, Bookmark for safari is near useless. They call that approved browsers.
    troubled241