The Skype outage that caused an interruption to its service earlier in December was due to a fault in a version of its software client, the company's chief information officer has explained.
On 22 and 23 December Skype's underlying peer-to-peer network crashed, cutting service for consumer and enterprise users of the internet telephony service.
The Skype outage was caused by a bug in a software client. Screenshot: Shannon Doubleday
"On Wednesday, December 22, a cluster of support servers responsible for offline instant messaging became overloaded. As a result of this overload, some Skype clients received delayed responses from the overloaded servers," Lars Rabbe, Skype's chief information officer, wrote on the company's main blog. "In a version of the Skype for Windows client (version 5.0.0152), the delayed responses from the overloaded servers were not properly processed, causing Windows clients running the affected version to crash."
Around 50 percent of Skype's global users were running the 5.0.0152 version of the software client. The crashes caused around 40 percent of the affected clients to crash. This ultimately took down 25 to 30 percent of all publicly available "supernodes", Rabbe wrote.
Supernodes are part of the backbone of Skype's distributed peer-to-peer communications network. Supernodes provide the addressing details of other Skype clients, route data, create local address clusters and help to connect different Skype clients with one another.
Each time a supernode failed, it created a cascading effect, Rabbe wrote. The responsibilities for the failed supernode were passed to the remaining supernodes, which then came under such a load that they began to shut themselves down as they had exceeded expected operational parameters, he continued. Stress on the remaining supernodes was heightened as users who had experienced crashes on their 5.0.0152 Skype client were restarting the software, placing additional strain on the remaining supernodes.
Traffic to the supernodes was about 100 times what Skype would have typically expected from that time of day, Rabbe wrote.
"Regrettably, as a result of the confluence of events — server overload, a bug in Skype for Windows clients (version 126.96.36.199) and the decline in available supernodes — Skype's functionality became unavailable to many of our users for approximately 24 hours," Rabbe said.
Skype was able to restore service by injecting thousands of "mega-supernodes" into the Skype network over the course of 22 December. The mega-supernodes did the jobs typically assigned to supernodes, while helping to stabilise other supernodes in the network, Rabbe wrote. Resources normally dedicated to Skype's Group Video Calling (GVC) features were used to deploy new supernodes and this caused downtime in GVC, which was restored by 24 December.
In light of the outage, Skype is going to look at the ways it can provide software updates to users to assure that all are up-to-date and will examine its testing process for new software, Rabbes wrote.