Facebook outage due to internal errors, says company

Summary:A misconfiguration led to a feedback storm of errors that could only be solved by pulling the plug on the social-networking site, according to the software engineering director of Facebook

Facebook's worst outage for four years was due to an internal configuration error, the company disclosed on Friday.

The 150 minute-long outage, during which time the site was turned off completely, was the result of a single incorrect setting that produced a cascade of erroneous traffic, Facebook software engineering director Robert Johnson said in a posting to the site.

"Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second," Johnson said.

"To make matters worse, every time a client got an error attempting to query one of the databases, it interpreted it as an invalid value and deleted the corresponding cache key," he added. "This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn’t allow the databases to recover."

Network analyst company Arbor Networks, which collates global internet statistics from 80 ISPs, reported that Facebook traffic fell from 60Gbps to 10Gbps between 5:30pm and 6:30pm BST on 23 September. It subsequently slumped to under 5Gbps before returning fully shortly after 9pm.

Facebook uses a mix of MySQL and InnoDB database technologies to serve information, and the company is active in the open-source database community. On 15 September, it released OSC, a tool it has developed to make rapid changes to MySQL schemas on live systems.

Topics: Networking

About

Editor, ZDNet UK. Ex technology/technical editor of ZDNet UK, IT Week, PC Magazine, Computer Life, Mac User, Alfa Systems, Amstrad, Sinclair. Micronet 800, Marconi Space and Defence Systems, and a dodgy TV repair shop in the back streets of Plymouth. Can still swap out a gassy PL509 with the best of 'em.Dear Reader - contact me via our m... Full Bio

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.