Home & Office

Facebook outage due to internal errors, says company

A misconfiguration led to a feedback storm of errors that could only be solved by pulling the plug on the social-networking site, according to the software engineering director of Facebook

Written by Rupert Goodwins, Contributor Sept. 26, 2010 at 11:01 a.m. PT

Facebook's worst outage for four years was due to an internal configuration error, the company disclosed on Friday.

The 150 minute-long outage, during which time the site was turned off completely, was the result of a single incorrect setting that produced a cascade of erroneous traffic, Facebook software engineering director Robert Johnson said in a posting to the site.

"Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second," Johnson said.

"To make matters worse, every time a client got an error attempting to query one of the databases, it interpreted it as an invalid value and deleted the corresponding cache key," he added. "This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn’t allow the databases to recover."

Network analyst company Arbor Networks, which collates global internet statistics from 80 ISPs, reported that Facebook traffic fell from 60Gbps to 10Gbps between 5:30pm and 6:30pm BST on 23 September. It subsequently slumped to under 5Gbps before returning fully shortly after 9pm.

Facebook uses a mix of MySQL and InnoDB database technologies to serve information, and the company is active in the open-source database community. On 15 September, it released OSC, a tool it has developed to make rapid changes to MySQL schemas on live systems.

Editorial standards

Show Comments

Logitech Mevo Core Streaming camera on a tripod on set

Facebook outage due to internal errors, says company

Related

I fly 10 times a year. These 5 tech gadgets are lifesavers

Forget the Pixel 8a. This $399 Samsung phone is a force to be reckoned with

The best AI image generators to try right now