Facebook outage due to internal errors, says company

Facebook outage due to internal errors, says company

Summary: A misconfiguration led to a feedback storm of errors that could only be solved by pulling the plug on the social-networking site, according to the software engineering director of Facebook

TOPICS: Networking

Facebook's worst outage for four years was due to an internal configuration error, the company disclosed on Friday.

The 150 minute-long outage, during which time the site was turned off completely, was the result of a single incorrect setting that produced a cascade of erroneous traffic, Facebook software engineering director Robert Johnson said in a posting to the site.

Read this

Facebook, Twitter: Is social media out of control at work?

A few key measures can help organisations benefit from social media and also shut out the criminals, says Alan Calder

Read more+

"Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second," Johnson said.

"To make matters worse, every time a client got an error attempting to query one of the databases, it interpreted it as an invalid value and deleted the corresponding cache key," he added. "This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn’t allow the databases to recover."

Network analyst company Arbor Networks, which collates global internet statistics from 80 ISPs, reported that Facebook traffic fell from 60Gbps to 10Gbps between 5:30pm and 6:30pm BST on 23 September. It subsequently slumped to under 5Gbps before returning fully shortly after 9pm.

Facebook uses a mix of MySQL and InnoDB database technologies to serve information, and the company is active in the open-source database community. On 15 September, it released OSC, a tool it has developed to make rapid changes to MySQL schemas on live systems.

Topic: Networking

Rupert Goodwins

About Rupert Goodwins

Rupert started off as a nerdy lad expecting to be an electronics engineer, but having tried it for a while discovered that journalism was more fun. He ended up on PC Magazine in the early '90s, before that evolved into ZDNet UK - and Rupert evolved with them into an online journalist.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


1 comment
Log in or register to join the discussion