/>
X

Facebook's giant outage: This change caused all the problems

Facebook says a configuration issue knocked its social media apps offline on Monday, October 4.
owen-hughes-headshot.jpg
Written by Owen Hughes, Senior Editor on

Facebook blamed its six-hour outage on Monday on a faulty configuration change that affected its vast social media platforms and internal systems.

Facebook, alongside WhatsApp and Instagram, suffered a global outage on Monday, October 4 that began at approximately 11:44 EDT and dragged on well into the afternoon.

The social media giant's services were back online as of 17:28 EDT.

SEE: A cloud company asked security researchers to look over its systems. Here's what they found

In a subsequent blog post, Facebook's VP of infrastructure, Santosh Janardhan, said the outage had been caused by a technical issue affecting its Border Gateway Protocol (BCP) routing system, which had "a cascading effect on the way our data centers communicate, bringing our services to a halt."

Monday's outage also affected internal tools at Facebook that made diagnosing and fixing the problem more difficult, said Janardhan. According to the New York Times, the outage rendered engineers' access cards useless, meaning staff couldn't get into the buildings where the affected servers were housed.

"Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication," said Janardhan.

"Our services are now back online and we're actively working to fully return them to regular operations. We want to make clear at this time we believe the root cause of this outage was a faulty configuration change."

BGP was originally designed to interconnect internet service providers across the globe. It now forms the routing backbone of the internet.

Facebook also uses BGP as a foundation for its data center routing design. In a blog post published in May 2021, Facebook researchers said the routing design was aimed to allow the company to "build our network quickly and provide high availability of our services, while keeping the design itself scalable."

SEE: Why Facebook is the AOL of 2021

However, the researchers also note that BGP "requires tight codesign with the data center topology, configuration, switch software, and data center–wide operational pipeline." Ironically, Facebook's data centre routing configuration was designed specifically to minimize the impact of failures.

No user data was compromised in Monday's outage, Facebook said.

Related

Transform your travels with Rosetta Stone, cheap airfares and expert tips
replace-this-image.jpg

Transform your travels with Rosetta Stone, cheap airfares and expert tips

Deals
Amazon, just say no: The looming horror of AI voice replication
cracked fake cloud faces

Amazon, just say no: The looming horror of AI voice replication

AI & Robotics
NASA spots double crater on Moon caused by mystery rocket crash
booster-0805-m1407760984r-map-str01-enlarge3x-1100crop.png

NASA spots double crater on Moon caused by mystery rocket crash

Space