X
Home & Office

CloudFlare pins outage on bad rule for Juniper routers

Content distribution firm "drops off the internet" after outage.
Written by Liam Tung, Contributing Writer

Content distribution company CloudFlare suffered a worldwide outage for around an hour over the weekend after applying a bad change to its Juniper edge routers that replicated across its network.

CloudFlare "effectively dropped off the internet" after a network wide failure hit all 23 nodes located in 14 countries across the world. Visitors to the site during the outage between 09.47 UTC and 10.49 UTC would have received a DNS error, CloudFlare's CEO Matthew Prince explained on the company's blog.

The outage affected its DNS and any services that rely on its web proxy, which is an important component for clients — such as WikiLeaks and around 500,000 other organisations — that rely on it for web optimisation and uptime in the face of distributed denial of service attacks.

Indeed, the outage occurred after its engineers applied a "bad rule" to a Juniper edge router while fending off a DDoS attack against one of its clients, which spread across its network of edge routers using Juniper's Flowspec protocol.

The rule was designed to filter an attack that was sending packet sizes between 99,971 and 99,985 bytes long to the client’s DNS server, but caused the router to malfunction. 

2013-03-04 01.00.48 pm
CloudFlare's networkwide outage. Credit: CloudFlare

"Flowspec accepted the rule and relayed it to our edge network. What should have happened is that no packet should have matched that rule because no packet was actually that large. What happened instead is that the routers encountered the rule and then proceeded to consume all their RAM until they crashed," Prince said.

Some routers failed to automatically reboot, forcing network operations teams at the datacentres had to physically access them and perform a hard reboot to get them up and running again.

The company said it is investigating whether Juniper is aware of any bugs and will begin testing whether Flowspec rule updates can be targeted to specific datacentres rather than applied network-wide.

CloudFlare intends on issuing service credits to accounts covered by service level agreements.

"Any amount of downtime is completely unacceptable to us and the whole CloudFlare team is sorry we let our customers down this morning," said Prince.

Editorial standards