Cloudflare disclosed that the snags have affected a slew of products at the data plane and edge level. These include Logpush, WARP / Zero Trust device posture, Cloudflare dashboard, Cloudflare API, Stream API, Workers API, and Alert Notification System.
Other programs are still running, but you can't modify their settings. These are Magic Transit, Argo Smart Routing, Workers KV, WAF, Rate Limiting, Rules, WARP / Zero Trust Registration, Waiting Room, Load Balancing and Healthchecks, Cloudflare Pages, Zero Trust Gateway, DNS Authoritative and Secondary, Cloudflare Tunnel, Workers KV namespace operations, and Magic WAN.
The root cause of these problems is a data center power failure combined with a failure of services to switch over from data centers having trouble to those still functioning.
Cloudflare gave ZDNET a fuller explanation of what happened:
We operate in multiple redundant data centers in Oregon that power Cloudflare's control plane (dashboard, logging, etc). There was a regional power issue that impacted multiple facilities in the region. The facilities failed to generate power overnight. Then, this morning, there were multiple generator failures that took the facilities entirely offline. We have failed over to our disaster recovery facility and most of our services are restored. This data center outage impacted Cloudflare's dashboards and APIs, but it did not impact traffic flowing through our global network. We are working with our data center vendors to investigate the root cause of the regional power outage and generator failures. We expect to publish multiple blogs based on what we learn and can share those with you when they're live.
Cloudflare is still working to resolve this problem. But, since the problem was with data center power outages rather than its software, solving it may be outside its control. Hang in there, folks. Fixing this may take a while. That said, no one expected it to take this long.