A switch failure in the London Internet Exchange (Linx) yesterday led to a cascading failure and temporary loss of throughput for a number of UK ISPs. Although Linx has a fully-redundant network with parallel topography from two different manufacturers, Brocade and Extreme Networks, only one side failed - but Linx declined to say which.
"The network is in a stable state now and we're working with our switch partner to resolve the problem completely", Malcolm Hutty, public affairs director for Linx, told ZDNet UK. "We're still investigating, but the problem appears to be most likely software related".
Statistics from Linx during the problem period showed short but significant outages and a number of periods of restoration followed by further restrictions. This was due to connected ISP routers discovering the problem themselves and rerouting to alternatives at different times, said Hutty.
"We have two peering lans, but not every member is connected to the second one.", Hutty said. "Some of our members are linked to the high availability network, others have connections to other networks such as DE-CIX in Frankfurt and so on, and switch to those if there's a problem. There were various phases of the outage, and in the early phase, if packets are being dropped but the peering session stays up, the ISP's router doesn't know there's a problem. When the session fails, it's a cue to the member router that there's a problem and it switches to an alternate network. Before that point there'll be degraded service."
Linx is the biggest Internet exchange point in the UK, and the third largest in Europe by average throughput. It has over 300 member ISPs connected to it.