The internet is a swan. It sails smoothly across our screens, delivering apps and pages, with seamless connectivity.
But under the surface it's full of ISPs paddling away, frantically negotiating peering deals, hosting servers and content-delivery networks — struggling to keep latency to a minimum and to ensure connectivity for customers.
Keeping the internet working is a 24/7 struggle I became familiar with when I ran the technology side of one of the UK's national ISPs. Recently I spent some time talking to business ISP and hosting provider Internap, about the work it had done to improve the performance of the Sahara Force India Formula 1 team's web presence.
With the season about to restart after the summer break, it's a site that's about to get very busy indeed. Much of what Internap had done was the result of something it called MIRO, the Managed Internet Route Optimizer.
Under the surface of the internet lie some powerful protocols. One, BGP, manages the routing tables that handle how data travels from A to B, and which networks it uses to traverse the world.
Thanks to those routing tables, a request from my desk in London might travel over the fibre of any of a dozen or so tier-1 network providers, depending on the peering arrangements of my broadband provider and the site host.
BGP is often ignored, but is incredibly powerful. That power was illustrated when Pakistan aimed to censor YouTube by configuring ISP core routers to send packets to a null address — but accidentally broadcast the BGP data to the wider net, blocking YouTube for much of the globe.
If you take a traditional approach to large-scale internet architectures, you're left with a lot of single points of failure — often at the border between one network and the next. What's needed is a resilient architecture that not only keeps a service online, but also aims to ensure the best possible network performance.
There's another problem here. Because the internet is fluid, it changes day to day and week to week, so there's no single best provider for a route and that makes things hard to manage.
Internap suggests that for between 80 and 90 percent of destinations you're going to get sub-optimal performance, with high latency that can add to costs and reduce site throughput. That's where MIRO comes in, balancing routes across multiple network providers and using the best one.
The architecture of a MIRO point of presence is familiar enough. Two border routers are dual-homed to a set of backbone switches, in a standard failover architecture. The backbone switches are connected to the site's core routers and to the internet via several network providers. Everything is resilient, with separate PSUs — making the datacentre the only point of failure.
MIRO works at the core layer, using BGP to work with the POP's core routers. It's software that monitors the data flow, sampling TCP packet time stamps to determine latency for each network. The software can then choose the best routes for a specific site, and delivers a BGP announcement of the new routes to the core routers, setting the next hop for the data.
Updates are regular, with the aim of keeping data always flowing on the most effective routes — and without users seeing anything unless they're keeping an eye on their traceroutes. MIRO handles about 15 million routing calculations in a 24-hour period, over five billion in the past year.
The result is a set of routes that changes every 90 seconds, when MIRO deploys a new set of optimised routes. There's no need to change everything, just 20,000 to 50,000 thousand of the four million or so routes that make up the cross-country and international connections of the internet. MIRO only monitors the routes customers are actually using — currently about half the routing table — and the result is a routing table that ebbs and flows, with traffic always on the lowest latency path.
Internap is certainly taking an intriguing approach to improving performance for its customers, though it's one that requires connections to a large percentage of the main global network providers. But sites certainly see the difference, with an average latency improvement of 25ms. That may not seem a lot, but a big e-commerce site may take 40 to 50 roundtrips to download a page, which quickly adds up.
A saving of over a second a page increases e-commerce throughput and retention or improves gameplay in a multiplayer online game. There's also an interesting side effect, as the ability to switch carrier on the fly means it's possible to route around carrier black- and brownouts.
A faster, more reliable internet? Perhaps it's not so far away.