Enough information to fill multiple CDs every second is flowing across the world on a network 1,000 times faster than home broadband.
Terabytes of data are streaming through dedicated fibre-optic links between laboratories and universities globally in preparation for the world's largest particle accelerator, the Large Hadron Collider (LHC), being switched on in August at Cern in Geneva, Switzerland.
The Large Hadron Collider Computing Grid (LCG), a super high-bandwidth network, will channel about 15 petabytes — 15 million gigabytes — of data from the LHC to about 5,000 scientists in 500 institutions every year for at least 10 years.
The particle accelerator will smash sub-atomic particles, protons, into each other at 99 percent of the speed of light, spraying huge amounts of energy and particles into its detectors.
The LCG will allow researchers to tap into the distributed processing power of almost 100,000 CPUs, crunching through vast amounts of data from the detectors and speeding their hunt for clues about the fundamental nature of the universe.
Rutherford Appleton Laboratories (RAL), near Oxford, has a 10Gb connection to Cern capable of 1,250Mbps upstream and downstream that will pipe in almost raw data from the LHC via the UK part of the LCG — the GridPP.
Twenty years to catch up
Andrew Sansum, tier-one manager at RAL, said its connection with Cern is about 1,000 times faster than the download speeds on a home broadband connection.
He said it may be less than two decades before commercial networks catch up: "Video and other media services are going to push the speed of consumer network connections up as the demand is going to be huge.
"We were at today's speed of about 10Mbps about 10 to 15 years ago, so you could take that as a precedent for how long it will take for the commercial networks to catch up with us today."
RAL and other "tier one" sites across the world in the LCG will shape the mass of data from the LHC into chunks that can be usefully analysed by physicists and pass it on to hundreds of "tier two" universities and laboratories in their respective countries.
Sansum said: "The LHC experiment would not be possible without the power and throughput of the LCG. Cern has not got the capacity to solely process the vast amount of data on site. The tier one sites will be busy refining the data and enhancing the software that analyses it, growing the processing operations of the grid.
"Our role is to make sure that those physicists are getting the most useful and relevant data. Grid technology is transforming the way that experiments are being carried out. Ten years ago these institutions were working on their own, now they work closely together."
Sansum said RAL and the GridPP are prepared for the LHC going live: "We have run it up to 250 to 300Mbps each way sustained over several days so far. We are in the final shakedown at the moment and seem to be in good shape to face the challenges the LHC will throw at us.
"But there are bound to be surprises around the corner. The biggest challenge is for the software to work out which of the 200 or so tier-two sites has which data. You need to be able to move vast amounts of data from site to site, check it has all got there, flag up any problems and correct those immediately — it quickly gets immensely complicated."
A wide range of projects are already tapping into the vast number-crunching capabilities and fat pipes of the GridPP during its downtime, including those searching for anti-malarial drugs, combating avian flu and an image search engine.
There are various grid projects around the world analysing weather data or collaborating on other scientific and academic projects, but none match the scale and sustained throughput of the LCG.
Grid technology will continue to grow in use, according to Sansum, linking up diverse data, such as climate information and localised cancer rates, and offering insight and driving scientific progress forward in ways never before possible.