Researchers at CERN, the world's largest particle physics laboratory, face a truly immense storage challenge.
One of its latest projects, the Large Hadron Collider (LHC), is being built to study particles and the forces that
bind them together. Due to become fully operational around September 2007, the
LHC will fire billions of protons round a 27km circuit, 150m below ground.
Each beam fires 3,000 bundles of 100 billion protons, whose
paths are bent round the circuit by supercooled (-271°C) superconducting
magnets, and are made to collide at the centre of four detectors in the tunnel.
The interactions between the protons are measured there at 40 million
events per second.
In short, this means that CERN's scientists have an awful
lot of data on their hands. They use computers to filter the events down to a
few hundred "good" events per second, but even this can generate between 100 and
1,000 megabytes of data per second.
That equates to 15 petabytes of data per year for four
experiments, which will be stored on magnetic tape and disk.
"This is far too large for a single datacentre," said Dr
Helge Meinhard, technical coordinator for CERN-IT Switzerland. "The information is federated to more than 120 datacenters worldwide."
The processing power currently required by CERN is
equivalent to 30,000 CPU servers, Meinhard told ZDNet UK, speaking at the
Storage Networking World event in Frankfurt.
Experimental event data is sent via optical links to CERN
computer centres. One data stream is stored on magnetic tape, one data stream is
sent to one or two of CERN's 11 "Tier 1" centers, while a third data stream is
sent to CERN's CPUs for analysis and to map the particle events.
This network is dubbed the DataGRID, and CERN's scientists
will be able to access data from anywhere on the network.
Storage is made more complex by each center being
autonomous, although there are commonalities. All the centres use x86
architecture, and Linux. CERN uses x86 and Linux on 98 percent of its systems, according to Meinhard.
"The main reason is cost," said Meinhard. "It gives us the
best value for money. You don't have to pay per machine, which is a significant
Another CERN scientist, who preferred not to be named, said
that it wouldn't be possible to fund CERN projects if they had to rely on
proprietary software, because of the cost of licensing.
CERN physicists also keep costs down by developing their own
"homemade" software, and relying on commodity or off-the-shelf equipment as far as possible.
With the collisions beginning in earnest in the LHC by late
summer 2007, the physicists hope to find the Higgs boson, a hypothetical elementary particle.
American scientists working on the LHC project got a boost
last week when two high-speed networks, ESNet and Internet2, announced they would work together to develop a "highly reliable, high-capacity network" across the United States.