Argonne scientists perform huge file transfers to model the makeup of the Universe

A team led by Argonne National Laboratory scientists moved 2.9 petabytes of data -- in a single file transfer -- as part of a project involving some of the largest-ever cosmological simulations.

Citizen Data Scientists Citizen data scientists answer questions and solve problems using government stats

A team of scientists from Argonne National Laboratory is moving around massive amounts of data to study the evolution and content of the universe. In one single file transfer, the team moved 2.9 petabytes of data on the Summit supercomputer at Oak Ridge National Laboratory. The file transfer was the largest ever conducted by Globus, a data management service used by hundreds of research institutions and high-performance computing (HPC) facilities worldwide.

"Storage is, in general, a very large problem in our community -- the Universe is just very big, so our work can often generate a lot of data," Katrin Heitmann, Argonne physicist and computational scientist, said in a statement.

Working in a field known as computational cosmology, Heitmann's team is generating extreme-scale simulations to model different scenarios of the makeup of the Universe. They carried out three different simulations on Summit, each of which resulted in a file transfer of 2-3PB.

"We are trying to understand the subtle differences in the distribution of matter in the Universe when we change the underlying model slightly," Heitmann explained in an interview

Her team was granted early access to Summit to perform the unique simulations. After the simulations were finished, they had to make a backup copy of the data to HPSS.

"One major problem we have with our simulations is that we can't get the data analyzed as fast as the computing centers want us to get them off their machines," she said. "So the copy to tape has two functions: make sure we have a copy of the data set in case something bad happens to the disk (which does occur rather regularly), and also ensure we can pull it back if we need to do new analysis tasks."