A team of scientists at Argonne National Laboratory has broken a data transfer record by moving a staggering 2.9 petabytes of data for a research project.
The data – from three large cosmological simulations – was generated and stored on the Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), which is currently rated as the world’s fastest supercomputer on the Top500 list at nearly 149 Linpack petaflops.
“We carried out three different simulations on Summit (each simulation resulted in a file transfer of 2-3PB) to model three different scenarios of the makeup of the Universe,” said Dr. Katrin Heitmann, a physicist at Argonne and lead researcher on the project. “We are trying to understand the subtle differences in the distribution of matter in the Universe when we change the underlying model slightly.”
The ambitious project required time-consuming analysis of data Heitmann calls “precious” – but, she explains, there was a major problem with their simulations: they couldn’t analyze the data fast enough to keep pace with how quickly the computer centers wanted the data removed from their machines. This meant transferring from disk to tape – both to have a copy in case of disk failure and to be able to remove it to make room for new analysis tasks if necessary.
The team used Globus – a research data management service – to transfer the files. Globus — which was originally developed in 1997 to help enable grid computing — allows subscribers to move, share and discover data using a single interface, and also allows developers to build applications and gateways. Heitmann said the team chose Globus for “speed, reliability, and ease of use.”
“The implementations in Globus are extremely convenient,” she elaborated. “For example, it reminds me when my credentials expire, so a job basically never times out. Also, the Globus interface is easy to use, and it provides excellent monitoring interfaces (first thing in the morning when I get to my office is making a coffee, second thing is very often checking in on the transfers!), and I can really fully rely on it due to the checksums. In addition, this work would not have been possible without Oak Ridge’s excellent setup of data transfer nodes, enabling the use of Globus for HPSS transfers.”
“We could not be prouder of our role in helping scientists do their world-changing work,” said Ian Foster, co-founder of Globus and director of Argonne’s Data Science and Learning Division. “We’re happy to see projects like this one continue to push the boundaries of what Globus can achieve. Congratulations to Dr. Heitmann and team!”