The Arecibo Observatory in Puerto Rico stood as the world’s largest single-aperture telescope for more than half a century, its grandiosity earning it a turn as a major filming location in the James Bond movie GoldenEye. Around six months ago, though, the National Science Foundation (NSF) announced that the observatory would be decommissioned in the wake of dangerous cable malfunctions. Just weeks later, the telescope collapsed of its own accord.
This failure, however, didn’t just leave behind a fascinating – if now useless – telescope dish: it left behind more than three petabytes of data from more than 1,700 projects spanning sixty years, all stored in a datacenter left undamaged by the collapse. Within a month, six institutions had teamed up to transfer the massive data store to a new facility.
“The collapse of the Arecibo Observatory platform certainly raised a sense of urgency within our team,” said Julio Alvarado, manager of Arecibo’s big data program. Alvarado reached out to the Office of Research at the University of Central Florida (UCF) for help, but UCF struggled to identify a viable path for transmitting the data – and also struggled to identify a viable endpoint for storage.
So UCF, in turn, directed Alvarado to two NSF-supported projects: the Engagement and Performance Operations Center (EPOC), which is a “production platform for operations, applied training, monitoring, and research and education support”; and the Cyberinfrastructure Center of Excellence (CI CoE) Pilot project, which “provides expertise and active support to cyberinfrastructure practitioners … in order to accelerate the data lifecycle and ensure the integrity and effectiveness of the cyberinfrastructure upon which research and discovery depends.”
Together, EPOC and the CI CoE pilot masterminded a plan to transfer the mammoth data load. “Migrating the entire Arecibo data set … would take many months or even years if done inefficiently, but could take only weeks with proper hardware, software and configurations,” said Hans Addleman, EPOC’s principal network systems engineer. To that end, EPOC’s staff supplied infrastructure resources for the design of a data transfer framework while the CI CoE team worked to ensure that the transferred data would remain accessible to the scientific community. The team enlisted Globus, a data management and transfer service based at the University of Chicago, to ensure that the data was transferred quickly, securely and reliably.
“NSF is committed to supporting Arecibo Observatory as a vital scientific, educational, and cultural center, and part of that will be making sure that the vast amounts of data collected by the telescope continue to drive discovery,” said Alison B. Peck, an NSF program officer. “We’re gratified to see that this partnership will not only safely store copies of Arecibo Observatory’s data but also provide enhanced levels of access for current and future generations of astronomers.”
The team found a destination, as well: Ranch, a massive, high-performance archival file system at the Texas Advanced Computing Center (TACC). Ranch is a DDN SFA14K declustered RAID system managed by Quantum StorNext supported by a Quantum Scalar i6000 tape library. Ranch’s capacity exceeds 70 petabytes with the potential for expansion up to an exabyte (1,000 petabytes).
The team began transferring the data – which is spread across hard drive, tape library and offsite data – in January, starting with the hard drives. Once loaded onto mobile storage devices, the data is being transferred to the University of Puerto Rico at Mayaguez (for continuing use) and to the Engine-4 coworking space for upload, which proceeds at a pace of 12 terabytes per day. “Further phases will copy the Arecibo tape library to hard drives and then to TACC,” Alvarado said, “and a later phase will copy data from offsite locations to TACC.”
The Arecibo Observatory’s data journey will not, however, end at TACC’s Ranch. Once safely stored in Ranch, the data will await a new, permanent home, currently under joint development by Arecibo, EPOC and the CI CoE pilot.
“Arecibo data has led to hundreds of discoveries over the last 50 years,” said Francisco Cordova, director of the Arecibo Observatory. “Preserving it, and most importantly, making it available to researchers and students worldwide will undoubtedly help continue the legacy of the facility for decades to come. With advanced machine learning and artificial intelligence tools available now, and in the future, the data provides opportunity for even more discoveries and understanding of recently discovered physical phenomena.”