TACC, the Texas Advanced Computing Center, knows all about big data. As a leading center of computational excellence in the United States, TACC relies on advanced computing technologies to enable discoveries that advance science and society. Of course, all the data that is generated requires a repository – that’s where Corral comes in. The large-scale data repository was deployed in 2009 to support the storing and sharing of research data at the University of Texas.
A recent article on TACC’s website highlights an important milestone for Corral. The DataDirect Networks storage system recently crossed the one petabyte mark in total data stored, and it now hosts over 100 unique data collections. The diverse assortment of datasets range from measurements of Earth’s gravity field to whale songs to mass spectrometry data, according to the piece by science writer Arron Dubrow.
Usage of the system continues to climb. For the last six months, usage has increased 10 percent per month.
“We’ve seen ever-increasing growth in the number and diversity of collections on Corral over the past several years,” said Chris Jordan, manager of the data management and collections group at TACC. “This shows how important a resource dedicated to data collections is to modern research practices, both for the researchers who are creating data and the worldwide community of researchers who use public data collections to further their own research.”
Corral is not the only storage mechanism at TACC, but it is unique for hosting large collections that are actively serving the community. TACC’s 100-petabyte Ranch tape archive serves as a long-term repository for archived work. The site’s newest petascale supercomputer, Stampede, includes more than 15 petabytes of dedicated storage, and there is also a scalable global file system, which adds another 20 petabytes. These are both used for short-term data retention to support ongoing simulations and analyses.
Corral, which has a current raw capacity of six petabytes, was designed and optimized to support complex large-scale collections and a collaborative research environment. With a high-speed connection to TACC’s other advanced computing systems, scientists can easily share data and results.
According to Niall Gaffney, TACC’s Director of Data Intensive Computing, “Corral is leading the way in the preservation and dissemination of data for researchers who are discovering that global, on-demand access to large quantities of data leads to previously unachievable results.”