The Mass Storage System (MSS) at the National Center for Atmospheric Research (NCAR) in Boulder, Colorado, is one of the largest archives in the world dedicated to geoscience research. Some of the data it stores originate from field experiments and observations: international climate records from the past 100 years include data from weather stations, ships, planes, and satellites. The bulk of MSS data, however, is generated by global climate simulations and other Earth systems models that run on high-performance computers.
As these computers become larger and faster, they generate an exponential amount of output data to be archived. Even greater demands for archiving data result from the growing use of coupled atmosphere, ocean, and sea ice climate models.
In January 2006, total data holdings on the MSS exceeded 2.5 petabytes (the equivalent of 2.5 billion 500-page paperback novels), while the quantity of unique data files is increasing by about 35 terabytes each month. Each day, the MSS handles an average of 40,000 requests for data, transporting more than 3.8 terabytes of data to and from NCAR supercomputers.
Ensuring that this enormous amount of information can be stored and accessed speedily, safely, and reliably by geoscientists around the world is the job of NCAR's Scientific Computing Division (SCD), which designed the MSS in the mid-1980s and has been extending its capabilities it ever since.
In January 2006 SCD made a significant upgrade to the MSS by completing the transition from High Performance Parallel Interface (HiPPI) technology to Fibre Channel technology.
“HiPPI is being retired after 13 years of faithful service,” said John Merrill, head of SCD's Mass Storage Systems Group. “For several years we've been moving from HiPPI to Gigabit Ethernet and Fibre Channel as a means to access the MSS. We have now decommissioned the last of the HiPPI devices and eliminated all remaining HiPPI data traffic. All MSS reads and writes are currently being carried through Fibre Channel.”
Data files can now be transferred three to four times faster than before, although users may not notice a sudden change because the transition has been gradual. In addition, the storage capacity of Fiber Channel-enabled media and tape drives is 3.3 times greater than the devices they replace.
These improvements will allow the MSS to expand yet further into the multi-petabyte range, while reducing the latency to access MSS files.
The move to Gigabit Ethernet and Fiber Channel is part of the larger, two-decade evolution of the MSS toward technologies that are faster and more reliable.
In the 1980s, the MSS was comprised strictly of tapes that were mounted manually by human operators. In November 1989, SCD acquired the first StorageTek Powderhorn data silo, which employed robotic arms to mount tapes at the blazing speed of 350 per hour. In 1995, an upgrade increased the speed to 450 mounts per hour.
While early MSS tapes held only 200 megabytes of data, over the years storage technology advanced until today, the same-sized tapes hold 200 gigabytes — a thousand-fold increase over the original cartridge capacity. MSS tape drives have also improved to accommodate higher storage densities and faster data transfers, while the number of data silos has increased to five.
The “brain” of the MSS, the computer that controls the entire storage facility, is called the Mass Storage Control Processor (MSCP). SCD has managed a steady succession of better, faster MSCPs; the current model is a high-speed, high-performance IBM z/890-320.
SCD is now at work on a new software implementation of the MSS metadata catalog, which will further increase bandwidth, accessibility, and reliability for data transfers.
As technology continues to evolve and computational output multiplies, SCD remains committed to providing cost-effective mass storage for the Earth sciences community — as it has since 1978, when NCAR's first rudimentary archival system contained less than one terabyte of data.
The Scientific Computing Division (SCD) is part of the Computational and Information Systems Laboratory (CISL) of the National Center for Atmospheric Research (NCAR) in Boulder, Colorado. NCAR is operated by the University Corporation for Atmospheric Research under the primary sponsorship of the National Science Foundation.