When it comes to storage problems, no one is exempt. The exponential growth of scientific and technical big data is not only having a major impact on HPC storage infrastructures at the world’s largest organizations, but at small-to medium-sized companies as well.
The stumbling blocks at large organizations are highly visible. Over the years the government labs, educational institutions and major enterprises have built up complex and often highly dispersed storage infrastructures. They are characterized by numerous, distributed file systems running on multiple HPC systems.
As a result, storage silos have become the norm. Just some of the consequences are limited access to data, high latency, and increased storage, maintenance and retrieval costs. In particular, these IT infrastructures have a difficult time handling either planned or unexpected peak period loads brought on by activities like checkpointing or unanticipated user demand.
Site-Wide Storage at NERSC
The National Energy Research Scientific Computing Center (NERSC) is an excellent example of how a major institution can solve these thorny storage problems.
More than 5,000 scientists use NERSC’s computational facilities every year. They are performing scientific research for as many as 700 topics spanning such fields as solar energy, bioinformatics, fusion science, astrophysics, climate science and more. The Center currently has six state of the art computer systems and advanced storage systems. Included is “Edison,” a Cray XC30 with a peak performance of over two petaflops.
Since 2006, NERSC has had to continuously address its storage problems and recently the pace has quickened. Typically, up to 400 researchers a day from all over the world were using the Center to handle hundreds of high-bandwidth applications to access, analyze and share research data. Because the facility relied on a multiple different file systems, delivering an optimum balance of capacity and throughput was a major problem.
“We were constantly moving data around within the center to ensure we had sufficient storage to handle new project growth while keeping our existing users happy,” says Jason Hick, group leader, storage systems for NERSC. “It took an inordinate amount of time and created an enormous amount of network traffic.”
Hick’s team debated the merits of deploying additional storage or moving to a centralized solution designed to meet both present and anticipated future growth. They opted for the latter.
Says Hick, “NERSC was a pioneer in moving away from local storage in favor of site-wide Global File systems and consolidated storage architecture.” He adds that performance and efficiency were the primary drivers for adopting a site-wide storage architecture.
DDN Storage Solution
At the heart of the solution is DataDirect Networks Storage Fusion Architecture® (SFA), which provides all the functionality needed to ingest, analyze and archive big data on a single platform.
This approach allows NERSC to deploy a centralized storage capability that can accommodate the requirements of the largest computer system on the network – including peak period bursts – as well meeting the storage needs of the Center’s other five systems. And when the ultra powerful NERSC 8 supercomputer “Cori” is installed in mid-2016, it will make full use of the scalable site-wide storage infrastructure.
Hick reports that the cost of the centralized infrastructure is 30 percent less than a local file system, with savings running into several hundreds of thousands of dollars. “Scratch” storage costs have been cut by more than 50 percent.
NERSC is just one of several large organizations that have moved to site-wide storage solutions based on DDN technology. Included are Texas Advanced Computing Center (TACC), Oak Ridge National Laboratory (ORNL) and Los Alamos National Laboratory (LANL).
But the benefits of site-wide storage solution are not the exclusive domain of these big government labs and major institutions. Smaller sites may not have the resources to buy, deploy and manage infrastructure on the scale of TACC or an ORNL, but they can still enjoy the benefits of site wide storage. One very successful approach is to “converge” parallel file systems and other applications with storage to create centralized storage building blocks that provide higher performance and lower latency. At the same time, this solution also offers ease of purchase, deployment and management.
Dealing with Big Data at the University of Florida
The University of Florida is a good example. Its Interdisciplinary Center for Biotechnology Research (ICBR) has been in a rapid growth mode, generating increasing amounts of data as it adds new equipment such as next generation sequencers and Cryo-electron microscopy instruments. To handle this growth, ICBR wanted a flexible, low footprint, and simplified infrastructure that could scale as needed.
The Center chose DDN’s converged infrastructure (DDN In-Storage Processing™) which allows users to embed parallel file sysytems and key applications inside the storage controller. This approach has allowed the ICBR to eliminate data access latency as well as the need for additional servers, cabling, network switches and adapters, while reducing administrative overhead. Balanced storage and faster application burst performance means that big data applications perform at optimal levels.
The solution provides the performance and advanced capabilities needed to handle its rapidly growing next generation sequencing projects with their constantly changing application loads.
ICBR’s experience shows how a mid-range organization with limited resources can enjoy a satisfactory site-wide storage solution. The Center has deployed an adaptive and customizable architecture for storing, managing and analyzing large collections of distributed data running to billions of files and petabytes of storage across tens of federated data grids.
Optimized Storage for All
As the University of Florida and the NERSC examples demonstrate, the benefits of site-wide, optimized storage are available to organizations both large and small. DDN, the HPC storage leader is showing the way.