Tackling Big Data Storage Problems with Site-Wide Storage

May 19, 2014

When it comes to storage problems, no one is exempt. The exponential growth of scientific and technical big data is not only having a major impact on HPC storage infrastructures at the world’s largest organizations, but at small-to medium-sized companies as well.

The stumbling blocks at large organizations are highly visible. Over the years the government labs, educational institutions and major enterprises have built up complex and often highly dispersed storage infrastructures. They are characterized by numerous, distributed file systems running on multiple HPC systems.

As a result, storage silos have become the norm. Just some of the consequences are limited access to data, high latency, and increased storage, maintenance and retrieval costs. In particular, these IT infrastructures have a difficult time handling either planned or unexpected peak period loads brought on by activities like checkpointing or unanticipated user demand.

Site-Wide Storage at NERSC

The National Energy Research Scientific Computing Center (NERSC) is an excellent example of how a major institution can solve these thorny storage problems.

More than 5,000 scientists use NERSC’s computational facilities every year. They are performing scientific research for as many as 700 topics spanning such fields as solar energy, bioinformatics, fusion science, astrophysics, climate science and more. The Center currently has six state of the art computer systems and advanced storage systems. Included is “Edison,” a Cray XC30 with a peak performance of over two petaflops.

Since 2006, NERSC has had to continuously address its storage problems and recently the pace has quickened. Typically, up to 400 researchers a day from all over the world were using the Center to handle hundreds of high-bandwidth applications to access, analyze and share research data. Because the facility relied on a multiple different file systems, delivering an optimum balance of capacity and throughput was a major problem.

“We were constantly moving data around within the center to ensure we had sufficient storage to handle new project growth while keeping our existing users happy,” says Jason Hick, group leader, storage systems for NERSC. “It took an inordinate amount of time and created an enormous amount of network traffic.”

Hick’s team debated the merits of deploying additional storage or moving to a centralized solution designed to meet both present and anticipated future growth. They opted for the latter.

Says Hick, “NERSC was a pioneer in moving away from local storage in favor of site-wide Global File systems and consolidated storage architecture.” He adds that performance and efficiency were the primary drivers for adopting a site-wide storage architecture.

DDN Storage Solution

At the heart of the solution is DataDirect Networks Storage Fusion Architecture® (SFA), which provides all the functionality needed to ingest, analyze and archive big data on a single platform.

This approach allows NERSC to deploy a centralized storage capability that can accommodate the requirements of the largest computer system on the network – including peak period bursts – as well meeting the storage needs of the Center’s other five systems. And when the ultra powerful NERSC 8 supercomputer “Cori” is installed in mid-2016, it will make full use of the scalable site-wide storage infrastructure.

Hick reports that the cost of the centralized infrastructure is 30 percent less than a local file system, with savings running into several hundreds of thousands of dollars. “Scratch” storage costs have been cut by more than 50 percent.

NERSC is just one of several large organizations that have moved to site-wide storage solutions based on DDN technology. Included are Texas Advanced Computing Center (TACC), Oak Ridge National Laboratory (ORNL) and Los Alamos National Laboratory (LANL).

But the benefits of site-wide storage solution are not the exclusive domain of these big government labs and major institutions. Smaller sites may not have the resources to buy, deploy and manage infrastructure on the scale of TACC or an ORNL, but they can still enjoy the benefits of site wide storage.  One very successful approach is to “converge” parallel file systems and other applications with storage to create centralized storage building blocks that provide higher performance and lower latency. At the same time, this solution also offers ease of purchase, deployment and management.

Dealing with Big Data at the University of Florida

The University of Florida is a good example. Its Interdisciplinary Center for Biotechnology Research (ICBR) has been in a rapid growth mode, generating increasing amounts of data as it adds new equipment such as next generation sequencers and Cryo-electron microscopy instruments. To handle this growth, ICBR wanted a flexible, low footprint, and simplified infrastructure that could scale as needed.

The Center chose DDN’s converged infrastructure (DDN In-Storage Processing™) which allows users to embed parallel file sysytems and key applications inside the storage controller. This approach has allowed the ICBR to eliminate data access latency as well as the need for additional servers, cabling, network switches and adapters, while reducing administrative overhead. Balanced storage and faster application burst performance means that big data applications perform at optimal levels.

The solution provides the performance and advanced capabilities needed to handle its rapidly growing next generation sequencing projects with their constantly changing application loads.

ICBR’s experience shows how a mid-range organization with limited resources can enjoy a satisfactory site-wide storage solution. The Center has deployed an adaptive and customizable architecture for storing, managing and analyzing large collections of distributed data running to billions of files and petabytes of storage across tens of federated data grids.

Optimized Storage for All

As the University of Florida and the NERSC examples demonstrate, the benefits of site-wide, optimized storage are available to organizations both large and small. DDN, the HPC storage leader is showing the way.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

AI Thought Leaders on Capitol Hill

July 14, 2018

On Thursday, July 12, the House Committee on Science, Space, and Technology heard from four academic and industry leaders – representatives from Berkeley Lab, Argonne Lab, GE Global Research and Carnegie Mellon University – on the opportunities springing from the intersection of machine learning and advanced-scale computing. Read more…

By Tiffany Trader

HPC Serves as a ‘Rosetta Stone’ for the Information Age

July 12, 2018

In an age defined and transformed by its data, several large-scale scientific instruments around the globe might be viewed as a ‘mother lode’ of precious data. With names seemingly created for a ‘techno-speak’ glossary, these interferometers, cyclotrons, sequencers, solenoids, satellite altimeters, and cryo-electron microscopes are churning out data in previously unthinkable and seemingly incomprehensible quantities -- billions, trillions and quadrillions of bits and bytes of electro-magnetic code. Read more…

By Warren Froelich

Can Markov Logic Take Machine Learning to the Next Level?

July 11, 2018

Advances in machine learning, including deep learning, have propelled artificial intelligence (AI) into the public conscience and forced executives to create new business plans based on data. However, the scarcity of hig Read more…

By Alex Woodie

HPE Extreme Performance Solutions

Introducing the First Integrated System Management Software for HPC Clusters from HPE

How do you manage your complex, growing cluster environments? Answer that big challenge with the new HPC cluster management solution: HPE Performance Cluster Manager. Read more…

IBM Accelerated Insights

ORNL Summit Supercomputer Is Officially Here

Oak Ridge National Laboratory (ORNL) together with IBM and Nvidia celebrated the official unveiling of the Department of Energy (DOE) Summit supercomputer today at an event presided over by DOE Secretary Rick Perry. Read more…

CSIR, Nvidia Partner to Launch GPU-Powered AI Center in India

July 10, 2018

As reported by a number of Indian news outlets, India’s Council of Scientific and Industrial Research (CSIR) is partnering with Nvidia to establish a new, AI-focused Centre of Excellence in New Delhi, India's capital. Read more…

By Oliver Peckham

AI Thought Leaders on Capitol Hill

July 14, 2018

On Thursday, July 12, the House Committee on Science, Space, and Technology heard from four academic and industry leaders – representatives from Berkeley Lab, Argonne Lab, GE Global Research and Carnegie Mellon University – on the opportunities springing from the intersection of machine learning and advanced-scale computing. Read more…

By Tiffany Trader

HPC Serves as a ‘Rosetta Stone’ for the Information Age

July 12, 2018

In an age defined and transformed by its data, several large-scale scientific instruments around the globe might be viewed as a ‘mother lode’ of precious data. With names seemingly created for a ‘techno-speak’ glossary, these interferometers, cyclotrons, sequencers, solenoids, satellite altimeters, and cryo-electron microscopes are churning out data in previously unthinkable and seemingly incomprehensible quantities -- billions, trillions and quadrillions of bits and bytes of electro-magnetic code. Read more…

By Warren Froelich

Tsinghua Powers Through ISC18 Field

July 10, 2018

Tsinghua University topped all other competitors at the ISC18 Student Cluster Competition with an overall score of 88.43 out of 100. This gives Tsinghua their s Read more…

By Dan Olds

HPE, EPFL Launch Blue Brain 5 Supercomputer

July 10, 2018

HPE and the Ecole Polytechnique Federale de Lausannne (EPFL) Blue Brain Project yesterday introduced Blue Brain 5, a new supercomputer built by HPE, which displ Read more…

By John Russell

Pumping New Life into HPC Clusters, the Case for Liquid Cooling

July 10, 2018

High Performance Computing (HPC) faces some daunting challenges in the coming years as traditional, industry-standard systems push the boundaries of data center Read more…

By Scott Tease

Meet the ISC18 Cluster Teams: Up Close & Personal

July 6, 2018

It’s time to meet your ISC18 Student Cluster Competition teams. While I was able to film them live at the ISC show, the trick was finding time to edit the vid Read more…

By Dan Olds

PRACEdays18 Keynote Allan Williams (Australia/NCI): We’re Open for Business Down Under!

July 5, 2018

The University of Ljubljana in Slovenia hosted the third annual EHPCSW18 and fifth annual PRACEdays18 events which opened with a plenary session on May 29, 2018 Read more…

By Elizabeth Leake (STEM-Trek for HPCwire)

HPC Under the Covers: Linpack, Exascale & the Top500

June 28, 2018

HPCers can get painted as a monolithic bunch by outsiders, but internecine disagreements abound over the HPCest of HPC jargon, as was evident at ISC this week. Read more…

By Tiffany Trader

Leading Solution Providers

SC17 Booth Video Tours Playlist

Altair @ SC17

Altair

AMD @ SC17

AMD

ASRock Rack @ SC17

ASRock Rack

CEJN @ SC17

CEJN

DDN Storage @ SC17

DDN Storage

Huawei @ SC17

Huawei

IBM @ SC17

IBM

IBM Power Systems @ SC17

IBM Power Systems

Intel @ SC17

Intel

Lenovo @ SC17

Lenovo

Mellanox Technologies @ SC17

Mellanox Technologies

Microsoft @ SC17

Microsoft

Penguin Computing @ SC17

Penguin Computing

Pure Storage @ SC17

Pure Storage

Supericro @ SC17

Supericro

Tyan @ SC17

Tyan

Univa @ SC17

Univa

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This