Parallel Storage: A Remedy for HPC Data Management

By Christopher Lazou

January 18, 2008

The advent of more powerful compute systems has increased the capacity to generate data at a fantastic rate. To solve associated issues of data management, a combination of grid technology and other storage components are currently being deployed. Many solutions have been designed to address these petabyte-scale data management problems, including new software, NAS/NFS products and parallel storage solutions from IBM, Panasas and others. This involves handling and storing very large data sets accessed simultaneously by thousands of compute clients.

Extreme single projects such as the Linear Hadron Collider (LHC) at CERN are producing 15 petabytes of data each year. Raw data are distributed to Tier-0 data centres and then, after an order of magnitude reduction, are passed on to Tier-1 data centres and so on. This activity involves 130 computer centres of which 12 are very large.

In HPC there is a strong demand for parallel storage from users in the fields of computational physics, CFD, crash analysis, climate modelling, oceanography, seismic processing and interpretation, bioinformatics, cosmology, computational chemistry and materials sciences. The parallel storage requirement is being driven by the growing size of data sets, more complex analysis, the requirement to run more jobs, simulations with more iterations and the fact that the HPC solutions (Linux clusters) are using multicore processors and more nodes. Inherently the systems and applications are becoming more parallel, hence the requirement for parallel I/O increases.

Before we concentrate on HPC storage needs, let’s briefly review the disk storage market trends.

The disk storage market is expanding rapidly. By 2008 the HPC storage market will be well over $4 billion, according to IDC. The spread of broadband creates huge volumes of data, increasing data exchange in commercial transactions, email, images, video and music. Since interactions are global, this is happening 24 hours a day, 7 days a week. This data growth and non-stop operations put storage and data protection at the heart of this business, requiring high-speed processing for large data sets and high-speed backup for protection. Tape is insufficient for this purpose.

High-end NAS storage systems are likely to be using 10 Gbps TCP/IP (soon to be using 20 Gbps) and could have over 150 TBs in a single rack. It is often connected to SAN with a fibre cable and expandable into a cluster.

To deliver a “best-in-class” solution, the compute server and data handling are decoupled. They are highly complementary, but need to be scaled together for balance to handle several petabytes of active data. Although data patterns vary, the system needs to be designed from the ground up for multiple petabyte capability and several millions, or even billions, of files. It is therefore imperative that the data handling systems scale and the network bandwidth does not become a bottleneck.

In the rich digital content environment of today, the limitations of traditional NAS/SAN storage — scalability, performance bottlenecks and cost — are driving the industry to find new solutions. The response from the industry was the clustered storage evolution. Vendors claim that clustered NFS storage provides customers with enormous benefits in this digital content environment. The benefits include massive scalability, 100X larger file system, unmatched performance, 20X higher total throughput and industry-leading reliability. They also claim it is as easy to manage a 10-petabyte file system as a 1-terabyte file system. Clustered NFS solutions are fine for most large Web sites, but they simply don’t handle the kind of large files typical of most HPC applications very well.

A typical cluster computing architecture consists of a software stack of applications and middleware, tens or thousands of processors/clients, a high speed interconnect using, say, 10GigE, InfiniBand, Myrinet or Quadrics, thousands of direct network connections and hundreds of connections to physical storage.
 
Storage clusters, similar to compute clusters, transparently aggregate a large amount of independent storage nodes in order to appear as a single-entity. They typically use the same network technology as the compute cluster (InfiniBand or 10GigE), processing power (CPU, multicore, SMP), large amounts of globally coherent cache, and disk drives (up to 1 TB each).

A cluster file system is likely to be using industry standard protocols, NFS, CIFS, HTTP, FTP, NDMP, SNMP, ADS, LDAP and NIS for security, or some other product of similar standing. A cluster file system creates one giant drive or NFS mounted fully symmetric cluster. Such a system is massively scalable to multiple petabytes, easy to manage and has plenty of growth potential. The management of LUNs, volumes or RAID is taken care of by the storage cluster management system and is normally hidden from the user.

The future of HPC is tied to larger data sets, more CPUs applied to each problem, and a requirement for parallel storage. Today’s high density 1U servers (typically with 8 cores each) have increased the number of processing cores per node, but I/O bandwidth has not evolved at the same rate. The reality is that the number of cores per node is still increasing, however scientific and technical analysis requires a system that balances compute cores and I/O bandwidth.

With this increase in compute nodes, traditional single-server NFS solutions have quickly become a bottleneck. A first approach to solve this problem came in the form of clustered NFS. This however is falling short of HPC requirements. Major HPC sites are therefore not significantly deploying clustered NFS, but are rather moving directly from NFS to parallel storage (like Panasas, IBM GPFS and Lustre).

Government and academia users are already heavily deploying parallel storage and this is likely to become a requirement for all simulation and modelling applications deployed on clusters. Simply put, parallel compute clusters require parallel storage!

In the last few years, new storage companies have succeeded in taking a significant share of the file storage component of the HPC market from traditional storage providers such as Network Appliance, IBM, Sun, NEC and so on. For example, Panasas made news in 2007, when it was chosen to provide the data storage subsystem to support the RoadRunner petaflop Supercomputer, built by IBM to be installed at Los Alamos. It’s interesting to note that LANL chose Panasas parallel storage even over IBM’s parallel storage system, GPFS.

Another feather in Panasas’ cap is that the company scooped the annual HPCwire reader’s choice and editors’ choice awards for Panasas ActiveStor parallel storage and for the new Panasas Tiered Parity architecture respectively, at Supercomputing 2007 (SC07) in Reno, Nev.

To overcome the potential I/O bottleneck inherent in such a large-scale system as RoadRunner, Panasas offered PanFS as part of its ActiveStor Storage cluster architecture. This architecture is object-based and uses the DirectFLOW protocol to provide high scalability, reliability and manageability. It supports Red Hat, SUSE and Fedora, and its DirectorBlades manage and enable metadata scalability by dividing namespace into virtual volumes.

PanFS is promoted by Panasas as the “best-in-class” file system for HPC environments. The company claims the system eliminates bottlenecks, solves manageability problems and improves overall reliability.

When Len Rosenthal, Panasas chief marketing officer, was asked what differentiates Panasas from other cluster storage vendors he said: “The ‘parallel’ element of our offering differentiates us from the clustered storage vendors as we can provide massive speed-up for HPC applications and higher utilization of clusters through parallelism.”
 
“What is driving the need for ‘parallel storage’ in HPC is the combination of multiple factors: 1) Explosion of data sets due to the need to run large and more accurate models. 2) The massive use of x86 clusters and multicore CPUs, where users are applying 100s and 1000s of CPUs to simulation and modelling problems. 3) Currently deployed I/O and file systems based on NFS, and even clustered NFS, cannot handle the I/O requirements,” continued Rosenthal.

According to Panasas, the evolution to Parallel NFS (pNFS) is the ultimate proof that the computer storage world is going parallel. Even though pNFS is inspired by Panasas technology, IBM, Sun, EMC and NetApp are all committed to implementing pNFS. One presumes that despite being competitors, these companies also recognise the performance and scalability advantages of parallel storage, especially for future HPC; hence, that is why they are also working towards the standardisation of pNFS.

The merits of standards are well known. Standards drive product adoption, unlock markets, drive down costs, make interoperability possible and reduce risk to the client. The key storage vendors have existing incompatible parallel file system products with no interoperability. IBM has GPFS, EMC MPFSi (High Road), Panasas ActiveScale, HP has Polyserve and so on. Similar interoperability concerns are also present in open source Red Hat GFS and Lustre.

pNFS is an extension to the Network File System v4 protocol standard. It allows for parallel and direct access from parallel Network File System clients to storage devices over multiple storage protocols. It essentially moves the Network File System server out of the data path.

The pNFS standard defines the NFSv4.1 protocol extensions between the server and the client. The I/O protocol between the client and storage is specified elsewhere, for example: SCSI Block Commands (SBC) over Fibre Channel (FC), SCSI Object-based Storage Device (OSD) over iSCSI and Network File System (NFS). The control protocol between the metadata server and storage devices is also specified elsewhere, for example: SCSI Object-based Storage Device (OSD) over iSCSI.

In my view, this standards effort is admirable and should be supported across the storage industry. Potential benefits for users include improved sustained performance, accelerated time to results (solution) and parallel storage capability with standard highly reliable performance. It offers more choice of parallel I/O capabilities from multiple storage vendors, freedom to access parallel storage from any client, as well as mix and match best of breed of vendor offerings. It also contains lower risk for the user community, since client constructs are tested and optimised by the operating system of vendors whilst the customer is free from vendor lock-in concerns. In short, it extends the benefits of the investment in storage systems.

In summary, vendors and users are recognising that the future of high-end file storage is parallel. The early adopters like government and academia have adopted it, but anyone in the HPC space who is building clusters with 100s of CPU-cores and generating terabytes of data will require parallel storage.

IBM, Lustre and Panasas are the primary parallel storage systems deployed in government and academia, but Panasas is a strong viable alternative in providing parallel storage systems to large commercial companies, like those in the energy, manufacturing and financial markets. Panasas customers include: Boeing, BP, Petroleum GeoServices, Fairfield Industries, Hyundai Automotive Technical Center, Statoil, BMW/Sauber F1 Motor Sports, Paradigm, Northrop Grumman, PetroChina, Novartis and dozens of others. Thus, companies that are using HPC and trying to accelerate product development and make profits from their HPC infrastructure are increasingly turning to parallel storage as their preferred solutions. Remember the old saying: “The proof of the pudding is in the eating.”

—–

Copyright (c) Christopher Lazou. January 2008. Brands and names are the property of their respective owners.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire