Breaking Boundaries with Supercomputer Storage

By Andy Morris, IBM Cognitive Infrastructure

November 4, 2019

[Attend the IBM LSF & HPC User Group Meeting at SC19 in Denver on November 19!]

In grid computing, computers on a network can work on a task together, acting as a single supercomputer. Grid storage extends this concept by employing multiple interconnected nodes that can communicate with any other storage node without the data having to pass through a centralized switch. Each storage node contains its own storage medium, CPUs, grid management software, and robust data services.

Grid storage offers a number of advantages over other storage architectures. First of all, it is more fault-tolerant and redundant. If one storage node fails or a pathway between two nodes is interrupted, the network can reroute access another way or to a redundant node, reducing the need for online maintenance and practically eliminating downtime. The existence of multiple paths between each pair of nodes ensures that storage grids can maintain optimum performance under conditions of fluctuating load. Also, grid storage is scalable by design. If a new storage node is added, it is automatically recognized by the rest of the grid and its resources are added to the whole. Grid storage performance, capacity, and even resilience can grow easily by adding nodes.1

[Also read: With HPC the Future is Looking Grid]

One of the best-known and highly regarded grid storage solutions on the market today is IBM Elastic Storage Server (ESS) built with the IBM Spectrum Scale data management system. IBM Spectrum Scale started out as a massively parallel high-performance computing (HPC) file management solution called General Parallel File System (GPFS). Today, it’s still used to manage the storage for some of the world’s most powerful computing platforms. In fact, the two fastest supercomputers on the planet right now – Summit and Sierra – both leverage the capabilities of IBM Spectrum Scale, as does the fastest commercial supercomputer, Pangea III.2 And all three of these behemoths employ grid storage solutions in the form of IBM ESS implementations.

Many supercomputing facilities currently employ grid storage solutions. For example, the Ohio Supercomputer Center (OSC) is working with IBM to expand the Center’s HPC storage capacity by 8.6 petabytes. Each year, OSC systems serve the computational and storage needs of more than 3,000 academic, government, and industry researchers. The new OSC storage will be provided by an IBM ESS grid storage solution powered by IBM Spectrum Scale. The new solution can expand capacity for both temporary and long-term project storage. And thanks to the capabilities of the IBM software-defined storage (SDS) elements involved, it also allows OSC to offer data encryption and full file system audit capabilities that enable secure storage of sensitive and personally identifiable information – broadening the horizon of the customer base and workloads OSC can support.

IBM ESS incorporates a number of cutting-edge technologies that improve storage performance, efficiency, and data protection. For example, IBM Spectrum Scale erasure coding implements a sophisticated data layout scheme that spreads data and redundancy information across all the drives of an ESS grid. This not only enhances data protection, but when a drive fails, IBM Spectrum Scale erasure coding simply ensures that the surviving drives swap some data and move on. A disk rebuild in a RAID 6 array with 4TB drives can negatively impact the performance of the entire system for 17 to 24 hours. But with IBM Spectrum Scale erasure coding, the impact can be in the range of a few minutes.

Innovation within the basic IBM ESS platform is continuing at a brisk pace. Recently, IBM announced a new ESS solution called IBM Elastic Storage System 3000 (ESS 3000). The new solution is designed to be one of the simplest ways to deploy IBM Spectrum Scale’s world-class file system technology. It leverages the ultra-low latency and massive throughput advantages offered by Non-Volatile Memory Express (NVMe)-enabled flash storage. ESS 3000 utilizes containerized software delivery for simplified installation and upgrade. It offers capacities ranging from 23TB to 370TB, plus blisteringly fast 40GB/sec performance within each efficient 2U building block, with exabyte scalability. The new solution brings ground-breaking capabilities for Artificial Intelligence (AI)-based applications and workloads, providing full GPU resource utilization in servers such as IBM Power AC922 and the fastest benchmarks with NVIDIA DGX systems.

[Related Article: Analyst Review: Six Recent IBM Innovations in High-Performance Computing]

New supercomputing storage solutions are always a cornerstone of The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), which will be held next month at the Colorado Convention Center in Denver, Colorado. Over 13,000 people attended the conference last year.

The technical program is considered the heart of supercomputer conferences. This year, presentations, tutorials, panels, and discussion forums will address a very wide range of scientific and engineering research, as well as technological development, innovation, breakthroughs, and inspired new areas of computing.

Among a rich program of technical seminars offered by the IBM team at SC19 is a topic that deserves particular attention and some deeper exploration. Imagine a list of storage infrastructure requirements that seem almost impossible to address all in one solution:

  • The storage solution must be able to start small but grow virtually without limit.
  • Performance and capacity must easily scale into the terabyte and petabyte ranges.
  • It must utilize a better data protection scheme than traditional RAID.
  • Availability must be very good and grow even better as the solution expands.
  • It must easily accommodate highly variable workloads.
  • The solution must work on commodity servers but also be available using NVMe-enhanced storage arrays.
  • It must be as easy to install as any software-defined storage solution.
  • The storage solution must leverage leading-edge innovation and yet all its elements must be already proven in the most demanding, extreme research and business environments.

Along with exploring topics ranging from the crucial role of Information Architecture to the effects of containers and Kubernetes in HPC environments, IBM experts will introduce the new ESS 3000 platform in detail during the Technical Seminars at SC19. Grid storage isn’t new to the HPC audience, but on-going innovation is keeping this powerful architecture very relevant for today’s AI and Big Data workloads. It’s a topic worth learning a lot more about, soon.

Shares

In grid computing, computers on a network can work on a task together, acting as a single supercomputer. Read more…

" share_counter=""]
Return to Solution Channel Homepage
HPCwire