HPC Leadership Computing Trusts DDN

By Rob Farber

June 13, 2016

As we approach an Exascale future, the focus is on how to provision and use that computational capability. In order to realize the full societal impact of Exascale computing, storage systems to support Exascale supercomputers are equally important else those valuable (and expensive) compute cycles will be wasted during IO operations. Thought leadership in the HPC community agrees that increasing core count is clearly the direction for computation (although there are strong differences of opinion on how those increased core counts are to be implemented). However, the storage picture is more complicated.

Unlike computation or in memory systems, data retained in storage is persistent over a period of years and even decades. Further, storage systems cannot ever risk losing or delivering bad data at any point in the data lifecycle.  This requires deep technical capabilities and experience in large scale computing and data management.  As we look to the future, past performance truly is a predictor of future success in the storage market, which is why more than 2/3 of the world’s fastest computers rely on DDN (Data Direct Networks) for their storage needs. From pre-Petascale supercomputers to the current generation of double-digit PF/s (petaflop per second) machines, DDN has preferentially been selected to partner with end users and technology integrators to expand the limits of HPC computing. Looking to an Exascale world, DDN is investing 10’s of millions of dollars and opening new research and development facilities to create the end-to-end storage technologies that will meet the data requirements of current users and future Exascale supercomputers. DDN storage simply works, is fast, expandable, power efficient, and cost effective, which is why DDN is the storage vendor of choice for HPC professionals and those tasked with advancing the state-of-the-art in leadership class supercomputing.

The recent announcement of the Japanese Oakforest-PACS 25 PF/s supercomputer is the latest double-digit Petascale machine that will utilize a combination of DDN burst buffer, application acceleration, SSD and file system technologies together to achieve results faster than conceived possible even just 2 years ago.  The Oakforest storage system is comprised of 25 DDN IME14KX caching appliances to provide 1.4 TB/s of low-latency flash-based cache. These cache devices will work in conjunction with DDN supplied storage to deliver 400 GB/s of peak Lustre bandwidth to meet the storage bandwidth needs of this latest generation multi-PF/s supercomputer. As can be seen in the figure below, Lustre is just one option as tiered DDN storage works with any parallel file-system.

Figure 1: DDN devices work with any parallel file-system
Figure 1: DDN devices work with any parallel file-system

Infinite Memory Engine

DDN’s IME (Infinite Memory Engine) represents a new IO tier for HPC that treats small IO in precisely the same manner as large sequential IO. This is a revolutionary change from existing parallel filesystems results in near wire-speed performance regardless of random IO patterns, IO size, and shared file access. DDN’s IME product line also has the ability to work with future storage media such as 3D XPoint and others.

Figure 2: Rack performance IME (Image courtesy Cray Users Group)
Figure 2: Rack performance IME (Image courtesy Cray Users Group)

IME “burst-buffers”

The DDN IME intelligently decouples storage performance from the traditional view of ‘storage’ to greatly accelerate HPC workloads – especially for frequently performed checkpoint/restart operations.

As can be seen in the figure below, Burst Bandwidth has traditionally required overprovisioning of storage to meet peak bandwidth needs. Checkpoint/restart operations are an example of a common IO operation that requires storage overprovisioning to quickly move the data and prevent wasting valuable compute cycles. The DDN IME caches can be configured to act as burst buffers that can quickly handle bursts of extremely high IO activity. This is the reason why the Oakforest-PACS supercomputer has been provisioned with 1.4 TB/s of DDN IME bandwidth.

Figure 3: Bursty IO patterns require overprovising
Figure 3: Bursty IO patterns require overprovising

IME positions HPC for the Exascale

Looking ahead to the Exascale, DDN IME caches can save significant capital and operational dollars by reducing the number of devices required to achieve Exascale-capable levels of storage performance. To put this in perspective, Gary Grider famously pointed out in his 2009 presentation, Preparing Applications for Next Generation IO/Storage that plotting Exascale storage costs of millions of dollars in log scale means you have hit the big time!

Figure 4: 2009 projected costs of storage for an Exascale system (image courtesy HPC User Forum)
Figure 4: 2009 projected costs of storage for an Exascale system (image courtesy HPC User Forum)

In contrast, the Oakforest-PACS procurement only required 25 DDN IME14KX caching appliances. As the industry leader, DDN has dramatically redefined the storage landscape and costs associated with Exascale storage systems since 2009 as shown in the graphic below.

Figure 5: DDN has redefined the storage landscape since 2009
Figure 5: DDN has redefined the storage landscape since 2009

For HPC, DDN IME devices makes high-performance clusters, multi-PF/s systems, and Exascale computation both possible and affordable.

Figure 6: A DDN IME14k (click to see more)
Figure 6: A DDN IME14k (click to see more)

The many uses of IME

Of course, IME storage works great for databases, out-of-core solvers, and a variety of other scientific and commercial HPC workloads.

Figure 7: Additional uses of a DDN IME product
Figure 7: Additional uses of a DDN IME product
  1. A Write Accelerating Burst Buffer absorbing the bulk application data into the IME14K NVMe solid state cache significantly faster than the file system can absorb it.
  2. A File System Accelerator and Application Optimizer as IME reorders application I/O to optimize flushing the cache to long term storage (enabling purchasing as little expensive cache possible).
Figure 8: Dataflow in the client
Figure 8: Dataflow in the client
  1. A Read-optimized Application-I/O Accelerator that enables out-of-band API configuration of the IME appliance to optimize both reads and writes, allowing more simultaneous job runs, shortening the job queue and enabling significantly faster application run time to the user. The API integrates IME with the job schedulers and pre-stages / warms the cache for new jobs, accelerating first read.

Standard script operations make utilization of DDN IME appliance capabilities straight-forward. The following shows how to use the DDN IME as an application accelerator.

Figure 9: IME acts as an application IO accelerator
Figure 9: IME acts as an application IO accelerator

Robustness and Scalability are key!

Cost and power savings are for naught if the storage solution is not robust and scalable as well.

DDN gives the customer the option of using a technique called erasure coding to protect against storage failures. Erasure codes are primarily used in scale-out object storage systems where erasure encoded data blocks are distributed across multiple storage nodes to provide protection against both media and node failures. Erasure encoding can literally save racks of storage nodes when compared to the alternative, three- or four-way mirroring/replication [For more information click here].

Option 1: Data protection is optional. The IME server and associated storage media are considered “just cache” where the data can be recreated if lost.

Option 2: Erasure coding is calculated at the client:

  • Exhibits excellent scaling and can run with high client counts.
  • Servers don’t get clogged up.
  • There is a tradeoff as erasure coding does reduce usable client bandwidth and IME capacity according to IME count by roughly 11% (in an 8+1 configuration) to 25% (in a 3+1 configuration).
Figure 10: Erasure encoding distributed across multiple IMEs
Figure 10: Erasure encoding distributed across multiple IMEs

Managing the full spectrum of end-to-end data lifecycle management

Robust, scalable, and performant storage are but part of the HPC storage picture as data archive must also be considered as well as full life cycle data management and distributed cloud based storage. Similarly, questions are being raised about the efficacy of POSIX based file-systems in future HPC systems. For this reason, object storage systems are undergoing rapid development.

To address current and future end-user storage needs – even at the Exascale – DDN has created a complete portfolio of end-to-end storage products that work together as an extremely flexible data lifecycle management toolset. DDN claims these tools that can be applied anywhere and at any scale.

Figure 11: DDN end-to-end big data lifecycle management
Figure 11: DDN end-to-end big data lifecycle management

Briefly, the DDN storage portfolio covers:

  • Fast data and compute: Addressed through the DDN family of IME products.
  • File-system appliances: DDN products include the GRIDScaler® and EXAScaler®.
  • Persistent data: Persistent data for a variety of commercial and big data workloads are addressed via the SFA14k™ storage array products.
  • Object and cloud storage: The WOS® Object storage for private and hybrid clouds take DDN customers beyond traditional file-systems. WOS is described in the DDN white paper, WOS® 360° full spectrum object storage.
Figure 12: WOS object storage
Figure 12: WOS object storage

For more information

For more information, visit http://www.ddn.com.


Rob Farber is a global technology consultant and author with an extensive background in HPC and storage technologies that he applies at national labs and commercial organizations. He can be reached at [email protected]

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

GTC 2019: Chief Scientist Bill Dally Provides Glimpse into Nvidia Research Engine

March 22, 2019

Amid the frenzy of GTC this week – Nvidia’s annual conference showcasing all things GPU (and now AI) – William Dally, chief scientist and SVP of research, provided a brief but insightful portrait of Nvidia’s rese Read more…

By John Russell

ORNL Helps Identify Challenges of Extremely Heterogeneous Architectures

March 21, 2019

Exponential growth in classical computing over the last two decades has produced hardware and software that support lightning-fast processing speeds, but advancements are topping out as computing architectures reach thei Read more…

By Laurie Varma

Interview with 2019 Person to Watch Jim Keller

March 21, 2019

On the heels of Intel's reaffirmation that it will deliver the first U.S. exascale computer in 2021, which will feature the company's new Intel Xe architecture, we bring you our interview with our 2019 Person to Watch Jim Keller, head of the Silicon Engineering Group at Intel. Read more…

By HPCwire Editorial Team

HPE Extreme Performance Solutions

HPE and Intel® Omni-Path Architecture: How to Power a Cloud

Learn how HPE and Intel® Omni-Path Architecture provide critical infrastructure for leading Nordic HPC provider’s HPCFLOW cloud service.

powercloud_blog.jpgFor decades, HPE has been at the forefront of high-performance computing, and we’ve powered some of the fastest and most robust supercomputers in the world. Read more…

IBM Accelerated Insights

Insurance: Where’s the Risk?

Insurers are facing extreme competitive challenges in their core businesses. Property and Casualty (P&C) and Life and Health (L&H) firms alike are highly impacted by the ongoing globalization, increasing regulation, and digital transformation of their client bases. Read more…

What’s New in HPC Research: TensorFlow, Buddy Compression, Intel Optane & More

March 20, 2019

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

GTC 2019: Chief Scientist Bill Dally Provides Glimpse into Nvidia Research Engine

March 22, 2019

Amid the frenzy of GTC this week – Nvidia’s annual conference showcasing all things GPU (and now AI) – William Dally, chief scientist and SVP of research, Read more…

By John Russell

At GTC: Nvidia Expands Scope of Its AI and Datacenter Ecosystem

March 19, 2019

In the high-stakes race to provide the AI life-cycle solution of choice, three of the biggest horses in the field are IBM, Intel and Nvidia. While the latter is only a fraction of the size of its two bigger rivals, and has been in business for only a fraction of the time, Nvidia continues to impress with an expanding array of new GPU-based hardware, software, robotics, partnerships and... Read more…

By Doug Black

Nvidia Debuts Clara AI Toolkit with Pre-Trained Models for Radiology Use

March 19, 2019

AI’s push into healthcare got a boost yesterday with Nvidia’s release of the Clara Deploy AI toolkit which includes 13 pre-trained models for use in radiolo Read more…

By John Russell

It’s Official: Aurora on Track to Be First US Exascale Computer in 2021

March 18, 2019

The U.S. Department of Energy along with Intel and Cray confirmed today that an Intel/Cray supercomputer, "Aurora," capable of sustained performance of one exaf Read more…

By Tiffany Trader

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

Oil and Gas Supercloud Clears Out Remaining Knights Landing Inventory: All 38,000 Wafers

March 13, 2019

The McCloud HPC service being built by Australia’s DownUnder GeoSolutions (DUG) outside Houston is set to become the largest oil and gas cloud in the world th Read more…

By Tiffany Trader

Quick Take: Trump’s 2020 Budget Spares DoE-funded HPC but Slams NSF and NIH

March 12, 2019

U.S. President Donald Trump’s 2020 budget request, released yesterday, proposes deep cuts in many science programs but seems to spare HPC funding by the Depar Read more…

By John Russell

Nvidia Wins Mellanox Stakes for $6.9 Billion

March 11, 2019

The long-rumored acquisition of Mellanox came to fruition this morning with GPU chipmaker Nvidia’s announcement that it has purchased the high-performance net Read more…

By Doug Black

Quantum Computing Will Never Work

November 27, 2018

Amid the gush of money and enthusiastic predictions being thrown at quantum computing comes a proposed cold shower in the form of an essay by physicist Mikhail Read more…

By John Russell

The Case Against ‘The Case Against Quantum Computing’

January 9, 2019

It’s not easy to be a physicist. Richard Feynman (basically the Jimi Hendrix of physicists) once said: “The first principle is that you must not fool yourse Read more…

By Ben Criger

ClusterVision in Bankruptcy, Fate Uncertain

February 13, 2019

ClusterVision, European HPC specialists that have built and installed over 20 Top500-ranked systems in their nearly 17-year history, appear to be in the midst o Read more…

By Tiffany Trader

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

Intel Reportedly in $6B Bid for Mellanox

January 30, 2019

The latest rumors and reports around an acquisition of Mellanox focus on Intel, which has reportedly offered a $6 billion bid for the high performance interconn Read more…

By Doug Black

Looking for Light Reading? NSF-backed ‘Comic Books’ Tackle Quantum Computing

January 28, 2019

Still baffled by quantum computing? How about turning to comic books (graphic novels for the well-read among you) for some clarity and a little humor on QC. The Read more…

By John Russell

Contract Signed for New Finnish Supercomputer

December 13, 2018

After the official contract signing yesterday, configuration details were made public for the new BullSequana system that the Finnish IT Center for Science (CSC Read more…

By Tiffany Trader

It’s Official: Aurora on Track to Be First US Exascale Computer in 2021

March 18, 2019

The U.S. Department of Energy along with Intel and Cray confirmed today that an Intel/Cray supercomputer, "Aurora," capable of sustained performance of one exaf Read more…

By Tiffany Trader

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
NVIDIA @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

Deep500: ETH Researchers Introduce New Deep Learning Benchmark for HPC

February 5, 2019

ETH researchers have developed a new deep learning benchmarking environment – Deep500 – they say is “the first distributed and reproducible benchmarking s Read more…

By John Russell

IBM Quantum Update: Q System One Launch, New Collaborators, and QC Center Plans

January 10, 2019

IBM made three significant quantum computing announcements at CES this week. One was introduction of IBM Q System One; it’s really the integration of IBM’s Read more…

By John Russell

IBM Bets $2B Seeking 1000X AI Hardware Performance Boost

February 7, 2019

For now, AI systems are mostly machine learning-based and “narrow” – powerful as they are by today's standards, they're limited to performing a few, narro Read more…

By Doug Black

The Deep500 – Researchers Tackle an HPC Benchmark for Deep Learning

January 7, 2019

How do you know if an HPC system, particularly a larger-scale system, is well-suited for deep learning workloads? Today, that’s not an easy question to answer Read more…

By John Russell

HPC Reflections and (Mostly Hopeful) Predictions

December 19, 2018

So much ‘spaghetti’ gets tossed on walls by the technology community (vendors and researchers) to see what sticks that it is often difficult to peer through Read more…

By John Russell

Arm Unveils Neoverse N1 Platform with up to 128-Cores

February 20, 2019

Following on its Neoverse roadmap announcement last October, Arm today revealed its next-gen Neoverse microarchitecture with compute and throughput-optimized si Read more…

By Tiffany Trader

Move Over Lustre & Spectrum Scale – Here Comes BeeGFS?

November 26, 2018

Is BeeGFS – the parallel file system with European roots – on a path to compete with Lustre and Spectrum Scale worldwide in HPC environments? Frank Herold Read more…

By John Russell

France to Deploy AI-Focused Supercomputer: Jean Zay

January 22, 2019

HPE announced today that it won the contract to build a supercomputer that will drive France’s AI and HPC efforts. The computer will be part of GENCI, the Fre Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This