HPC Leadership Computing Trusts DDN

By Rob Farber

June 13, 2016

As we approach an Exascale future, the focus is on how to provision and use that computational capability. In order to realize the full societal impact of Exascale computing, storage systems to support Exascale supercomputers are equally important else those valuable (and expensive) compute cycles will be wasted during IO operations. Thought leadership in the HPC community agrees that increasing core count is clearly the direction for computation (although there are strong differences of opinion on how those increased core counts are to be implemented). However, the storage picture is more complicated.

Unlike computation or in memory systems, data retained in storage is persistent over a period of years and even decades. Further, storage systems cannot ever risk losing or delivering bad data at any point in the data lifecycle.  This requires deep technical capabilities and experience in large scale computing and data management.  As we look to the future, past performance truly is a predictor of future success in the storage market, which is why more than 2/3 of the world’s fastest computers rely on DDN (Data Direct Networks) for their storage needs. From pre-Petascale supercomputers to the current generation of double-digit PF/s (petaflop per second) machines, DDN has preferentially been selected to partner with end users and technology integrators to expand the limits of HPC computing. Looking to an Exascale world, DDN is investing 10’s of millions of dollars and opening new research and development facilities to create the end-to-end storage technologies that will meet the data requirements of current users and future Exascale supercomputers. DDN storage simply works, is fast, expandable, power efficient, and cost effective, which is why DDN is the storage vendor of choice for HPC professionals and those tasked with advancing the state-of-the-art in leadership class supercomputing.

The recent announcement of the Japanese Oakforest-PACS 25 PF/s supercomputer is the latest double-digit Petascale machine that will utilize a combination of DDN burst buffer, application acceleration, SSD and file system technologies together to achieve results faster than conceived possible even just 2 years ago.  The Oakforest storage system is comprised of 25 DDN IME14KX caching appliances to provide 1.4 TB/s of low-latency flash-based cache. These cache devices will work in conjunction with DDN supplied storage to deliver 400 GB/s of peak Lustre bandwidth to meet the storage bandwidth needs of this latest generation multi-PF/s supercomputer. As can be seen in the figure below, Lustre is just one option as tiered DDN storage works with any parallel file-system.

Figure 1: DDN devices work with any parallel file-system
Figure 1: DDN devices work with any parallel file-system

Infinite Memory Engine

DDN’s IME (Infinite Memory Engine) represents a new IO tier for HPC that treats small IO in precisely the same manner as large sequential IO. This is a revolutionary change from existing parallel filesystems results in near wire-speed performance regardless of random IO patterns, IO size, and shared file access. DDN’s IME product line also has the ability to work with future storage media such as 3D XPoint and others.

Figure 2: Rack performance IME (Image courtesy Cray Users Group)
Figure 2: Rack performance IME (Image courtesy Cray Users Group)

IME “burst-buffers”

The DDN IME intelligently decouples storage performance from the traditional view of ‘storage’ to greatly accelerate HPC workloads – especially for frequently performed checkpoint/restart operations.

As can be seen in the figure below, Burst Bandwidth has traditionally required overprovisioning of storage to meet peak bandwidth needs. Checkpoint/restart operations are an example of a common IO operation that requires storage overprovisioning to quickly move the data and prevent wasting valuable compute cycles. The DDN IME caches can be configured to act as burst buffers that can quickly handle bursts of extremely high IO activity. This is the reason why the Oakforest-PACS supercomputer has been provisioned with 1.4 TB/s of DDN IME bandwidth.

Figure 3: Bursty IO patterns require overprovising
Figure 3: Bursty IO patterns require overprovising

IME positions HPC for the Exascale

Looking ahead to the Exascale, DDN IME caches can save significant capital and operational dollars by reducing the number of devices required to achieve Exascale-capable levels of storage performance. To put this in perspective, Gary Grider famously pointed out in his 2009 presentation, Preparing Applications for Next Generation IO/Storage that plotting Exascale storage costs of millions of dollars in log scale means you have hit the big time!

Figure 4: 2009 projected costs of storage for an Exascale system (image courtesy HPC User Forum)
Figure 4: 2009 projected costs of storage for an Exascale system (image courtesy HPC User Forum)

In contrast, the Oakforest-PACS procurement only required 25 DDN IME14KX caching appliances. As the industry leader, DDN has dramatically redefined the storage landscape and costs associated with Exascale storage systems since 2009 as shown in the graphic below.

Figure 5: DDN has redefined the storage landscape since 2009
Figure 5: DDN has redefined the storage landscape since 2009

For HPC, DDN IME devices makes high-performance clusters, multi-PF/s systems, and Exascale computation both possible and affordable.

Figure 6: A DDN IME14k (click to see more)
Figure 6: A DDN IME14k (click to see more)

The many uses of IME

Of course, IME storage works great for databases, out-of-core solvers, and a variety of other scientific and commercial HPC workloads.

Figure 7: Additional uses of a DDN IME product
Figure 7: Additional uses of a DDN IME product
  1. A Write Accelerating Burst Buffer absorbing the bulk application data into the IME14K NVMe solid state cache significantly faster than the file system can absorb it.
  2. A File System Accelerator and Application Optimizer as IME reorders application I/O to optimize flushing the cache to long term storage (enabling purchasing as little expensive cache possible).
Figure 8: Dataflow in the client
Figure 8: Dataflow in the client
  1. A Read-optimized Application-I/O Accelerator that enables out-of-band API configuration of the IME appliance to optimize both reads and writes, allowing more simultaneous job runs, shortening the job queue and enabling significantly faster application run time to the user. The API integrates IME with the job schedulers and pre-stages / warms the cache for new jobs, accelerating first read.

Standard script operations make utilization of DDN IME appliance capabilities straight-forward. The following shows how to use the DDN IME as an application accelerator.

Figure 9: IME acts as an application IO accelerator
Figure 9: IME acts as an application IO accelerator

Robustness and Scalability are key!

Cost and power savings are for naught if the storage solution is not robust and scalable as well.

DDN gives the customer the option of using a technique called erasure coding to protect against storage failures. Erasure codes are primarily used in scale-out object storage systems where erasure encoded data blocks are distributed across multiple storage nodes to provide protection against both media and node failures. Erasure encoding can literally save racks of storage nodes when compared to the alternative, three- or four-way mirroring/replication [For more information click here].

Option 1: Data protection is optional. The IME server and associated storage media are considered “just cache” where the data can be recreated if lost.

Option 2: Erasure coding is calculated at the client:

  • Exhibits excellent scaling and can run with high client counts.
  • Servers don’t get clogged up.
  • There is a tradeoff as erasure coding does reduce usable client bandwidth and IME capacity according to IME count by roughly 11% (in an 8+1 configuration) to 25% (in a 3+1 configuration).
Figure 10: Erasure encoding distributed across multiple IMEs
Figure 10: Erasure encoding distributed across multiple IMEs

Managing the full spectrum of end-to-end data lifecycle management

Robust, scalable, and performant storage are but part of the HPC storage picture as data archive must also be considered as well as full life cycle data management and distributed cloud based storage. Similarly, questions are being raised about the efficacy of POSIX based file-systems in future HPC systems. For this reason, object storage systems are undergoing rapid development.

To address current and future end-user storage needs – even at the Exascale – DDN has created a complete portfolio of end-to-end storage products that work together as an extremely flexible data lifecycle management toolset. DDN claims these tools that can be applied anywhere and at any scale.

Figure 11: DDN end-to-end big data lifecycle management
Figure 11: DDN end-to-end big data lifecycle management

Briefly, the DDN storage portfolio covers:

  • Fast data and compute: Addressed through the DDN family of IME products.
  • File-system appliances: DDN products include the GRIDScaler® and EXAScaler®.
  • Persistent data: Persistent data for a variety of commercial and big data workloads are addressed via the SFA14k™ storage array products.
  • Object and cloud storage: The WOS® Object storage for private and hybrid clouds take DDN customers beyond traditional file-systems. WOS is described in the DDN white paper, WOS® 360° full spectrum object storage.
Figure 12: WOS object storage
Figure 12: WOS object storage

For more information

For more information, visit http://www.ddn.com.


Rob Farber is a global technology consultant and author with an extensive background in HPC and storage technologies that he applies at national labs and commercial organizations. He can be reached at [email protected]

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

LANL Researchers Simulate Billion-Atom Biomolecule

April 23, 2019

Simulating large biomolecules has long been challenging. Now, researchers from Los Alamos National Laboratory, RIKEN Center for Computational Science in Japan, the New Mexico Consortium, and New York University have succ Read more…

By John Russell

Students Gird for Cluster Mayhem at ASC19

April 23, 2019

Final cluster configurations have been set, and competitors in the ASC19 Student Supercomputer Challenge have started running the various AI models and HPC benchmarks that will determine who is declared champion. But if Read more…

By Alex Woodie

Student Cluster Season Opener: ASC19

April 22, 2019

Calling all computer sports fans! Now hear this:  The 2019 Student Cluster Competition season is officially underway with the beginning of the ASC19 event on Tuesday, April 22nd. For you millions of student cluster c Read more…

By Dan Stark

HPE Extreme Performance Solutions

HPE and Intel® Omni-Path Architecture: How to Power a Cloud

Learn how HPE and Intel® Omni-Path Architecture provide critical infrastructure for leading Nordic HPC provider’s HPCFLOW cloud service.

powercloud_blog.jpgFor decades, HPE has been at the forefront of high-performance computing, and we’ve powered some of the fastest and most robust supercomputers in the world. Read more…

IBM Accelerated Insights

Bridging HPC and Cloud Native Development with Kubernetes

The HPC community has historically developed its own specialized software stack including schedulers, filesystems, developer tools, container technologies tuned for performance and large-scale on-premises deployments. Read more…

A Beginner’s Guide to the ASC19 Finals

April 22, 2019

Three thousand watts. That's how much power the competitors in the 2019 ASC Student Supercomputer Challenge here in Dalian, China, have to work with. Everybody would like more juice to run compute-intensive HPC simulatio Read more…

By Alex Woodie

A Beginner’s Guide to the ASC19 Finals

April 22, 2019

Three thousand watts. That's how much power the competitors in the 2019 ASC Student Supercomputer Challenge here in Dalian, China, have to work with. Everybody Read more…

By Alex Woodie

At ASF 2019: The Virtuous Circle of Big Data, AI and HPC

April 18, 2019

We've entered a new phase in IT -- in the world, really -- where the combination of big data, artificial intelligence, and high performance computing is pushing Read more…

By Alex Woodie with Doug Black and Tiffany Trader

Interview with 2019 Person to Watch Michela Taufer

April 18, 2019

Today, as part of our ongoing HPCwire People to Watch focus series, we are highlighting our interview with 2019 Person to Watch Michela Taufer. Michela -- the Read more…

By HPCwire Editorial Team

Intel Gold U-Series SKUs Reveal Single Socket Intentions

April 18, 2019

Intel plans to jump into the single socket market with a portion of its just announced Cascade Lake microprocessor line according to one media report. This isn Read more…

By John Russell

BSC Researchers Shrink Floating Point Formats to Accelerate Deep Neural Network Training

April 15, 2019

Sometimes calculating solutions as precisely as a computer can wastes more CPU resources than is necessary. A case in point is with deep learning. In early stag Read more…

By Ken Strandberg

Intel Extends FPGA Ecosystem with 10nm Agilex

April 11, 2019

The insatiable appetite for higher throughput and lower latency – particularly where edge analytics and AI, network functions, or for a range of datacenter ac Read more…

By Doug Black

Nvidia Doubles Down on Medical AI

April 9, 2019

Nvidia is collaborating with medical groups to push GPU-powered AI tools into clinical settings, including radiology and drug discovery. The GPU leader said Monday it will collaborate with the American College of Radiology (ACR) to provide clinicians with its Clara AI tool kit. The partnership would allow radiologists to leverage AI techniques for diagnostic imaging using their own clinical data. Read more…

By George Leopold

Digging into MLPerf Benchmark Suite to Inform AI Infrastructure Decisions

April 9, 2019

With machine learning and deep learning storming into the datacenter, the new challenge is optimizing infrastructure choices to support diverse ML and DL workfl Read more…

By John Russell

The Case Against ‘The Case Against Quantum Computing’

January 9, 2019

It’s not easy to be a physicist. Richard Feynman (basically the Jimi Hendrix of physicists) once said: “The first principle is that you must not fool yourse Read more…

By Ben Criger

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

ClusterVision in Bankruptcy, Fate Uncertain

February 13, 2019

ClusterVision, European HPC specialists that have built and installed over 20 Top500-ranked systems in their nearly 17-year history, appear to be in the midst o Read more…

By Tiffany Trader

Intel Reportedly in $6B Bid for Mellanox

January 30, 2019

The latest rumors and reports around an acquisition of Mellanox focus on Intel, which has reportedly offered a $6 billion bid for the high performance interconn Read more…

By Doug Black

It’s Official: Aurora on Track to Be First US Exascale Computer in 2021

March 18, 2019

The U.S. Department of Energy along with Intel and Cray confirmed today that an Intel/Cray supercomputer, "Aurora," capable of sustained performance of one exaf Read more…

By Tiffany Trader

Looking for Light Reading? NSF-backed ‘Comic Books’ Tackle Quantum Computing

January 28, 2019

Still baffled by quantum computing? How about turning to comic books (graphic novels for the well-read among you) for some clarity and a little humor on QC. The Read more…

By John Russell

IBM Quantum Update: Q System One Launch, New Collaborators, and QC Center Plans

January 10, 2019

IBM made three significant quantum computing announcements at CES this week. One was introduction of IBM Q System One; it’s really the integration of IBM’s Read more…

By John Russell

Deep500: ETH Researchers Introduce New Deep Learning Benchmark for HPC

February 5, 2019

ETH researchers have developed a new deep learning benchmarking environment – Deep500 – they say is “the first distributed and reproducible benchmarking s Read more…

By John Russell

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
NVIDIA @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

IBM Bets $2B Seeking 1000X AI Hardware Performance Boost

February 7, 2019

For now, AI systems are mostly machine learning-based and “narrow” – powerful as they are by today's standards, they're limited to performing a few, narro Read more…

By Doug Black

The Deep500 – Researchers Tackle an HPC Benchmark for Deep Learning

January 7, 2019

How do you know if an HPC system, particularly a larger-scale system, is well-suited for deep learning workloads? Today, that’s not an easy question to answer Read more…

By John Russell

Arm Unveils Neoverse N1 Platform with up to 128-Cores

February 20, 2019

Following on its Neoverse roadmap announcement last October, Arm today revealed its next-gen Neoverse microarchitecture with compute and throughput-optimized si Read more…

By Tiffany Trader

Intel Launches Cascade Lake Xeons with Up to 56 Cores

April 2, 2019

At Intel's Data-Centric Innovation Day in San Francisco (April 2), the company unveiled its second-generation Xeon Scalable (Cascade Lake) family and debuted it Read more…

By Tiffany Trader

France to Deploy AI-Focused Supercomputer: Jean Zay

January 22, 2019

HPE announced today that it won the contract to build a supercomputer that will drive France’s AI and HPC efforts. The computer will be part of GENCI, the Fre Read more…

By Tiffany Trader

Oil and Gas Supercloud Clears Out Remaining Knights Landing Inventory: All 38,000 Wafers

March 13, 2019

The McCloud HPC service being built by Australia’s DownUnder GeoSolutions (DUG) outside Houston is set to become the largest oil and gas cloud in the world th Read more…

By Tiffany Trader

Intel Extends FPGA Ecosystem with 10nm Agilex

April 11, 2019

The insatiable appetite for higher throughput and lower latency – particularly where edge analytics and AI, network functions, or for a range of datacenter ac Read more…

By Doug Black

UC Berkeley Paper Heralds Rise of Serverless Computing in the Cloud – Do You Agree?

February 13, 2019

Almost exactly ten years to the day from publishing of their widely-read, seminal paper on cloud computing, UC Berkeley researchers have issued another ambitious examination of cloud computing - Cloud Programming Simplified: A Berkeley View on Serverless Computing. The new work heralds the rise of ‘serverless computing’ as the next dominant phase of cloud computing. Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This