HPC Leadership Computing Trusts DDN

By Rob Farber

June 13, 2016

As we approach an Exascale future, the focus is on how to provision and use that computational capability. In order to realize the full societal impact of Exascale computing, storage systems to support Exascale supercomputers are equally important else those valuable (and expensive) compute cycles will be wasted during IO operations. Thought leadership in the HPC community agrees that increasing core count is clearly the direction for computation (although there are strong differences of opinion on how those increased core counts are to be implemented). However, the storage picture is more complicated.

Unlike computation or in memory systems, data retained in storage is persistent over a period of years and even decades. Further, storage systems cannot ever risk losing or delivering bad data at any point in the data lifecycle.  This requires deep technical capabilities and experience in large scale computing and data management.  As we look to the future, past performance truly is a predictor of future success in the storage market, which is why more than 2/3 of the world’s fastest computers rely on DDN (Data Direct Networks) for their storage needs. From pre-Petascale supercomputers to the current generation of double-digit PF/s (petaflop per second) machines, DDN has preferentially been selected to partner with end users and technology integrators to expand the limits of HPC computing. Looking to an Exascale world, DDN is investing 10’s of millions of dollars and opening new research and development facilities to create the end-to-end storage technologies that will meet the data requirements of current users and future Exascale supercomputers. DDN storage simply works, is fast, expandable, power efficient, and cost effective, which is why DDN is the storage vendor of choice for HPC professionals and those tasked with advancing the state-of-the-art in leadership class supercomputing.

The recent announcement of the Japanese Oakforest-PACS 25 PF/s supercomputer is the latest double-digit Petascale machine that will utilize a combination of DDN burst buffer, application acceleration, SSD and file system technologies together to achieve results faster than conceived possible even just 2 years ago.  The Oakforest storage system is comprised of 25 DDN IME14KX caching appliances to provide 1.4 TB/s of low-latency flash-based cache. These cache devices will work in conjunction with DDN supplied storage to deliver 400 GB/s of peak Lustre bandwidth to meet the storage bandwidth needs of this latest generation multi-PF/s supercomputer. As can be seen in the figure below, Lustre is just one option as tiered DDN storage works with any parallel file-system.

Figure 1: DDN devices work with any parallel file-system
Figure 1: DDN devices work with any parallel file-system

Infinite Memory Engine

DDN’s IME (Infinite Memory Engine) represents a new IO tier for HPC that treats small IO in precisely the same manner as large sequential IO. This is a revolutionary change from existing parallel filesystems results in near wire-speed performance regardless of random IO patterns, IO size, and shared file access. DDN’s IME product line also has the ability to work with future storage media such as 3D XPoint and others.

Figure 2: Rack performance IME (Image courtesy Cray Users Group)
Figure 2: Rack performance IME (Image courtesy Cray Users Group)

IME “burst-buffers”

The DDN IME intelligently decouples storage performance from the traditional view of ‘storage’ to greatly accelerate HPC workloads – especially for frequently performed checkpoint/restart operations.

As can be seen in the figure below, Burst Bandwidth has traditionally required overprovisioning of storage to meet peak bandwidth needs. Checkpoint/restart operations are an example of a common IO operation that requires storage overprovisioning to quickly move the data and prevent wasting valuable compute cycles. The DDN IME caches can be configured to act as burst buffers that can quickly handle bursts of extremely high IO activity. This is the reason why the Oakforest-PACS supercomputer has been provisioned with 1.4 TB/s of DDN IME bandwidth.

Figure 3: Bursty IO patterns require overprovising
Figure 3: Bursty IO patterns require overprovising

IME positions HPC for the Exascale

Looking ahead to the Exascale, DDN IME caches can save significant capital and operational dollars by reducing the number of devices required to achieve Exascale-capable levels of storage performance. To put this in perspective, Gary Grider famously pointed out in his 2009 presentation, Preparing Applications for Next Generation IO/Storage that plotting Exascale storage costs of millions of dollars in log scale means you have hit the big time!

Figure 4: 2009 projected costs of storage for an Exascale system (image courtesy HPC User Forum)
Figure 4: 2009 projected costs of storage for an Exascale system (image courtesy HPC User Forum)

In contrast, the Oakforest-PACS procurement only required 25 DDN IME14KX caching appliances. As the industry leader, DDN has dramatically redefined the storage landscape and costs associated with Exascale storage systems since 2009 as shown in the graphic below.

Figure 5: DDN has redefined the storage landscape since 2009
Figure 5: DDN has redefined the storage landscape since 2009

For HPC, DDN IME devices makes high-performance clusters, multi-PF/s systems, and Exascale computation both possible and affordable.

Figure 6: A DDN IME14k (click to see more)
Figure 6: A DDN IME14k (click to see more)

The many uses of IME

Of course, IME storage works great for databases, out-of-core solvers, and a variety of other scientific and commercial HPC workloads.

Figure 7: Additional uses of a DDN IME product
Figure 7: Additional uses of a DDN IME product
  1. A Write Accelerating Burst Buffer absorbing the bulk application data into the IME14K NVMe solid state cache significantly faster than the file system can absorb it.
  2. A File System Accelerator and Application Optimizer as IME reorders application I/O to optimize flushing the cache to long term storage (enabling purchasing as little expensive cache possible).
Figure 8: Dataflow in the client
Figure 8: Dataflow in the client
  1. A Read-optimized Application-I/O Accelerator that enables out-of-band API configuration of the IME appliance to optimize both reads and writes, allowing more simultaneous job runs, shortening the job queue and enabling significantly faster application run time to the user. The API integrates IME with the job schedulers and pre-stages / warms the cache for new jobs, accelerating first read.

Standard script operations make utilization of DDN IME appliance capabilities straight-forward. The following shows how to use the DDN IME as an application accelerator.

Figure 9: IME acts as an application IO accelerator
Figure 9: IME acts as an application IO accelerator

Robustness and Scalability are key!

Cost and power savings are for naught if the storage solution is not robust and scalable as well.

DDN gives the customer the option of using a technique called erasure coding to protect against storage failures. Erasure codes are primarily used in scale-out object storage systems where erasure encoded data blocks are distributed across multiple storage nodes to provide protection against both media and node failures. Erasure encoding can literally save racks of storage nodes when compared to the alternative, three- or four-way mirroring/replication [For more information click here].

Option 1: Data protection is optional. The IME server and associated storage media are considered “just cache” where the data can be recreated if lost.

Option 2: Erasure coding is calculated at the client:

  • Exhibits excellent scaling and can run with high client counts.
  • Servers don’t get clogged up.
  • There is a tradeoff as erasure coding does reduce usable client bandwidth and IME capacity according to IME count by roughly 11% (in an 8+1 configuration) to 25% (in a 3+1 configuration).
Figure 10: Erasure encoding distributed across multiple IMEs
Figure 10: Erasure encoding distributed across multiple IMEs

Managing the full spectrum of end-to-end data lifecycle management

Robust, scalable, and performant storage are but part of the HPC storage picture as data archive must also be considered as well as full life cycle data management and distributed cloud based storage. Similarly, questions are being raised about the efficacy of POSIX based file-systems in future HPC systems. For this reason, object storage systems are undergoing rapid development.

To address current and future end-user storage needs – even at the Exascale – DDN has created a complete portfolio of end-to-end storage products that work together as an extremely flexible data lifecycle management toolset. DDN claims these tools that can be applied anywhere and at any scale.

Figure 11: DDN end-to-end big data lifecycle management
Figure 11: DDN end-to-end big data lifecycle management

Briefly, the DDN storage portfolio covers:

  • Fast data and compute: Addressed through the DDN family of IME products.
  • File-system appliances: DDN products include the GRIDScaler® and EXAScaler®.
  • Persistent data: Persistent data for a variety of commercial and big data workloads are addressed via the SFA14k™ storage array products.
  • Object and cloud storage: The WOS® Object storage for private and hybrid clouds take DDN customers beyond traditional file-systems. WOS is described in the DDN white paper, WOS® 360° full spectrum object storage.
Figure 12: WOS object storage
Figure 12: WOS object storage

For more information

For more information, visit http://www.ddn.com.


Rob Farber is a global technology consultant and author with an extensive background in HPC and storage technologies that he applies at national labs and commercial organizations. He can be reached at [email protected]

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

BlueField SmartNIC Backs Transformation to Bare Metal Kubernetes

May 21, 2019

Hardware vendors are betting the transition to 5G wireless networks supporting myriad connected consumer and industrial devices also will accelerate the shift to heavy-duty bare-metal servers as a way to provision cloud- Read more…

By George Leopold

HPE to Acquire Cray for $1.3B

May 17, 2019

Venerable supercomputer pioneer Cray Inc. will be acquired by Hewlett Packard Enterprise for $1.3 billion under a definitive agreement announced this morning. The news follows HPE’s acquisition nearly three years ago o Read more…

By Doug Black & Tiffany Trader

China Establishes Seventh National Supercomputing Center

May 16, 2019

Chinese media is reporting that China will construct a new National Supercomputer Center in Zhengzhou, in central China's Henan Province. The new Zhengzhou facility will house a 100-petaflops supercomputer and will be ta Read more…

By Staff report

HPE Extreme Performance Solutions

HPE and Intel® Omni-Path Architecture: How to Power a Cloud

Learn how HPE and Intel® Omni-Path Architecture provide critical infrastructure for leading Nordic HPC provider’s HPCFLOW cloud service.

For decades, HPE has been at the forefront of high-performance computing, and we’ve powered some of the fastest and most robust supercomputers in the world. Read more…

IBM Accelerated Insights

Smarter EDA: Leveraging New Technologies for Product Verification

There is perhaps no sector more competitive than the modern electronics industry. Macro-trends, including artificial intelligence, 5G, and the internet of things (IoT), continue to propel dramatic growth. Read more…

Interview with 2019 Person to Watch Ken King

May 16, 2019

Today, as the final installment of our HPCwire People to Watch focus series, we present our interview with Ken King, general manager of OpenPOWER for the IBM Systems Group. Ken is responsible for building and managing t Read more…

By HPCwire Editorial Team

HPE to Acquire Cray for $1.3B

May 17, 2019

Venerable supercomputer pioneer Cray Inc. will be acquired by Hewlett Packard Enterprise for $1.3 billion under a definitive agreement announced this morning. T Read more…

By Doug Black & Tiffany Trader

Deep Learning Competitors Stalk Nvidia

May 14, 2019

There is no shortage of processing architectures emerging to accelerate deep learning workloads, with two more options emerging this week to challenge GPU leader Nvidia. First, Intel researchers claimed a new deep learning record for image classification on the ResNet-50 convolutional neural network. Separately, Israeli AI chip startup Hailo.ai... Read more…

By George Leopold

CCC Offers Draft 20-Year AI Roadmap; Seeks Comments

May 14, 2019

Artificial Intelligence in all its guises has captured much of the conversation in HPC and general computing today. The White House, DARPA, IARPA, and Departmen Read more…

By John Russell

Cascade Lake Shows Up to 84 Percent Gen-on-Gen Advantage on STAC Benchmarking

May 13, 2019

The Securities Technology Analysis Center (STAC) issued a report Friday comparing the performance of Intel's Cascade Lake processors with previous-gen Skylake u Read more…

By Tiffany Trader

Nvidia Claims 6000x Speed-Up for Stock Trading Backtest Benchmark

May 13, 2019

A stock trading backtesting algorithm used by hedge funds to simulate trading variants has received a massive, GPU-based performance boost, according to Nvidia, Read more…

By Doug Black

ASC19: NTHU Returns to Glory

May 11, 2019

As many of you Student Cluster Competition fanatics know by now, Taiwan’s National Tsing Hua University (NTHU) won the gold medal at the recently concluded AS Read more…

By Dan Olds

Intel 7nm GPU on Roadmap for 2021, OneAPI Coming This Year

May 8, 2019

At Intel's investor meeting today in Santa Clara, Calif., the company filled in details of its roadmap and product launch plans and sought to allay concerns about delays of its 10nm chips. In laying out its 10nm and 7nm timelines, Intel revealed that its first 7nm product would be... Read more…

By Tiffany Trader

Ten Great Reasons to Build the 1.5 Exaflops Frontier

May 7, 2019

It’s perhaps obvious that the fundamental reason for building expensive exascale computers is to drive science and industry forward, realizing the resulting b Read more…

By John Russell

Cray, AMD to Extend DOE’s Exascale Frontier

May 7, 2019

Cray and AMD are coming back to Oak Ridge National Laboratory to partner on the world’s largest and most expensive supercomputer. The Department of Energy’s Read more…

By Tiffany Trader

Graphene Surprises Again, This Time for Quantum Computing

May 8, 2019

Graphene is fascinating stuff with promise for use in a seeming endless number of applications. This month researchers from the University of Vienna and Institu Read more…

By John Russell

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

ClusterVision in Bankruptcy, Fate Uncertain

February 13, 2019

ClusterVision, European HPC specialists that have built and installed over 20 Top500-ranked systems in their nearly 17-year history, appear to be in the midst o Read more…

By Tiffany Trader

It’s Official: Aurora on Track to Be First US Exascale Computer in 2021

March 18, 2019

The U.S. Department of Energy along with Intel and Cray confirmed today that an Intel/Cray supercomputer, "Aurora," capable of sustained performance of one exaf Read more…

By Tiffany Trader

Intel Reportedly in $6B Bid for Mellanox

January 30, 2019

The latest rumors and reports around an acquisition of Mellanox focus on Intel, which has reportedly offered a $6 billion bid for the high performance interconn Read more…

By Doug Black

Looking for Light Reading? NSF-backed ‘Comic Books’ Tackle Quantum Computing

January 28, 2019

Still baffled by quantum computing? How about turning to comic books (graphic novels for the well-read among you) for some clarity and a little humor on QC. The Read more…

By John Russell

The Case Against ‘The Case Against Quantum Computing’

January 9, 2019

It’s not easy to be a physicist. Richard Feynman (basically the Jimi Hendrix of physicists) once said: “The first principle is that you must not fool yourse Read more…

By Ben Criger

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
NVIDIA @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

Deep Learning Competitors Stalk Nvidia

May 14, 2019

There is no shortage of processing architectures emerging to accelerate deep learning workloads, with two more options emerging this week to challenge GPU leader Nvidia. First, Intel researchers claimed a new deep learning record for image classification on the ResNet-50 convolutional neural network. Separately, Israeli AI chip startup Hailo.ai... Read more…

By George Leopold

Deep500: ETH Researchers Introduce New Deep Learning Benchmark for HPC

February 5, 2019

ETH researchers have developed a new deep learning benchmarking environment – Deep500 – they say is “the first distributed and reproducible benchmarking s Read more…

By John Russell

IBM Bets $2B Seeking 1000X AI Hardware Performance Boost

February 7, 2019

For now, AI systems are mostly machine learning-based and “narrow” – powerful as they are by today's standards, they're limited to performing a few, narro Read more…

By Doug Black

Arm Unveils Neoverse N1 Platform with up to 128-Cores

February 20, 2019

Following on its Neoverse roadmap announcement last October, Arm today revealed its next-gen Neoverse microarchitecture with compute and throughput-optimized si Read more…

By Tiffany Trader

Intel Launches Cascade Lake Xeons with Up to 56 Cores

April 2, 2019

At Intel's Data-Centric Innovation Day in San Francisco (April 2), the company unveiled its second-generation Xeon Scalable (Cascade Lake) family and debuted it Read more…

By Tiffany Trader

France to Deploy AI-Focused Supercomputer: Jean Zay

January 22, 2019

HPE announced today that it won the contract to build a supercomputer that will drive France’s AI and HPC efforts. The computer will be part of GENCI, the Fre Read more…

By Tiffany Trader

In Wake of Nvidia-Mellanox: Xilinx to Acquire Solarflare

April 25, 2019

With echoes of Nvidia’s recent acquisition of Mellanox, FPGA maker Xilinx has announced a definitive agreement to acquire Solarflare Communications, provider Read more…

By Doug Black

Nvidia Claims 6000x Speed-Up for Stock Trading Backtest Benchmark

May 13, 2019

A stock trading backtesting algorithm used by hedge funds to simulate trading variants has received a massive, GPU-based performance boost, according to Nvidia, Read more…

By Doug Black

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This