HPC Leadership Computing Trusts DDN

By Rob Farber

June 13, 2016

As we approach an Exascale future, the focus is on how to provision and use that computational capability. In order to realize the full societal impact of Exascale computing, storage systems to support Exascale supercomputers are equally important else those valuable (and expensive) compute cycles will be wasted during IO operations. Thought leadership in the HPC community agrees that increasing core count is clearly the direction for computation (although there are strong differences of opinion on how those increased core counts are to be implemented). However, the storage picture is more complicated.

Unlike computation or in memory systems, data retained in storage is persistent over a period of years and even decades. Further, storage systems cannot ever risk losing or delivering bad data at any point in the data lifecycle.  This requires deep technical capabilities and experience in large scale computing and data management.  As we look to the future, past performance truly is a predictor of future success in the storage market, which is why more than 2/3 of the world’s fastest computers rely on DDN (Data Direct Networks) for their storage needs. From pre-Petascale supercomputers to the current generation of double-digit PF/s (petaflop per second) machines, DDN has preferentially been selected to partner with end users and technology integrators to expand the limits of HPC computing. Looking to an Exascale world, DDN is investing 10’s of millions of dollars and opening new research and development facilities to create the end-to-end storage technologies that will meet the data requirements of current users and future Exascale supercomputers. DDN storage simply works, is fast, expandable, power efficient, and cost effective, which is why DDN is the storage vendor of choice for HPC professionals and those tasked with advancing the state-of-the-art in leadership class supercomputing.

The recent announcement of the Japanese Oakforest-PACS 25 PF/s supercomputer is the latest double-digit Petascale machine that will utilize a combination of DDN burst buffer, application acceleration, SSD and file system technologies together to achieve results faster than conceived possible even just 2 years ago.  The Oakforest storage system is comprised of 25 DDN IME14KX caching appliances to provide 1.4 TB/s of low-latency flash-based cache. These cache devices will work in conjunction with DDN supplied storage to deliver 400 GB/s of peak Lustre bandwidth to meet the storage bandwidth needs of this latest generation multi-PF/s supercomputer. As can be seen in the figure below, Lustre is just one option as tiered DDN storage works with any parallel file-system.

Figure 1: DDN devices work with any parallel file-system
Figure 1: DDN devices work with any parallel file-system

Infinite Memory Engine

DDN’s IME (Infinite Memory Engine) represents a new IO tier for HPC that treats small IO in precisely the same manner as large sequential IO. This is a revolutionary change from existing parallel filesystems results in near wire-speed performance regardless of random IO patterns, IO size, and shared file access. DDN’s IME product line also has the ability to work with future storage media such as 3D XPoint and others.

Figure 2: Rack performance IME (Image courtesy Cray Users Group)
Figure 2: Rack performance IME (Image courtesy Cray Users Group)

IME “burst-buffers”

The DDN IME intelligently decouples storage performance from the traditional view of ‘storage’ to greatly accelerate HPC workloads – especially for frequently performed checkpoint/restart operations.

As can be seen in the figure below, Burst Bandwidth has traditionally required overprovisioning of storage to meet peak bandwidth needs. Checkpoint/restart operations are an example of a common IO operation that requires storage overprovisioning to quickly move the data and prevent wasting valuable compute cycles. The DDN IME caches can be configured to act as burst buffers that can quickly handle bursts of extremely high IO activity. This is the reason why the Oakforest-PACS supercomputer has been provisioned with 1.4 TB/s of DDN IME bandwidth.

Figure 3: Bursty IO patterns require overprovising
Figure 3: Bursty IO patterns require overprovising

IME positions HPC for the Exascale

Looking ahead to the Exascale, DDN IME caches can save significant capital and operational dollars by reducing the number of devices required to achieve Exascale-capable levels of storage performance. To put this in perspective, Gary Grider famously pointed out in his 2009 presentation, Preparing Applications for Next Generation IO/Storage that plotting Exascale storage costs of millions of dollars in log scale means you have hit the big time!

Figure 4: 2009 projected costs of storage for an Exascale system (image courtesy HPC User Forum)
Figure 4: 2009 projected costs of storage for an Exascale system (image courtesy HPC User Forum)

In contrast, the Oakforest-PACS procurement only required 25 DDN IME14KX caching appliances. As the industry leader, DDN has dramatically redefined the storage landscape and costs associated with Exascale storage systems since 2009 as shown in the graphic below.

Figure 5: DDN has redefined the storage landscape since 2009
Figure 5: DDN has redefined the storage landscape since 2009

For HPC, DDN IME devices makes high-performance clusters, multi-PF/s systems, and Exascale computation both possible and affordable.

Figure 6: A DDN IME14k (click to see more)
Figure 6: A DDN IME14k (click to see more)

The many uses of IME

Of course, IME storage works great for databases, out-of-core solvers, and a variety of other scientific and commercial HPC workloads.

Figure 7: Additional uses of a DDN IME product
Figure 7: Additional uses of a DDN IME product
  1. A Write Accelerating Burst Buffer absorbing the bulk application data into the IME14K NVMe solid state cache significantly faster than the file system can absorb it.
  2. A File System Accelerator and Application Optimizer as IME reorders application I/O to optimize flushing the cache to long term storage (enabling purchasing as little expensive cache possible).
Figure 8: Dataflow in the client
Figure 8: Dataflow in the client
  1. A Read-optimized Application-I/O Accelerator that enables out-of-band API configuration of the IME appliance to optimize both reads and writes, allowing more simultaneous job runs, shortening the job queue and enabling significantly faster application run time to the user. The API integrates IME with the job schedulers and pre-stages / warms the cache for new jobs, accelerating first read.

Standard script operations make utilization of DDN IME appliance capabilities straight-forward. The following shows how to use the DDN IME as an application accelerator.

Figure 9: IME acts as an application IO accelerator
Figure 9: IME acts as an application IO accelerator

Robustness and Scalability are key!

Cost and power savings are for naught if the storage solution is not robust and scalable as well.

DDN gives the customer the option of using a technique called erasure coding to protect against storage failures. Erasure codes are primarily used in scale-out object storage systems where erasure encoded data blocks are distributed across multiple storage nodes to provide protection against both media and node failures. Erasure encoding can literally save racks of storage nodes when compared to the alternative, three- or four-way mirroring/replication [For more information click here].

Option 1: Data protection is optional. The IME server and associated storage media are considered “just cache” where the data can be recreated if lost.

Option 2: Erasure coding is calculated at the client:

  • Exhibits excellent scaling and can run with high client counts.
  • Servers don’t get clogged up.
  • There is a tradeoff as erasure coding does reduce usable client bandwidth and IME capacity according to IME count by roughly 11% (in an 8+1 configuration) to 25% (in a 3+1 configuration).
Figure 10: Erasure encoding distributed across multiple IMEs
Figure 10: Erasure encoding distributed across multiple IMEs

Managing the full spectrum of end-to-end data lifecycle management

Robust, scalable, and performant storage are but part of the HPC storage picture as data archive must also be considered as well as full life cycle data management and distributed cloud based storage. Similarly, questions are being raised about the efficacy of POSIX based file-systems in future HPC systems. For this reason, object storage systems are undergoing rapid development.

To address current and future end-user storage needs – even at the Exascale – DDN has created a complete portfolio of end-to-end storage products that work together as an extremely flexible data lifecycle management toolset. DDN claims these tools that can be applied anywhere and at any scale.

Figure 11: DDN end-to-end big data lifecycle management
Figure 11: DDN end-to-end big data lifecycle management

Briefly, the DDN storage portfolio covers:

  • Fast data and compute: Addressed through the DDN family of IME products.
  • File-system appliances: DDN products include the GRIDScaler® and EXAScaler®.
  • Persistent data: Persistent data for a variety of commercial and big data workloads are addressed via the SFA14k™ storage array products.
  • Object and cloud storage: The WOS® Object storage for private and hybrid clouds take DDN customers beyond traditional file-systems. WOS is described in the DDN white paper, WOS® 360° full spectrum object storage.
Figure 12: WOS object storage
Figure 12: WOS object storage

For more information

For more information, visit http://www.ddn.com.


Rob Farber is a global technology consultant and author with an extensive background in HPC and storage technologies that he applies at national labs and commercial organizations. He can be reached at [email protected]

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

What’s New in Computing vs. COVID-19: Fast-Tracked Research, Susceptibility Study, Antibodies & More

April 6, 2020

Supercomputing, big data and artificial intelligence are crucial tools in the fight against the coronavirus pandemic. Around the world, researchers, corporations and governments are urgently devoting their computing reso Read more…

By Oliver Peckham

Army Seeks AI Ground Truth

April 3, 2020

Deep neural networks are being mustered by U.S. military researchers to marshal new technology forces on the Internet of Battlefield Things. U.S. Army and industry researchers said this week they have developed a “c Read more…

By George Leopold

Piz Daint Tackles Marsquakes

April 3, 2020

Even as researchers use supercomputers to probe the mysteries of earthquakes here on Earth, others are setting their sights on quakes just a little farther away. Researchers at ETH Zürich in Switzerland have applied sup Read more…

By Oliver Peckham

HPC Career Notes: April 2020 Edition

April 2, 2020

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it’s a promotion, new company hire, or even an accolade, we’ Read more…

By Mariana Iriarte

AMD Epyc CPUs Now on Bare Metal IBM Cloud Servers

April 1, 2020

AMD’s expanding presence in the datacenter and cloud computing markets took a step forward with today’s announcement that its 7nm 2nd Gen Epyc 7642 CPUs are now available on IBM Cloud bare metal servers. AMD, whose Read more…

By Doug Black

AWS Solution Channel

Amazon FSx for Lustre Update: Persistent Storage for Long-Term, High-Performance Workloads

Last year I wrote about Amazon FSx for Lustre and told you how our customers can use it to create pebibyte-scale, highly parallel POSIX-compliant file systems that serve thousands of simultaneous clients driving millions of IOPS (Input/Output Operations per Second) with sub-millisecond latency. Read more…

Supercomputer Testing Probes Viral Transmission in Airplanes

April 1, 2020

It might be a long time before the general public is flying again, but the question remains: how high-risk is air travel in terms of viral infection? In an article for the Texas Advanced Computing Center (TACC), Faith Si Read more…

By Staff report

ECP Milestone Report Details Progress and Directions

April 1, 2020

The Exascale Computing Project (ECP) milestone report issued last week presents a good snapshot of progress in preparing applications for exascale computing. Th Read more…

By John Russell

Pandemic ‘Wipes Out’ 2020 HPC Market Growth, Flat to 12% Drop Expected

March 31, 2020

As the world battles the still accelerating novel coronavirus, the HPC community has mounted a forceful response to the pandemic on many fronts. But these efforts won't inoculate the HPC industry from the economic effects of COVID-19. Market watcher Intersect360 Research has revised its 2020 forecast for HPC products and services, projecting... Read more…

By Tiffany Trader

LLNL Leverages Supercomputing to Identify COVID-19 Antibody Candidates

March 30, 2020

As COVID-19 sweeps the globe to devastating effect, supercomputers around the world are spinning up to fight back by working on diagnosis, epidemiology, treatme Read more…

By Staff report

Weather at Exascale: Load Balancing for Heterogeneous Systems

March 30, 2020

The first months of 2020 were dominated by weather and climate supercomputing news, with major announcements coming from the UK, the European Centre for Medium- Read more…

By Oliver Peckham

Q&A Part Two: ORNL’s Pooser on Progress in Quantum Communication

March 30, 2020

Quantum computing seems to get more than its fair share of attention compared to quantum communication. That’s despite the fact that quantum networking may be Read more…

By John Russell

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Conversation: ANL’s Rick Stevens on DoE’s AI for Science Project

March 23, 2020

With release of the Department of Energy’s AI for Science report in late February, the effort to build a national AI program, modeled loosely on the U.S. Exascale Initiative, enters a new phase. Project leaders have already had early discussions with Congress... Read more…

By John Russell

[email protected] Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

Julia Programming’s Dramatic Rise in HPC and Elsewhere

January 14, 2020

Back in 2012 a paper by four computer scientists including Alan Edelman of MIT introduced Julia, A Fast Dynamic Language for Technical Computing. At the time, t Read more…

By John Russell

Global Supercomputing Is Mobilizing Against COVID-19

March 12, 2020

Tech has been taking some heavy losses from the coronavirus pandemic. Global supply chains have been disrupted, virtually every major tech conference taking place over the next few months has been canceled... Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Steve Scott Lays Out HPE-Cray Blended Product Roadmap

March 11, 2020

Last week, the day before the El Capitan processor disclosures were made at HPE's new headquarters in San Jose, Steve Scott (CTO for HPC & AI at HPE, and former Cray CTO) was on-hand at the Rice Oil & Gas HPC conference in Houston. He was there to discuss the HPE-Cray transition and blended roadmap, as well as his favorite topic, Cray's eighth-gen networking technology, Slingshot. Read more…

By Tiffany Trader

Fujitsu A64FX Supercomputer to Be Deployed at Nagoya University This Summer

February 3, 2020

Japanese tech giant Fujitsu announced today that it will supply Nagoya University Information Technology Center with the first commercial supercomputer powered Read more…

By Tiffany Trader

Tech Conferences Are Being Canceled Due to Coronavirus

March 3, 2020

Several conferences scheduled to take place in the coming weeks, including Nvidia’s GPU Technology Conference (GTC) and the Strata Data + AI conference, have Read more…

By Alex Woodie

Leading Solution Providers

SC 2019 Virtual Booth Video Tour

AMD
AMD
ASROCK RACK
ASROCK RACK
AWS
AWS
CEJN
CJEN
CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
IBM
IBM
MELLANOX
MELLANOX
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
SIX NINES IT
SIX NINES IT
VERNE GLOBAL
VERNE GLOBAL
WEKAIO
WEKAIO

Cray to Provide NOAA with Two AMD-Powered Supercomputers

February 24, 2020

The United States’ National Oceanic and Atmospheric Administration (NOAA) last week announced plans for a major refresh of its operational weather forecasting supercomputers, part of a 10-year, $505.2 million program, which will secure two HPE-Cray systems for NOAA’s National Weather Service to be fielded later this year and put into production in early 2022. Read more…

By Tiffany Trader

Exascale Watch: El Capitan Will Use AMD CPUs & GPUs to Reach 2 Exaflops

March 4, 2020

HPE and its collaborators reported today that El Capitan, the forthcoming exascale supercomputer to be sited at Lawrence Livermore National Laboratory and serve Read more…

By John Russell

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

IBM Unveils Latest Achievements in AI Hardware

December 13, 2019

“The increased capabilities of contemporary AI models provide unprecedented recognition accuracy, but often at the expense of larger computational and energet Read more…

By Oliver Peckham

TACC Supercomputers Run Simulations Illuminating COVID-19, DNA Replication

March 19, 2020

As supercomputers around the world spin up to combat the coronavirus, the Texas Advanced Computing Center (TACC) is announcing results that may help to illumina Read more…

By Staff report

IBM Debuts IC922 Power Server for AI Inferencing and Data Management

January 28, 2020

IBM today launched a Power9-based inference server – the IC922 – that features up to six Nvidia T4 GPUs, PCIe Gen 4 and OpenCAPI connectivity, and can accom Read more…

By John Russell

Summit Joins the Fight Against the Coronavirus

March 6, 2020

With the coronavirus sweeping the globe, tech conferences and supply chains are being hit hard – but now, tech is hitting back. Oak Ridge National Laboratory Read more…

By Staff report

University of Stuttgart Inaugurates ‘Hawk’ Supercomputer

February 20, 2020

This week, the new “Hawk” supercomputer was inaugurated in a ceremony at the High-Performance Computing Center of the University of Stuttgart (HLRS). Offici Read more…

By Staff report

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This