Reliability Matters – Your HPC Workloads are Thirsty for Enterprise Quality

By Nicole Hemsoth

November 12, 2012

Is your current HPC data storage solution experiencing issues with disk drives?  Are you seeing performance degradation, where HPC projects take longer to complete than they should?  Is your performance situation normal, or are there reliable alternatives to achieving sustained performance at large HPC scale?

To help address these and other questions you might have when evaluating your data storage infrastructure, Seagate and Xyratex co-authored a white paper, “Achieving Rapid Scale in Enterprise and Cloud Data Centers with SAS.” The paper [1] provides insight into selecting the right disk drive for your application environment and specific performance, scalability and reliability needs. Anyone currently experiencing high rates of what appear to be drive-related issues, or anyone considering purchasing or leasing high-density storage solutions, would be advised to consider these points. Also, those who have the goal to efficiently achieve reliable sustained performance on HPC, enterprise or mission critical applications would benefit from reading this paper. 

The Importance of Drive Design

Design tolerance, and features built into disk drives within multi-spindle environments, have a direct impact on performance. Drives that are not optimized to handle rotational vibration (RV) have shown in testing to produce more than 50 percent less performance.  Also, the RV mitigation features provided in enterprise class drives will not perform as effectively without adequate RV isolation designed into the multi-drive enclosure system.  Both drive RV mitigation and enclosure RV isolation are required to act together to deliver a well-crafted RV management solution.  If RV is not taken into account in the design of the drive and multi-drive enclosure, the force of RV can push the disk drive head off track and cause missed revolutions and delays in data transfers.  Specifically, delayed read/write operations are the root of all vibration-induced I/O degradation. 

Seagate and Xyratex point out in this new paper that “the use of lower-end commodity technologies derived from department-level and workgroup clients as well as the blending, merging and displacement of former data center and enterprise techniques underscore the need for broad industry education regarding the facts about storage technologies.”  In most cases, poor drive reliability is usually a result of deploying the wrong type of storage device within an enterprise class system, or for a specific enterprise class workload.  Hard disk drives, being mechanical devices, are designed with specific features and components for specific workloads.

Improper management of RV can be subtle, and can be introduced into your project through selecting an inappropriate disk drive class compared to its application loading or an enclosure lacking design margin relative to the application and selected disk drive.  These factors do not matter if reliable, sustained performance is not a key purchasing criterion, because there are plenty of archive and low-performance bulk storage applications where attention to RV is not as critical.  However, in the case of high-density HPC data storage, reliable and sustained performance at massive scale is paramount.

Since HPC storage solutions provide numerous data protection methods, improper management of RV does not automatically translate into something as obvious as data loss. Instead, it can result in prolonged lingering performance impact, intermittent errors and escalating service costs which are quite literally built into the storage system for given application load levels.  To overcome these avoidable design limitations, Seagate and Xyratex contrast disk drive types and point out the range of mission critical design characteristics available with high-performance, enterprise-class, nearline SAS drives. 

Drive Testing Critical to Improving Performance

In addition to selecting the right drive type, the white paper describes intensive solution and component test methods adopted by Xyratex to improve drive reliability and system robustness by detecting  individual drive weaknesses or defects early  and thoroughly exercising enclosure-level RV isolation design techniques.  Xyratex’ four-stage Integrated System Testing Platform (ISTP)[2] includes a highly efficient and scalable storage test that exposes, identifies and eliminates devices with inherent defects or defects resulting from manufacturing aberrations that cause time and stress-dependent failures.  This identifies and removes hidden quality problems and significantly reduces in-the-field component failures.  Additionally, this represents attention to drive quality and solution robustness above and beyond business as usual expectations and yields useful perspective on what is attainable to raise the bar on solution performance and reliability among HPC storage providers.

Xyratex’ ISTP process is based on the fact that 50 percent of worldwide disk drives are produced utilizing Xyratex disk drive test and processing technologies.  Further, Xyratex is the industry’s largest OEM storage manufacturer, with over 25 years of experience and innovation in end-to-end engineering design, manufacturing and field failure analysis supporting the entire market from entry and mid-range enterprises to emerging HPC, cloud and solid state storage platforms.

Performance Solution Possibilities

The Xyratex ClusterStor™ 6000 is an example of a scale-out HPC data storage solution designed to satisfy the linear file system processing and data capacity scaling needs for state-of-the-art HPC systems, supporting hundreds of GB/s to 1TB/s Lustre® file system throughput and beyond.  ClusterStor features enterprise-class, nearline SAS drives that are tested, packaged and sourced using Xyratex’ attention to comprehensive quality and high-density solution-level robustness.

Xyratex goes above and beyond with all components of the ClusterStor high-density solution, including metadata servers, object storage servers and object storage targets that are factory-integrated, tested and supported by one company.  Xyratex’ methodical attention to integral solution quality drives ClusterStor’s seamless integration from the lowest level component to highest-level management interface, as well as its linear file system processing and capacity scaling capabilities.  In addition, Xyratex has unique partnerships with drive suppliers, providing insights into low-level drive testing as well as extensive high-density storage design experience. Accordingly, Xyratex data storage solutions are designed to routinely exceed the quality and reliability figures of other industry offerings.[2]

This white paper points out the range of mission-critical design characteristics available with enterprise-class, nearline SAS drives and provides insight into leading high-density solution design methods that raise the bar on solution performance and reliability among HPC storage providers.

The Seagate and Xyratex white paper is available here

[1] “Achieving Rapid Scale in Enterprise and Cloud Data Centers with SAS,” November 2012, Seagate & Xyratex Whitepaper, Topic: Enterprise Nearline vs. Desktop.

[2] “How Do You Get To 1TB/s? Quality.” HPC Wire, October 29, 2012.

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Institute for Human-Centered AI (HAI) put out a yearly report to t Read more…

Crossing the Quantum Threshold: The Path to 10,000 Qubits

April 15, 2024

Editor’s Note: Why do qubit count and quality matter? What’s the difference between physical qubits and logical qubits? Quantum computer vendors toss these terms and numbers around as indicators of the strengths of t Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips are available off the shelf, a concern raised at many recent Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announced its second fund targeting €200 million. The very idea th Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. In a way, Nvidia is the new Intel IDF, the hottest chip show Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in its cloud service.  Google claimed the CPU is based on cut Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Computational Chemistry Needs To Be Sustainable, Too

April 8, 2024

A diverse group of computational chemists is encouraging the research community to embrace a sustainable software ecosystem. That's the message behind a recent Read more…

Hyperion Research: Eleven HPC Predictions for 2024

April 4, 2024

HPCwire is happy to announce a new series with Hyperion Research  - a fact-based market research firm focusing on the HPC market. In addition to providing mark Read more…

Google Making Major Changes in AI Operations to Pull in Cash from Gemini

April 4, 2024

Over the last week, Google has made some under-the-radar changes, including appointing a new leader for AI development, which suggests the company is taking its Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

Leading Solution Providers

Contributors

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire