Intelligent Application of SSDs to Accelerate HPC Workloads

By Nicole Hemsoth

October 1, 2012

Introduction

In most industries today, (whether it is financial services, manufacturing, academic research, healthcare and life sciences, or energy exploration) data analysis, modeling, and visualization efforts are critical to success.

To gain a competitive edge, most organizations are incorporating ever-large data sets and more variable data formats into these computational workflows to help derive better information upon which to make smarter decisions.

These big data applications are placing new attention on the high performance computing (HPC) solutions used to run the algorithms and process the raw data. Due to the larger volumes and greater variety of data types, as well as the desire to use more robust analysis, modeling, and visualization routines, HPC solutions can be used to provide high sustained I/O and throughput, while being optimized to cost-effectively handle highly variable workflows.

The essential element in all of this work is a need for speed. Organizations need fast time-to-results so that they can make the right decisions (which well to drill, which new drug candidate to develop, which product design to produce, which customer to award a lower rate loan to) before their competitors.

Complications and challenges that can impede HPC workflows

When looking to accelerate HPC workloads, there are several factors that can play a major role in overall performance.

To start, today’s analysis, modeling, and visualization efforts are carried out using much more sophisticated algorithms in order to derive more detailed and realistic results. The output from these routines offers finer spatial or temporal resolution and consequently results in much larger size output data sets. In a typical workflow, those output files might be used as input to another analysis, modeling, or visualization application.

These operations can impact HPC workflows since the great volumes of data produced by the initial run must be written to disk and saved and then the data must be ingested by yet another routine. Both operations can generate high I/O and throughput demands on an infrastructure. And if the infrastructure is not capable of sustaining these data transfers, the computational workflows can slow significantly.

Another factor has to do with the data that is being used in today’s analysis, modeling, and visualization efforts. Nearly every industry is now making use of much larger data sets, richer sets (such as that produced from newer seismic imaging tools or next-generation sequencers), and many more types of data. However, most users, even those who primarily have large data sets, also have large numbers of small files – even if they consume a relatively small percentage of the total capacity.

Big data and HPC solutions must therefore not only be capable of quickly accessing the large volumes of data required for the computations, they also must intelligently stage the different types of data, which comes in varying file formats and sizes, on suitably high performance storage.

Required storage solution characteristics

Organizations continually deploy new servers with more powerful CPUs to improve and speed up their analysis, modeling, and visualization efforts. To make the best use of such computing resources, an HPC solution must have a suitable storage solution to sustain HPC workflows.

A storage solution for today’s big data and HPC environments must be able to easily scale. Some solutions offer help meeting the growing data volume demands, but fall short when trying to keep CPUs satiated. To help accelerate HPC workflows, a storage solution must also scale in performance so that as the data volumes grow, the system supports the higher I/O and throughput required to get faster results.

Finally, a storage solution must be optimized to handle today’s HPC big data workflows consisting of data sets of files of all sizes. If all data used were in the same format – a structured database, for example – or of the same relative file size, a solution could be highly optimized to handle the specific data. Working with the mixed data sets used today requires a storage solution that optimizes workflow performance for each data type.

Panasas introduces an integrated SSD/SATA approach

Panasas ActiveStor storage systems have a modular blade architecture integrated with its PanFS parallel file system. The design eliminates the bottleneck of a single RAID controller to deliver high-performance, scalable storage. Prior generations of ActiveStor have been based solely on SATA drives and were well-tuned for high throughput.

With the fifth-generation ActiveStor 14, Panasas has taken a unique approach, leveraging lightning fast SSDs integrated with high capacity SATA disk to improve storage performance while keeping costs down. Rather than use SSD for caching or for “most recent” file access as many other vendors have done, ActiveStor 14 stores all metadata and small files (less than 60KB) on the SSDs and larger files on SATA drives.

Metadata is accessed frequently so fast metadata access benefits all types of workloads. All file operations, including reads and writes, require access to metadata. In many cases, such as directory listings, access to the metadata is all that is required to satisfy an I/O request. Storing metadata on SSD boosts performance for all storage operations, especially for directory functions (listing, searches, etc.) and RAID rebuilds in the event of a drive error. Rebuild performance has been improved so that the new 4TB drives can be rebuilt in the same amount of time as the 3TB drives in the prior generation ActiveStor 12, maintaining a high level of data integrity and system reliability.

Small file access can be disproportionately slow when read from, or written to, standard hard disk drives. Accesses of less than a full sector are inefficient, particularly for random I/O. Furthermore, reads and writes of small files can conflict with streaming reads or writes of large files to the same disk. By maintaining small files on SSD, such conflicts are eliminated. In addition, ActiveStor 14 stores the first 12KB of all files inside the file system metadata, improving SSD efficiency while increasing small file performance. This efficient storage of small files on SSD, dramatically improves response time and IOPS, as evidenced by very impressive SPEC sfs2008 NFS IOPS results that Panasas has published.

ActiveStor 14 is available in three configurations with varying sizes of SSD, SATA and cache. The amount of SSD for acceleration ranges from 1.5 percent up to 10.7 percent of total storage capacity. The bulk of the storage capacity, however, is on cost-effective SATA drives, keeping the overall cost per terabyte lower than the prior generation, and very competitive in the market today.

The Importance of Ease of Use and Management

Equally important to the performance and reliability of any storage system is the ease of use and management of the product. With ActiveStor, organizations can simply add blade enclosures to non-disruptively increase capacity and performance of the global file system as storage requirements grow. Parallel access to data and automated load balancing ensure that performance is optimized. This makes it easy to linearly scale capacity to over eight petabytes and performance to 150GB/s or 1.4M IOPS.

Conclusion

The end result is a high-performance storage system that delivers high throughput and IOPS, ideal for the most demanding HPC and big data workloads and accelerates time-to-results. ActiveStor delivers unmatched scale-out NAS performance in addition to the manageability, reliability, and value required by demanding computing organizations in the biosciences, energy, finance, government, manufacturing, media, and other research sectors.

To learn more about how the Panasas ActiveStor 14 can help your organization, register for the live webinar: http://www.panasas.com/news/webinars

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

AI Thought Leaders on Capitol Hill

July 14, 2018

On Thursday, July 12, the House Committee on Science, Space, and Technology heard from four academic and industry leaders – representatives from Berkeley Lab, Argonne Lab, GE Global Research and Carnegie Mellon University – on the opportunities springing from the intersection of machine learning and advanced-scale computing. Read more…

By Tiffany Trader

HPC Serves as a ‘Rosetta Stone’ for the Information Age

July 12, 2018

In an age defined and transformed by its data, several large-scale scientific instruments around the globe might be viewed as a ‘mother lode’ of precious data. With names seemingly created for a ‘techno-speak’ glossary, these interferometers, cyclotrons, sequencers, solenoids, satellite altimeters, and cryo-electron microscopes are churning out data in previously unthinkable and seemingly incomprehensible quantities -- billions, trillions and quadrillions of bits and bytes of electro-magnetic code. Read more…

By Warren Froelich

Can Markov Logic Take Machine Learning to the Next Level?

July 11, 2018

Advances in machine learning, including deep learning, have propelled artificial intelligence (AI) into the public conscience and forced executives to create new business plans based on data. However, the scarcity of hig Read more…

By Alex Woodie

HPE Extreme Performance Solutions

Introducing the First Integrated System Management Software for HPC Clusters from HPE

How do you manage your complex, growing cluster environments? Answer that big challenge with the new HPC cluster management solution: HPE Performance Cluster Manager. Read more…

IBM Accelerated Insights

ORNL Summit Supercomputer Is Officially Here

Oak Ridge National Laboratory (ORNL) together with IBM and Nvidia celebrated the official unveiling of the Department of Energy (DOE) Summit supercomputer today at an event presided over by DOE Secretary Rick Perry. Read more…

CSIR, Nvidia Partner to Launch GPU-Powered AI Center in India

July 10, 2018

As reported by a number of Indian news outlets, India’s Council of Scientific and Industrial Research (CSIR) is partnering with Nvidia to establish a new, AI-focused Centre of Excellence in New Delhi, India's capital. Read more…

By Oliver Peckham

AI Thought Leaders on Capitol Hill

July 14, 2018

On Thursday, July 12, the House Committee on Science, Space, and Technology heard from four academic and industry leaders – representatives from Berkeley Lab, Argonne Lab, GE Global Research and Carnegie Mellon University – on the opportunities springing from the intersection of machine learning and advanced-scale computing. Read more…

By Tiffany Trader

HPC Serves as a ‘Rosetta Stone’ for the Information Age

July 12, 2018

In an age defined and transformed by its data, several large-scale scientific instruments around the globe might be viewed as a ‘mother lode’ of precious data. With names seemingly created for a ‘techno-speak’ glossary, these interferometers, cyclotrons, sequencers, solenoids, satellite altimeters, and cryo-electron microscopes are churning out data in previously unthinkable and seemingly incomprehensible quantities -- billions, trillions and quadrillions of bits and bytes of electro-magnetic code. Read more…

By Warren Froelich

Tsinghua Powers Through ISC18 Field

July 10, 2018

Tsinghua University topped all other competitors at the ISC18 Student Cluster Competition with an overall score of 88.43 out of 100. This gives Tsinghua their s Read more…

By Dan Olds

HPE, EPFL Launch Blue Brain 5 Supercomputer

July 10, 2018

HPE and the Ecole Polytechnique Federale de Lausannne (EPFL) Blue Brain Project yesterday introduced Blue Brain 5, a new supercomputer built by HPE, which displ Read more…

By John Russell

Pumping New Life into HPC Clusters, the Case for Liquid Cooling

July 10, 2018

High Performance Computing (HPC) faces some daunting challenges in the coming years as traditional, industry-standard systems push the boundaries of data center Read more…

By Scott Tease

Meet the ISC18 Cluster Teams: Up Close & Personal

July 6, 2018

It’s time to meet your ISC18 Student Cluster Competition teams. While I was able to film them live at the ISC show, the trick was finding time to edit the vid Read more…

By Dan Olds

PRACEdays18 Keynote Allan Williams (Australia/NCI): We’re Open for Business Down Under!

July 5, 2018

The University of Ljubljana in Slovenia hosted the third annual EHPCSW18 and fifth annual PRACEdays18 events which opened with a plenary session on May 29, 2018 Read more…

By Elizabeth Leake (STEM-Trek for HPCwire)

HPC Under the Covers: Linpack, Exascale & the Top500

June 28, 2018

HPCers can get painted as a monolithic bunch by outsiders, but internecine disagreements abound over the HPCest of HPC jargon, as was evident at ISC this week. Read more…

By Tiffany Trader

Leading Solution Providers

SC17 Booth Video Tours Playlist

Altair @ SC17

Altair

AMD @ SC17

AMD

ASRock Rack @ SC17

ASRock Rack

CEJN @ SC17

CEJN

DDN Storage @ SC17

DDN Storage

Huawei @ SC17

Huawei

IBM @ SC17

IBM

IBM Power Systems @ SC17

IBM Power Systems

Intel @ SC17

Intel

Lenovo @ SC17

Lenovo

Mellanox Technologies @ SC17

Mellanox Technologies

Microsoft @ SC17

Microsoft

Penguin Computing @ SC17

Penguin Computing

Pure Storage @ SC17

Pure Storage

Supericro @ SC17

Supericro

Tyan @ SC17

Tyan

Univa @ SC17

Univa

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This