Reading List: Fault Tolerance Techniques for HPC

August 6, 2015

Among the chief challenges of deploying useful exascale machines, resilience looms large. Today's error rates combined with tomorrow's node counts cannot susta Read more…

By Tiffany Trader

Toward a Fault-Tolerant Cloud

June 23, 2011

With the proliferation of public cloud infrastructures, our dependability on them has increased. Many of our vital services pertaining to the research, industry or even lifestyle domain have been massively moved onto the cloud. Then, what happens when the cloud services we are depending on go down? Dr. Jose Luis Vazquez-Poletti shares some key aspects on how the scientific community can provide answers to this problem. Read more…

By Jose Luis Vazquez-Poletti

Looking to Fault-Tolerant Software

November 9, 2010

Achieving workable software-based fault tolerance will require a fresh approach for developers. Read more…

By Tiffany Trader

The Other Exascale Challenge

June 10, 2010

Supercomputing apps may have to ditch the checkpoint-restart model. Read more…

By Michael Feldman

Embrace Failure!

April 22, 2009

Can smart checkpoints and fault-resilient applications avert a Malthusian Catastrophe? Read more…

By Elizabeth Leake, TeraGrid, and Anne Heavey, iSGTW

  • arrow
  • Click Here for More Headlines
  • arrow

Leading Solution Providers


Strategies for the Spectrum of Cloud Adoptions

Whether an organization chooses a cloud for general business needs or a highly tailored workload, the spectrum of offerings and configurations can be overwhelming. To help you navigate the various cloud options available today, we're breaking down your options, exploring pros and cons, and sharing ways to keep your options open and your business agile as you execute your cloud strategy.

Download this report

Sponsored by Microsoft


Adaptive Flexibility is the Future of Supercomputing – The Arm advantage for HPC workloads

Researchers in academic labs and commercial R&D groups continue to need more compute capacity, which means leveraging the latest innovations in HPC technologies as well as an assortment of resources to meet the unique needs of different workloads. Increasingly, systems based on Arm processors are stepping into that role, offering low power consumption and strategic advantages for HPC workloads.

Download this report

Sponsored by Cray


HPC Goes Mainstream

Whether it's for fraud detection, personalized medicine, manufacturing, smart cities, autonomous vehicles and many other areas, advanced-scale computing has exploded beyond the realm of academia and government and into the private sector. And with data-intensive workloads on the rise, commercial users are turning to HPC-based infrastructure to run the AI, ML and cognitive computing applications that their organizations depend on.

Download this report

Sponsored by SUSE

Advanced Scale Career Development & Workforce Enhancement Center

Featured Advanced Scale Jobs:

Receive the Monthly
Advanced Computing Job Bank Resource:

HPCwire Resource Library

HPCwire Product Showcase

Subscribe to the Monthly
Technology Product Showcase: