Aspen
Texas Advanced Computing Center
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

The Week in HPC Research


The top research stories of the week have been hand-selected from prominent journals and leading conference proceedings. Here's another diverse set of items, including lessons learned from system failures; a cross-platform OpenCL implementation; the best memory to extract GPU's potential; innovative ideas for next-generation interconnects; and the benefits of cloud storage to HPC applications.

Learning from Failure

A recent paper [PDF] authored by Charng-Da Lu, Computational Scientist at the Center for Computational Research at SUNY at Buffalo, investigates the important topic of HPC system failures. The research team presents 8-24 months of actual failure data generated by three HPC systems at the National Center for Supercomputing Applications (NCSA).

Lu explains the impetus for the research thusly: "Continuous availability of high performance computing (HPC) systems built from commodity components have become a primary concern as system size grows to thousands of processors. To design more reliable systems, a solid understanding of failure behavior of current systems is in need."

Learning from mistakes is essential to progress, and Lu argues that failure data analysis of HPC systems has three main goals:

1. It highlights dependability bottlenecks and serves as a guideline for designing more reliable systems.

2. Real data can be used to drive numerical evaluation of performability models and simulations, which are an essential part of reliability engineering.

3. It can be applied to predict node availability, which is useful for resource characterization and scheduling.

The analysis shows that the three systems had an availability of between 98.7-99.8%. Lu finds that most outages were caused by software halts, while downtime per outage was highest in the case of hardware halts or scheduled maintenance. His team employed failure clustering analysis to identify several correlated failures.

Next >> Box Counting Algorithm on GPU

Box Counting Algorithm on GPU and multi-core CPU

In the prestigious Journal of Supercomputing, Jesús Jiménez and Juan Ruiz de Miras from the Department of Computer Science, University of Jaén in Spain, have authored a paper recounting their work with a cross-platform OpenCL implementation of the box-counting algorithm – one of the most popular methods for estimating the Fractal Dimension.

The Fractal Dimension, they explain, is an effective, but time-consuming image analysis method used in many disciplines, including the biomedical field, environmental science, materials science and computer graphics. When it comes to the analysis of 3D images, box counting proves especially slow-going.

"Unlike parallel programming models that strictly depend on the hardware type and manufacturer, like CUDA," the team writes. "OpenCL allows us to provide an implementation suitable for execution on both GPUs and multi-core CPUs, whatever the hardware manufacturer."

Drawing on the work of earlier research, the authors design an OpenCL algorithm that has been specifically optimized according the type of the target device. They claim average speedups of 7.46× and 4×, when executed on the GPU and the multicore CPU respectively, compared to single-threaded (sequential) CPU implementation.

Next >> Can PCM Benefit GPU?

Can PCM Benefit GPU?

A new technical report from the College of William & Mary Department of Computer Science examines the benefits of deploying phase change memory (PCM) in tandem with GPU systems.

The seven-member research team starts with the following premise:

"Recent years have seen a rapid adoption of Graphic Processing Units (GPU) for computing beyond graphics processing. As a massively parallel architecture, GPU has demonstrated appealing energy efficiency and tremendous throughput. However, the energy efficiency of current GPU systems is still far from meeting the requirement of extreme-scale computing."

"Can PCM Benefit GPU?" - this is the question posed by the researchers and the title of their 11-page paper [PDF]. They point to recent studies that highlight PCM's energy efficiency potential when teamed with CPU systems that have a modest level of parallelism. But would the same benefits apply for GPU-like massively parallel systems?

The authors claim that their work is the "first systematic investigation into this question." They conclude that promise of PCM-based memory for increasing the energy-efficiency of parallel CPU-based systems did not hold true for GPU computing. In fact, the use of PCM in tandem with GPUs significant degraded energy-efficiency. The authors pointed to a "mismatch between those designs and the massive parallelism in GPU" and further note that repairing the mismatch requires "innovations in both hardware and software support."

Ultimately their work reconciles a hybrid memory design with GPU massive parallelism for enhanced energy efficiency. It is this design that they say yields 15.6% and 40.1% energy saving on average compared to DRAM and PCM respectively, with a performance hit of less than 3.9%.

Next >> Interconnects for Exascale

Interconnects for Exascale

As the coming generation of supercomputers reaches into exaflop-class territory, the HPC community faces fundamental challenges to the way that such systems are designed and operated. One the biggest hurdles will be powering and cooling these mammoth machines. Optical interconnects could help alleviate some of these issues and thus have been proposed as a potential exascale enabler, but they are not without challenges themselves, especially in regards to manufacturability.

The feasibility of implementing chip-to-board interconnects for high-performance computing is discussed in a recent paper published in the Feb. 22, 2013 edition of Proceedings of SPIE. Written by a team of European researchers, the paper makes the case for integrating optical interconnect technologies into the module and chip level.

The researchers argue that "the introduction of optical links into High Performance Computing (HPC) could be an option to allow scaling the manufacturing technology to large volume manufacturing. This will drive the need for manufacturability of optical interconnects, giving rise to other challenges that add to the realization of this type of interconnection."

The authors envision a solution that puts optical components on the module level, integrating optical chips, laser diodes or PIN diodes as components. They note the method is analogous to constructing a surface-mount device (SMD), which has its components mounted directly onto the surface of printed circuit boards. This new class of 3-dimensional optical link is symbolic of the "fundamental paradigm shifts" that will usher in the exaflop future.

Next >> Evaluating Cloud Storage for HPC

Evaluating Cloud Storage Services for Tightly-Coupled Applications

This week's HPC cloud item comes from a team of researchers from INRIA and Argonne National Laboratory. Their work "Evaluating Cloud Storage Services for Tightly-Coupled Applications" was published as a chapter in Euro-Par 2012: Parallel Processing Workshops.

Noting that past HPC cloud research primarily focused on performance as a way to quantify the HPC capabilities of public and private clouds, the team sets out to address the topic of data storage as it relates to traditional HPC applications.

"Tightly-coupled applications are a common class of scientific HPC applications, which exhibit specific requirements previously addressed by supercomputers," write the authors. They're referring to the fact that tightly-coupled applications work best when paired with a custom-tuned parallel file system (PFS). And while virtual machines can be outfitted with any file system, including PFS, the setup introduces issues around data persistency.

The research team elect to test a cloud-based storage service, and they opt for an open source platform as opposed to Amazon. They select the Nimbus Cloud framework and its S3-compatible storage service, Cumulus.

The group runs several experiments using an atmospheric modeling application running in a private Nimbus cloud. The results show that the application is able to scale with the size of the data and the number of processes (up to 144 running in parallel), while storing 50 GB of output data on the Cumulus cloud storage service.

Sponsored Links

Webinar: Programming Heterogeneous X64+GPU Systems Using OpenACC
Join Michael Wolfe as he compares the advantages and costs of using both low-level models and the directive-based OpenACC model for programming accelerated heterogeneous systems. Registration is free.

High-Performance Computing in Action
Businesses that want to be on the cutting edge of their industries are increasingly turning to high-performance computing (HPC) solutions to handle complex compute processes and speed up their rate of innovation. Download this Executive Brief to see how businesses in energy, life sciences and entertainment put HPC solutions to work in their operations.

Accelerate your science with Seneca
One of the first HPC providers installing a 4X NVIDIA Kepler K-20 cluster. Invites you to a free evaluation on Seneca’s NVIDIA K20 Kepler cluster, pre-loaded with AMBER, NAMD, LAMMPS

May 22, 2013

May 21, 2013

May 20, 2013

May 17, 2013

May 16, 2013

May 15, 2013

May 14, 2013

May 13, 2013

May 10, 2013


Most Read Features

Most Read Around the Web

Most Read This Just In


Short Takes

Building Supercomputers with Raspberries

May 22, 2013 | At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
Read more...

Running Computational Fluid Dynamics in the Cloud

May 16, 2013 | When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...

Computing the Physics of Bubbles

May 15, 2013 | Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...

Internet2 Awards Program Seeks Innovative Applications

May 10, 2013 | Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

SGI DMF ZeroWatt Disk Solution

In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

SC12 Editorial Feature HPCwire Soundbite sponsored by ISC Xyratex

HPC Job Bank


Featured Events


  • June 16, 2013 - June 20, 2013
    ISC'13
    Leipzig,
    Germany

  • June 17, 2013 - June 18, 2013
    Forecast 2013
    San Francisco, CA
    United States





HPCwire Events