The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
April 25, 2008
A research paper exploring ways to make a popular scientific analysis code run smoothly on different types of multicore computers won a Best Paper Award at the IEEE International Parallel and Distributed Processing Symposium (IPDPS) this month.
The paper's lead author and CRD researcher, Samuel Williams, and his collaborators chose lattice Bolzmann code to explore a broader issue: how to make best use of multicore supercomputers. The multicore trend started recently, and the computing industry is expected to add more cores per chip to boost performance in the future. The paper described how the researchers developed a code generator that could efficiently and productively optimize a lattice Bolzmann code to deliver better performance on a new breed of supercomputers built with multicore processors.
The multicore trend is taking flight without an equally concerted effort by software developers. "The computing revolution towards massive on-chip parallelism is moving forward with relatively little concrete evidence on how to best to use these technologies for real applications," Williams wrote in the paper.
The researchers settled on the lattice Bolzmann code used to model turbulence in magnetohydrodynamics simulations that play a key role in areas of physics research, from star formation to magnetic fusion devices. The code, LBMHD, typically performs poorly on traditional multicore machines.
The optimization research resulted in a great improvement to the code performance -- substantially higher than any published to date. The researchers also gained insight into building effective multicore applications, compilers and other tools.
The paper, "Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms," won the Best Paper Award in the application track. Jonathan Carter of NERSC, Lenny Oliker of CRD, John Shalf of NERSC and Kathy Yelick of NERSC co-authored the paper. The researchers presented their paper at the IPDPS in Miami. Yelick, who is NERSC Director, also was a keynote speaker at the symposium.
Oliker, Carter and Shalf also authored a paper that won the same award last year. The paper, "Scientific Application Performance on Candidate PetaScale Platforms," was co-authored by CRD researchers Andrew Canning, Costin Iancu, Michael Lijewski, Shoaib Kamil, Hongzhang Shan and Erich Strohmaier. Stephane Ethier from the Princeton Plasma Physics Laboratory and Tom Goodale from Louisiana State University also contributed to the work.The researchers first looked at why the original LBMHD performs poorly on these multicore systems. Williams and his fellow researchers found that, contrary to conventional wisdom, memory bus bandwidth didn't present the biggest obstacle. Instead, lack of resources for mapping virtual memory pages, insufficient cache bandwidth, high memory latency, and/or poor functional unit scheduling did more to hamper the code's performance, Williams said.
The researchers created a code generator abstraction for LBMHD in order to optimize it for different multicore architectures. The optimization efforts included loop restructuring, code reordering , software prefetching, and explicit SIMDization. The researchers characterized their effort as akin to the "auto-tuning methodology exemplified by libraries like ATLAS and OSKI."
The results showed a wide range of performances on different processors and pointed to bottlenecks in the hardware that prevented the code from running well. The optimization efforts also resulted in a huge gain in performance -- the speed of the optimized code ran up to 14 times faster than the original version. It also achieved sustained performance for this code that is higher than any published to date: over 50 percent of peak flops on two of the processor architectures.
(Digg, Technorati, more)
White Paper: HPC in a Green and Modular Solution Building Block
Learn how the Appro GreenBlade™ System helps consolidate server, storage, network, power and simplified management capabilities in a single package while providing the performance-density, energy-efficiency and best ROI for your business.
Petascale Computing: Algorithms and Applications, edited by David A. Bader, is the first book in CRC's Computational Science Series, edited by Horst Simon. Although the book is a collection of papers, Bader has done an excellent job of creating a compilation that holds together and covers a broad topic very well.
Read More...
Cilk++ used in parallelization of the FP-tree algorithm for pattern mining; Istanbul benchmark results posted; and the latest on the NVIDIA Tesla shortage. John West recaps those stories and more in our weekly wrap-up.
Read More...
Last week's International Supercomputing Conference (ISC'09) was a convenient excuse for vendors to announce a raft of new products, but three, in particular, stood out.
Read More...
Jul 01 | GenomeWeb Daily News | The popularity of cloud computing in the life sciences community was on full display at April's Bio-IT World conference. Read more...
Jul 01 | Linux Magazine | How can getting to the ocean help with HPC computing? Read more...
Jun 29 | GCN.com | Agency issues RFI for "Ubiquitous High Performance Computing" systems. Read more...
Jun 29 | Computerworld | The bottom of the TOP500 reveals the coming revolution in truly accessible high-end computing. Read more...
Jun 18 | EE Times | Parallel software also takes spotlight at Stanford confab. Read more...
Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.
Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell
Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.
Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.