The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
June 30, 2006
Introduction
Every year, the International Supercomputing Conference (ISC) reviews the major directions and key advances in the field of high performance computing during the preceding year -- from the previous June to the time of the current conference. At ISC 2006, significant accomplishments since June of 2005 were discussed, touching on some of the most interesting events and trends. Unlike last year, the year just concluded has seen no new major breakthroughs in high performance computing.
But such an observation would obscure the important and rapid progress that was achieved along the new path first established during the previous year. Described then as "high density computing" this important change in direction has emerged as the dominant strategy for continuing to exploit Moore's Law. Taking two widely variant forms, high density computing achieves increased performance with continued device density increases while limiting the growth in power consumption that has characterized recent microprocessor product deployment. Multi-core components, integrating multiple processor cores on a single chip, are driving this new path. A second strategy is heterogeneous computing mixing processors of diverse form and function to provide different modalities of superior sustained performance. These two techniques are being employed, sometimes in combination, as the basis for perhaps the most notable direction of the last year: the beginning of the campaign to develop general-purpose Petaflops scale computer systems by the end of this decade.
While serious consideration about the means and methods for reaching a Petaflops capability has been explored since at least 1994, this year marked a turning point with the preparation for projects to develop such machines. But the year also saw, perhaps less dramatically but still of importance, continued improvement and maturation of many of the foundation elements of the HPC arena including new releases of several heavily relied upon software packages including more than one release of MPI, a mainstay of parallel programming. These and other aspects of this year's progress are highlighted in this brief discussion.
Multi-Core
Historically, since the 1980s, microprocessor technology has moved toward very powerful single chip uni-processor designs. First limited by available logic devices and later by latencies due to on-chip execution pipelines and off-chip memory accesses, microprocessor architecture has grown to highly complex systems. Unfortunately, the point of diminishing returns has been reached such that the addition of more devices results in ever decreasing performance improvement. At the same time, power consumption continued to increase with increases in clock speed and total device count to a point that was judged bordering on impractical for future commercial systems.
Enter multi-core. Last year, commercial vendors introduced dual-core components. Performance gain would no longer be achieved through ever larger and more complicated processor design but rather through the integration of multiple processors on the same component chip. Over the last year, multi-core has come to dominate both mainstream commercial systems -- reaching as far down as the laptop -- and supercomputer system design. The emerging generation of MPPs and clusters are all employing multi-core processor components to deliver sustained growth in performance. These include the IBM Blue Gene/L which now dominates the highest high end of the Top500 list, the next generation of Cray XT3 systems, and commodity clusters from more than one vendor using Intel and AMD 64 bit extended x86 architectures. While, the majority of such systems are dual-core, next generation systems are rapidly moving to quad-core. And it is expected that this trend will continue with Moore's law over several iterations.
However, it is recognized that the shift to multi-core brings with it its own challenges, especially for the mainstream markets. In a sense, the HPC community is better prepared for multi-core than the general commercial markets because the shift to parallel processing demanded by the new technology trend is a mainstay for supercomputing. Even for the world of supercomputing, this trend to multi-core will impose a demand for increasing parallelism. If, as is expected, this trend continues, then the amount of parallelism required of user applications may easily increase by two orders of magnitude over the next decade. Also, with more processors being put on the same die, the ratio of off-chip communications demand to I/O pins bandwidth is getting worse, making the exploitation of locality even more critical than before. But with more cores on a chip, the allocation of caches is made more complicated with smaller L1 caches per core and possibly fragmented shared L2 or L3 caches. With little or no architecture support for managing global parallelism, these challenges will have to be addressed by new software methods or more extreme application programmer resource management.
Heterogeneous Computing
This year has seen a marked increase in interest in heterogeneous computing for high performance. Spawned in part by the significant performances demonstrated by special-purpose devices such as graphical processing units (GPU), the idea of finding ways to leverage these industry investments for more general-purpose technical computing has become enticing with a number of projects, mostly in academia but also some work in national laboratories in many countries dedicating time to this. But the move towards heterogeneous computing is driven by more than the perceived opportunity of "low hanging fruit."
Page: 1 of 4(Digg, Technorati, more)
New Paper: Parallel Computing Without Parallel Programming
Learn how domain experts can run VHLL programs like MATLAB® on a variety of high-performance platforms without low-level reprogramming and how to work with the largest datasets and complex algorithms without sacrificing ease of use or reducing productivity.
Spider, the world's biggest Lustre-based, centerwide file system, has been fully tested to support Oak Ridge National Laboratory's new petascale Cray XT4/XT5 Jaguar supercomputer and is now offering early access to scientists.
Read More...
Wolfram Alpha, the Web-based computational engine introduced in May, is not a traditional supercomputing application, but relies on supercomputers to satisfy its unique requirements.
Read More...
There was a new energy at this year's TeraGrid '09 conference thanks to an outstanding turnout for the student program. Thanks to support from the National Science Foundation, more than 100 high school, undergraduate and graduate students were able to participate in the conference.
Read More...
Jul 09 | Engineer Live | The demand for computational tools to underpin the 3D seismic interpretation process has never been more apparent. Read more...
Jul 08 | EE Times | Unemployment for U.S. engineers has reached record levels, according to government figures. Read more...
Jul 08 | Network World | Global spending for 2009 projected to drop 6 percent, for a total of $3.2 trillion. Read more...
Jul 08 | Linux Magazine | Portability or efficiency? Neither is guaranteed when writing explicit parallel code. Read more...
Jul 07 | Ars Technica | Japanese company builds custom ASIC to accelerate real-time ray traced rendering for the auto industry. Read more...
Jul 10 | | Engineers, scientists, and other domain experts depend on the productivity enabled by very high-level language (VHLL) tools like MATLAB® and Python. However, as datasets grow larger and programs get more sophisticated, ordinary desktop computers can no longer keep up. The paper explores how to run VHLL programs on high-performance platforms without low-level reprogramming. Work with large datasets and complex algorithms without sacrificing ease of use or reducing productivity.
Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.
Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell
Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.
Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.