The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
May 04, 2007
by Tobias Gradl (Institute of Informatics at the University of Erlangen)
and Reinhold Bader (Leibniz Computing Centre)
On April 17, the Leibniz Computing Centre Munich (LRZ) announced the completion phase 2 of the National Supercomputing System HLRB II installation. The SGI Altix 4700 based computer has been upgraded to 9728 cores and 39 TBytes of memory. With the now installed Intel Itanium 2 Montecito Dual Core processors, the system's LINPACK performance has more than doubled to 56.5 TFlop/s. In normal operation, a single operating system image of Novell's SuSE Linux Enterprise Server 10 spans 512 NUMAlink 4 connected cores. This enables the user to access 2 TBytes of shared memory at a time; more than that is not offered by any other comparable computer in the current Top500 list. In benchmark tests, a single SLES 10 image was capable of running productively on up to 1024 cores.
Besides the LINPACK benchmark, other large scale codes have been used to put HLRB II through its paces. One of them is HHG ("Hierarchical Hybrid Grids"), a multigrid solver for the finite elements method on unstructured grids. HHG has been developed by Benjamin Bergen at the Chair for System Simulation (LSS), University Erlangen-Nuremberg, Germany, under the auspices of the Bavarian KONWIHR supercomputing research consortium, and is now being maintained and refined by Tobias Gradl at LSS. It has received the 2006 award of the International Supercomputing Conference (ISC). The software is designed to solve as large as possible simulations as fast as possible. Both goals are achieved by using a compromise between structured and unstructured meshes. Unstructured coarse grid patches are refined in a structured way, which results in minimal storage space for the operator stencils and in high MFlop/s rates thanks to regular memory access patterns.
Using 9170 cores of HLRB II, HHG solved a finite element simulation with 307 billion unknowns in 93 seconds (7.75 seconds per V-cycle of the multigrid method). These figures are impressive -- presumably a world record -- but the scaling is also very good. When keeping the problem size per processor core constant, a V-cycle running on 64 cores takes 4.93 seconds, 5.68 seconds on 4080 cores, and 6.33 seconds on 6120 cores.
Of the 9728 total cores, 6656 cores constitute the "high bandwidth" part of HLRB II. In this part, every dual core processor can access its own memory channel with full speed (8.5 GByte/s). The remaining 3072 cores are organized in groups of two processors (four cores) per memory channel. This inhomogeneity leads to varying performance figures, depending on what part of the machine a program is running on. The effect is visible from the timing results mentioned above. Using up to 6120 cores, i.e., on the "high bandwidth" part, the scaling is good. But when using the whole computer, it deteriorates slightly.
How strongly the memory bandwidth influences performance highly depends on an application's memory access habits. Therefore, SGI and the LRZ HPC support team around Dr. Matthias Brehm eagerly expected the first performance measurements on the new installation phase, to compare them with those from phase 1, in which a single core Itanium 2 Madison processor could use the whole memory bandwidth of 8.5 GByte/s. As they were happy to see, the reduced memory bandwidth per core didn't have as severe an impact as could have been expected. Per-core MFlop/s rates averaged over all applications appear to be at essentially the same level as in phase 1; focusing on the very memory intensive fluid dynamics code shows a per-core performance decrease of up to 40 percent. In particular, HHG's performance, compared to phase 1, is reduced by a factor of 1.1-1.4 when running on the "high bandwidth" part, and by a factor of 1.9-2.0 on the "high density" part. This is more than pleasing, considering the fact that the available memory bandwidth per core has been reduced by factors of 2 and 4, respectively.
HLRB II is suited especially well for multigrid solvers like HHG. Multigrid is among the computationally most efficient methods for solving PDEs, but it is hard to implement on large scale computers, because its hierarchy of coarse meshes creates an inherent lack of parallelism. HLRB II provides 4 GBytes of main memory per core, much more than some other comparable supercomputers. Because of that, larger subdomains of the finite element mesh can be assigned to each processor, and the coarse meshes are still large enough to allow high MFlop/s rates.
About Leibniz Computing Centre (Leibniz-Rechenzentrum, LRZ)
Leibniz Computing Centre is a facility of the commission for information science of the Bavarian Academy of Sciences, with around 170 employees. As a modern service enterprise, LRZ constitutes a scientific computing centre for all universities in Munich and the Academy of Sciences, as well as being a national centre for scientific supercomputing and a centre for large-scale archiving of data. It is responsible for planning, upgrading and deployment of the Munich Scientific Network and acts as a state-wide competence centre for data communication networks. For further information please visit www.lrz.de.
-----
Source: Leibniz Computing Centre
(Digg, Technorati, more)
PGI Accelerator™ Fortran 95/03 and C99 compilers for x64+NVIDIA
Accelerate applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing Fortran and C programs. Available now for Linux, MacOS and Windows. Download a free 15 day trial.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager integrates all the cluster productivity tools you need to deploy, run and manage your HPC environment.
C-DAC announces plans for a petaflop system; IBM researchers are working on vertical integration techniques to extend Moore's Law another 15 years. We recap those stories and more in our weekly wrapup.
Read More...
The Moscow State University supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of tripling its processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.
Read More...
Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.
Read More...
Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...
Mar 17 | The Register | But what about the tier ones? Read more...
Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...
Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...
Mar 16 | Bio-IT World | Biotech firm builds genetic models from patient data. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html