HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Off the Wire

Taking Supercomputing in Germany to a New Level


by Tobias Gradl (Institute of Informatics at the University of Erlangen)
and Reinhold Bader (Leibniz Computing Centre)

On April 17, the Leibniz Computing Centre Munich (LRZ) announced the completion phase 2 of the National Supercomputing System HLRB II installation. The SGI Altix 4700 based computer has been upgraded to 9728 cores and 39 TBytes of memory. With the now installed Intel Itanium 2 Montecito Dual Core processors, the system's LINPACK performance has more than doubled to 56.5 TFlop/s. In normal operation, a single operating system image of Novell's SuSE Linux Enterprise Server 10 spans 512 NUMAlink 4 connected cores. This enables the user to access 2 TBytes of shared memory at a time; more than that is not offered by any other comparable computer in the current Top500 list. In benchmark tests, a single SLES 10 image was capable of running productively on up to 1024 cores.

Besides the LINPACK benchmark, other large scale codes have been used to put HLRB II through its paces. One of them is HHG ("Hierarchical Hybrid Grids"), a multigrid solver for the finite elements method on unstructured grids. HHG has been developed by Benjamin Bergen at the Chair for System Simulation (LSS), University Erlangen-Nuremberg, Germany, under the auspices of the Bavarian KONWIHR supercomputing research consortium, and is now being maintained and refined by Tobias Gradl at LSS. It has received the 2006 award of the International Supercomputing Conference (ISC). The software is designed to solve as large as possible simulations as fast as possible. Both goals are achieved by using a compromise between structured and unstructured meshes. Unstructured coarse grid patches are refined in a structured way, which results in minimal storage space for the operator stencils and in high MFlop/s rates thanks to regular memory access patterns.

Using 9170 cores of HLRB II, HHG solved a finite element simulation with 307 billion unknowns in 93 seconds (7.75 seconds per V-cycle of the multigrid method). These figures are impressive -- presumably a world record -- but the scaling is also very good. When keeping the problem size per processor core constant, a V-cycle running on 64 cores takes 4.93 seconds, 5.68 seconds on 4080 cores, and 6.33 seconds on 6120 cores.

Of the 9728 total cores, 6656 cores constitute the "high bandwidth" part of HLRB II. In this part, every dual core processor can access its own memory channel with full speed (8.5 GByte/s). The remaining 3072 cores are organized in groups of two processors (four cores) per memory channel. This inhomogeneity leads to varying performance figures, depending on what part of the machine a program is running on. The effect is visible from the timing results mentioned above. Using up to 6120 cores, i.e., on the "high bandwidth" part, the scaling is good. But when using the whole computer, it deteriorates slightly.

How strongly the memory bandwidth influences performance highly depends on an application's memory access habits. Therefore, SGI and the LRZ HPC support team around Dr. Matthias Brehm eagerly expected the first performance measurements on the new installation phase, to compare them with those from phase 1, in which a single core Itanium 2 Madison processor could use the whole memory bandwidth of 8.5 GByte/s. As they were happy to see, the reduced memory bandwidth per core didn't have as severe an impact as could have been expected. Per-core MFlop/s rates averaged over all applications appear to be at essentially the same level as in phase 1; focusing on the very memory intensive fluid dynamics code shows a per-core performance decrease of up to 40 percent. In particular, HHG's performance, compared to phase 1, is reduced by a factor of 1.1-1.4 when running on the "high bandwidth" part, and by a factor of 1.9-2.0 on the "high density" part. This is more than pleasing, considering the fact that the available memory bandwidth per core has been reduced by factors of 2 and 4, respectively.

HLRB II is suited especially well for multigrid solvers like HHG. Multigrid is among the computationally most efficient methods for solving PDEs, but it is hard to implement on large scale computers, because its hierarchy of coarse meshes creates an inherent lack of parallelism. HLRB II provides 4 GBytes of main memory per core, much more than some other comparable supercomputers. Because of  that, larger subdomains of the finite element mesh can be assigned to each processor, and the coarse meshes are still large enough to allow high MFlop/s rates.

About Leibniz Computing Centre (Leibniz-Rechenzentrum, LRZ)

Leibniz Computing Centre is a facility of the commission for information science of the Bavarian Academy of Sciences, with around 170 employees. As a modern service enterprise, LRZ constitutes a scientific computing centre for all universities in Munich and the Academy of Sciences, as well as being a national centre for scientific supercomputing and a centre for large-scale archiving of data. It is responsible for planning, upgrading and deployment of the Munich Scientific Network and acts as a state-wide competence centre for data communication networks. For further information please visit www.lrz.de.

-----

Source: Leibniz Computing Centre


HPCwire on Twitter

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  

HPC in the Cloud Part 2
People to Watch 2010


Feature Articles

The Week in Review

C-DAC announces plans for a petaflop system; IBM researchers are working on vertical integration techniques to extend Moore's Law another 15 years. We recap those stories and more in our weekly wrapup.
Read More...

Moscow State University Supercomputer Has Petaflop Aspirations

The Moscow State University supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of tripling its processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.
Read More...

Intel Ups Performance Ante with Westmere Server Chips

Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.
Read More...

Top Headlines

Intel Partners See 'Easy' Upgrade Path With Xeon 5600 Chips

Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...

AMD: OEMs primed for Opteron 6100s

Mar 17 | The Register | But what about the tier ones? Read more...

Arrival of the Desktop Supercomputer

Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...

Scheduling HPC In The Cloud

Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...

Tailoring Medicine with Supercomputers

Mar 16 | Bio-IT World | Biotech firm builds genetic models from patient data. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

SC09 HPC in the Cloud

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Lab
ESC
DEISA PRACE Symposium