The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
October 06, 2006
Steve Neuner, the director for Linux engineering at SGI, has been pushing Linux up the scalability ladder for the better part of the 21st century. In August of this year, SGI announced that they were able to run a single system image of the Linux OS over 1024 processors on an Itanium-based Altix 4700 supercomputer. How was this feat accomplished? This week at the Gelato Itanium Conference and Expo (ICE) in Singapore, Neuner presented a session that described the Linux kernel modification that helped to make this possible. HPCwire caught up with him before the conference to ask him about the Linux improvements and where the future of single system image scalability is headed.
HPCwire: Can you give us a brief time line of how Linux has scaled from 8 processors to 1024 processors over the last five years?
Neuner: In the summer of 2001, we built an early 32 processor prototype system in the lab. SGI used it extensively to begin identifying and fixing scaling issues. This development system was later increased to 64 processors, which became our initial configuration limit for a single system image of the Linux kernel when we launched SGI Altix in February of 2003. A year later, that limit was increased to 256 processors.
Later in February of 2005, we started shipping the 2.6 Linux kernel, which was a major step forward that enabled support for 512 processor systems. In August of this year, this limit was increased to our now current limit of 1024 processors.
HPCwire: Can you describe the types of changes that were made to the Linux 2.6 kernel to get a single image of the OS to run on a 1024-processor system?
Neuner: The changes usually fall into one of two categories. The first is getting the system to boot and recognize all the hardware. This typically involves increasing the size of data structures throughout the kernel that contain information related to the amount of nodes, processors, or memory on a NUMA system. SGI uses a hardware simulator to find and fix most of these problems before we have a system of that size in the lab. For example, when engineering received the first 1024 processor system for testing, it booted right up the very first time.
Once Linux can boot and run on a larger system, the next category of fixes is getting Linux to perform well. This work often involves running benchmark tests and various HPC applications, so hot-locks, cache lines, timing windows, and race conditions can be exposed and pin-pointed in order to improve Linux's efficiency on very large systems.
Surprisingly, most of the changes going from 512 processors to 1024 processors fell into the first category of enabling the kernel to recognize and boot on a 1024 processor system. It turned out that the performance scaling work done earlier with our 512p system paid off since issues were already found and fixed. So going from 512p to 1024p became more of a testing and validation exercise. As a result, we were able to officially support 1024 processors for our customers a year ahead of plan.
HPCwire: Can you talk about some of the other 2.6 Linux kernel enhancements that have been added for HPC functionality?
Neuner: As processor counts increase, so does memory. Significant improvements in 2.6 were made in memory handling and supporting larger memory sizes. Some examples in this area include support for over 10 TB of memory, improved node locality and NUMA awareness in various kernel memory allocations mechanisms, 4-level page table, page migration, out-of-memory error handling improvements, and fault containment of double-bit uncorrectable memory errors.
Page: 1 of 3(Digg, Technorati, more)
PGI Accelerator™ Fortran 95/03 and C99 compilers for x64+NVIDIA
Accelerate applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing Fortran and C programs. Available now for Linux, MacOS and Windows. Download a free 15 day trial.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager integrates all the cluster productivity tools you need to deploy, run and manage your HPC environment.
Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...
Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...
Mar 17 | The Register | But what about the tier ones? Read more...
Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...
Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html