HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Features

SGI Expanding the Reach of Linux


Page:  1  of  3
1 | 2 | 3   All  »  

Steve Neuner, the director for Linux engineering at SGI, has been pushing Linux up the scalability ladder for the better part of the 21st century. In August of this year, SGI announced that they were able to run a single system image of the Linux OS over 1024 processors on an Itanium-based Altix 4700 supercomputer. How was this feat accomplished? This week at the Gelato Itanium Conference and Expo (ICE) in Singapore, Neuner presented a session that described the Linux kernel modification that helped to make this possible. HPCwire caught up with him before the conference to ask him about the Linux improvements and where the future of single system image scalability is headed.

HPCwire: Can you give us a brief time line of how Linux has scaled from 8 processors to 1024 processors over the last five years?

Neuner: In the summer of 2001, we built an early 32 processor prototype system in the lab. SGI used it extensively to begin identifying and fixing scaling issues. This development system was later increased to 64 processors, which became our initial configuration limit for a single system image of the Linux kernel when we launched SGI Altix in February of 2003. A year later, that limit was increased to 256 processors.

Later in February of 2005, we started shipping the 2.6 Linux kernel, which was a major step forward that enabled support for 512 processor systems. In August of this year, this limit was increased to our now current limit of 1024 processors.

HPCwire: Can you describe the types of changes that were made to the Linux 2.6 kernel to get a single image of the OS to run on a 1024-processor system?

Neuner: The changes usually fall into one of two categories. The first is getting the system to boot and recognize all the hardware. This typically involves increasing the size of data structures throughout the kernel that contain information related to the amount of nodes, processors, or memory on a NUMA system. SGI uses a hardware simulator to find and fix most of these problems before we have a system of that size in the lab. For example, when engineering received the first 1024 processor system for testing, it booted right up the very first time.

Once Linux can boot and run on a larger system, the next category of fixes is getting Linux to perform well. This work often involves running benchmark tests and various HPC applications, so hot-locks, cache lines, timing windows, and race conditions can be exposed and pin-pointed in order to improve Linux's efficiency on very large systems.

Surprisingly, most of the changes going from 512 processors to 1024 processors fell into the first category of enabling the kernel to recognize and boot on a 1024 processor system. It turned out that the performance scaling work done earlier with our 512p system paid off since issues were already found and fixed. So going from 512p to 1024p became more of a testing and validation exercise. As a result, we were able to officially support 1024 processors for our customers a year ahead of plan.

HPCwire: Can you talk about some of the other 2.6 Linux kernel enhancements that have been added for HPC functionality?

Neuner: As processor counts increase, so does memory. Significant improvements in 2.6 were made in memory handling and supporting larger memory sizes. Some examples in this area include support for over 10 TB of memory, improved node locality and NUMA awareness in various kernel memory allocations mechanisms, 4-level page table, page migration, out-of-memory error handling improvements, and fault containment of double-bit uncorrectable memory errors.

Page:  1  of  3
1 | 2 | 3   All  »  

HPCwire on Twitter

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  

HPC in the Cloud Part 2
People to Watch 2010


Top Headlines

Australia Commissions Cray Supercomputer

Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...

Intel Partners See 'Easy' Upgrade Path With Xeon 5600 Chips

Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...

AMD: OEMs primed for Opteron 6100s

Mar 17 | The Register | But what about the tier ones? Read more...

Arrival of the Desktop Supercomputer

Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...

Scheduling HPC In The Cloud

Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

SC09 HPC in the Cloud

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Lab
ESC
DEISA PRACE Symposium