SGI Expanding the Reach of Linux

By Nicole Hemsoth

October 6, 2006

Steve Neuner, the director for Linux engineering at SGI, has been pushing Linux up the scalability ladder for the better part of the 21st century. In August of this year, SGI announced that they were able to run a single system image of the Linux OS over 1024 processors on an Itanium-based Altix 4700 supercomputer. How was this feat accomplished? This week at the Gelato Itanium Conference and Expo (ICE) in Singapore, Neuner presented a session that described the Linux kernel modification that helped to make this possible. HPCwire caught up with him before the conference to ask him about the Linux improvements and where the future of single system image scalability is headed.

HPCwire: Can you give us a brief time line of how Linux has scaled from 8 processors to 1024 processors over the last five years?

Neuner: In the summer of 2001, we built an early 32 processor prototype system in the lab. SGI used it extensively to begin identifying and fixing scaling issues. This development system was later increased to 64 processors, which became our initial configuration limit for a single system image of the Linux kernel when we launched SGI Altix in February of 2003. A year later, that limit was increased to 256 processors.

Later in February of 2005, we started shipping the 2.6 Linux kernel, which was a major step forward that enabled support for 512 processor systems. In August of this year, this limit was increased to our now current limit of 1024 processors.

HPCwire: Can you describe the types of changes that were made to the Linux 2.6 kernel to get a single image of the OS to run on a 1024-processor system?

Neuner: The changes usually fall into one of two categories. The first is getting the system to boot and recognize all the hardware. This typically involves increasing the size of data structures throughout the kernel that contain information related to the amount of nodes, processors, or memory on a NUMA system. SGI uses a hardware simulator to find and fix most of these problems before we have a system of that size in the lab. For example, when engineering received the first 1024 processor system for testing, it booted right up the very first time.

Once Linux can boot and run on a larger system, the next category of fixes is getting Linux to perform well. This work often involves running benchmark tests and various HPC applications, so hot-locks, cache lines, timing windows, and race conditions can be exposed and pin-pointed in order to improve Linux's efficiency on very large systems.

Surprisingly, most of the changes going from 512 processors to 1024 processors fell into the first category of enabling the kernel to recognize and boot on a 1024 processor system. It turned out that the performance scaling work done earlier with our 512p system paid off since issues were already found and fixed. So going from 512p to 1024p became more of a testing and validation exercise. As a result, we were able to officially support 1024 processors for our customers a year ahead of plan.

HPCwire: Can you talk about some of the other 2.6 Linux kernel enhancements that have been added for HPC functionality?

Neuner: As processor counts increase, so does memory. Significant improvements in 2.6 were made in memory handling and supporting larger memory sizes. Some examples in this area include support for over 10 TB of memory, improved node locality and NUMA awareness in various kernel memory allocations mechanisms, 4-level page table, page migration, out-of-memory error handling improvements, and fault containment of double-bit uncorrectable memory errors.

Process scheduling is another area that has seen significant advances. Some examples include the O(1) scheduler, which maintains an almost constant level of system overhead regardless of the system size; CPU affinity support for placement of processes on specific processors; CPUSETS, which allow a user to place specific processors and reserve local memory for exclusive use; and dynamic scheduling domains.

Other areas of improvement include the incorporation of XFS for high bandwidth and large file systems, support for a large number of disks, an overhaul of the block and driver layer to enable large and parallel I/Os, high performance networking with 10 Gigabit Ethernet and InfiniBand, timer resolution and the new thread library.

All these improvements along with 2.6's performance and scaling improvements enable Linux to continue to expand into other areas of deployment. For example, the same general-purpose Linux kernel used from small-to-large or enterprise-to-HPC servers can now be also deployed and used in real-time applications providing support and capabilities previously found only on proprietary or specialized real-time operating systems.

HPCwire: What elements of the Linux HPC work are done by SGI versus others in the community?

Neuner: While SGI often focuses on HPC and I/O related kernel issues, it's not unusual for us to encounter a problem that's already being worked on or addressed by someone in the community, since many performance, error handling and robustness improvements needed for HPC environments also benefit or affect enterprise environments.

However, our access and usage of very large systems also means we are first to find various HPC, scaling or performance related problems. This is due to the fact that one of the best ways to shake out and find problems faster is to “turn up the stress knobs” on a system by using very large system configurations for testing, so systems with large amounts of processors, memory, and I/O are crucial and heavily relied upon for all our kernel development and testing.

Also, as community acceptance is critical to all kernel work SGI does, virtually all of the work we do involves collaboration with some subset of the Linux community.

HPCwire: Do you think the open source nature of Linux has speeded development of HPC OS features or made it a more complex undertaking?

Neuner: At SGI, OS engineers continue to work on kernel issues and improvements on Linux as we did on IRIX. The main difference now is how we deliver these improvements to our customers. Seeking acceptance and agreement on a proposed change from others within the Linux community seemed like an extra hurdle at first, but over time it became clear that this collaboration combined with the high quality standards is why Linux has become highly versatile, robust, and stable for all workload environments including HPC. The Linux community software development model enables our customers to benefit from improvements made by the entire Linux community rather than just improvements made by SGI engineers.

HPCwire: What are the practical limits for single system image scalability? Are they inherent in the kernel design or just the result of hardware limitations?

Neuner: The hardware, OS, and HPC application all need to scale in order for users to see the performance gains from adding more processors to their system. With HPC applications, scaling can occur in two ways. The first is with the already numerous existing “embarrassingly parallel” applications that are ready to exploit large CPU counts using the hardware as a “capability server.” The second way is when a system is used as a “capacity server,” where multiple applications each use only a subset of the total available processors. Either way, many HPC applications and environments can usually take advantage of a larger system when more processors are added.

For hardware, SGI systems are designed with hardware scalability and performance as paramount. The operating system scalability typically lags behind, especially since one really needs to get access to the hardware first in order to go after and solve the OS issues. The hardware limit for our current generation of Altix is 4096 processors for running a single system image of the operating system.

With the operating system, the practical limit is hit when a highly specialized, light-weight, and dedicated operating system customized for a specific hardware architecture must be used over a general purpose one. Today, SGI uses the same general purpose Linux kernel whether running with 2 or 1024 processors — which is incredible and a testament to the excellent design and work by everyone within the Linux community.

We've already successfully booted Linux in the lab on 1742 processors, at which point we encountered more internal kernel issues that will need to be addressed, so it's an on-going process and impossible to predict the upper limit for Linux, given its impressive track record.

—–

Steve Neuner is the Linux Engineering Director at SGI and has been working on Linux and Itanium-based systems since joining SGI 7 years ago. Prior to SGI, Steve worked at Digital Equipment Corporation, Sequent Computer Systems, and MAI Basic Four. He has been involved with Linux and UNIX kernel development for over 20 years.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire