Flexibility is Key to Cluster Administration Software

By Bruce Potter and Jennifer Cranfill

February 2, 2007

Introduction

When companies purchase a significant number of machines and cluster them together to solve their computing needs, their site environment often drives specific requirements for their clusters. These requirements vary and often include specific networking configurations, specific applications that need managing, specific approaches to software installation and maintenance, and existing management software and procedures that must be accommodated. The key to successful cluster administration software is that it be flexible enough to accommodate many of these environments. For optimum flexibility, the systems management software must have the following characteristics:

It must have some fundamental capabilities that can be used to accomplish a wide variety of tasks. These include capabilities like parallel command execution, configuration file management, and software maintenance.
The out-of-band hardware control must be extensible to support a wide variety of hardware.
It must support a variety of node installation methodologies, for example: direct installation using the native installer, cloning nodes, or running diskless nodes.
It must support a variety of networking configurations, including routers, firewalls, low bandwidth networks, and high security environments.
The monitoring capabilities must be configurable, extensible, and support standards.
The management software must have the proper APIs and command line interfaces necessary to support running it in a hierarchical fashion for very large clusters or subdivided clusters.
It must be modular and customizable so that it can be fit into companies' existing structures and processes (CLI, extensibility, use of isolated parts, etc.)
It must have mechanisms for allowing frequent updates and user contributions.

This article will discuss each of these characteristics in turn and give examples of cluster administration software that possesses these qualities.

Flexible Fundamental Capabilities

Cluster administration encompasses a wide variety of tasks that are often unique to the cluster or to the cluster's purpose. Therefore cluster management tools need to provide ways to accomplish many different tasks with simple tools. The more inherent flexibility in these tools the better. Basic functionality that is needed for cluster management includes:

Support for multiple distributions: Tools that work across multiple operating systems and architectures allow for greater use. While Red Hat Enterprise Linux (RHEL) and SuSE Linux Enterprise Server (SLES) are two of the main distributions for enterprise clusters, support for free distributions like Fedora, CentOS, Scientific Linux, and Debian is also desired by many cluster users.
Distributed command execution: A distributed shell is an essential clustering component, as it allows the administrator to quickly perform command line tasks across the entire cluster or a subset of nodes. This capability is a catch-all, because it allows the administrator to perform tasks that are not specifically supported by the rest of the administration software. Required flexibility includes timeout values (for nodes that are not responding), skipping of offline nodes, and the ability to use any underlying remote shell.
Distribution of files: Distribution of files comes in as a close second as an essential clustering capability. There are two modes of file distribution: one time copy, and a repository of files kept common throughout the cluster. The latter mode is useful for maintaining configuration files throughout the cluster or on a subset of nodes, and it can have increased flexibility by allowing different versions of files for different node groups, and by automatically running user-defined scripts before and after files are copied to the nodes.
Software maintenance: Software maintenance – the ability to upgrade and install software after a node is installed – is also important to enable administrators to install or upgrade individual applications without reinstalling the node. This feature must also automatically install prerequisite RPMs.

With the basic tools above it is possible to accomplish a large number of complex cluster tasks including installation and startup of the high performance computing (HPC) application stack, cluster wide user management, configuration and startup of services, and additions of nodes to workload queues. For instance, install and startup of HPC software can be done with software maintenance and the distributed shell. Configuration and startup of services like NTP and automounter, as well as user management, can be configured mainly through the distribution of configuration files from the management server.

Examples of cluster administration tools that include forms of the above functionality are: xCAT, the C3 tools in Oscar, Scali Manage, and CSM. Other tools can do just one of the tasks above. For example, Red Hat Network (up2date) and YUM provide software maintenance capabilities.

Extensible Hardware Control

Hardware Control provides the key capability of managing the cluster hardware (powering on/off, query, console, firmware flash) without having to be physically present with the hardware. Most cluster hardware provides native mechanisms to accomplish these tasks, but often the mechanisms vary between hardware types. This provides a challenging environment for remote hardware control software, since many clusters consist of heterogeneous hardware. Even if all the compute nodes are the same machine type there are still I/O nodes and non-node devices such as switches, SAN controllers, and terminal servers to be managed.

To support the ever growing number of power methods, the administration software must support user-defined power methods that can be plugged into the main power commands. A pluggable method allows the software to more easily support new hardware, and allows the user to run the same command to all the nodes and devices, despite their different control methods. It also allows other software components, such as installation, to drive the power control to the various hardware.

In addition to power control of the cluster hardware, remote console is another area that requires pluggable methods. There are a wide variety of terminal servers and serial over LAN (SOL) support on the market, and each of these has its own intricacies for establishing a remote console session to the node.

The method for flashing the firmware of the nodes is usually very hardware specific, but at least some flexibility can be achieved by allowing additional drivers to be added to the flashing environment, and to allow flashing to be done either in-band (while the node is running) or pre-OS (before the operating system is installed on the node).

In addition to writing your own console method for new terminal servers, “in house” development of power and console methods can allow more flexibility when upgrading cluster hardware: instead of being required to wait for and upgrade to the latest version of the management software to support new hardware, you can script your own solution. Examples of simple hardware control methods that cluster administrators can easily develop are: power on through Wake On LAN, and power off through a distributed shell, and power control via a power switch like APC or Baytech. Cluster products that provide extensible power control include xCAT, Scali Manage, and CSM.

Variety of Node Installation Methods

Installing the operating system and applications on nodes is one of the most important functions of cluster administration software, because it can take so long to do manually. Because the method of installation affects the other administration processes, it is important for the software to support a variety of installation methods.

For clusters in which the nodes are not all identical and for which there exists a separate software maintenance procedure, the approach of directly installing the RPMs from the distribution media (over the network) is generally the most useful. This allows the administrator to initiate an install with just the distribution CDs in hand, and they can easily specify a different list of RPMs for different nodes. Products that support this method of installation include Rocks, Clusterworx, Scali Manage, xCAT, and CSM. They generally use kickstart's and autoyast's unattended installation features to automate the installation of multiple nodes over the network in parallel.

While many users like the simplicity of the direct installation method, an equally large camp of users prefer the cloning method. This generally combines the node installation method with the node software maintenance strategy. In this approach a model node (sometimes called a “golden” node) is installed manually and configured exactly how the administrator wants the rest of the nodes to be. Then the software image is captured from the golden node and replicated to the other nodes. When updates or configuration changes are necessary, the golden node is updated and the capture/replicate process is done again. This approach is most effective for clusters in which the nodes in the cluster are almost identical, in terms of both hardware and software. Products that provide cloning capability include OSCAR, HP XC, xCAT, CSM, Clusterworx, and the open source tools Partimage and System Imager.

While installing the operating system locally on each node generally works well (disks are cheap, and the OS files can be loaded more quickly at boot time), some users are moving to diskless nodes. The motivation for this is generally not price (disks are dirt cheap these days) or even easier maintenance (there are both pros and cons in this area). The motivation is usually reliability in large clusters, because the last moving part in the node is eliminated. Diskless nodes are achieved in most cases by sending the kernel and possibly some of the file systems to the node when it boots, and then the rest of the files systems are NFS mounted from a server. Products that support diskless nodes include CIT, Clusterworx, warewulf, xCAT, CSM, Scyld, and Egenera. (Scyld only loads a very minimal image on the node. Egenera makes a disk available to each node via a SAN.)

Most users have definite requirements about the type of node installation they want to use, since it is central to their entire administration strategy. Therefore, it is important for the cluster administration software to support as many of the presented node installation methods as possible. These methods should also be customizable by supporting the use of user defined post installation scripts and by supporting install/image servers to increase scalability and performance.

Extensible Monitoring Capabilities

Just as extensible hardware control is important, extensible monitoring of the cluster is a useful tool for automation of cluster events. While there are many enterprise software packages that provide error detection and response, it is useful to have at least some set of customizable and user defined monitoring capabilities built into a cluster administration product. Common event information across the cluster to which the software may need to respond include: node down and up events (useful for manipulating workload queues), memory and swap space used, filesystem space used, processor idle time, network adapter throughput, and syslog entries. The following extensibility points are important in event monitoring:

User defined “sensors” to monitor additional OS or application specific values in the system, and to monitor standard instrumentation such as SNMP and CIM that may not already be supported by the native monitoring subsystem.
User defined response scripts to be run locally or across the cluster in response to events occurring.
The ability to forward events to a variety of enterprise monitoring products such as the Tivoli Enterprise Console, CA Unicenter, or HP OpenView.

Products that have an extensible monitoring system include Ganglia, Nagios, Big Brother, Scali Manage, Clusterworx, and CSM.

Hierarchical Support

There are several possible reasons which might motivate the use of cluster administration in a hierarchical fashion. The obvious reason is to be able to manage more nodes than supported by the current scaling limit of the administration software. Another reason is to divide up the nodes into smaller sets that can be managed individually, sometimes by different administrators, but still retain some central control. A third reason is to handle unusual networking configurations, for example cross-geography clusters. A typical hierarchical cluster consists of a 3 level hierarchy, in which there are sets of nodes, with each set being managed by a management server (called the First Line Management Server, or FMS). A top level management server (Executive Management Server or EMS) manages all of the First Line Management Servers.

Ideally, all management operations could be done from the EMS, but at least the following are critical to be done from the EMS:

Install the FMSs, replicate any required OS images, and drive the FMS to install its leaf nodes.
Push out updated software to the FMSs and all the leaf nodes.
Push out configuration files and data to the FMSs and all the leaf nodes.
Run commands on any or all of the FMSs and leaf nodes.
Control the leaf node hardware (power on/off, etc.)
Configure event monitoring and monitor events from all of the FMSs and leaf nodes.

Hierarchical support is important to allow the cluster administration software to work in more cluster environments. The only products that we know of that support hierarchical clusters as described here are xCAT and CSM. Several products, for example CIT, support a hierarchy for one specific operation, usually for node installation or diskless boot.

Modular and Customizable

We have already mentioned that customers often have established system management processes in their lab prior to using any of the administration products mentioned in this article. It is not normally well received when the product dictates the processes to be used for all the administration tasks (installation, software maintenance, user management, configuration, monitoring, etc.) To avoid this “barrier to entry”, the product must have the following characteristics:

A complete command line interface – this facilitates administrators writing scripts around the product's administration commands so that they can be called from the administrator's own processes. To enable this, all the commands must be script-ready by not prompting for input, and by making the output unambiguously parsable, even in internationalized environments.
Modular - the administrator must be able to use only parts of the product, while ignoring other parts for which he/she already has a solution. For example, the product should not require that the administrator use its userid management solution to be able to access any of the other functions.
Extensible - As mentioned in previous sections, the product needs to be extensible to support different distributions easily (CSM has this), support new hardware for hardware control (xCAT and CSM), monitor user-defined resources (CSM and Ganglia), and support setting up user applications (Rocks, OSCAR, CSM).

Frequent Updates and User Contributions

As we all know, Linux software and its associated hardware does not stand still. The many components of a typical Linux cluster continue to evolve with new versions, usually several times a year, with all the components on different release schedules. And new technology continually appears. As a result, the administration software needs to continually adapt to its changing environment. This requires the ability to put out frequent updates to the product. Open source solutions (e.g. Rocks, CIT, OSCAR) generally have an easier time of this due to their iterative development style and less regression testing done by the development team (and more by the user community). But even vendor products need to find ways to release updates often. CSM uses a combination of traditional product releases and early updates on the xCSM site. User contributions can also help tremendously in keeping up with all the changing components. This is business as usual for open source solutions, but can be difficult for vendor products due to legal restrictions. This issue must be resolved in order for vendor products to be able to keep up with the ever changing environment.

Summary

In Linux clusters, there are so many open source administration utilities and so many home-grown solutions that there is very little need for a one-size-fits-all cluster administration product. The administration software must be extremely flexible to adapt to a variety of environments and to complement, but not conflict with, the utilities already in use.

—–

Bruce Potter

Bruce is a Senior Technical Staff Member at IBM. He has been working in the area of systems management of clusters since 1989, including AIX clusters based on the IBM SP2, Windows clusters with IBM Director, and Linux & AIX clusters with CSM and open source.

Jennifer Cranfill

Jennifer has been working with clusters for the past seven years. She currently spends her days at Sony Pictures Imageworks in sunny southern California.

—–

For a list of the products referenced in this article click here.

—–

Editors note: Rick Friedman, Scali's VP of Marketing and Product Management wished to clarify some of the Scali Manage capabilities referenced in the article:

“With regard to the section 'Variety of Node Installation Methods', Scali Manage, in fact, supports both image / cloning installation and RPM methods, enabling our users to use RPMs to create a master image for initial cluster setups, and then leveraging RPMs for updates without requiring the re-imaging of the existing systems.

“With regard to the section 'Hierarchical Support', Scali Manage provides this functionality, supporting multiple clusters, multiple networks, and multiple geographies in a hierarchical fashion, enabling cluster administrators to coordinate and manage HPC efforts throughout an organization.

“Finally, with regard to support for diskless nodes, this functionality will be added in our next release coming next month.”

Topics: Systems

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pressing needs and hurdles to widespread AI adoption. The sudde Read more…

Quantinuum Reports 99.9% 2-Qubit Gate Fidelity, Caps Eventful 2 Months

April 16, 2024

March and April have been good months for Quantinuum, which today released a blog announcing the ion trap quantum computer specialist has achieved a 99.9% (three nines) two-qubit gate fidelity on its H1 system. The lates Read more…

Mystery Solved: Intel’s Former HPC Chief Now Running Software Engineering Group

April 15, 2024

Last year, Jeff McVeigh, Intel's readily available leader of the high-performance computing group, suddenly went silent, with no interviews granted or appearances at press conferences. It led to questions -- what's Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Institute for Human-Centered AI (HAI) put out a yearly report to t Read more…

Crossing the Quantum Threshold: The Path to 10,000 Qubits

April 15, 2024

Editor’s Note: Why do qubit count and quality matter? What’s the difference between physical qubits and logical qubits? Quantum computer vendors toss these terms and numbers around as indicators of the strengths of t Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Computational Chemistry Needs To Be Sustainable, Too

April 8, 2024

A diverse group of computational chemists is encouraging the research community to embrace a sustainable software ecosystem. That's the message behind a recent Read more…

Hyperion Research: Eleven HPC Predictions for 2024

April 4, 2024

HPCwire is happy to announce a new series with Hyperion Research - a fact-based market research firm focusing on the HPC market. In addition to providing mark Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Google Addresses the Mysteries of Its Hypercomputer

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Intel’s Xeon General Manager Talks about Server Chips

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

Click Here for More Headlines

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.

Leading Solution Providers

Off The Wire

Industry Headlines

April 17, 2024

April 16, 2024

April 15, 2024

Subscribe to HPCwire's Weekly Update!

2024 Winter Classic: Meet Team Morehouse

MLCommons Launches New AI Safety Benchmark Initiative

Quantinuum Reports 99.9% 2-Qubit Gate Fidelity, Caps Eventful 2 Months

Mystery Solved: Intel’s Former HPC Chief Now Running Software Engineering Group

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

Crossing the Quantum Threshold: The Path to 10,000 Qubits

MLCommons Launches New AI Safety Benchmark Initiative

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

Nvidia’s GTC Is the New Intel IDF

Google Announces Homegrown ARM-based CPUs

Computational Chemistry Needs To Be Sustainable, Too

Hyperion Research: Eleven HPC Predictions for 2024

Nvidia H100: Are 550,000 GPUs Enough for This Year?

Synopsys Eats Ansys: Does HPC Get Indigestion?

Intel’s Server and PC Chip Development Will Blur After 2025

Choosing the Right GPU for LLM Inference and Training

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

Google Addresses the Mysteries of Its Hypercomputer

How AMD May Get Across the CUDA Moat

Leading Solution Providers

Contributors

Tiffany Trader

Editorial Director

Douglas Eadline

Managing Editor

John Russell

Senior Editor

Kevin Jackson

Contributing Editor

Ali Azhar

Contributing Editor

Alex Woodie

Contributing Editor

Addison Snell

Contributing Editor

Drew Jolly

Assistant Editor

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

DoD Takes a Long View of Quantum Computing

China Is All In on a RISC-V Future

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

Eyes on the Quantum Prize – D-Wave Says its Time is Now

GenAI Having Major Impact on Data Culture, Survey Says

Intel’s Xeon General Manager Talks about Server Chips

The Information Nexus of Advanced Computing and Data systems for a High Performance World

Share

Copy short link