Flexibility is Key to Cluster Administration Software

By Bruce Potter and Jennifer Cranfill

February 2, 2007

Introduction

When companies purchase a significant number of machines and cluster them together to solve their computing needs, their site environment often drives specific requirements for their clusters. These requirements vary and often include specific networking configurations, specific applications that need managing, specific approaches to software installation and maintenance, and existing management software and procedures that must be accommodated. The key to successful cluster administration software is that it be flexible enough to accommodate many of these environments. For optimum flexibility, the systems management software must have the following characteristics:

  • It must have some fundamental capabilities that can be used to accomplish a wide variety of tasks. These include capabilities like parallel command execution, configuration file management, and software maintenance.
  • The out-of-band hardware control must be extensible to support a wide variety of hardware.
  • It must support a variety of node installation methodologies, for example: direct installation using the native installer, cloning nodes, or running diskless nodes.
  • It must support a variety of networking configurations, including routers, firewalls, low bandwidth networks, and high security environments.
  • The monitoring capabilities must be configurable, extensible, and support standards.
  • The management software must have the proper APIs and command line interfaces necessary to support running it in a hierarchical fashion for very large clusters or subdivided clusters.
  • It must be modular and customizable so that it can be fit into companies' existing structures and processes (CLI, extensibility, use of isolated parts, etc.)
  • It must have mechanisms for allowing frequent updates and user contributions.

This article will discuss each of these characteristics in turn and give examples of cluster administration software that possesses these qualities.

Flexible Fundamental Capabilities

Cluster administration encompasses a wide variety of tasks that are often unique to the cluster or to the cluster's purpose. Therefore cluster management tools need to provide ways to accomplish many different tasks with simple tools. The more inherent flexibility in these tools the better. Basic functionality that is needed for cluster management includes:

  • Support for multiple distributions: Tools that work across multiple operating systems and architectures allow for greater use. While Red Hat Enterprise Linux (RHEL) and SuSE Linux Enterprise Server (SLES) are two of the main distributions for enterprise clusters, support for free distributions like Fedora, CentOS, Scientific Linux, and Debian is also desired by many cluster users.
  • Distributed command execution: A distributed shell is an essential clustering component, as it allows the administrator to quickly perform command line tasks across the entire cluster or a subset of nodes. This capability is a catch-all, because it allows the administrator to perform tasks that are not specifically supported by the rest of the administration software. Required flexibility includes timeout values (for nodes that are not responding), skipping of offline nodes, and the ability to use any underlying remote shell.
  • Distribution of files: Distribution of files comes in as a close second as an essential clustering capability. There are two modes of file distribution: one time copy, and a repository of files kept common throughout the cluster. The latter mode is useful for maintaining configuration files throughout the cluster or on a subset of nodes, and it can have increased flexibility by allowing different versions of files for different node groups, and by automatically running user-defined scripts before and after files are copied to the nodes.
  • Software maintenance: Software maintenance – the ability to upgrade and install software after a node is installed – is also important to enable administrators to install or upgrade individual applications without reinstalling the node. This feature must also automatically install prerequisite RPMs.

With the basic tools above it is possible to accomplish a large number of complex cluster tasks including installation and startup of the high performance computing (HPC) application stack, cluster wide user management, configuration and startup of services, and additions of nodes to workload queues. For instance, install and startup of HPC software can be done with software maintenance and the distributed shell. Configuration and startup of services like NTP and automounter, as well as user management, can be configured mainly through the distribution of configuration files from the management server.

Examples of cluster administration tools that include forms of the above functionality are: xCAT, the C3 tools in Oscar, Scali Manage, and CSM. Other tools can do just one of the tasks above. For example, Red Hat Network (up2date) and YUM provide software maintenance capabilities.
 
Extensible Hardware Control

Hardware Control provides the key capability of managing the cluster hardware (powering on/off, query, console, firmware flash) without having to be physically present with the hardware. Most cluster hardware provides native mechanisms to accomplish these tasks, but often the mechanisms vary between hardware types. This provides a challenging environment for remote hardware control software, since many clusters consist of heterogeneous hardware. Even if all the compute nodes are the same machine type there are still I/O nodes and non-node devices such as switches, SAN controllers, and terminal servers to be managed.

To support the ever growing number of power methods, the administration software must support user-defined power methods that can be plugged into the main power commands. A pluggable method allows the software to more easily support new hardware, and allows the user to run the same command to all the nodes and devices, despite their different control methods. It also allows other software components, such as installation, to drive the power control to the various hardware.

In addition to power control of the cluster hardware, remote console is another area that requires pluggable methods. There are a wide variety of terminal servers and serial over LAN (SOL) support on the market, and each of these has its own intricacies for establishing a remote console session to the node.

The method for flashing the firmware of the nodes is usually very hardware specific, but at least some flexibility can be achieved by allowing additional drivers to be added to the flashing environment, and to allow flashing to be done either in-band (while the node is running) or pre-OS (before the operating system is installed on the node).

In addition to writing your own console method for new terminal servers, “in house” development of power and console methods can allow more flexibility when upgrading cluster hardware: instead of being required to wait for and upgrade to the latest version of the management software to support new hardware, you can script your own solution. Examples of simple hardware control methods that cluster administrators can easily develop are: power on through Wake On LAN, and power off through a distributed shell, and power control via a power switch like APC or Baytech. Cluster products that provide extensible power control include xCAT, Scali Manage, and CSM.

Variety of Node Installation Methods

Installing the operating system and applications on nodes is one of the most important functions of cluster administration software, because it can take so long to do manually. Because the method of installation affects the other administration processes, it is important for the software to support a variety of installation methods.

For clusters in which the nodes are not all identical and for which there exists a separate software maintenance procedure, the approach of directly installing the RPMs from the distribution media (over the network) is generally the most useful. This allows the administrator to initiate an install with just the distribution CDs in hand, and they can easily specify a different list of RPMs for different nodes. Products that support this method of installation include Rocks, Clusterworx, Scali Manage, xCAT, and CSM. They generally use kickstart's and autoyast's unattended installation features to automate the installation of multiple nodes over the network in parallel.

While many users like the simplicity of the direct installation method, an equally large camp of users prefer the cloning method. This generally combines the node installation method with the node software maintenance strategy. In this approach a model node (sometimes called a “golden” node) is installed manually and configured exactly how the administrator wants the rest of the nodes to be. Then the software image is captured from the golden node and replicated to the other nodes. When updates or configuration changes are necessary, the golden node is updated and the capture/replicate process is done again. This approach is most effective for clusters in which the nodes in the cluster are almost identical, in terms of both hardware and software. Products that provide cloning capability include OSCAR, HP XC, xCAT, CSM, Clusterworx, and the open source tools Partimage and System Imager.

While installing the operating system locally on each node generally works well (disks are cheap, and the OS files can be loaded more quickly at boot time), some users are moving to diskless nodes. The motivation for this is generally not price (disks are dirt cheap these days) or even easier maintenance (there are both pros and cons in this area). The motivation is usually reliability in large clusters, because the last moving part in the node is eliminated. Diskless nodes are achieved in most cases by sending the kernel and possibly some of the file systems to the node when it boots, and then the rest of the files systems are NFS mounted from a server. Products that support diskless nodes include CIT, Clusterworx, warewulf, xCAT, CSM, Scyld, and Egenera. (Scyld only loads a very minimal image on the node. Egenera makes a disk available to each node via a SAN.)

Most users have definite requirements about the type of node installation they want to use, since it is central to their entire administration strategy. Therefore, it is important for the cluster administration software to support as many of the presented node installation methods as possible. These methods should also be customizable by supporting the use of user defined post installation scripts and by supporting install/image servers to increase scalability and performance.

Extensible Monitoring Capabilities

Just as extensible hardware control is important, extensible monitoring of the cluster is a useful tool for automation of cluster events. While there are many enterprise software packages that provide error detection and response, it is useful to have at least some set of customizable and user defined monitoring capabilities built into a cluster administration product. Common event information across the cluster to which the software may need to respond include: node down and up events (useful for manipulating workload queues), memory and swap space used, filesystem space used, processor idle time, network adapter throughput, and syslog entries. The following extensibility points are important in event monitoring:

  • User defined “sensors” to monitor additional OS or application specific values in the system, and to monitor standard instrumentation such as SNMP and CIM that may not already be supported by the native monitoring subsystem.
  • User defined response scripts to be run locally or across the cluster in response to events occurring.
  • The ability to forward events to a variety of enterprise monitoring products such as the Tivoli Enterprise Console, CA Unicenter, or HP OpenView.

Products that have an extensible monitoring system include Ganglia, Nagios, Big Brother, Scali Manage, Clusterworx, and CSM.

Hierarchical Support

There are several possible reasons which might motivate the use of cluster administration in a hierarchical fashion. The obvious reason is to be able to manage more nodes than supported by the current scaling limit of the administration software. Another reason is to divide up the nodes into smaller sets that can be managed individually, sometimes by different administrators, but still retain some central control. A third reason is to handle unusual networking configurations, for example cross-geography clusters. A typical hierarchical cluster consists of a 3 level hierarchy, in which there are sets of nodes, with each set being managed by a management server (called the First Line Management Server, or FMS). A top level management server (Executive Management Server or EMS) manages all of the First Line Management Servers.

Ideally, all management operations could be done from the EMS, but at least the following are critical to be done from the EMS:

  • Install the FMSs, replicate any required OS images, and drive the FMS to install its leaf nodes.
  • Push out updated software to the FMSs and all the leaf nodes.
  • Push out configuration files and data to the FMSs and all the leaf nodes.
  • Run commands on any or all of the FMSs and leaf nodes.
  • Control the leaf node hardware (power on/off, etc.)
  • Configure event monitoring and monitor events from all of the FMSs and leaf nodes.

Hierarchical support is important to allow the cluster administration software to work in more cluster environments. The only products that we know of that support hierarchical clusters as described here are xCAT and CSM. Several products, for example CIT, support a hierarchy for one specific operation, usually for node installation or diskless boot.

Modular and Customizable

We have already mentioned that customers often have established system management processes in their lab prior to using any of the administration products mentioned in this article. It is not normally well received when the product dictates the processes to be used for all the administration tasks (installation, software maintenance, user management, configuration, monitoring, etc.) To avoid this “barrier to entry”, the product must have the following characteristics:

  • A complete command line interface – this facilitates administrators writing scripts around the product's administration commands so that they can be called from the administrator's own processes. To enable this, all the commands must be script-ready by not prompting for input, and by making the output unambiguously parsable, even in internationalized environments.
  • Modular - the administrator must be able to use only parts of the product, while ignoring other parts for which he/she already has a solution. For example, the product should not require that the administrator use its userid management solution to be able to access any of the other functions.
  • Extensible - As mentioned in previous sections, the product needs to be extensible to support different distributions easily (CSM has this), support new hardware for hardware control (xCAT and CSM), monitor user-defined resources (CSM and Ganglia), and support setting up user applications (Rocks, OSCAR, CSM).

Frequent Updates and User Contributions

As we all know, Linux software and its associated hardware does not stand still. The many components of a typical Linux cluster continue to evolve with new versions, usually several times a year, with all the components on different release schedules. And new technology continually appears. As a result, the administration software needs to continually adapt to its changing environment. This requires the ability to put out frequent updates to the product. Open source solutions (e.g. Rocks, CIT, OSCAR) generally have an easier time of this due to their iterative development style and less regression testing done by the development team (and more by the user community). But even vendor products need to find ways to release updates often. CSM uses a combination of traditional product releases and early updates on the xCSM site. User contributions can also help tremendously in keeping up with all the changing components. This is business as usual for open source solutions, but can be difficult for vendor products due to legal restrictions. This issue must be resolved in order for vendor products to be able to keep up with the ever changing environment.

Summary

In Linux clusters, there are so many open source administration utilities and so many home-grown solutions that there is very little need for a one-size-fits-all cluster administration product. The administration software must be extremely flexible to adapt to a variety of environments and to complement, but not conflict with, the utilities already in use.

—–

Bruce Potter

Bruce is a Senior Technical Staff Member at IBM.  He has been working in the area of systems management of clusters since 1989, including AIX clusters based on the IBM SP2, Windows clusters with IBM Director, and Linux & AIX clusters with CSM and open source.

Jennifer Cranfill

Jennifer has been working with clusters for the past seven years. She currently spends her days at Sony Pictures Imageworks in sunny southern California.

—–

For a list of the products referenced in this article click here.

—–

Editors note: Rick Friedman, Scali's VP of Marketing and Product Management wished to clarify some of the Scali Manage capabilities referenced in the article:

“With regard to the section 'Variety of Node Installation Methods', Scali Manage, in fact, supports both image / cloning installation and RPM methods, enabling our users to use RPMs to create a master image for initial cluster setups, and then leveraging RPMs for updates without requiring the re-imaging of the existing systems.

“With regard to the section 'Hierarchical Support', Scali Manage provides this functionality, supporting multiple clusters, multiple networks, and multiple geographies in a hierarchical fashion, enabling cluster administrators to coordinate and manage HPC efforts throughout an organization.

“Finally, with regard to support for diskless nodes, this functionality will be added in our next release coming next month.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Australian Researchers Break All-Time Internet Speed Record

May 26, 2020

If you’ve been stuck at home for the last few months, you’ve probably become more attuned to the quality (or lack thereof) of your internet connection. Even in the U.S. (which has a reasonably fast average broadband Read more…

By Oliver Peckham

Hats Over Hearts: Remembering Rich Brueckner

May 26, 2020

It is with great sadness that we announce the death of Rich Brueckner. His passing is an unexpected and enormous blow to both his family and our HPC family. Rich was born in Milwaukee, Wisconsin on April 12, 1962. His Read more…

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the dominant primate species, with the neanderthals disappearing b Read more…

By Oliver Peckham

Discovering Alternative Solar Panel Materials with Supercomputing

May 23, 2020

Solar power is quickly growing in the world’s energy mix, but silicon – a crucial material in the construction of photovoltaic solar panels – remains expensive, hindering solar’s expansion and competitiveness wit Read more…

By Oliver Peckham

Nvidia Q1 Earnings Top Expectations, Datacenter Revenue Breaks $1B

May 22, 2020

Nvidia’s seemingly endless roll continued in the first quarter with the company announcing blockbuster earnings that exceeded Wall Street expectations. Nvidia said revenues for the period ended April 26 were up 39 perc Read more…

By Doug Black

AWS Solution Channel

Computational Fluid Dynamics on AWS

Over the past 30 years Computational Fluid Dynamics (CFD) has grown to become a key part of many engineering design processes. From aircraft design to modelling the blood flow in our bodies, the ability to understand the behaviour of fluids has enabled countless innovations and improved the time to market for many products. Read more…

TACC Supercomputers Delve into COVID-19’s Spike Protein

May 22, 2020

If you’ve been following COVID-19 research, by now, you’ve probably heard of the spike protein (or S-protein). The spike protein – which gives COVID-19 its namesake crown-like shape – is the virus’ crowbar into Read more…

By Oliver Peckham

Microsoft’s Massive AI Supercomputer on Azure: 285k CPU Cores, 10k GPUs

May 20, 2020

Microsoft has unveiled a supercomputing monster – among the world’s five most powerful, according to the company – aimed at what is known in scientific an Read more…

By Doug Black

HPC in Life Sciences 2020 Part 1: Rise of AMD, Data Management’s Wild West, More 

May 20, 2020

Given the disruption caused by the COVID-19 pandemic and the massive enlistment of major HPC resources to fight the pandemic, it is especially appropriate to re Read more…

By John Russell

AMD Epyc Rome Picked for New Nvidia DGX, but HGX Preserves Intel Option

May 19, 2020

AMD continues to make inroads into the datacenter with its second-generation Epyc "Rome" processor, which last week scored a win with Nvidia's announcement that Read more…

By Tiffany Trader

Hacking Streak Forces European Supercomputers Offline in Midst of COVID-19 Research Effort

May 18, 2020

This week, a number of European supercomputers discovered intrusive malware hosted on their systems. Now, in the midst of a massive supercomputing research effo Read more…

By Oliver Peckham

Nvidia’s Ampere A100 GPU: Up to 2.5X the HPC, 20X the AI

May 14, 2020

Nvidia's first Ampere-based graphics card, the A100 GPU, packs a whopping 54 billion transistors on 826mm2 of silicon, making it the world's largest seven-nanom Read more…

By Tiffany Trader

Wafer-Scale Engine AI Supercomputer Is Fighting COVID-19

May 13, 2020

Seemingly every supercomputer in the world is allied in the fight against the coronavirus pandemic – but not many of them are fresh out of the box. Cerebras S Read more…

By Oliver Peckham

Startup MemVerge on Memory-centric Mission

May 12, 2020

Memory situated at the center of the computing universe, replacing processing, has long been envisioned as instrumental to radically improved datacenter systems Read more…

By Doug Black

In Australia, HPC Illuminates the Early Universe

May 11, 2020

Many billions of years ago, the universe was a swirling pool of gas. Unraveling the story of how we got from there to here isn’t an easy task, with many simul Read more…

By Oliver Peckham

Supercomputer Modeling Tests How COVID-19 Spreads in Grocery Stores

April 8, 2020

In the COVID-19 era, many people are treating simple activities like getting gas or groceries with caution as they try to heed social distancing mandates and protect their own health. Still, significant uncertainty surrounds the relative risk of different activities, and conflicting information is prevalent. A team of Finnish researchers set out to address some of these uncertainties by... Read more…

By Oliver Peckham

[email protected]home Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Global Supercomputing Is Mobilizing Against COVID-19

March 12, 2020

Tech has been taking some heavy losses from the coronavirus pandemic. Global supply chains have been disrupted, virtually every major tech conference taking place over the next few months has been canceled... Read more…

By Oliver Peckham

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Steve Scott Lays Out HPE-Cray Blended Product Roadmap

March 11, 2020

Last week, the day before the El Capitan processor disclosures were made at HPE's new headquarters in San Jose, Steve Scott (CTO for HPC & AI at HPE, and former Cray CTO) was on-hand at the Rice Oil & Gas HPC conference in Houston. He was there to discuss the HPE-Cray transition and blended roadmap, as well as his favorite topic, Cray's eighth-gen networking technology, Slingshot. Read more…

By Tiffany Trader

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems waded deliberately into the then calmer quantum computing (QC) waters. Fast forward to March when Honeywell announced plans to introduce an ion trap-based quantum computer whose ‘performance’ would... Read more…

By John Russell

Fujitsu A64FX Supercomputer to Be Deployed at Nagoya University This Summer

February 3, 2020

Japanese tech giant Fujitsu announced today that it will supply Nagoya University Information Technology Center with the first commercial supercomputer powered Read more…

By Tiffany Trader

Leading Solution Providers

SC 2019 Virtual Booth Video Tour

AMD
AMD
ASROCK RACK
ASROCK RACK
AWS
AWS
CEJN
CJEN
CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
IBM
IBM
MELLANOX
MELLANOX
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
SIX NINES IT
SIX NINES IT
VERNE GLOBAL
VERNE GLOBAL
WEKAIO
WEKAIO

Contributors

Tech Conferences Are Being Canceled Due to Coronavirus

March 3, 2020

Several conferences scheduled to take place in the coming weeks, including Nvidia’s GPU Technology Conference (GTC) and the Strata Data + AI conference, have Read more…

By Alex Woodie

Exascale Watch: El Capitan Will Use AMD CPUs & GPUs to Reach 2 Exaflops

March 4, 2020

HPE and its collaborators reported today that El Capitan, the forthcoming exascale supercomputer to be sited at Lawrence Livermore National Laboratory and serve Read more…

By John Russell

‘Billion Molecules Against COVID-19’ Challenge to Launch with Massive Supercomputing Support

April 22, 2020

Around the world, supercomputing centers have spun up and opened their doors for COVID-19 research in what may be the most unified supercomputing effort in hist Read more…

By Oliver Peckham

Cray to Provide NOAA with Two AMD-Powered Supercomputers

February 24, 2020

The United States’ National Oceanic and Atmospheric Administration (NOAA) last week announced plans for a major refresh of its operational weather forecasting supercomputers, part of a 10-year, $505.2 million program, which will secure two HPE-Cray systems for NOAA’s National Weather Service to be fielded later this year and put into production in early 2022. Read more…

By Tiffany Trader

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the do Read more…

By Oliver Peckham

15 Slides on Programming Aurora and Exascale Systems

May 7, 2020

Sometime in 2021, Aurora, the first planned U.S. exascale system, is scheduled to be fired up at Argonne National Laboratory. Cray (now HPE) and Intel are the k Read more…

By John Russell

TACC Supercomputers Run Simulations Illuminating COVID-19, DNA Replication

March 19, 2020

As supercomputers around the world spin up to combat the coronavirus, the Texas Advanced Computing Center (TACC) is announcing results that may help to illumina Read more…

By Staff report

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This