Flexibility is Key to Cluster Administration Software

By Bruce Potter and Jennifer Cranfill

February 2, 2007

Introduction

When companies purchase a significant number of machines and cluster them together to solve their computing needs, their site environment often drives specific requirements for their clusters. These requirements vary and often include specific networking configurations, specific applications that need managing, specific approaches to software installation and maintenance, and existing management software and procedures that must be accommodated. The key to successful cluster administration software is that it be flexible enough to accommodate many of these environments. For optimum flexibility, the systems management software must have the following characteristics:

  • It must have some fundamental capabilities that can be used to accomplish a wide variety of tasks. These include capabilities like parallel command execution, configuration file management, and software maintenance.
  • The out-of-band hardware control must be extensible to support a wide variety of hardware.
  • It must support a variety of node installation methodologies, for example: direct installation using the native installer, cloning nodes, or running diskless nodes.
  • It must support a variety of networking configurations, including routers, firewalls, low bandwidth networks, and high security environments.
  • The monitoring capabilities must be configurable, extensible, and support standards.
  • The management software must have the proper APIs and command line interfaces necessary to support running it in a hierarchical fashion for very large clusters or subdivided clusters.
  • It must be modular and customizable so that it can be fit into companies' existing structures and processes (CLI, extensibility, use of isolated parts, etc.)
  • It must have mechanisms for allowing frequent updates and user contributions.

This article will discuss each of these characteristics in turn and give examples of cluster administration software that possesses these qualities.

Flexible Fundamental Capabilities

Cluster administration encompasses a wide variety of tasks that are often unique to the cluster or to the cluster's purpose. Therefore cluster management tools need to provide ways to accomplish many different tasks with simple tools. The more inherent flexibility in these tools the better. Basic functionality that is needed for cluster management includes:

  • Support for multiple distributions: Tools that work across multiple operating systems and architectures allow for greater use. While Red Hat Enterprise Linux (RHEL) and SuSE Linux Enterprise Server (SLES) are two of the main distributions for enterprise clusters, support for free distributions like Fedora, CentOS, Scientific Linux, and Debian is also desired by many cluster users.
  • Distributed command execution: A distributed shell is an essential clustering component, as it allows the administrator to quickly perform command line tasks across the entire cluster or a subset of nodes. This capability is a catch-all, because it allows the administrator to perform tasks that are not specifically supported by the rest of the administration software. Required flexibility includes timeout values (for nodes that are not responding), skipping of offline nodes, and the ability to use any underlying remote shell.
  • Distribution of files: Distribution of files comes in as a close second as an essential clustering capability. There are two modes of file distribution: one time copy, and a repository of files kept common throughout the cluster. The latter mode is useful for maintaining configuration files throughout the cluster or on a subset of nodes, and it can have increased flexibility by allowing different versions of files for different node groups, and by automatically running user-defined scripts before and after files are copied to the nodes.
  • Software maintenance: Software maintenance – the ability to upgrade and install software after a node is installed – is also important to enable administrators to install or upgrade individual applications without reinstalling the node. This feature must also automatically install prerequisite RPMs.

With the basic tools above it is possible to accomplish a large number of complex cluster tasks including installation and startup of the high performance computing (HPC) application stack, cluster wide user management, configuration and startup of services, and additions of nodes to workload queues. For instance, install and startup of HPC software can be done with software maintenance and the distributed shell. Configuration and startup of services like NTP and automounter, as well as user management, can be configured mainly through the distribution of configuration files from the management server.

Examples of cluster administration tools that include forms of the above functionality are: xCAT, the C3 tools in Oscar, Scali Manage, and CSM. Other tools can do just one of the tasks above. For example, Red Hat Network (up2date) and YUM provide software maintenance capabilities.
 
Extensible Hardware Control

Hardware Control provides the key capability of managing the cluster hardware (powering on/off, query, console, firmware flash) without having to be physically present with the hardware. Most cluster hardware provides native mechanisms to accomplish these tasks, but often the mechanisms vary between hardware types. This provides a challenging environment for remote hardware control software, since many clusters consist of heterogeneous hardware. Even if all the compute nodes are the same machine type there are still I/O nodes and non-node devices such as switches, SAN controllers, and terminal servers to be managed.

To support the ever growing number of power methods, the administration software must support user-defined power methods that can be plugged into the main power commands. A pluggable method allows the software to more easily support new hardware, and allows the user to run the same command to all the nodes and devices, despite their different control methods. It also allows other software components, such as installation, to drive the power control to the various hardware.

In addition to power control of the cluster hardware, remote console is another area that requires pluggable methods. There are a wide variety of terminal servers and serial over LAN (SOL) support on the market, and each of these has its own intricacies for establishing a remote console session to the node.

The method for flashing the firmware of the nodes is usually very hardware specific, but at least some flexibility can be achieved by allowing additional drivers to be added to the flashing environment, and to allow flashing to be done either in-band (while the node is running) or pre-OS (before the operating system is installed on the node).

In addition to writing your own console method for new terminal servers, “in house” development of power and console methods can allow more flexibility when upgrading cluster hardware: instead of being required to wait for and upgrade to the latest version of the management software to support new hardware, you can script your own solution. Examples of simple hardware control methods that cluster administrators can easily develop are: power on through Wake On LAN, and power off through a distributed shell, and power control via a power switch like APC or Baytech. Cluster products that provide extensible power control include xCAT, Scali Manage, and CSM.

Variety of Node Installation Methods

Installing the operating system and applications on nodes is one of the most important functions of cluster administration software, because it can take so long to do manually. Because the method of installation affects the other administration processes, it is important for the software to support a variety of installation methods.

For clusters in which the nodes are not all identical and for which there exists a separate software maintenance procedure, the approach of directly installing the RPMs from the distribution media (over the network) is generally the most useful. This allows the administrator to initiate an install with just the distribution CDs in hand, and they can easily specify a different list of RPMs for different nodes. Products that support this method of installation include Rocks, Clusterworx, Scali Manage, xCAT, and CSM. They generally use kickstart's and autoyast's unattended installation features to automate the installation of multiple nodes over the network in parallel.

While many users like the simplicity of the direct installation method, an equally large camp of users prefer the cloning method. This generally combines the node installation method with the node software maintenance strategy. In this approach a model node (sometimes called a “golden” node) is installed manually and configured exactly how the administrator wants the rest of the nodes to be. Then the software image is captured from the golden node and replicated to the other nodes. When updates or configuration changes are necessary, the golden node is updated and the capture/replicate process is done again. This approach is most effective for clusters in which the nodes in the cluster are almost identical, in terms of both hardware and software. Products that provide cloning capability include OSCAR, HP XC, xCAT, CSM, Clusterworx, and the open source tools Partimage and System Imager.

While installing the operating system locally on each node generally works well (disks are cheap, and the OS files can be loaded more quickly at boot time), some users are moving to diskless nodes. The motivation for this is generally not price (disks are dirt cheap these days) or even easier maintenance (there are both pros and cons in this area). The motivation is usually reliability in large clusters, because the last moving part in the node is eliminated. Diskless nodes are achieved in most cases by sending the kernel and possibly some of the file systems to the node when it boots, and then the rest of the files systems are NFS mounted from a server. Products that support diskless nodes include CIT, Clusterworx, warewulf, xCAT, CSM, Scyld, and Egenera. (Scyld only loads a very minimal image on the node. Egenera makes a disk available to each node via a SAN.)

Most users have definite requirements about the type of node installation they want to use, since it is central to their entire administration strategy. Therefore, it is important for the cluster administration software to support as many of the presented node installation methods as possible. These methods should also be customizable by supporting the use of user defined post installation scripts and by supporting install/image servers to increase scalability and performance.

Extensible Monitoring Capabilities

Just as extensible hardware control is important, extensible monitoring of the cluster is a useful tool for automation of cluster events. While there are many enterprise software packages that provide error detection and response, it is useful to have at least some set of customizable and user defined monitoring capabilities built into a cluster administration product. Common event information across the cluster to which the software may need to respond include: node down and up events (useful for manipulating workload queues), memory and swap space used, filesystem space used, processor idle time, network adapter throughput, and syslog entries. The following extensibility points are important in event monitoring:

  • User defined “sensors” to monitor additional OS or application specific values in the system, and to monitor standard instrumentation such as SNMP and CIM that may not already be supported by the native monitoring subsystem.
  • User defined response scripts to be run locally or across the cluster in response to events occurring.
  • The ability to forward events to a variety of enterprise monitoring products such as the Tivoli Enterprise Console, CA Unicenter, or HP OpenView.

Products that have an extensible monitoring system include Ganglia, Nagios, Big Brother, Scali Manage, Clusterworx, and CSM.

Hierarchical Support

There are several possible reasons which might motivate the use of cluster administration in a hierarchical fashion. The obvious reason is to be able to manage more nodes than supported by the current scaling limit of the administration software. Another reason is to divide up the nodes into smaller sets that can be managed individually, sometimes by different administrators, but still retain some central control. A third reason is to handle unusual networking configurations, for example cross-geography clusters. A typical hierarchical cluster consists of a 3 level hierarchy, in which there are sets of nodes, with each set being managed by a management server (called the First Line Management Server, or FMS). A top level management server (Executive Management Server or EMS) manages all of the First Line Management Servers.

Ideally, all management operations could be done from the EMS, but at least the following are critical to be done from the EMS:

  • Install the FMSs, replicate any required OS images, and drive the FMS to install its leaf nodes.
  • Push out updated software to the FMSs and all the leaf nodes.
  • Push out configuration files and data to the FMSs and all the leaf nodes.
  • Run commands on any or all of the FMSs and leaf nodes.
  • Control the leaf node hardware (power on/off, etc.)
  • Configure event monitoring and monitor events from all of the FMSs and leaf nodes.

Hierarchical support is important to allow the cluster administration software to work in more cluster environments. The only products that we know of that support hierarchical clusters as described here are xCAT and CSM. Several products, for example CIT, support a hierarchy for one specific operation, usually for node installation or diskless boot.

Modular and Customizable

We have already mentioned that customers often have established system management processes in their lab prior to using any of the administration products mentioned in this article. It is not normally well received when the product dictates the processes to be used for all the administration tasks (installation, software maintenance, user management, configuration, monitoring, etc.) To avoid this “barrier to entry”, the product must have the following characteristics:

  • A complete command line interface – this facilitates administrators writing scripts around the product's administration commands so that they can be called from the administrator's own processes. To enable this, all the commands must be script-ready by not prompting for input, and by making the output unambiguously parsable, even in internationalized environments.
  • Modular - the administrator must be able to use only parts of the product, while ignoring other parts for which he/she already has a solution. For example, the product should not require that the administrator use its userid management solution to be able to access any of the other functions.
  • Extensible - As mentioned in previous sections, the product needs to be extensible to support different distributions easily (CSM has this), support new hardware for hardware control (xCAT and CSM), monitor user-defined resources (CSM and Ganglia), and support setting up user applications (Rocks, OSCAR, CSM).

Frequent Updates and User Contributions

As we all know, Linux software and its associated hardware does not stand still. The many components of a typical Linux cluster continue to evolve with new versions, usually several times a year, with all the components on different release schedules. And new technology continually appears. As a result, the administration software needs to continually adapt to its changing environment. This requires the ability to put out frequent updates to the product. Open source solutions (e.g. Rocks, CIT, OSCAR) generally have an easier time of this due to their iterative development style and less regression testing done by the development team (and more by the user community). But even vendor products need to find ways to release updates often. CSM uses a combination of traditional product releases and early updates on the xCSM site. User contributions can also help tremendously in keeping up with all the changing components. This is business as usual for open source solutions, but can be difficult for vendor products due to legal restrictions. This issue must be resolved in order for vendor products to be able to keep up with the ever changing environment.

Summary

In Linux clusters, there are so many open source administration utilities and so many home-grown solutions that there is very little need for a one-size-fits-all cluster administration product. The administration software must be extremely flexible to adapt to a variety of environments and to complement, but not conflict with, the utilities already in use.

—–

Bruce Potter

Bruce is a Senior Technical Staff Member at IBM.  He has been working in the area of systems management of clusters since 1989, including AIX clusters based on the IBM SP2, Windows clusters with IBM Director, and Linux & AIX clusters with CSM and open source.

Jennifer Cranfill

Jennifer has been working with clusters for the past seven years. She currently spends her days at Sony Pictures Imageworks in sunny southern California.

—–

For a list of the products referenced in this article click here.

—–

Editors note: Rick Friedman, Scali's VP of Marketing and Product Management wished to clarify some of the Scali Manage capabilities referenced in the article:

“With regard to the section 'Variety of Node Installation Methods', Scali Manage, in fact, supports both image / cloning installation and RPM methods, enabling our users to use RPMs to create a master image for initial cluster setups, and then leveraging RPMs for updates without requiring the re-imaging of the existing systems.

“With regard to the section 'Hierarchical Support', Scali Manage provides this functionality, supporting multiple clusters, multiple networks, and multiple geographies in a hierarchical fashion, enabling cluster administrators to coordinate and manage HPC efforts throughout an organization.

“Finally, with regard to support for diskless nodes, this functionality will be added in our next release coming next month.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

SIA Recognizes Robert Dennard with 2019 Noyce Award

November 12, 2019

If you don’t know what Dennard Scaling is, the chances are strong you don’t labor in electronics. Robert Dennard, longtime IBM researcher, inventor of the DRAM and the fellow for whom Dennard Scaling was named, is th Read more…

By John Russell

Leveraging Exaflops Performance to Remediate Nuclear Waste

November 12, 2019

Nuclear waste storage sites are a subject of intense controversy and debate; nobody wants the radioactive remnants in their backyard. Now, a collaboration between Berkeley Lab, Pacific Northwest National University (PNNL Read more…

By Oliver Peckham

Using HPC and Machine Learning to Predict Traffic Congestion

November 12, 2019

Traffic congestion is a never-ending logic puzzle, dictated by commute patterns, but also by more stochastic accidents and similar disruptions. Traffic engineers struggle to model the traffic flow that occurs after accid Read more…

By Oliver Peckham

Mira Supercomputer Enables Cancer Research Breakthrough

November 11, 2019

Dynamic partial-wave spectroscopic (PWS) microscopy allows researchers to observe intracellular structures as small as 20 nanometers – smaller than those visible by optical microscopes – in three dimensions at a mill Read more…

By Staff report

IBM Adds Support for Ion Trap Quantum Technology to Qiskit

November 11, 2019

After years of percolating in the shadow of quantum computing research based on superconducting semiconductors – think IBM, Rigetti, Google, and D-Wave (quantum annealing) – ion trap technology is edging into the QC Read more…

By John Russell

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

IBM Accelerated Insights

Tackling HPC’s Memory and I/O Bottlenecks with On-Node, Non-Volatile RAM

November 8, 2019

On-node, non-volatile memory (NVRAM) is a game-changing technology that can remove many I/O and memory bottlenecks and provide a key enabler for exascale. That’s the conclusion drawn by the scientists and researcher Read more…

By Jan Rowell

IBM Adds Support for Ion Trap Quantum Technology to Qiskit

November 11, 2019

After years of percolating in the shadow of quantum computing research based on superconducting semiconductors – think IBM, Rigetti, Google, and D-Wave (quant Read more…

By John Russell

Tackling HPC’s Memory and I/O Bottlenecks with On-Node, Non-Volatile RAM

November 8, 2019

On-node, non-volatile memory (NVRAM) is a game-changing technology that can remove many I/O and memory bottlenecks and provide a key enabler for exascale. Th Read more…

By Jan Rowell

MLPerf Releases First Inference Benchmark Results; Nvidia Touts its Showing

November 6, 2019

MLPerf.org, the young AI-benchmarking consortium, today issued the first round of results for its inference test suite. Among organizations with submissions wer Read more…

By John Russell

Azure Cloud First with AMD Epyc Rome Processors

November 6, 2019

At Ignite 2019 this week, Microsoft's Azure cloud team and AMD announced an expansion of their partnership that began in 2017 when Azure debuted Epyc-backed ins Read more…

By Tiffany Trader

Nvidia Launches Credit Card-Sized 21 TOPS Jetson System for Edge Devices

November 6, 2019

Nvidia has launched a new addition to its Jetson product line: a credit card-sized (70x45mm) form factor delivering up to 21 trillion operations/second (TOPS) o Read more…

By Doug Black

In Memoriam: Steve Tuecke, Globus Co-founder

November 4, 2019

HPCwire is deeply saddened to report that Steve Tuecke, longtime scientist at Argonne National Lab and University of Chicago, has passed away at age 52. Tuecke Read more…

By Tiffany Trader

Spending Spree: Hyperscalers Bought $57B of IT in 2018, $10B+ by Google – But Is Cloud on Horizon?

October 31, 2019

Hyperscalers are the masters of the IT universe, gravitational centers of increasing pull in the emerging age of data-driven compute and AI.  In the high-stake Read more…

By Doug Black

Cray Debuts ClusterStor E1000 Finishing Remake of Portfolio for ‘Exascale Era’

October 30, 2019

Cray, now owned by HPE, today introduced the ClusterStor E1000 storage platform, which leverages Cray software and mixes hard disk drives (HDD) and flash memory Read more…

By John Russell

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Dell Ramps Up HPC Testing of AMD Rome Processors

October 21, 2019

Dell Technologies is wading deeper into the AMD-based systems market with a growing evaluation program for the latest Epyc (Rome) microprocessors from AMD. In a Read more…

By John Russell

Intel Debuts Pohoiki Beach, Its 8M Neuron Neuromorphic Development System

July 17, 2019

Neuromorphic computing has received less fanfare of late than quantum computing whose mystery has captured public attention and which seems to have generated mo Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

Xilinx vs. Intel: FPGA Market Leaders Launch Server Accelerator Cards

August 6, 2019

The two FPGA market leaders, Intel and Xilinx, both announced new accelerator cards this week designed to handle specialized, compute-intensive workloads and un Read more…

By Doug Black

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

With the Help of HPC, Astronomers Prepare to Deflect a Real Asteroid

September 26, 2019

For years, NASA has been running simulations of asteroid impacts to understand the risks (and likelihoods) of asteroids colliding with Earth. Now, NASA and the European Space Agency (ESA) are preparing for the next, crucial step in planetary defense against asteroid impacts: physically deflecting a real asteroid. Read more…

By Oliver Peckham

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This