Microsoft and SchedMD Partner to Bring Slurm into Azure CycleCloud

May 21, 2020 — Microsoft Azure announced a partnership with the company that developed the open source Slurm workload manager, SchedMD. In a blog posted below, Andy Howard, program manager for HPC & Big Compute at Microsoft, details how the Slurm workload manager will help enhance the HPC Azure experience for customers.

Microsoft Azure is committed to providing a world class HPC platform for our customers. Over the last year, we have demonstrated this commitment with the roll out of new HPC hardware offerings and storage options that rival those in any supercomputing center. Azure CycleCloud is designed to help our HPC customers orchestrate these HPC VMs and build cloud clusters in a way that mirror their on-premises systems they are familiar with, yet provide the elasticity to right-size the clusters based on the workloads.

Azure CycleCloud running a Slurm cluster. Image courtesy of Microsoft.

One of the benefits CycleCloud brings to users is that they get to keep working with the scheduling environment they’ve been using for years, sometimes decades. One scheduler we have seen increasing demand for over the last year is Slurm, an open-source workload manager that has been maintained and developed by SchedMD and capable of scaling to meet the demands of even the largest HPC workloads.

We have partnered with SchedMD to deliver the best user experience for Azure HPC customers. Utilizing Slurm’s elastic compute capability and its topology awareness, CycleCloud is able to orchestrate VMs for a Slurm cluster such that jobs are scheduled on the appropriate VMs according to their resource requirements. For example, tightly-coupled MPI tasks land on partitions with nodes on the same InfiniBand fabric, while non-MPI tasks can use a separate partition designed for scale across multiple VM families. This is particularly helpful for multi-stage workloads or shared “community” clusters with multiple user groups.

With that in mind, every CycleCloud installation includes a Slurm cluster template with two partitions pre-defined: one for MPI workloads, i.e. high performance or “HPC”, and one for distributed, high throughput or “HTC” workloads. This template represents an initial configuration, and modifiable to include any number of partitions, VM types, and autoscale limits.

For more information on deploying Slurm clusters with CycleCloud, visit the CycleCloud documentation or contact your Azure account team. For help customizing and configuring Slurm, or enterprise Slurm support, please contact SchedMD.

About Microsoft

Microsoft enables digital transformation for the era of an intelligent cloud and an intelligent edge. Its mission is to empower every person and every organization on the planet to achieve more.

Source: Andy Howard, Microsoft