An introduction to Fuzzball, which combines the best of enterprise tooling with the latest in modern HPC technology.
The standard architecture of high performance computing (HPC) cluster environments has remained largely static since the mid-1990s. Before this, supercomputers were built usually as one-off, highly specific architectures that were uniquely designed for that one machine—an approach which couldn’t be massively scaled. While these machines were incredibly innovative, advancing needs in science and industry led to the development of the “Message Passing Interface” (MPI) standard around 1994, and this changed the face of HPC at the time.
MPI allows for direct core-to-core communications between the CPU cores on a given single computer or across multiple networked computers at once, which essentially allows for the massive pooling of computational resources. Suddenly, thousands of compute nodes (with, thus, thousands of CPU cores) could be wired together and work in tandem on one simulation. This architecture, called the “Beowulf cluster” — a flat set of compute nodes with a controller head node managing them — became the standard, static way that supercomputers have been designed since the development of MPI in the mid-1990s.
Once Islands, HPC and Enterprise Computing Are Merging Territory
With advancing needs in computing, a new revolution has come to HPC. The divide between enterprise computing and HPC is breaking down as enterprises increasingly need HPC-like resources for their workloads, and, conversely, HPC workloads increasingly need enterprise-like tooling to continue to scale effectively. Enterprises today need massive GPU technology, large-scale compute clusters with thousands of nodes, and scalable HPC to manage websites, databases, SaaS deployments, AI and ML workloads, etc. HPC needs all the benefits of containers and container orchestration and the capabilities of CI/CD (Continuous Integration/Continuous Delivery/Deployment), automated building, software supply chain validation/security, etc., that have already been ubiquitous in enterprise for some time. The problem is, enterprises don’t want to build clusters according to the Beowulf architecture that is “outclassed” by their own best practices and tools. Moreover, the HPC community generally doesn’t have the time or resources necessary to learn the complexities of all these enterprise tools and how they can be leveraged to improve their HPC architectures. In effect, the two worlds don’t speak the same language and desperately need a translator between them.
This has all led to a gradual merger of the spaces in the past few years, culminating with the beginning of an arms race in HPC around who can implement this next-generation of HPC in silicon. A lot of this arms race is about getting an effective integration between HPC and Kubernetes (k8s), a tool used so ubiquitously in enterprise computing that most major cloud platforms (AWS, Azure, GCP, etc.) feature a dedicated way to deploy k8s clusters. One hurdle here is that trying to run HPC work inside of a k8s pod (a deployable collection of containters) usually results in a flat 10-20% performance hit to the end HPC application being run, which is highly undesirable. While there have been previous attempts to work out batch computing effectively in k8s, none of these have seen much market adoption, and k8s remains very unknown in HPC, despite being used ubiquitously in enterprise architecture alongside all kinds of other highly useful tooling.
Introducing Fuzzball Orchestrate: How to Run HPC Workloads in a K8s Environment
CIQ’s recently launched product, Fuzzball Orchestrate, is an “HPC/enterprise computing translator” that solves the k8s problem by integrating k8s with HPC in the way k8s was intended to be used: as an orchestration stack for containers running microservices, instead of as a platform for running batch computing work in k8s pods. Fuzzball Orchestrate is a stack of microservices that run on top of Kubernetes that allow a Fuzzball cluster to run.
There are two sides to a given Fuzzball cluster: the compute side and the management side.
On the compute side, we begin by provisioning with Warewulf, an open-source tool that CIQ provides support for. It’s been around for about 20 years or so and allows you to take the hundreds or thousands of compute nodes in your cluster and serve out a single image to them all at once over iPXE so you can efficiently bring up a whole cluster’s compute nodes with one (or several) node configurations. For the OS layer we prefer Rocky Enterprise Linux, a drop-in replacement for CentOS or RHEL (CIQ is a founding support partner). On top of this, we have Fuzzball Substrate, which is a custom container runtime that we’ve built at CIQ that runs Docker (OCI) or Apptainer containers.
On the management side of our cluster, we once again choose Rocky Enterprise Linux. We then have some type of Kubernetes distribution running on that. The specific type of Kubernetes distribution doesn’t particularly matter: it could be Rancher, OpenShift, vanilla Kubernetes, etc. If you’re up in the cloud, you can use something like EKS on AWS or similar k8s-specific solutions offered by the major cloud providers. Ultimately the underlying version of Kubernetes doesn’t matter too much.
That said, if you’re deploying an on-prem Fuzzball cluster, we have an option called IQube which is a turnkey Fuzzball cluster. It’s basically Kubernetes plus the Fuzzball Orchestrate stack all packed into a container that’s bootstrapped up on top of Substrate to bring up an Orchestrate cluster. In this case, you put Substrate on your management nodes, and then use Substrate to run the containerized Kubernetes and Fuzzball Orchestrate code on bootstrap.
On top of Kubernetes, we run the Fuzzball Orchestrate stack of microservices. This stack, in turn, is the part doing the HPC-associated cluster and HPC workload management tasks, and includes programs like:
- A workflow engine that parses the YAML-based documents that Fuzzball uses to codify different types of HPC work as a workflow
- A volume manager that sets up storage volumes that can be attached to jobs so they can persist data between each other
- A data mover that can reach out to object storages or the internet to be able to ingress/egress data to/from storage volumes, supporting S3 API-compliant object storages
- An image service that manages the pulling and caching of containers, as all workflow jobs in Fuzzball are done from a container
- An instance provisioner that reaches out to cloud platforms on the fly to request and spin up instances to serve as compute resources for a workflow
- A job scheduler that schedules HPC jobs on available resources
This allows the end user to codify their HPC workflows in a comprehensive way, submit them to a Fuzzball cluster, and have Fuzzball take care of the rest. This process can happen entirely from a web GUI Fuzzball has available, or from a Fuzzball CLI; but no SSH-ing or otherwise Linux system tasks are necessary by default to utilize Fuzzball, as has been the case in previous iterations of HPC. This is all API driven, so it’s possible for CI/CD systems to interact with this very robustly.
So, in summary, Fuzzball Orchestrate allows users to codify up their HPC work, whether that be genomics sequencing, a weather simulation, or some type of financial calculations. The workflow can then be submitted to the Fuzzball cluster, and it will run in a Kubernetes-based HPC environment.
To learn more, check out this demo of a Fuzzball Orchestrate cluster running on Amazon Elastic Kubernetes Service, or contact us here.
Forrest Burt is a solutions architect at CIQ, where he works in-depth with containerized HPC and the Fuzzball platform. He was previously an HPC system administrator while a student at Boise State University, supporting campus and national lab researchers on the R2 and Borah clusters while obtaining a B.S. in computer science.