Launched in late 2015 and transitioned to a Linux Foundation Project in 2016, OpenHPC has marched quietly but steadily forward. Its goal “to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools” was always greeted with enthusiasm although there was wariness with Intel as the early driver. Since then OpenHPC has fared well by sticking to open source road (while still enjoying Intel’s support).
Earlier this month OpenHPC released version 2.0 targeting new Linux operating system distributions and including new support for cloud and Arm. SC20 would have been v2.0’s coming out party had the pandemic not converted HPC’s annual extravaganza into a digital gathering. OpenHPC is still planning to offer SC20 activities. The older 1.0 branch (now v1.39) is likely to get another minor update and then move into maintenance mode.
Karl Schulz, the project lead for OpenHPC since its start and currently a research professor (Oden Institute) at UT Austin, provided HPCwire with an update of OpenHPC activities and plans. Among other things Schulz touched on growing traction in the cloud and rising demand for Arm builds; why it’s tough to tightly integrate GPU tech; effort to expand the number of tutorials offered; and thoughts on including processor-specific recipes down the road.
Presented here is a lightly-edited portion of Schulz’s conversation with HPCwire.
HPCwire: It’s been quite a while since we’ve talked. I know v2.0 was just released and I am thinking the last release of 1.39 was well before that, maybe a full year ago. Can you briefly bring us up to speed?
Karl Schulz: That’s right, the last full release would have been right before supercomputing of 2019, and then we sort of made a commitment to try to work on the 2.0 release. The 1.3x branch was targeting older distro versions. It supported RHEL 7 (Red Hat Enterprise Linux) or CentOS 7 and SLES 12 (SUSE Linux Enterprise Server). We have been basically working since then to put out a 2.0 release against newer distro versions. It did take a little while for us to get that out the door?
HPCwire: It looks like v2.0 is not backward compatible; maybe talk about the thinking there and what are some of the major changes?
Karl Schulz: It’s not intended to be backwards compatible. The primary reason for that is because the OSs themselves are not exactly intended to be upgradeable, meaning it’s pretty difficult and not really a supported path to go from RHEL 7 to RHEL 8, for example. SLES has a little more support, they say, but even they get kind of nervous anytime you want to go from a major distro version and try to upgrade it. So that’s the real reason 2.0 is not backwards compatible. We also took the opportunity to make some significant changes. The big part is 2.0 targets the new distros. We’re still sticking with CentOS (open source) which is by far our most popular of the recipes that are downloaded, but we did switch from SLES to Leap (non-commercial version of SLES).
I don’t know how closely you follow that world. SUSE has always had its enterprise edition, and open SUSE but it was not exactly 100% compatible with the enterprise distribution. They now have a version of open SUSE called Leap. So, for example, there’s a leap 15.1 which roughly maps to SLES 15 Service Pack 1, and they are in fact binary compatible. We took the opportunity to sort of switch, being an open source project, to build against open SUSE Leap 15 as opposed to SLES 15, even though you can use OpenHPC with either one.
HPCwire: What other significant changes are there in v2.0?
Karl Schulz: Well, there’s a lot of stuff happening in HPC space around network interfaces and small things on the MPI stack. We have adopted the newer CH4 interface in MPICH which is coming down the pipe. As you may know, a lot of the commercial MPI installs start from MPICH as a base. This is a newer interface coming out of Argonne (National Laboratory) that we have adopted.
At the same time that gives us the flexibility to take advantage of newer fabric transport interfaces. OpenHPC 2.0 introduces two new fabric interfaces, Libfabric and UCX. We are trying to support both as best we can; that means for MPICH builds we have versions of both. The same thing for open MPI which supports both of those transport layers. Those are pretty significant changes in 2.0. From the end-user perspective it shouldn’t matter too much, but from an administrator perspective, we’re sort of assuming that people are going to want to be using Libfabric and potentially UCX as well.
HPCwire: OpenHPC has come a long way since its start in 2015 with Intel as the driving force. The worry was Intel would exert undue influence it. Has it?
Karl Schulz: I have been involved since the beginning and we were concerned upfront with trying to make sure the project got going as a true community project. There’s been a couple of things that have really helped along the way. Getting multiple vendors who are in sort of the same space, if you will, to be part of the project has been very positive helping spur growth and adoption. We were very pleased Arm joined and we started doing builds against Arm processors and adding recipes for that. That was an important milestone for the project to show that it really intended to support multiple architectures.
Same thing with multiple distros. We’ve had multiple distro folks involved since the get go, but maintaining that and growing the number of recipes within open HPC has been important. When we started back in 2016, we had one installation recipe; it was for CentOS and it was for Slurm and used one provisioner. With 2.0, we have something like 10 recipes, which span two architectures, two distros, two provisioners, and multiple types of recipes using those provisioners whether you want stateless or stateful. I think that’s another important growth point for the project.
HPCwire: Who is the target user? At key message at the start of the project was the notion of making it easier to deploy HPC capabilities which implied adoption of HPC by less experienced users.
Karl Schulz: One of the things we’ve always been sensitive to provide building blocks for HPC and there’s always this Catch 22 between, are you are you targeting the highest-end of folks, the DOE labs and really big supercomputing centers who have a lot of expertise, or are you targeting people who are maybe in smaller shops, who are building their first cluster. We wanted to do a little bit of both, which is certainly difficult, but I think the way we’ve organized the project and the way that we’ve organized the packaging does allow people to sort of pick and choose what they’d like to use.
We’ve also been very happy to see continued growth in the academic space. You see a lot of academic institutions who are we’re using open HPC pretty much straight up or just customizing a little bit. That’s the important part [that] we didn’t want to prohibit that customization. It’s the same for OEMs. We have some OEMs who are taking OpenHPC packages, rebuilding it and, providing a version to their customers with support, which we always thought was important because that that’s a way to keep the OEMs engaged in the project and actually to help fund the project, frankly.
HPCwire: Who are examples of OEMs and universities working with OpenHPC?
Karl Schulz: Lenovo is an example. QCT is a member organization that has some of that as well. Those two to come to mind. I believe, you can you can buy a cluster from Dell and have them pre-install OpenHPC. Those are a few examples. In terms of academia, it’s a huge number of universities, and I can send you a link our cluster registry,
HPCwire: What’s OpenHPC doing with regard to growing demand for AI compute capability and the infusion of machine learning and frameworks into HPC?
Karl Schulz: We’ve seen this certainly. One thing I’ll add is we have seen the desire to not just do on-premise type of installations, but also spinning up HPC environments in the cloud and on top of that running different kinds of workloads, and machine learning is certainly one of those. That’s something in the last year we have spent sort of more time on.
OpenHPC definitely started focusing on on-premise types of installations and for use in containerization. The last time we talked, I was big on containerization and certainly still am, that hasn’t gone anywhere. But I think you mix all these things together, and you have this desire for common HPC software running in the cloud, using containers to run workloads. That’s really what we’ve seen. We’ve done some recent work, having tutorials – we’re trying to grow our tutorial efforts – and had a tutorial at the PEARC (Practice and Experience in Advanced Research Computing) conference this summer. It was focused on using OpenHPC packaging, but installing it in the cloud. We had everybody work through building up a dynamic cluster that would fire up compute nodes automatically when you submit a job to the resource manager and doing all that through AWS in that case.
We’re expanding on that will have another tutorial at supercomputing; it’s again going to walk people through how to use OpenHPC packages in the cloud, but then we will [also] do a hands-on tutorial, now that we have this environment spun up, on how to use containerization and run some machine learning workloads like TensorFlow. We’re definitely seeing more and more of that sort of use case and we’ve been trying to put together documentation and tutorial efforts to help people with at least using bits and pieces from OpenHPC.
HPCwire: Are you getting help from some of the big cloud providers as well? Are they offering OpenHPC as a way in which you could spin up a cluster at AWS using their tools?
Karl Schulz: Not yet. We are fortunate [in that] we have one of our committee members is at AWS and we have good traction getting technical expertise to help us with their tools. In fact, this was part of the tutorial at PEARC; we had help using some AWS tools to do the cluster installation. At the moment, we’re really targeting administrators who want to leverage cloud resources to do that. I could imagine in the future, perhaps it becomes a little bit more of a push button type of activity. We are making images available, which are pre-built images that people can access in the cloud to make it little easier, but they still need to walk through the process of tying cluster nodes together with a head node and a logging node and all that kind of stuff.
HPCwire: Now that 2.0 is out, will you try and move back to a quarterly release schedule?
Karl Schulz: I do think it [release schedule] will become more frequent again. Before we came out with 2.0 we realized we had to set expectations for the previous branch and the new branch. I will say I’ve been sort of shocked at how fast 2.0 has been picked up. We put out a release candidate in June, because we knew when anybody’s installing a new system, [such as] RHEL 8, you want to go with the latest possible [HPC stack]. In about three months, we saw 2.0 packages being used as much the 1.3 packages. Now it’s surpassed that. So in four months, we already have more use of this new branch. We did have some [1.3 branch] requests. We’ll probably put out one more release in the 1.3 series to fix a few things and update a few packages people have asked for. Then the 1.3 three series will go into a maintenance mode [and] really the only thing that we push out [then] is are security fixes. Seeing the quick uptake of 2.0 also helps justify that decision, but we will hopefully have another 1.3 release by the end of the year.
HPCwire: Can you provide some numbers around OpenHPC users overall? How many people are using it now and what’s the growth been?
Karl Schulz: That’s a hard question to answer. What I’ve been doing to have some metric for being able to watch growth is look at how many sites hit our repository every month. It’s just something that should be consistent or at least measurable. We’re averaging about 10,000 IPs per month hitting our repository, and folks are downloading a little over five terabytes of packages every month. Just to put it in perspective, at the end of 2016, we had maybe 1000 IPs a month hitting the site. So it’s about 10x growth.
HPCwire: You’re pleased with the traction and how OpenHPC has become accepted within the community?
Karl Schulz: I am very pleased. I’m happy it has sort of transitioned from a single company project to a true community effort, and we have a great group of folks who participate on our technical steering committee, we have a good governing board, everybody seems to be involved with it for the right reasons. Thus far, I’m gonna knock on my wood table here, we haven’t encountered politics.
HPCwire: Does Intel still play a leadership role?
Karl Schulz: They do have a leadership role. The governing board member from Intel at the moment is serving as our chair and they’ve continued to be active. Intel has participants on the technical steering committee from their open source organization within Intel.
HPCwire: One reason I ask is we’re following Intel’s efforts with OneAPI watching to see if blooms into a true open source activity.
Karl Schulz: We’ve been very appreciative of their support and, as I said, it has been consistent throughout the project. On the oneAPI stuff, it’s hard to say how that will go. Obviously we understand the importance of vendor compilers, in particular, with the HPC market, which is why even though OpenHPC is focused on open source we have some compatibility with the vendor compilers. From the beginning we’ve had that with the Intel compiler, the Parallel Studio suite, where OpenHPC provides a compatibility shim layer where people can go acquire the Intel compiler separately and then enable third party builds from OpenHPC that link against that compiler.
That was an important design decision for us because if we didn’t do that I think OpenHPC would have always been perceived as just sort of a nice project but only providing builds with GCC, for example. We really want to use the vendor compiler for whatever architecture we’re building on. It was important for us to design that in from the beginning. Now, the other thing that’s important about to 2.0 is we’re starting to introduce that same type of capability for the Arm Allinea compiler. I would say over the last year we’ve seen a steady growth in downloads for all the Arm builds we’ve done. Certainly, Intel has the lion’s share, but we’ve seen steady growth in Arm interest from OpenHPC’s perspective.
HPCwire: What about the whole sort of emergence of heterogeneous architecture and growing use of GPUs or some sort of accelerator? How does how does that, if at all, figure into OpenHPC plans?
Karl Schulz: That is a tough one for us at the moment. GPUs are obviously very popular and are continuing to grow in popularity and that is one place that is difficult to sort of include in the same functionality. You know it’s not terribly hard to do what we’ve done the vendor compilers, because that’s really sort of an add-on. You can do that after the system is instantiated. But for something like GPU drivers, that’s a little more complicated because you really need to have those at a time when you are provisioning a system. Because that’s not open source, it does make it difficult for us to be able to integrate that.
We have seen other people put stuff on top of OpenHPC to do that, and certainly, many users are running OpenHPC with GPU systems; what they’re doing is grabbing the drivers from Nvidia and adding them themselves. We will always want to support that type of operation, but we don’t have a handle for how to sort of integrate [GPU drivers] more directly at the moment due to the licensing.
HPCwire: Looking out six months to a year, what are the plans and priorities for OpenHPC?
Karl Schulz: Getting 2.0 has taken up most of our time and thinking. One of the things we did was try to make it a little bit easier for people to customize their builds. OpenHPC is focused on providing binary builds, so people can get up and running quickly and use them in a container and all that stuff. But you can imagine situations where maybe an OEM wanted to take those packages and say “I want to maximize every last bit of performance for my particular architecture.” That’s a situation that’s a little bit different than OpenHPC. We don’t know in advance specific processor details, which means we have to be pretty generic in our builds. We try to make it now easier, where people can take the builds from OpenHPC and easily add more optimization to them, and also make it so that they can co-install their customized packages against the OpenHPC packages. That was a request from the community we did get into 2.0.
Actually, that was a request from both sides (vendor and user). The first time that particular discussion came up was through interaction with DOE which was standing up an Arm system; they were starting from the OpenHPC packages and wanted to test what could they get if they turned on all the bells and whistles from the compiler. It’s definitely something we wanted to support. So we put a little effort into that.
One thing you could imagine perhaps farther down the road as OpenHPC continues to grow and gain traction is we’ll enough resources to provide a subset of packages that have optimized builds for a particular architecture. We know it doesn’t make sense, for example, for the resource management to turn on all the bells and whistles from your compiler for that, but [it might make sense] for something like BLAS or some of the other linear algebra libraries. Our thinking is farther down the road we might have a generic OpenHPC repository, and then, perhaps, processor-specific repositories that have a very small number of packages that are pre-built. Our guess is it’s probably something like 5%-to-10% of the packages that are really mission critical that are used in a lot of scientific applications that would benefit from extra levels of optimization.
HPCwire: Karl, thanks very much for your time.