OpenHPC Progress Report – v2.0, More Recipes, Cloud and Arm Support, Says Schulz

By John Russell

October 26, 2020

Launched in late 2015 and transitioned to a Linux Foundation Project in 2016, OpenHPC has marched quietly but steadily forward. Its goal “to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools” was always greeted with enthusiasm although there was wariness with Intel as the early driver. Since then OpenHPC has fared well by sticking to open source road (while still enjoying Intel’s support).

Earlier this month OpenHPC released version 2.0 targeting new Linux operating system distributions and including new support for cloud and Arm. SC20 would have been v2.0’s coming out party had the pandemic not converted HPC’s annual extravaganza into a digital gathering. OpenHPC is still planning to offer SC20 activities. The older 1.0 branch (now v1.39) is likely to get another minor update and then move into maintenance mode.

Karl Schulz, OpenHPC

Karl Schulz, the project lead for OpenHPC since its start and currently a research professor (Oden Institute) at UT Austin, provided HPCwire with an update of OpenHPC activities and plans. Among other things Schulz touched on growing traction in the cloud and rising demand for Arm builds; why it’s tough to tightly integrate GPU tech; effort to expand the number of tutorials offered; and thoughts on including processor-specific recipes down the road.

Presented here is a lightly-edited portion of Schulz’s conversation with HPCwire.

HPCwire: It’s been quite a while since we’ve talked. I know v2.0 was just released and I am thinking the last release of 1.39 was well before that, maybe a full year ago. Can you briefly bring us up to speed?

Karl Schulz: That’s right, the last full release would have been right before supercomputing of 2019, and then we sort of made a commitment to try to work on the 2.0 release. The 1.3x branch was targeting older distro versions. It supported RHEL 7 (Red Hat Enterprise Linux) or CentOS 7 and SLES 12 (SUSE Linux Enterprise Server). We have been basically working since then to put out a 2.0 release against newer distro versions. It did take a little while for us to get that out the door?

HPCwire: It looks like v2.0 is not backward compatible; maybe talk about the thinking there and what are some of the major changes?

Karl Schulz: It’s not intended to be backwards compatible. The primary reason for that is because the OSs themselves are not exactly intended to be upgradeable, meaning it’s pretty difficult and not really a supported path to go from RHEL 7 to RHEL 8, for example. SLES has a little more support, they say, but even they get kind of nervous anytime you want to go from a major distro version and try to upgrade it. So that’s the real reason 2.0 is not backwards compatible. We also took the opportunity to make some significant changes. The big part is 2.0 targets the new distros. We’re still sticking with CentOS (open source) which is by far our most popular of the recipes that are downloaded, but we did switch from SLES to Leap (non-commercial version of SLES).

I don’t know how closely you follow that world. SUSE has always had its enterprise edition, and open SUSE but it was not exactly 100% compatible with the enterprise distribution. They now have a version of open SUSE called Leap. So, for example, there’s a leap 15.1 which roughly maps to SLES 15 Service Pack 1, and they are in fact binary compatible. We took the opportunity to sort of switch, being an open source project, to build against open SUSE Leap 15 as opposed to SLES 15, even though you can use OpenHPC with either one.

HPCwire: What other significant changes are there in v2.0?

Karl Schulz: Well, there’s a lot of stuff happening in HPC space around network interfaces and small things on the MPI stack. We have adopted the newer CH4 interface in MPICH which is coming down the pipe. As you may know, a lot of the commercial MPI installs start from MPICH as a base. This is a newer interface coming out of Argonne (National Laboratory) that we have adopted.

At the same time that gives us the flexibility to take advantage of newer fabric transport interfaces. OpenHPC 2.0 introduces two new fabric interfaces, Libfabric and UCX. We are trying to support both as best we can; that means for MPICH builds we have versions of both. The same thing for open MPI which supports both of those transport layers. Those are pretty significant changes in 2.0. From the end-user perspective it shouldn’t matter too much, but from an administrator perspective, we’re sort of assuming that people are going to want to be using Libfabric and potentially UCX as well.

HPCwire: OpenHPC has come a long way since its start in 2015 with Intel as the driving force. The worry was Intel would exert undue influence it. Has it?

Karl Schulz: I have been involved since the beginning and we were concerned upfront with trying to make sure the project got going as a true community project. There’s been a couple of things that have really helped along the way. Getting multiple vendors who are in sort of the same space, if you will, to be part of the project has been very positive helping spur growth and adoption. We were very pleased Arm joined and we started doing builds against Arm processors and adding recipes for that. That was an important milestone for the project to show that it really intended to support multiple architectures.

Same thing with multiple distros. We’ve had multiple distro folks involved since the get go, but maintaining that and growing the number of recipes within open HPC has been important. When we started back in 2016, we had one installation recipe; it was for CentOS and it was for Slurm and used one provisioner. With 2.0, we have something like 10 recipes, which span two architectures, two distros, two provisioners, and multiple types of recipes using those provisioners whether you want stateless or stateful. I think that’s another important growth point for the project.

HPCwire: Who is the target user? At key message at the start of the project was the notion of making it easier to deploy HPC capabilities which implied adoption of HPC by less experienced users.

Karl Schulz: One of the things we’ve always been sensitive to provide building blocks for HPC and there’s always this Catch 22 between, are you are you targeting the highest-end of folks, the DOE labs and really big supercomputing centers who have a lot of expertise, or are you targeting people who are maybe in smaller shops, who are building their first cluster. We wanted to do a little bit of both, which is certainly difficult, but I think the way we’ve organized the project and the way that we’ve organized the packaging does allow people to sort of pick and choose what they’d like to use.

We’ve also been very happy to see continued growth in the academic space. You see a lot of academic institutions who are we’re using open HPC pretty much straight up or just customizing a little bit. That’s the important part [that] we didn’t want to prohibit that customization. It’s the same for OEMs. We have some OEMs who are taking OpenHPC packages, rebuilding it and, providing a version to their customers with support, which we always thought was important because that that’s a way to keep the OEMs engaged in the project and actually to help fund the project, frankly.

HPCwire: Who are examples of OEMs and universities working with OpenHPC?

Karl Schulz: Lenovo is an example. QCT is a member organization that has some of that as well. Those two to come to mind. I believe, you can you can buy a cluster from Dell and have them pre-install OpenHPC. Those are a few examples. In terms of academia, it’s a huge number of universities, and I can send you a link our cluster registry,

HPCwire: What’s OpenHPC doing with regard to growing demand for AI compute capability and the infusion of machine learning and frameworks into HPC?

Karl Schulz: We’ve seen this certainly. One thing I’ll add is we have seen the desire to not just do on-premise type of installations, but also spinning up HPC environments in the cloud and on top of that running different kinds of workloads, and machine learning is certainly one of those. That’s something in the last year we have spent sort of more time on.

OpenHPC definitely started focusing on on-premise types of installations and for use in containerization. The last time we talked, I was big on containerization and certainly still am, that hasn’t gone anywhere. But I think you mix all these things together, and you have this desire for common HPC software running in the cloud, using containers to run workloads. That’s really what we’ve seen. We’ve done some recent work, having tutorials – we’re trying to grow our tutorial efforts – and had a tutorial at the PEARC (Practice and Experience in Advanced Research Computing) conference this summer. It was focused on using OpenHPC packaging, but installing it in the cloud. We had everybody work through building up a dynamic cluster that would fire up compute nodes automatically when you submit a job to the resource manager and doing all that through AWS in that case.

We’re expanding on that will have another tutorial at supercomputing; it’s again going to walk people through how to use OpenHPC packages in the cloud, but then we will [also] do a hands-on tutorial, now that we have this environment spun up, on how to use containerization and run some machine learning workloads like TensorFlow. We’re definitely seeing more and more of that sort of use case and we’ve been trying to put together documentation and tutorial efforts to help people with at least using bits and pieces from OpenHPC.

HPCwire: Are you getting help from some of the big cloud providers as well? Are they offering OpenHPC as a way in which you could spin up a cluster at AWS using their tools?

Karl Schulz: Not yet. We are fortunate [in that] we have one of our committee members is at AWS and we have good traction getting technical expertise to help us with their tools. In fact, this was part of the tutorial at PEARC; we had help using some AWS tools to do the cluster installation. At the moment, we’re really targeting administrators who want to leverage cloud resources to do that. I could imagine in the future, perhaps it becomes a little bit more of a push button type of activity. We are making images available, which are pre-built images that people can access in the cloud to make it little easier, but they still need to walk through the process of tying cluster nodes together with a head node and a logging node and all that kind of stuff.

HPCwire: Now that 2.0 is out, will you try and move back to a quarterly release schedule?

Karl Schulz: I do think it [release schedule] will become more frequent again. Before we came out with 2.0 we realized we had to set expectations for the previous branch and the new branch. I will say I’ve been sort of shocked at how fast 2.0 has been picked up. We put out a release candidate in June, because we knew when anybody’s installing a new system, [such as] RHEL 8, you want to go with the latest possible [HPC stack]. In about three months, we saw 2.0 packages being used as much the 1.3 packages. Now it’s surpassed that. So in four months, we already have more use of this new branch. We did have some [1.3 branch] requests. We’ll probably put out one more release in the 1.3 series to fix a few things and update a few packages people have asked for. Then the 1.3 three series will go into a maintenance mode [and] really the only thing that we push out [then] is are security fixes. Seeing the quick uptake of 2.0 also helps justify that decision, but we will hopefully have another 1.3 release by the end of the year.

HPCwire: Can you provide some numbers around OpenHPC users overall? How many people are using it now and what’s the growth been?

Karl Schulz: That’s a hard question to answer. What I’ve been doing to have some metric for being able to watch growth is look at how many sites hit our repository every month. It’s just something that should be consistent or at least measurable. We’re averaging about 10,000 IPs per month hitting our repository, and folks are downloading a little over five terabytes of packages every month. Just to put it in perspective, at the end of 2016, we had maybe 1000 IPs a month hitting the site. So it’s about 10x growth.

HPCwire: You’re pleased with the traction and how OpenHPC has become accepted within the community?

Karl Schulz: I am very pleased. I’m happy it has sort of transitioned from a single company project to a true community effort, and we have a great group of folks who participate on our technical steering committee, we have a good governing board, everybody seems to be involved with it for the right reasons. Thus far, I’m gonna knock on my wood table here, we haven’t encountered politics.

HPCwire: Does Intel still play a leadership role?

Karl Schulz: They do have a leadership role. The governing board member from Intel at the moment is serving as our chair and they’ve continued to be active. Intel has participants on the technical steering committee from their open source organization within Intel.

HPCwire: One reason I ask is we’re following Intel’s efforts with OneAPI watching to see if blooms into a true open source activity.

Karl Schulz: We’ve been very appreciative of their support and, as I said, it has been consistent throughout the project. On the oneAPI stuff, it’s hard to say how that will go. Obviously we understand the importance of vendor compilers, in particular, with the HPC market, which is why even though OpenHPC is focused on open source we have some compatibility with the vendor compilers. From the beginning we’ve had that with the Intel compiler, the Parallel Studio suite, where OpenHPC provides a compatibility shim layer where people can go acquire the Intel compiler separately and then enable third party builds from OpenHPC that link against that compiler.

That was an important design decision for us because if we didn’t do that I think OpenHPC would have always been perceived as just sort of a nice project but only providing builds with GCC, for example. We really want to use the vendor compiler for whatever architecture we’re building on. It was important for us to design that in from the beginning. Now, the other thing that’s important about to 2.0 is we’re starting to introduce that same type of capability for the Arm Allinea compiler. I would say over the last year we’ve seen a steady growth in downloads for all the Arm builds we’ve done. Certainly, Intel has the lion’s share, but we’ve seen steady growth in Arm interest from OpenHPC’s perspective.

HPCwire: What about the whole sort of emergence of heterogeneous architecture and growing use of GPUs or some sort of accelerator? How does how does that, if at all, figure into OpenHPC plans?

Karl Schulz: That is a tough one for us at the moment. GPUs are obviously very popular and are continuing to grow in popularity and that is one place that is difficult to sort of include in the same functionality. You know it’s not terribly hard to do what we’ve done the vendor compilers, because that’s really sort of an add-on. You can do that after the system is instantiated. But for something like GPU drivers, that’s a little more complicated because you really need to have those at a time when you are provisioning a system. Because that’s not open source, it does make it difficult for us to be able to integrate that.

We have seen other people put stuff on top of OpenHPC to do that, and certainly, many users are running OpenHPC with GPU systems; what they’re doing is grabbing the drivers from Nvidia and adding them themselves. We will always want to support that type of operation, but we don’t have a handle for how to sort of integrate [GPU drivers] more directly at the moment due to the licensing.

HPCwire: Looking out six months to a year, what are the plans and priorities for OpenHPC?

Karl Schulz: Getting 2.0 has taken up most of our time and thinking. One of the things we did was try to make it a little bit easier for people to customize their builds. OpenHPC is focused on providing binary builds, so people can get up and running quickly and use them in a container and all that stuff. But you can imagine situations where maybe an OEM wanted to take those packages and say “I want to maximize every last bit of performance for my particular architecture.” That’s a situation that’s a little bit different than OpenHPC. We don’t know in advance specific processor details, which means we have to be pretty generic in our builds. We try to make it now easier, where people can take the builds from OpenHPC and easily add more optimization to them, and also make it so that they can co-install their customized packages against the OpenHPC packages. That was a request from the community we did get into 2.0.

Actually, that was a request from both sides (vendor and user). The first time that particular discussion came up was through interaction with DOE which was standing up an Arm system; they were starting from the OpenHPC packages and wanted to test what could they get if they turned on all the bells and whistles from the compiler. It’s definitely something we wanted to support. So we put a little effort into that.

One thing you could imagine perhaps farther down the road as OpenHPC continues to grow and gain traction is we’ll enough resources to provide a subset of packages that have optimized builds for a particular architecture. We know it doesn’t make sense, for example, for the resource management to turn on all the bells and whistles from your compiler for that, but [it might make sense] for something like BLAS or some of the other linear algebra libraries. Our thinking is farther down the road we might have a generic OpenHPC repository, and then, perhaps, processor-specific repositories that have a very small number of packages that are pre-built. Our guess is it’s probably something like 5%-to-10% of the packages that are really mission critical that are used in a lot of scientific applications that would benefit from extra levels of optimization.

HPCwire: Karl, thanks very much for your time.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, code-named Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from its predecessors, including the red-hot H100 and A100 GPUs. Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. While Nvidia may not spring to mind when thinking of the quant Read more…

2024 Winter Classic: Meet the HPE Mentors

March 18, 2024

The latest installment of the 2024 Winter Classic Studio Update Show features our interview with the HPE mentor team who introduced our student teams to the joys (and potential sorrows) of the HPL (LINPACK) and accompany Read more…

Houston We Have a Solution: Addressing the HPC and Tech Talent Gap

March 15, 2024

Generations of Houstonian teachers, counselors, and parents have either worked in the aerospace industry or know people who do - the prospect of entering the field was normalized for boys in 1969 when the Apollo 11 missi Read more…

Apple Buys DarwinAI Deepening its AI Push According to Report

March 14, 2024

Apple has purchased Canadian AI startup DarwinAI according to a Bloomberg report today. Apparently the deal was done early this year but still hasn’t been publicly announced according to the report. Apple is preparing Read more…

Survey of Rapid Training Methods for Neural Networks

March 14, 2024

Artificial neural networks are computing systems with interconnected layers that process and learn from data. During training, neural networks utilize optimization algorithms to iteratively refine their parameters until Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, code-named Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Houston We Have a Solution: Addressing the HPC and Tech Talent Gap

March 15, 2024

Generations of Houstonian teachers, counselors, and parents have either worked in the aerospace industry or know people who do - the prospect of entering the fi Read more…

Survey of Rapid Training Methods for Neural Networks

March 14, 2024

Artificial neural networks are computing systems with interconnected layers that process and learn from data. During training, neural networks utilize optimizat Read more…

PASQAL Issues Roadmap to 10,000 Qubits in 2026 and Fault Tolerance in 2028

March 13, 2024

Paris-based PASQAL, a developer of neutral atom-based quantum computers, yesterday issued a roadmap for delivering systems with 10,000 physical qubits in 2026 a Read more…

India Is an AI Powerhouse Waiting to Happen, but Challenges Await

March 12, 2024

The Indian government is pushing full speed ahead to make the country an attractive technology base, especially in the hot fields of AI and semiconductors, but Read more…

Charles Tahan Exits National Quantum Coordination Office

March 12, 2024

(March 1, 2024) My first official day at the White House Office of Science and Technology Policy (OSTP) was June 15, 2020, during the depths of the COVID-19 loc Read more…

AI Bias In the Spotlight On International Women’s Day

March 11, 2024

What impact does AI bias have on women and girls? What can people do to increase female participation in the AI field? These are some of the questions the tech Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Analyst Panel Says Take the Quantum Computing Plunge Now…

November 27, 2023

Should you start exploring quantum computing? Yes, said a panel of analysts convened at Tabor Communications HPC and AI on Wall Street conference earlier this y Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Training of 1-Trillion Parameter Scientific AI Begins

November 13, 2023

A US national lab has started training a massive AI brain that could ultimately become the must-have computing resource for scientific researchers. Argonne N Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire