OpenHPC Progress Report – v2.0, More Recipes, Cloud and Arm Support, Says Schulz

By John Russell

October 26, 2020

Launched in late 2015 and transitioned to a Linux Foundation Project in 2016, OpenHPC has marched quietly but steadily forward. Its goal “to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools” was always greeted with enthusiasm although there was wariness with Intel as the early driver. Since then OpenHPC has fared well by sticking to open source road (while still enjoying Intel’s support).

Earlier this month OpenHPC released version 2.0 targeting new Linux operating system distributions and including new support for cloud and Arm. SC20 would have been v2.0’s coming out party had the pandemic not converted HPC’s annual extravaganza into a digital gathering. OpenHPC is still planning to offer SC20 activities. The older 1.0 branch (now v1.39) is likely to get another minor update and then move into maintenance mode.

Karl Schulz, OpenHPC

Karl Schulz, the project lead for OpenHPC since its start and currently a research professor (Oden Institute) at UT Austin, provided HPCwire with an update of OpenHPC activities and plans. Among other things Schulz touched on growing traction in the cloud and rising demand for Arm builds; why it’s tough to tightly integrate GPU tech; effort to expand the number of tutorials offered; and thoughts on including processor-specific recipes down the road.

Presented here is a lightly-edited portion of Schulz’s conversation with HPCwire.

HPCwire: It’s been quite a while since we’ve talked. I know v2.0 was just released and I am thinking the last release of 1.39 was well before that, maybe a full year ago. Can you briefly bring us up to speed?

Karl Schulz: That’s right, the last full release would have been right before supercomputing of 2019, and then we sort of made a commitment to try to work on the 2.0 release. The 1.3x branch was targeting older distro versions. It supported RHEL 7 (Red Hat Enterprise Linux) or CentOS 7 and SLES 12 (SUSE Linux Enterprise Server). We have been basically working since then to put out a 2.0 release against newer distro versions. It did take a little while for us to get that out the door?

HPCwire: It looks like v2.0 is not backward compatible; maybe talk about the thinking there and what are some of the major changes?

Karl Schulz: It’s not intended to be backwards compatible. The primary reason for that is because the OSs themselves are not exactly intended to be upgradeable, meaning it’s pretty difficult and not really a supported path to go from RHEL 7 to RHEL 8, for example. SLES has a little more support, they say, but even they get kind of nervous anytime you want to go from a major distro version and try to upgrade it. So that’s the real reason 2.0 is not backwards compatible. We also took the opportunity to make some significant changes. The big part is 2.0 targets the new distros. We’re still sticking with CentOS (open source) which is by far our most popular of the recipes that are downloaded, but we did switch from SLES to Leap (non-commercial version of SLES).

I don’t know how closely you follow that world. SUSE has always had its enterprise edition, and open SUSE but it was not exactly 100% compatible with the enterprise distribution. They now have a version of open SUSE called Leap. So, for example, there’s a leap 15.1 which roughly maps to SLES 15 Service Pack 1, and they are in fact binary compatible. We took the opportunity to sort of switch, being an open source project, to build against open SUSE Leap 15 as opposed to SLES 15, even though you can use OpenHPC with either one.

HPCwire: What other significant changes are there in v2.0?

Karl Schulz: Well, there’s a lot of stuff happening in HPC space around network interfaces and small things on the MPI stack. We have adopted the newer CH4 interface in MPICH which is coming down the pipe. As you may know, a lot of the commercial MPI installs start from MPICH as a base. This is a newer interface coming out of Argonne (National Laboratory) that we have adopted.

At the same time that gives us the flexibility to take advantage of newer fabric transport interfaces. OpenHPC 2.0 introduces two new fabric interfaces, Libfabric and UCX. We are trying to support both as best we can; that means for MPICH builds we have versions of both. The same thing for open MPI which supports both of those transport layers. Those are pretty significant changes in 2.0. From the end-user perspective it shouldn’t matter too much, but from an administrator perspective, we’re sort of assuming that people are going to want to be using Libfabric and potentially UCX as well.

HPCwire: OpenHPC has come a long way since its start in 2015 with Intel as the driving force. The worry was Intel would exert undue influence it. Has it?

Karl Schulz: I have been involved since the beginning and we were concerned upfront with trying to make sure the project got going as a true community project. There’s been a couple of things that have really helped along the way. Getting multiple vendors who are in sort of the same space, if you will, to be part of the project has been very positive helping spur growth and adoption. We were very pleased Arm joined and we started doing builds against Arm processors and adding recipes for that. That was an important milestone for the project to show that it really intended to support multiple architectures.

Same thing with multiple distros. We’ve had multiple distro folks involved since the get go, but maintaining that and growing the number of recipes within open HPC has been important. When we started back in 2016, we had one installation recipe; it was for CentOS and it was for Slurm and used one provisioner. With 2.0, we have something like 10 recipes, which span two architectures, two distros, two provisioners, and multiple types of recipes using those provisioners whether you want stateless or stateful. I think that’s another important growth point for the project.

HPCwire: Who is the target user? At key message at the start of the project was the notion of making it easier to deploy HPC capabilities which implied adoption of HPC by less experienced users.

Karl Schulz: One of the things we’ve always been sensitive to provide building blocks for HPC and there’s always this Catch 22 between, are you are you targeting the highest-end of folks, the DOE labs and really big supercomputing centers who have a lot of expertise, or are you targeting people who are maybe in smaller shops, who are building their first cluster. We wanted to do a little bit of both, which is certainly difficult, but I think the way we’ve organized the project and the way that we’ve organized the packaging does allow people to sort of pick and choose what they’d like to use.

We’ve also been very happy to see continued growth in the academic space. You see a lot of academic institutions who are we’re using open HPC pretty much straight up or just customizing a little bit. That’s the important part [that] we didn’t want to prohibit that customization. It’s the same for OEMs. We have some OEMs who are taking OpenHPC packages, rebuilding it and, providing a version to their customers with support, which we always thought was important because that that’s a way to keep the OEMs engaged in the project and actually to help fund the project, frankly.

HPCwire: Who are examples of OEMs and universities working with OpenHPC?

Karl Schulz: Lenovo is an example. QCT is a member organization that has some of that as well. Those two to come to mind. I believe, you can you can buy a cluster from Dell and have them pre-install OpenHPC. Those are a few examples. In terms of academia, it’s a huge number of universities, and I can send you a link our cluster registry,

HPCwire: What’s OpenHPC doing with regard to growing demand for AI compute capability and the infusion of machine learning and frameworks into HPC?

Karl Schulz: We’ve seen this certainly. One thing I’ll add is we have seen the desire to not just do on-premise type of installations, but also spinning up HPC environments in the cloud and on top of that running different kinds of workloads, and machine learning is certainly one of those. That’s something in the last year we have spent sort of more time on.

OpenHPC definitely started focusing on on-premise types of installations and for use in containerization. The last time we talked, I was big on containerization and certainly still am, that hasn’t gone anywhere. But I think you mix all these things together, and you have this desire for common HPC software running in the cloud, using containers to run workloads. That’s really what we’ve seen. We’ve done some recent work, having tutorials – we’re trying to grow our tutorial efforts – and had a tutorial at the PEARC (Practice and Experience in Advanced Research Computing) conference this summer. It was focused on using OpenHPC packaging, but installing it in the cloud. We had everybody work through building up a dynamic cluster that would fire up compute nodes automatically when you submit a job to the resource manager and doing all that through AWS in that case.

We’re expanding on that will have another tutorial at supercomputing; it’s again going to walk people through how to use OpenHPC packages in the cloud, but then we will [also] do a hands-on tutorial, now that we have this environment spun up, on how to use containerization and run some machine learning workloads like TensorFlow. We’re definitely seeing more and more of that sort of use case and we’ve been trying to put together documentation and tutorial efforts to help people with at least using bits and pieces from OpenHPC.

HPCwire: Are you getting help from some of the big cloud providers as well? Are they offering OpenHPC as a way in which you could spin up a cluster at AWS using their tools?

Karl Schulz: Not yet. We are fortunate [in that] we have one of our committee members is at AWS and we have good traction getting technical expertise to help us with their tools. In fact, this was part of the tutorial at PEARC; we had help using some AWS tools to do the cluster installation. At the moment, we’re really targeting administrators who want to leverage cloud resources to do that. I could imagine in the future, perhaps it becomes a little bit more of a push button type of activity. We are making images available, which are pre-built images that people can access in the cloud to make it little easier, but they still need to walk through the process of tying cluster nodes together with a head node and a logging node and all that kind of stuff.

HPCwire: Now that 2.0 is out, will you try and move back to a quarterly release schedule?

Karl Schulz: I do think it [release schedule] will become more frequent again. Before we came out with 2.0 we realized we had to set expectations for the previous branch and the new branch. I will say I’ve been sort of shocked at how fast 2.0 has been picked up. We put out a release candidate in June, because we knew when anybody’s installing a new system, [such as] RHEL 8, you want to go with the latest possible [HPC stack]. In about three months, we saw 2.0 packages being used as much the 1.3 packages. Now it’s surpassed that. So in four months, we already have more use of this new branch. We did have some [1.3 branch] requests. We’ll probably put out one more release in the 1.3 series to fix a few things and update a few packages people have asked for. Then the 1.3 three series will go into a maintenance mode [and] really the only thing that we push out [then] is are security fixes. Seeing the quick uptake of 2.0 also helps justify that decision, but we will hopefully have another 1.3 release by the end of the year.

HPCwire: Can you provide some numbers around OpenHPC users overall? How many people are using it now and what’s the growth been?

Karl Schulz: That’s a hard question to answer. What I’ve been doing to have some metric for being able to watch growth is look at how many sites hit our repository every month. It’s just something that should be consistent or at least measurable. We’re averaging about 10,000 IPs per month hitting our repository, and folks are downloading a little over five terabytes of packages every month. Just to put it in perspective, at the end of 2016, we had maybe 1000 IPs a month hitting the site. So it’s about 10x growth.

HPCwire: You’re pleased with the traction and how OpenHPC has become accepted within the community?

Karl Schulz: I am very pleased. I’m happy it has sort of transitioned from a single company project to a true community effort, and we have a great group of folks who participate on our technical steering committee, we have a good governing board, everybody seems to be involved with it for the right reasons. Thus far, I’m gonna knock on my wood table here, we haven’t encountered politics.

HPCwire: Does Intel still play a leadership role?

Karl Schulz: They do have a leadership role. The governing board member from Intel at the moment is serving as our chair and they’ve continued to be active. Intel has participants on the technical steering committee from their open source organization within Intel.

HPCwire: One reason I ask is we’re following Intel’s efforts with OneAPI watching to see if blooms into a true open source activity.

Karl Schulz: We’ve been very appreciative of their support and, as I said, it has been consistent throughout the project. On the oneAPI stuff, it’s hard to say how that will go. Obviously we understand the importance of vendor compilers, in particular, with the HPC market, which is why even though OpenHPC is focused on open source we have some compatibility with the vendor compilers. From the beginning we’ve had that with the Intel compiler, the Parallel Studio suite, where OpenHPC provides a compatibility shim layer where people can go acquire the Intel compiler separately and then enable third party builds from OpenHPC that link against that compiler.

That was an important design decision for us because if we didn’t do that I think OpenHPC would have always been perceived as just sort of a nice project but only providing builds with GCC, for example. We really want to use the vendor compiler for whatever architecture we’re building on. It was important for us to design that in from the beginning. Now, the other thing that’s important about to 2.0 is we’re starting to introduce that same type of capability for the Arm Allinea compiler. I would say over the last year we’ve seen a steady growth in downloads for all the Arm builds we’ve done. Certainly, Intel has the lion’s share, but we’ve seen steady growth in Arm interest from OpenHPC’s perspective.

HPCwire: What about the whole sort of emergence of heterogeneous architecture and growing use of GPUs or some sort of accelerator? How does how does that, if at all, figure into OpenHPC plans?

Karl Schulz: That is a tough one for us at the moment. GPUs are obviously very popular and are continuing to grow in popularity and that is one place that is difficult to sort of include in the same functionality. You know it’s not terribly hard to do what we’ve done the vendor compilers, because that’s really sort of an add-on. You can do that after the system is instantiated. But for something like GPU drivers, that’s a little more complicated because you really need to have those at a time when you are provisioning a system. Because that’s not open source, it does make it difficult for us to be able to integrate that.

We have seen other people put stuff on top of OpenHPC to do that, and certainly, many users are running OpenHPC with GPU systems; what they’re doing is grabbing the drivers from Nvidia and adding them themselves. We will always want to support that type of operation, but we don’t have a handle for how to sort of integrate [GPU drivers] more directly at the moment due to the licensing.

HPCwire: Looking out six months to a year, what are the plans and priorities for OpenHPC?

Karl Schulz: Getting 2.0 has taken up most of our time and thinking. One of the things we did was try to make it a little bit easier for people to customize their builds. OpenHPC is focused on providing binary builds, so people can get up and running quickly and use them in a container and all that stuff. But you can imagine situations where maybe an OEM wanted to take those packages and say “I want to maximize every last bit of performance for my particular architecture.” That’s a situation that’s a little bit different than OpenHPC. We don’t know in advance specific processor details, which means we have to be pretty generic in our builds. We try to make it now easier, where people can take the builds from OpenHPC and easily add more optimization to them, and also make it so that they can co-install their customized packages against the OpenHPC packages. That was a request from the community we did get into 2.0.

Actually, that was a request from both sides (vendor and user). The first time that particular discussion came up was through interaction with DOE which was standing up an Arm system; they were starting from the OpenHPC packages and wanted to test what could they get if they turned on all the bells and whistles from the compiler. It’s definitely something we wanted to support. So we put a little effort into that.

One thing you could imagine perhaps farther down the road as OpenHPC continues to grow and gain traction is we’ll enough resources to provide a subset of packages that have optimized builds for a particular architecture. We know it doesn’t make sense, for example, for the resource management to turn on all the bells and whistles from your compiler for that, but [it might make sense] for something like BLAS or some of the other linear algebra libraries. Our thinking is farther down the road we might have a generic OpenHPC repository, and then, perhaps, processor-specific repositories that have a very small number of packages that are pre-built. Our guess is it’s probably something like 5%-to-10% of the packages that are really mission critical that are used in a lot of scientific applications that would benefit from extra levels of optimization.

HPCwire: Karl, thanks very much for your time.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Nvidia Aims Clara Healthcare at Drug Discovery, Imaging via DGX

April 12, 2021

Nvidia Corp. continues to expand its Clara healthcare platform with the addition of computational drug discovery and medical imaging tools based on its DGX A100 platform, related InfiniBand networking and its AGX develop Read more…

Nvidia Serves Up Its First Arm Datacenter CPU ‘Grace’ During Kitchen Keynote

April 12, 2021

Today at Nvidia’s annual spring GPU technology conference, held virtually once more due to the ongoing pandemic, the company announced its first ever Arm-based CPU, called Grace in honor of the famous American programmer Grace Hopper. Read more…

Nvidia Debuts BlueField-3 – Its Next DPU with Big Plans for an Expanded Role

April 12, 2021

Nvidia today announced its next generation data processing unit (DPU) – BlueField-3 – adding more substance to its evolving concept of the DPU as a full-fledged partner to CPUs and GPUs in delivering advanced computi Read more…

Nvidia’s Newly DPU-Enabled SuperPOD Is a Multi-Tenant, Cloud-Native Supercomputer

April 12, 2021

At GTC 2021, Nvidia has announced an upgraded iteration of its DGX SuperPods, calling the new offering “the first cloud-native, multi-tenant supercomputer.” The newly announced SuperPods come just two years after the Read more…

Tune in to Watch Nvidia’s GTC21 Keynote with Jensen Huang – Recording Now Available

April 12, 2021

Join HPCwire right here on Monday, April 12, at 8:30 am PT to see the Nvidia GTC21 keynote from Nvidia’s CEO, Jensen Huang, livestreamed in its entirety. Hosted by HPCwire, you can click to join the Huang keynote on our livestream to hear Nvidia’s expected news and... Read more…

AWS Solution Channel

Volkswagen Passenger Cars Uses NICE DCV for High-Performance 3D Remote Visualization

 

Volkswagen Passenger Cars has been one of the world’s largest car manufacturers for over 70 years. The company delivers more than 6 million automobiles to global customers every year, from 50 production locations on five continents. Read more…

The US Places Seven Additional Chinese Supercomputing Entities on Blacklist

April 8, 2021

As tensions between the U.S. and China continue to simmer, the U.S. government today added seven Chinese supercomputing entities to an economic blacklist. The U.S. Entity List bars U.S. firms from supplying key technolog Read more…

Nvidia Serves Up Its First Arm Datacenter CPU ‘Grace’ During Kitchen Keynote

April 12, 2021

Today at Nvidia’s annual spring GPU technology conference, held virtually once more due to the ongoing pandemic, the company announced its first ever Arm-based CPU, called Grace in honor of the famous American programmer Grace Hopper. Read more…

Nvidia Debuts BlueField-3 – Its Next DPU with Big Plans for an Expanded Role

April 12, 2021

Nvidia today announced its next generation data processing unit (DPU) – BlueField-3 – adding more substance to its evolving concept of the DPU as a full-fle Read more…

Nvidia’s Newly DPU-Enabled SuperPOD Is a Multi-Tenant, Cloud-Native Supercomputer

April 12, 2021

At GTC 2021, Nvidia has announced an upgraded iteration of its DGX SuperPods, calling the new offering “the first cloud-native, multi-tenant supercomputer.” Read more…

Tune in to Watch Nvidia’s GTC21 Keynote with Jensen Huang – Recording Now Available

April 12, 2021

Join HPCwire right here on Monday, April 12, at 8:30 am PT to see the Nvidia GTC21 keynote from Nvidia’s CEO, Jensen Huang, livestreamed in its entirety. Hosted by HPCwire, you can click to join the Huang keynote on our livestream to hear Nvidia’s expected news and... Read more…

The US Places Seven Additional Chinese Supercomputing Entities on Blacklist

April 8, 2021

As tensions between the U.S. and China continue to simmer, the U.S. government today added seven Chinese supercomputing entities to an economic blacklist. The U Read more…

Habana’s AI Silicon Comes to San Diego Supercomputer Center

April 8, 2021

Habana Labs, an Intel-owned AI company, has partnered with server maker Supermicro to provide high-performance, high-efficiency AI computing in the form of new Read more…

Intel Partners Debut Latest Servers Based on the New Intel Gen 3 ‘Ice Lake’ Xeons

April 7, 2021

Fresh from Intel’s launch of the company’s latest third-generation Xeon Scalable “Ice Lake” processors on April 6 (Tuesday), Intel server partners Cisco, Dell EMC, HPE and Lenovo simultaneously unveiled their first server models built around the latest chips. And though arch-rival AMD may... Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

CERN Is Betting Big on Exascale

April 1, 2021

The European Organization for Nuclear Research (CERN) involves 23 countries, 15,000 researchers, billions of dollars a year, and the biggest machine in the worl Read more…

Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

HPE Launches Storage Line Loaded with IBM’s Spectrum Scale File System

April 6, 2021

HPE today launched a new family of storage solutions bundled with IBM’s Spectrum Scale Erasure Code Edition parallel file system (description below) and featu Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

Quantum Computer Start-up IonQ Plans IPO via SPAC

March 8, 2021

IonQ, a Maryland-based quantum computing start-up working with ion trap technology, plans to go public via a Special Purpose Acquisition Company (SPAC) merger a Read more…

Leading Solution Providers

Contributors

Can Deep Learning Replace Numerical Weather Prediction?

March 3, 2021

Numerical weather prediction (NWP) is a mainstay of supercomputing. Some of the first applications of the first supercomputers dealt with climate modeling, and Read more…

Livermore’s El Capitan Supercomputer to Debut HPE ‘Rabbit’ Near Node Local Storage

February 18, 2021

A near node local storage innovation called Rabbit factored heavily into Lawrence Livermore National Laboratory’s decision to select Cray’s proposal for its CORAL-2 machine, the lab’s first exascale-class supercomputer, El Capitan. Details of this new storage technology were revealed... Read more…

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

African Supercomputing Center Inaugurates ‘Toubkal,’ Most Powerful Supercomputer on the Continent

February 25, 2021

Historically, Africa hasn’t exactly been synonymous with supercomputing. There are only a handful of supercomputers on the continent, with few ranking on the Read more…

The History of Supercomputing vs. COVID-19

March 9, 2021

The COVID-19 pandemic poses a greater challenge to the high-performance computing community than any before. HPCwire's coverage of the supercomputing response t Read more…

HPE Names Justin Hotard New HPC Chief as Pete Ungaro Departs

March 2, 2021

HPE CEO Antonio Neri announced today (March 2, 2021) the appointment of Justin Hotard as general manager of HPC, mission critical solutions and labs, effective Read more…

AMD Launches Epyc ‘Milan’ with 19 SKUs for HPC, Enterprise and Hyperscale

March 15, 2021

At a virtual launch event held today (Monday), AMD revealed its third-generation Epyc “Milan” CPU lineup: a set of 19 SKUs -- including the flagship 64-core, 280-watt 7763 part --  aimed at HPC, enterprise and cloud workloads. Notably, the third-gen Epyc Milan chips achieve 19 percent... Read more…

Microsoft, HPE Bringing AI, Edge, Cloud to Earth Orbit in Preparation for Mars Missions

February 12, 2021

The International Space Station will soon get a delivery of powerful AI, edge and cloud computing tools from HPE and Microsoft Azure to expand technology experi Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire