OpenHPC Progress Report – v2.0, More Recipes, Cloud and Arm Support, Says Schulz

By John Russell

October 26, 2020

Launched in late 2015 and transitioned to a Linux Foundation Project in 2016, OpenHPC has marched quietly but steadily forward. Its goal “to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools” was always greeted with enthusiasm although there was wariness with Intel as the early driver. Since then OpenHPC has fared well by sticking to open source road (while still enjoying Intel’s support).

Earlier this month OpenHPC released version 2.0 targeting new Linux operating system distributions and including new support for cloud and Arm. SC20 would have been v2.0’s coming out party had the pandemic not converted HPC’s annual extravaganza into a digital gathering. OpenHPC is still planning to offer SC20 activities. The older 1.0 branch (now v1.39) is likely to get another minor update and then move into maintenance mode.

Karl Schulz, OpenHPC

Karl Schulz, the project lead for OpenHPC since its start and currently a research professor (Oden Institute) at UT Austin, provided HPCwire with an update of OpenHPC activities and plans. Among other things Schulz touched on growing traction in the cloud and rising demand for Arm builds; why it’s tough to tightly integrate GPU tech; effort to expand the number of tutorials offered; and thoughts on including processor-specific recipes down the road.

Presented here is a lightly-edited portion of Schulz’s conversation with HPCwire.

HPCwire: It’s been quite a while since we’ve talked. I know v2.0 was just released and I am thinking the last release of 1.39 was well before that, maybe a full year ago. Can you briefly bring us up to speed?

Karl Schulz: That’s right, the last full release would have been right before supercomputing of 2019, and then we sort of made a commitment to try to work on the 2.0 release. The 1.3x branch was targeting older distro versions. It supported RHEL 7 (Red Hat Enterprise Linux) or CentOS 7 and SLES 12 (SUSE Linux Enterprise Server). We have been basically working since then to put out a 2.0 release against newer distro versions. It did take a little while for us to get that out the door?

HPCwire: It looks like v2.0 is not backward compatible; maybe talk about the thinking there and what are some of the major changes?

Karl Schulz: It’s not intended to be backwards compatible. The primary reason for that is because the OSs themselves are not exactly intended to be upgradeable, meaning it’s pretty difficult and not really a supported path to go from RHEL 7 to RHEL 8, for example. SLES has a little more support, they say, but even they get kind of nervous anytime you want to go from a major distro version and try to upgrade it. So that’s the real reason 2.0 is not backwards compatible. We also took the opportunity to make some significant changes. The big part is 2.0 targets the new distros. We’re still sticking with CentOS (open source) which is by far our most popular of the recipes that are downloaded, but we did switch from SLES to Leap (non-commercial version of SLES).

I don’t know how closely you follow that world. SUSE has always had its enterprise edition, and open SUSE but it was not exactly 100% compatible with the enterprise distribution. They now have a version of open SUSE called Leap. So, for example, there’s a leap 15.1 which roughly maps to SLES 15 Service Pack 1, and they are in fact binary compatible. We took the opportunity to sort of switch, being an open source project, to build against open SUSE Leap 15 as opposed to SLES 15, even though you can use OpenHPC with either one.

HPCwire: What other significant changes are there in v2.0?

Karl Schulz: Well, there’s a lot of stuff happening in HPC space around network interfaces and small things on the MPI stack. We have adopted the newer CH4 interface in MPICH which is coming down the pipe. As you may know, a lot of the commercial MPI installs start from MPICH as a base. This is a newer interface coming out of Argonne (National Laboratory) that we have adopted.

At the same time that gives us the flexibility to take advantage of newer fabric transport interfaces. OpenHPC 2.0 introduces two new fabric interfaces, Libfabric and UCX. We are trying to support both as best we can; that means for MPICH builds we have versions of both. The same thing for open MPI which supports both of those transport layers. Those are pretty significant changes in 2.0. From the end-user perspective it shouldn’t matter too much, but from an administrator perspective, we’re sort of assuming that people are going to want to be using Libfabric and potentially UCX as well.

HPCwire: OpenHPC has come a long way since its start in 2015 with Intel as the driving force. The worry was Intel would exert undue influence it. Has it?

Karl Schulz: I have been involved since the beginning and we were concerned upfront with trying to make sure the project got going as a true community project. There’s been a couple of things that have really helped along the way. Getting multiple vendors who are in sort of the same space, if you will, to be part of the project has been very positive helping spur growth and adoption. We were very pleased Arm joined and we started doing builds against Arm processors and adding recipes for that. That was an important milestone for the project to show that it really intended to support multiple architectures.

Same thing with multiple distros. We’ve had multiple distro folks involved since the get go, but maintaining that and growing the number of recipes within open HPC has been important. When we started back in 2016, we had one installation recipe; it was for CentOS and it was for Slurm and used one provisioner. With 2.0, we have something like 10 recipes, which span two architectures, two distros, two provisioners, and multiple types of recipes using those provisioners whether you want stateless or stateful. I think that’s another important growth point for the project.

HPCwire: Who is the target user? At key message at the start of the project was the notion of making it easier to deploy HPC capabilities which implied adoption of HPC by less experienced users.

Karl Schulz: One of the things we’ve always been sensitive to provide building blocks for HPC and there’s always this Catch 22 between, are you are you targeting the highest-end of folks, the DOE labs and really big supercomputing centers who have a lot of expertise, or are you targeting people who are maybe in smaller shops, who are building their first cluster. We wanted to do a little bit of both, which is certainly difficult, but I think the way we’ve organized the project and the way that we’ve organized the packaging does allow people to sort of pick and choose what they’d like to use.

We’ve also been very happy to see continued growth in the academic space. You see a lot of academic institutions who are we’re using open HPC pretty much straight up or just customizing a little bit. That’s the important part [that] we didn’t want to prohibit that customization. It’s the same for OEMs. We have some OEMs who are taking OpenHPC packages, rebuilding it and, providing a version to their customers with support, which we always thought was important because that that’s a way to keep the OEMs engaged in the project and actually to help fund the project, frankly.

HPCwire: Who are examples of OEMs and universities working with OpenHPC?

Karl Schulz: Lenovo is an example. QCT is a member organization that has some of that as well. Those two to come to mind. I believe, you can you can buy a cluster from Dell and have them pre-install OpenHPC. Those are a few examples. In terms of academia, it’s a huge number of universities, and I can send you a link our cluster registry,

HPCwire: What’s OpenHPC doing with regard to growing demand for AI compute capability and the infusion of machine learning and frameworks into HPC?

Karl Schulz: We’ve seen this certainly. One thing I’ll add is we have seen the desire to not just do on-premise type of installations, but also spinning up HPC environments in the cloud and on top of that running different kinds of workloads, and machine learning is certainly one of those. That’s something in the last year we have spent sort of more time on.

OpenHPC definitely started focusing on on-premise types of installations and for use in containerization. The last time we talked, I was big on containerization and certainly still am, that hasn’t gone anywhere. But I think you mix all these things together, and you have this desire for common HPC software running in the cloud, using containers to run workloads. That’s really what we’ve seen. We’ve done some recent work, having tutorials – we’re trying to grow our tutorial efforts – and had a tutorial at the PEARC (Practice and Experience in Advanced Research Computing) conference this summer. It was focused on using OpenHPC packaging, but installing it in the cloud. We had everybody work through building up a dynamic cluster that would fire up compute nodes automatically when you submit a job to the resource manager and doing all that through AWS in that case.

We’re expanding on that will have another tutorial at supercomputing; it’s again going to walk people through how to use OpenHPC packages in the cloud, but then we will [also] do a hands-on tutorial, now that we have this environment spun up, on how to use containerization and run some machine learning workloads like TensorFlow. We’re definitely seeing more and more of that sort of use case and we’ve been trying to put together documentation and tutorial efforts to help people with at least using bits and pieces from OpenHPC.

HPCwire: Are you getting help from some of the big cloud providers as well? Are they offering OpenHPC as a way in which you could spin up a cluster at AWS using their tools?

Karl Schulz: Not yet. We are fortunate [in that] we have one of our committee members is at AWS and we have good traction getting technical expertise to help us with their tools. In fact, this was part of the tutorial at PEARC; we had help using some AWS tools to do the cluster installation. At the moment, we’re really targeting administrators who want to leverage cloud resources to do that. I could imagine in the future, perhaps it becomes a little bit more of a push button type of activity. We are making images available, which are pre-built images that people can access in the cloud to make it little easier, but they still need to walk through the process of tying cluster nodes together with a head node and a logging node and all that kind of stuff.

HPCwire: Now that 2.0 is out, will you try and move back to a quarterly release schedule?

Karl Schulz: I do think it [release schedule] will become more frequent again. Before we came out with 2.0 we realized we had to set expectations for the previous branch and the new branch. I will say I’ve been sort of shocked at how fast 2.0 has been picked up. We put out a release candidate in June, because we knew when anybody’s installing a new system, [such as] RHEL 8, you want to go with the latest possible [HPC stack]. In about three months, we saw 2.0 packages being used as much the 1.3 packages. Now it’s surpassed that. So in four months, we already have more use of this new branch. We did have some [1.3 branch] requests. We’ll probably put out one more release in the 1.3 series to fix a few things and update a few packages people have asked for. Then the 1.3 three series will go into a maintenance mode [and] really the only thing that we push out [then] is are security fixes. Seeing the quick uptake of 2.0 also helps justify that decision, but we will hopefully have another 1.3 release by the end of the year.

HPCwire: Can you provide some numbers around OpenHPC users overall? How many people are using it now and what’s the growth been?

Karl Schulz: That’s a hard question to answer. What I’ve been doing to have some metric for being able to watch growth is look at how many sites hit our repository every month. It’s just something that should be consistent or at least measurable. We’re averaging about 10,000 IPs per month hitting our repository, and folks are downloading a little over five terabytes of packages every month. Just to put it in perspective, at the end of 2016, we had maybe 1000 IPs a month hitting the site. So it’s about 10x growth.

HPCwire: You’re pleased with the traction and how OpenHPC has become accepted within the community?

Karl Schulz: I am very pleased. I’m happy it has sort of transitioned from a single company project to a true community effort, and we have a great group of folks who participate on our technical steering committee, we have a good governing board, everybody seems to be involved with it for the right reasons. Thus far, I’m gonna knock on my wood table here, we haven’t encountered politics.

HPCwire: Does Intel still play a leadership role?

Karl Schulz: They do have a leadership role. The governing board member from Intel at the moment is serving as our chair and they’ve continued to be active. Intel has participants on the technical steering committee from their open source organization within Intel.

HPCwire: One reason I ask is we’re following Intel’s efforts with OneAPI watching to see if blooms into a true open source activity.

Karl Schulz: We’ve been very appreciative of their support and, as I said, it has been consistent throughout the project. On the oneAPI stuff, it’s hard to say how that will go. Obviously we understand the importance of vendor compilers, in particular, with the HPC market, which is why even though OpenHPC is focused on open source we have some compatibility with the vendor compilers. From the beginning we’ve had that with the Intel compiler, the Parallel Studio suite, where OpenHPC provides a compatibility shim layer where people can go acquire the Intel compiler separately and then enable third party builds from OpenHPC that link against that compiler.

That was an important design decision for us because if we didn’t do that I think OpenHPC would have always been perceived as just sort of a nice project but only providing builds with GCC, for example. We really want to use the vendor compiler for whatever architecture we’re building on. It was important for us to design that in from the beginning. Now, the other thing that’s important about to 2.0 is we’re starting to introduce that same type of capability for the Arm Allinea compiler. I would say over the last year we’ve seen a steady growth in downloads for all the Arm builds we’ve done. Certainly, Intel has the lion’s share, but we’ve seen steady growth in Arm interest from OpenHPC’s perspective.

HPCwire: What about the whole sort of emergence of heterogeneous architecture and growing use of GPUs or some sort of accelerator? How does how does that, if at all, figure into OpenHPC plans?

Karl Schulz: That is a tough one for us at the moment. GPUs are obviously very popular and are continuing to grow in popularity and that is one place that is difficult to sort of include in the same functionality. You know it’s not terribly hard to do what we’ve done the vendor compilers, because that’s really sort of an add-on. You can do that after the system is instantiated. But for something like GPU drivers, that’s a little more complicated because you really need to have those at a time when you are provisioning a system. Because that’s not open source, it does make it difficult for us to be able to integrate that.

We have seen other people put stuff on top of OpenHPC to do that, and certainly, many users are running OpenHPC with GPU systems; what they’re doing is grabbing the drivers from Nvidia and adding them themselves. We will always want to support that type of operation, but we don’t have a handle for how to sort of integrate [GPU drivers] more directly at the moment due to the licensing.

HPCwire: Looking out six months to a year, what are the plans and priorities for OpenHPC?

Karl Schulz: Getting 2.0 has taken up most of our time and thinking. One of the things we did was try to make it a little bit easier for people to customize their builds. OpenHPC is focused on providing binary builds, so people can get up and running quickly and use them in a container and all that stuff. But you can imagine situations where maybe an OEM wanted to take those packages and say “I want to maximize every last bit of performance for my particular architecture.” That’s a situation that’s a little bit different than OpenHPC. We don’t know in advance specific processor details, which means we have to be pretty generic in our builds. We try to make it now easier, where people can take the builds from OpenHPC and easily add more optimization to them, and also make it so that they can co-install their customized packages against the OpenHPC packages. That was a request from the community we did get into 2.0.

Actually, that was a request from both sides (vendor and user). The first time that particular discussion came up was through interaction with DOE which was standing up an Arm system; they were starting from the OpenHPC packages and wanted to test what could they get if they turned on all the bells and whistles from the compiler. It’s definitely something we wanted to support. So we put a little effort into that.

One thing you could imagine perhaps farther down the road as OpenHPC continues to grow and gain traction is we’ll enough resources to provide a subset of packages that have optimized builds for a particular architecture. We know it doesn’t make sense, for example, for the resource management to turn on all the bells and whistles from your compiler for that, but [it might make sense] for something like BLAS or some of the other linear algebra libraries. Our thinking is farther down the road we might have a generic OpenHPC repository, and then, perhaps, processor-specific repositories that have a very small number of packages that are pre-built. Our guess is it’s probably something like 5%-to-10% of the packages that are really mission critical that are used in a lot of scientific applications that would benefit from extra levels of optimization.

HPCwire: Karl, thanks very much for your time.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Exascale Climate Model Used to Examine the Climate Consequences of Nuclear War

December 3, 2020

Often, we can tend to examine global crises in isolation – however, they can have surprising feedback effects on one another. The COVID-19 pandemic, for instance, has led to a small global decrease in emissions, and cl Read more…

By Oliver Peckham

Natural Compounds, Identified via HPC, Move Forward in COVID-19 Therapeutic Testing

December 2, 2020

Nearly six months ago, HPCwire reported on efforts by researchers at the University of Alabama in Huntsville (UAH) to identify natural compounds that could be useful in the fight against COVID-19. At the time, the resear Read more…

By Oliver Peckham

AWS Reveals Gaudi-based EC2 Instances Coming in 2021

December 2, 2020

Amazon Web Services has a broad swath of new and bolstered services coming for customers in 2021, from the implementation of powerful Habana Gaudi AI hardware in Amazon EC2 instances for machine learning workloads to cus Read more…

By Todd R. Weiss

AWS Goes Supersonic with Boom

December 2, 2020

Supersonic flights are a Holy Grail of commercial aviation, promising halvings of international flight times. While commercial supersonic flights have operated in the past, high costs for both airlines and passengers led Read more…

By Oliver Peckham

VAST Data Makes the Case for All-Flash Storage; Do you Agree?

December 1, 2020

Founded in 2016, all-flash storage startup VAST Data says it is on the verge of upending storage practices in line with its original mission which was and remains “to kill the hard drive,” says Jeff Denworth, one of Read more…

By John Russell

AWS Solution Channel

Add storage to your high-performance file system with a single click and meet your scalability needs

Many organizations have on-premises, high-performance workloads burdened with complex management and scalability challenges. Scaling data-intensive workloads on-premises typically involves purchasing more hardware, which can slow time to production and require high upfront investment. Read more…

Intel® HPC + AI Pavilion

Intel Keynote Address

Intel is the foundation of HPC – from the workstation to the cloud to the backbone of the Top500. At SC20, Intel’s Trish Damkroger, VP and GM of high performance computing, addresses the audience to show how Intel and its partners are building the future of HPC today, through hardware and software technologies that accelerate the broad deployment of advanced HPC systems. Read more…

HPC Career Notes: December 2020 Edition

December 1, 2020

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it’s a promotion, new company hire, or even an accolade, we’ Read more…

By Mariana Iriarte

AWS Reveals Gaudi-based EC2 Instances Coming in 2021

December 2, 2020

Amazon Web Services has a broad swath of new and bolstered services coming for customers in 2021, from the implementation of powerful Habana Gaudi AI hardware i Read more…

By Todd R. Weiss

AWS Goes Supersonic with Boom

December 2, 2020

Supersonic flights are a Holy Grail of commercial aviation, promising halvings of international flight times. While commercial supersonic flights have operated Read more…

By Oliver Peckham

VAST Data Makes the Case for All-Flash Storage; Do you Agree?

December 1, 2020

Founded in 2016, all-flash storage startup VAST Data says it is on the verge of upending storage practices in line with its original mission which was and remai Read more…

By John Russell

HPC Career Notes: December 2020 Edition

December 1, 2020

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it Read more…

By Mariana Iriarte

The Present and Future of AI: A Discussion with HPC Visionary Dr. Eng Lim Goh

November 27, 2020

As HPE’s chief technology officer for artificial intelligence, Dr. Eng Lim Goh devotes much of his time talking and consulting with enterprise customers about Read more…

By Todd R. Weiss

SC20 Panel – OK, You Hate Storage Tiering. What’s Next Then?

November 25, 2020

Tiering in HPC storage has a bad rep. No one likes it. It complicates things and slows I/O. At least one storage technology newcomer – VAST Data – advocates dumping the whole idea. One large-scale user, NERSC storage architect Glenn Lockwood sort of agrees. The challenge, of course, is that tiering... Read more…

By John Russell

Exscalate4CoV Runs 70 Billion-Molecule Coronavirus Simulation

November 25, 2020

The winds of the pandemic are changing – for better and for worse. Three viable vaccines now teeter on the brink of regulatory approval, which will pave the way for broad distribution by April or May. But until then, COVID-19 cases are skyrocketing across the U.S. and Europe... Read more…

By Oliver Peckham

Azure Scaled to Record 86,400 Cores for Molecular Dynamics

November 20, 2020

A new record for HPC scaling on the public cloud has been achieved on Microsoft Azure. Led by Dr. Jer-Ming Chia, the cloud provider partnered with the Beckman I Read more…

By Oliver Peckham

Azure Scaled to Record 86,400 Cores for Molecular Dynamics

November 20, 2020

A new record for HPC scaling on the public cloud has been achieved on Microsoft Azure. Led by Dr. Jer-Ming Chia, the cloud provider partnered with the Beckman I Read more…

By Oliver Peckham

Supercomputer-Powered Research Uncovers Signs of ‘Bradykinin Storm’ That May Explain COVID-19 Symptoms

July 28, 2020

Doctors and medical researchers have struggled to pinpoint – let alone explain – the deluge of symptoms induced by COVID-19 infections in patients, and what Read more…

By Oliver Peckham

Google Hires Longtime Intel Exec Bill Magro to Lead HPC Strategy

September 18, 2020

In a sign of the times, another prominent HPCer has made a move to a hyperscaler. Longtime Intel executive Bill Magro joined Google as chief technologist for hi Read more…

By Tiffany Trader

Nvidia Said to Be Close on Arm Deal

August 3, 2020

GPU leader Nvidia Corp. is in talks to buy U.K. chip designer Arm from parent company Softbank, according to several reports over the weekend. If consummated Read more…

By George Leopold

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Is the Nvidia A100 GPU Performance Worth a Hardware Upgrade?

October 16, 2020

Over the last decade, accelerators have seen an increasing rate of adoption in high-performance computing (HPC) platforms, and in the June 2020 Top500 list, eig Read more…

By Hartwig Anzt, Ahmad Abdelfattah and Jack Dongarra

NICS Unleashes ‘Kraken’ Supercomputer

April 4, 2008

A Cray XT4 supercomputer, dubbed Kraken, is scheduled to come online in mid-summer at the National Institute for Computational Sciences (NICS). The soon-to-be petascale system, and the resulting NICS organization, are the result of an NSF Track II award of $65 million to the University of Tennessee and its partners to provide next-generation supercomputing for the nation's science community. Read more…

Aurora’s Troubles Move Frontier into Pole Exascale Position

October 1, 2020

Intel’s 7nm node delay has raised questions about the status of the Aurora supercomputer that was scheduled to be stood up at Argonne National Laboratory next year. Aurora was in the running to be the United States’ first exascale supercomputer although it was on a contemporaneous timeline with... Read more…

By Tiffany Trader

Leading Solution Providers

Contributors

European Commission Declares €8 Billion Investment in Supercomputing

September 18, 2020

Just under two years ago, the European Commission formalized the EuroHPC Joint Undertaking (JU): a concerted HPC effort (comprising 32 participating states at c Read more…

By Oliver Peckham

HPE Keeps Cray Brand Promise, Reveals HPE Cray Supercomputing Line

August 4, 2020

The HPC community, ever-affectionate toward Cray and its eponymous founder, can breathe a (virtual) sigh of relief. The Cray brand will live on, encompassing th Read more…

By Tiffany Trader

Top500: Fugaku Keeps Crown, Nvidia’s Selene Climbs to #5

November 16, 2020

With the publication of the 56th Top500 list today from SC20's virtual proceedings, Japan's Fugaku supercomputer – now fully deployed – notches another win, Read more…

By Tiffany Trader

Texas A&M Announces Flagship ‘Grace’ Supercomputer

November 9, 2020

Texas A&M University has announced its next flagship system: Grace. The new supercomputer, named for legendary programming pioneer Grace Hopper, is replacing the Ada system (itself named for mathematician Ada Lovelace) as the primary workhorse for Texas A&M’s High Performance Research Computing (HPRC). Read more…

By Oliver Peckham

At Oak Ridge, ‘End of Life’ Sometimes Isn’t

October 31, 2020

Sometimes, the old dog actually does go live on a farm. HPC systems are often cursed with short lifespans, as they are continually supplanted by the latest and Read more…

By Oliver Peckham

Microsoft Azure Adds A100 GPU Instances for ‘Supercomputer-Class AI’ in the Cloud

August 19, 2020

Microsoft Azure continues to infuse its cloud platform with HPC- and AI-directed technologies. Today the cloud services purveyor announced a new virtual machine Read more…

By Tiffany Trader

Nvidia and EuroHPC Team for Four Supercomputers, Including Massive ‘Leonardo’ System

October 15, 2020

The EuroHPC Joint Undertaking (JU) serves as Europe’s concerted supercomputing play, currently comprising 32 member states and billions of euros in funding. I Read more…

By Oliver Peckham

Nvidia-Arm Deal a Boon for RISC-V?

October 26, 2020

The $40 billion blockbuster acquisition deal that will bring chipmaker Arm into the Nvidia corporate family could provide a boost for the competing RISC-V architecture. As regulators in the U.S., China and the European Union begin scrutinizing the impact of the blockbuster deal on semiconductor industry competition and innovation, the deal has at the very least... Read more…

By George Leopold

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This