AI-Accelerated Technology Investment Guidelines for HPC

By Rob Farber

October 25, 2023

When planning an AI or HPC investment, applications are where the rubber meets the road and ultimately determine the benefits of any hardware investment. In addition, anyone concerned about computational performance must now consider the ubiquity of hardware accelerators and the necessity of using them to realize high application performance.

Given the challenges of this heterogeneous environment, there are certain guidelines (as summarized in Figure 1) to consider for any AI and HPC technology investment to stay on top of the application performance curve. The basic idea is to “follow the leaders in your workload domain”. These are the technology leaders like DOE, NOAA, the Exascale Computing Project (ECP), Ansys, and others, who have provided both funding and resources to create or modify workflows that can run reliably and portably across many heterogenous, multivendor, multiarchitecture computing devices.

First, some background to set the stage for more specific recommendations:

  • We are now in in the day of reckoning foreseen by Gordon Moore (of Moore’s law) when we will need to build larger systems out of smaller functions, combining heterogenous and customized solutions.[1] Experts now believe we are entering the 5th epoch of distributed computing where the heterogenous design of modern systems have been driven by numerous technology advances, many of which are based on building blocks that accelerate AI-based workloads. Both these AI building blocks and the AI technology they support are now being broadly used by the HPC community.
  • There are differing software approaches to address the exponentially hard problem of application support for ubiquitous multiarchitecture and multivendor accelerator deployments in datacenters and the cloud. Competition is our performance friend given the cornucopia of vendor-specific accelerators that are now available. Entire families of solutions from Intel, AMD, NVIDIA, Xilinx, Habana, Google and others means the market has outgrown the piecemeal optimization for individual accelerators and single-vendor software ecosystems. The breadth of deployments and diversity of workloads is simply too big. No single company— however large — can meet all customer needs nor can bespoke human customizations. Instead, community development efforts ecosystem and the extensive, many-year effort of the DOE funded Exascale Computing Project (ECP) now offer practical solutions for running on current and future accelerator devices via standards-based libraries and languages.

Follow the Leaders In Your Workload Domain

We all are confronting the same challenges in transitioning to the 5th epoch of accelerated computing. Many thought leaders have reflected on guidelines on how to capitalize on these hardware solutions.

The recent NASA and the DOE Exascale Computing Project Virtual Workshop provided an excellent example of how leaders at two premier HPC organizations are addressing the needs of their userbase. The participants discussed issues surrounding the current state of computing and how to capitalize on both existing and future hardware advances. A brief synopsis by these and other thought leaders is reflected in the following guidelines:

  • Look to the established technology leaders (NNSA, NASA, ECP, Argonne, Google, etc.). These organizations can (and must) address computing at scale in a portable, sustainable fashion. Individual projects simply don’t have their resources or comparable access to the latest hardware.
  • Address the mindsets that frequently present hurdles to change: “not invented here”, “we must control it”, “what if they change or drop support?” and other concerns.
  • Look beyond the technology to questions that occur at the organizational level regarding training, documentation, deployment, and more. The barrier to adoption must be low enough that the software technology can be adopted without undue effort or cost, including source-to-source translation tools. “Organizations  where data can only flow into the organization (e.g., financial institutions, DoD, national security, etcetera) can be an excellent exemplar. Such organizations cannot or are not willing to seek external assistance. They can provide a good indicator of the ease of adoption via their success or failure in adopting a public solution.

Focus on General-Purpose Ecosystems with Demonstrated Multivendor, Multiarchitecture Support

The degree of generality in the framework depends upon your workload domain. Most workload domains fit into one of three categories: general-purpose, domain-specific, and application-specific. Most HPC users require a general-purpose framework that gives them the ability to address evolving, and generally extreme user needs.

Various programming models and runtimes are also working to offer broad multiarchitecture HPC platforms support. Kokkos is one example. More focused application and domain specific solutions are also available. Data scientists, for example, generally only look to PyTorch and TensorFlow for broad accelerator support for their AI workloads. Commercial users also look to industry leaders and application providers such as Ansys to realize broad hardware support.[2] [3]

The two most general-purpose solutions that provide a production quality framework at this time appear to be the oneAPI ecosystem and the ECP extreme scale software stack (E4S) available via the Spack package manager.

The ECP and E4S projects were created to benefit HPC and AI users running on systems from laptops to exascale supercomputers along with the ability to use the latest accelerator hardware at scale, and even in the cloud.  This effort has been a highly visible, and certainly impactful project on the US and global HPC community. The project has provided many “lessons learned”.[4]

Mike Heroux (Sandia National Laboratories director of software technology) explained this further in his presentation “100X: Leveraging the Future Potential of US Exascale Computing Project Investments”. Briefly, Heroux noted that the ECP:

  • Spent 7 years building an accelerated, cloud-ready software ecosystem.
  • Positioned users so they can utilize accelerators from multiple vendors.
  • Emphasized software quality: testing, documentation, design, and more.
  • Prioritized community engagement: Webinars, BOFs, tutorials, and more.
  • Established that DOE portability layers are the credible way to
    • Build codes that are sustainable across multiple GPUs and
    • Avoid vendor lock-in,
    • Avoid growing divergence and hand tuning in individual code bases.
  • Provided software that can lower costs and increase performance for accelerated platforms.
  • Enabled proof-points for many classes of applications and capabilities that others can use and emulate for commercial, AI, and HPC applications.

For those considering technology investments, the ECP effort provides a unique source of vendor-neutral, third-party performance evaluations that:

  • Includes expert evaluation of accelerator performance on the current generation of US supercomputers.
  • Evaluates the performance of oneAPI on individual nodes and accelerators as well as viability (e.g., ability to pass continuous integration tests) and accuracy.
  • Provides a third-party evaluation of oneAPI against native, single-vendor software portability and performance when stressed in an extreme-scale, heterogenous, multiarchitecture, distributed computing environment.

Easy Deployment is a Must

Ease of deployment tops the list of must have capabilities because if you can’t deploy it, you cannot use it. Easy deployment also ensures easy updates, so a project does not get mired in “stale software”.

The E4S software stack makes the extensive ECP software investment available for easy deployment in both source and binary only packages. These software packages encapsulate seven years of work by expert teams at numerous US national labs. This software stack is accelerator friendly, performance portable, and verified to run on many systems. (Please see the E4S dashboard for a current snapshot). The ECP software has been made available to the worldwide scientific community using open source software with permissive licensing.

Work Together

None of these concepts are new or unique to any one individual or organization. We are all confronting common challenges that are addressed by community ecosystems, driven by a common need to come together to run our applications better, faster, and cheaper.

Todd Gamblin, (Distinguished member of technical staff in the Livermore Computing division at Lawrence Livermore National Laboratory and creator of Spack) summarized this:  “We have to rely on, build, and run code developed by others. Otherwise, we have to spend time reimplementing. Ensuring that our software stacks work correctly is a difficult task. OS updates can consume hundreds of person hours to rebuild software.”

The dependency graph for the LLNL ARES project, shown in Figure 2 (Source), exemplifies how the leaders in the field have already addressed the need to work together to exploit all the accelerated hardware performance of current and future hardware systems — both portability and at scale

(Editors note: To understand Figure 2, focus on the types and large mix of packages; LLNL Internal, LLNL Open Source, External Open Source, not the details)

Figure 2: Package dependency graph for the LLNL multi-block structured ALE-AMR (ARES) multi-physics code: 31 are internal proprietary packages, 13 are opensource packages developed at LLNL, and together these rely on 72 external open-source software packages

Ensure That the Overall Workflow Can Capitalize on All Available Parallelism

Driven by expanded hardware performance and corresponding increase in data volume and workload complexity has proven challenging to existing workflow systems. The ECP–funded ExaWorks project demonstrates that it is possible for scientists to create award winning workflows by giving them access to hardened and tested workflow components. Projects use the ExaWorks software development kit (SDK) to implement big data workflows that can utilize multiple levels of parallelism in the datacenter.

Leverage the Cloud to Evaluate and Run on the Latest Hardware

Finding the right technology for a given software investment in the current AI and HPC marketplace is challenging. The key to staying on the leading edge of this innovation tsunami is twofold: (1) Stay agile by avoiding vendor and software lock-in, and (2) always consider the overall workflow. For many AI and HPC workloads, memory bandwidth and capacity are the gating metric — not peak performance.[5]

Of course, the software framework must be able to be deployed in a cloud computing environment. Both the oneAPI and the ECP software can be deployed in the cloud.

This opens the door to testing on the latest hardware. Many Internet Service Platforms such as AWS and Google also compete by providing the latest hardware from Intel, AMD, and NVIDIA which makes such ISPs an excellent test platform for evaluating new hardware. Further, cloud computing is becoming an increasingly viable AI and HPC platform for production computing.

There is No Substitute for Actual Performance Data

Look to the ECP and other third parties for performance data. Following are four possible sources, including opportunities for hands-on evaluations:

  1. The cloud is a generally available platform where everyone can obtain actual performance data on the latest hardware.
    • For example, many ISPs now offer access to the newest Intel 4th Generation accelerated processors. These new processors are causing many to rethink their GPU processing requirements and can be the desired platform for many multiphysics and AI workloads [6] [7] [8]. Along with internal accelerators, these are the first CPUs to support high-bandwidth memory (HBM). Benchmarks show that these CPUs can deliver GPU levels of performance on some traditionally “GPU-only” AI workloads. They have also been shown to deliver across the board 2× to 3× faster HPC performance compared to previous generation processors.[9]
    • E4S containers now include Intel compilers and MPI libraries along with various NVIDIA and AMD tools. This gives users the ability to explore oneAPI, the ECP software stack, and other vendor offerings using these containers.
  2. HPC groups can contact Sameer Shende (Research professor and director of the Performance Research Laboratory at the Oregon Advanced Computing Institute for Science and Society) to gain access to the Frank cluster. This cluster is used for E4S verification and can provide access to recent hardware not covered under NDA.
  3. For more exotic hardware, the Argonne Leadership Computing Facility (ALCF) AI Testbed provides an infrastructure for testing the next-generation of AI-accelerator machines for scientific research. Users can request time or review expert about the performance of these devices which include dedicated deep learning hardware such as the Habana Gaudi.

Summary

Look to the leaders in your problem domain, they are the ones who are making the investments that will demonstrate the efficacy and portability of the new accelerators inside a supportable software framework suited to your needs. Whenever possible, find third-party sources of information about the hardware devices that might be worthy of investment and, if possible, run your own tests on the hardware. At this time, the ECP software stack and oneAPI software ecosystem are the most general, readily available, and extensively evaluated frontrunner approaches for the HPC and AI communities.

References

[1] https://www.theregister.com/2022/09/28/intel_chiplets_advanced_packaging/

[2] https://insidehpc.com/2022/11/recent-results-show-hbm-can-make-cpus-the-desired-platform-for-ai-and-hpc/

[3] https://medium.com/@rmfarber/balancing-high-bandwidth-memory-and-faster-time-to-solution-for-manufacturing-bd2b4ff7f74e

[4] https://www.exascaleproject.org/reports/

[5] https://medium.com/@rmfarber/balancing-high-bandwidth-memory-and-faster-time-to-solution-for-manufacturing-bd2b4ff7f74e

[6] https://community.intel.com/t5/Blogs/Products-and-Solutions/HPC/High-Bandwidth-Memory-Can-Make-CPUs-the-Desired-Platform-for-AI/post/1434192

[7] https://medium.com/@rmfarber/balancing-high-bandwidth-memory-and-faster-time-to-solution-for-manufacturing-bd2b4ff7f74e

[8] https://www.datasciencecentral.com/internal-cpu-accelerators-and-hbm-enable-faster-and-smarter-hpc-and-ai-applications/

[9] https://community.intel.com/t5/Blogs/Products-and-Solutions/HPC/High-Bandwidth-Memory-Can-Make-CPUs-the-Desired-Platform-for-AI/post/1434192

 


Rob Farber is a global technology consultant and author with an extensive background in HPC and machine learning technology.

This article was produced as part of Intel’s editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC and AI communities through advanced technology. The publisher of the content has final editing rights and determines what articles are published.

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Natcast/NSTC Issues Roadmap to Implement CHIPS and Science Act

May 29, 2024

Yesterday, CHIPS for America and Natcast, the operator of the National Semiconductor Technology Center (NSTC), released a roadmap of early steps for implementing portions of the ambitious $5 billion program. Natcast is t Read more…

Scientists Use GenAI to Uncover New Insights in Materials Science

May 29, 2024

With the help of generative AI, researchers from MIT and the University of Basel in Switzerland have developed a new machine-learning framework that can help uncover new insights about materials science. The findings of Read more…

Microsoft’s ARM-based CPU Cobalt will Support Windows 11 in the Cloud

May 29, 2024

Microsoft's ARM-based CPU, called Cobalt, is now available in the cloud for public consumption. Cobalt is Microsoft's first homegrown CPU, which was first announced six months ago. The cloud-based Cobalt VMs will support Read more…

2024 Winter Classic Finale! Gala Awards Ceremony

May 28, 2024

We wrapped up the competition with our traditional Gala Awards Ceremony. This was an exciting show, given that only 40 points or so separated first place from fifth place after the Google GROMACS Challenge and heading in Read more…

IBM Makes a Push Towards Open-Source Services, Announces New watsonx Updates

May 28, 2024

Today, IBM declared that it is releasing a number of noteworthy changes to its watsonx platform, with the goal of increasing the openness, affordability, and flexibility of the platform’s AI capabilities. Announced Read more…

ISC 2024 Takeaways: Love for Top500, Extending HPC Systems, and Media Bashing

May 23, 2024

The ISC High Performance show is typically about time-to-science, but breakout sessions also focused on Europe's tech sovereignty, server infrastructure, storage, throughput, and new computing technologies. This round Read more…

Scientists Use GenAI to Uncover New Insights in Materials Science

May 29, 2024

With the help of generative AI, researchers from MIT and the University of Basel in Switzerland have developed a new machine-learning framework that can help un Read more…

watsonx

IBM Makes a Push Towards Open-Source Services, Announces New watsonx Updates

May 28, 2024

Today, IBM declared that it is releasing a number of noteworthy changes to its watsonx platform, with the goal of increasing the openness, affordability, and fl Read more…

ISC 2024 Takeaways: Love for Top500, Extending HPC Systems, and Media Bashing

May 23, 2024

The ISC High Performance show is typically about time-to-science, but breakout sessions also focused on Europe's tech sovereignty, server infrastructure, storag Read more…

ISC 2024 — A Few Quantum Gems and Slides from a Packed QC Agenda

May 22, 2024

If you were looking for quantum computing content, ISC 2024 was a good place to be last week — there were around 20 quantum computing related sessions. QC eve Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Atos goes under, HPE will be the only major option for building large-scale systems. Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Europe’s Race towards Quantum-HPC Integration and Quantum Advantage

May 16, 2024

What an interesting panel, Quantum Advantage — Where are We and What is Needed? While the panelists looked slightly weary — their’s was, after all, one of Read more…

The Future of AI in Science

May 15, 2024

AI is one of the most transformative and valuable scientific tools ever developed. By harnessing vast amounts of data and computational power, AI systems can un Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Atos goes under, HPE will be the only major option for building large-scale systems. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026  

August 8, 2023

Intel is planning to onboard a new version of the Falcon Shores chip in 2026, which is code-named Falcon Shores 2. The new product was announced by CEO Pat Gel Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

How the Chip Industry is Helping a Battery Company

May 8, 2024

Chip companies, once seen as engineering pure plays, are now at the center of geopolitical intrigue. Chip manufacturing firms, especially TSMC and Intel, have b Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire