Summer Reading: “High-Performance Computing Is at an Inflection Point”

By John Russell

July 21, 2021

At last month’s 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART), a group of researchers led by Martin Schulz of the Leibniz Supercomputing Center (Munich) presented a “position paper” in which they argue HPC architectural landscape of High-Performance Computing (HPC) is undergoing a seismic shift.

“Future architectures,” they contend, “will have to provide a range of specialized architectures enabling a broad range of workloads, all under a strict energy cap. These architectures will have to be integrated within each node—as already seen in mobile and embedded systems—to avoid data movements across nodes or even worse, across system modules when switching between accelerator types.”

That HPC is transforming is hardly in dispute, and the authors – Martin Schulz, Dieter Kranzlmüller, Laura Brandon Schulz, Carsten Trinitis, Josef Weidendorfer – acknowledge many familiar pressures (end of Dennard scaling, declining Moore’s Law, etc.) and propose four guiding principles for the future of HPC architecture:

  • Energy consumption is no longer merely a cost factor but also a hard feasibility constraint for facilities.
  • Specialization is key to further increase performance despite stagnating frequencies and within limited energy bands.
  • A significant portion of the energy budget is spent moving data and future architectures must be designed to minimize such data movements.
  • Large-scale computing centers must provide optimal computing resources for increasingly differentiated workloads.

Their paper, On the Inevitability of Integrated HPC Systems and How they will Change HPC System Operations, digs into each of the four areas. They note that integrated heterogeneous systems (interesting turn of phrase) “are a promising alternative, which integrate multiple specialized architectures on a single node while keeping the overall system architecture a homogeneous collection of mostly identical nodes. This allows applications to switch quickly between accelerator modules at a fine-grained scale, while minimizing the energy cost and performance overhead, enabling truly heterogeneous applications.”

A core ingredient in achieving this kind of integrated heterogeneity is the use of chiplets.

“Simple integrated systems with one or two specialized processing elements (e.g., with GPUs or with GPUs and tensor units) are already used in many systems. Research projects, like ExaNoDe, are currently investigating integration with promising results. Also, several commercial chip manufacturers are rumored to be headed in this direction,” write the researchers. “Currently and most prominently, the European Processor Initiative EPI) is looking at a customizable chip design combining ARM cores with different accelerator modules (Figure 1). Additionally, several groups are experimenting with clusters that GPUs and FPGAs within nodes, either for alternative workloads directed at the appropriate architecture or for solving large parallel problems with algorithms mapped to both architectures. Future systems are likely to push this even further, aiming at a closer integration and a larger diversity of architectures, leading to systems with more heterogeneity and flexibility in their usage.”

This integrated approach is not without challenges, agree the researchers: “[W]hile it is easy to run a single application across the entire system— since the same type of node is available everywhere—a single application is likely not going use all specialized compute elements at the same time, leading to idle processing elements. Therefore, the choice of the best-suited accelerator mix is an important design criterion during procurement, which can only be achieved via co-design between the computer center and its users on one side and the system vendor on the other. Further, at runtime, it will be important to dynamically schedule and power the respective compute resources. Using power overprovisioning, i.e., planning for a TDP and maximal node power that is reached with a subset of dynamically chosen accelerated processing elements, this can be easily achieved, but requires novel software approaches in system and resource management.”

They note the need for programming environments and abstractions to exploit the different on-node accelerators. “For widespread use, such support must be readily available and, in the best case, in a unified manner in one programming environment. OpenMP, with its architecture-agnostic target concept, is a good match for this. Domain-specific frameworks, as they are, e.g., common in AI, ML or HPDA (e.g., Tensorflow, Pytorch or Spark), will further help to hide this heterogeneity and help make integrated platforms accessible to a wide range of users.”

To cope with intra-node device diversity and inevitable idle periods among various devices, the researchers propose developing a “new level of adaptivity coupled with dynamic scheduling of compute and energy resources to exploit an integrated system fully.” The core of this adaptive management approach, the suggest, is a feedback loop, as shown in figure 2 below.

This adaptive approach is being investigated as part of EU research project REGALE, launched this spring. REGALE uses measured information across all system layers and uses that information to adaptivity drive the entire stack:

  • Application Level. Changing application resources in terms of number and type of processing elements dynamically.
  • Node Level. Changing node settings, e.g. power/energy consumption via techniques like DVFS or power capping as well as node level partitioning of memory, caches, etc.
  • System Level. Adjusting system operation based on work- loads or external inputs, e.g., energy prices or supply levels.

The position paper is a quick read and best done directly. While the level and type of integration may vary, in their conclusion the researchers, “argue that such integration has to be on-node or even on-chip in order to: minimize and shorten expensive data transfers; enable fine-grained shifting between different processing elements running within a node; and to allow applications to utilize the entire machine for scale-out experiments rather than only individual modules or sub-clusters of a particular technology.”

Only such an approach, they contend, will enable design and deployment of large-scale compute resources capable of providing a diversified portfolio of computing, at scale and at optimal energy efficiency. Time will tell.

Link to paper: https://dl.acm.org/doi/10.1145/3468044.3468046

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, code-named Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from its predecessors, including the red-hot H100 and A100 GPUs. Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. While Nvidia may not spring to mind when thinking of the quant Read more…

2024 Winter Classic: Meet the HPE Mentors

March 18, 2024

The latest installment of the 2024 Winter Classic Studio Update Show features our interview with the HPE mentor team who introduced our student teams to the joys (and potential sorrows) of the HPL (LINPACK) and accompany Read more…

Houston We Have a Solution: Addressing the HPC and Tech Talent Gap

March 15, 2024

Generations of Houstonian teachers, counselors, and parents have either worked in the aerospace industry or know people who do - the prospect of entering the field was normalized for boys in 1969 when the Apollo 11 missi Read more…

Apple Buys DarwinAI Deepening its AI Push According to Report

March 14, 2024

Apple has purchased Canadian AI startup DarwinAI according to a Bloomberg report today. Apparently the deal was done early this year but still hasn’t been publicly announced according to the report. Apple is preparing Read more…

Survey of Rapid Training Methods for Neural Networks

March 14, 2024

Artificial neural networks are computing systems with interconnected layers that process and learn from data. During training, neural networks utilize optimization algorithms to iteratively refine their parameters until Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, code-named Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Houston We Have a Solution: Addressing the HPC and Tech Talent Gap

March 15, 2024

Generations of Houstonian teachers, counselors, and parents have either worked in the aerospace industry or know people who do - the prospect of entering the fi Read more…

Survey of Rapid Training Methods for Neural Networks

March 14, 2024

Artificial neural networks are computing systems with interconnected layers that process and learn from data. During training, neural networks utilize optimizat Read more…

PASQAL Issues Roadmap to 10,000 Qubits in 2026 and Fault Tolerance in 2028

March 13, 2024

Paris-based PASQAL, a developer of neutral atom-based quantum computers, yesterday issued a roadmap for delivering systems with 10,000 physical qubits in 2026 a Read more…

India Is an AI Powerhouse Waiting to Happen, but Challenges Await

March 12, 2024

The Indian government is pushing full speed ahead to make the country an attractive technology base, especially in the hot fields of AI and semiconductors, but Read more…

Charles Tahan Exits National Quantum Coordination Office

March 12, 2024

(March 1, 2024) My first official day at the White House Office of Science and Technology Policy (OSTP) was June 15, 2020, during the depths of the COVID-19 loc Read more…

AI Bias In the Spotlight On International Women’s Day

March 11, 2024

What impact does AI bias have on women and girls? What can people do to increase female participation in the AI field? These are some of the questions the tech Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Analyst Panel Says Take the Quantum Computing Plunge Now…

November 27, 2023

Should you start exploring quantum computing? Yes, said a panel of analysts convened at Tabor Communications HPC and AI on Wall Street conference earlier this y Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Training of 1-Trillion Parameter Scientific AI Begins

November 13, 2023

A US national lab has started training a massive AI brain that could ultimately become the must-have computing resource for scientific researchers. Argonne N Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire