What’s New with Chapel? Nine Questions for the Development Team

By Doug Eadline

September 4, 2024

HPC news headlines often highlight the latest hardware speeds and feeds. While advances on the hardware front are important, improving the ability to write software for advanced systems is equally important. Indeed, HPC software has always been challenging to create and manage.

Traditionally, writing HPC software is accomplished with libraries like MPI, OpenMP, or CUDA in conjunction with C, C++, or Fortran. The choice of library is often dictated by the underlying hardware and often limits portability. While this low-level approach has pushed the HPC community to high performance levels, it has also served to inhibit new non-HPC users from taking advantage of advanced hardware.

The relatively young Chapel language offers a high-level approach to HPC programming that includes both hardware independence and high performance. Version 2.1 of Chapel was recently released (June 2024) to the community. HPCwire reached out to the Chapel development team with nine questions about the current capabilities of the Chapel HPC language.

1. What is your “elevator speech” about why someone should be considering Chapel for HPC applications?

Chapel supports writing applications that can target the distributed multicore processors and GPUs of HPC systems using a single, consistent set of features that express parallelism and locality. This feature sharply contrasts the status quo, in which each level of hardware parallelism tends to come with its own programming dialect, often involving changes/extensions to existing languages, libraries, or vendor-specific approaches. Chapel benefits from HPC-aware optimizations as a compiled language, typically resulting in performance that matches or beats standard approaches like MPI, SHMEM, OpenMP, or CUDA. In practice, Chapel applications have scaled to thousands of compute nodes and over a million cores. Best of all, these performance and scalability benefits are all contained within a general-purpose language whose design supports writing clear, concise code.

2. Does Chapel support GPUs? Is it possible to easily create an application that can recognize GPUs and use them if available and otherwise use the available CPUs (cores)?

Yes, Chapel does support vendor-neutral GPU programming, whereas our recent releases have supported NVIDIA and AMD GPUs. Intel GPUs are also of interest but currently remain future work.

Thanks to Chapel’s locales, It is possible and reasonably easy to write applications that work with or without GPUs. The ‘locale’ type is how we represent system resources within a Chapel program. A Chapel application running on n compute nodes has an n-element array of locale values representing those nodes, which permits users to make use of them and reason about them. For example, locales support queries that return the amount of memory or parallelism available on a given node. Each top-level locale also contains an array of GPU sub-locales, which represent the node’s GPUs and can be used to target GPU processors and memories. When running on a compute node that has no GPUs, this is simply an array of size zero, permitting code to query and respond to that fact or to be written in a way that works whether the array is populated or empty.

For HPCWire readers interested in learning more about Chapel’s GPU support, I’d recommend watching this talk+demo by Engin Kayraklioglu and Jade Abraham or reading this ongoing blog series by Daniel Fedorin and Engin.

3. In a similar vein, depending on the underlying hardware, can Chapel programs be written to use IB and/or Ethernet? (i.e., How hard is it to be portable across the interconnect?)

Yes, Chapel programs port trivially across HPC interconnects, including HPE Slingshot, InfiniBand, AWS’s Elastic Fabric Adaptor (EFA), and Ethernet. Chapel supports a global namespace in which variables that are visible through traditional lexical scoping rules can be accessed, whether they are stored in local memory or on a remote node. Data transfers between nodes are implemented and optimized by Chapel’s compiler and runtime, removing the need to explicitly implement communication using libraries like MPI, Libfabric, or a network-specific API.

Of course, performance can vary depending on the target network’s capabilities. For example, a Chapel program that relies on lots of remote atomic operations will run great on a system with HPE Slingshot, where such operations enjoy native support in the interconnect, but it may bog down on an Ethernet system, where remote computation would be needed to implement the atomicity. Note that this issue isn’t specific to Chapel, though—it’s the classic performance-portability question of whether to use system features that enhance performance yet are not universally available. That said, using consistent features—like atomic operations—without needing to worry about whether they’re supported in the hardware of a given platform is a great starting point for portability and performance, compared to manually mapping down to network-specific features.

I should note that Chapel’s portability across networks benefits greatly from LBNL’s GASNet-EX middleware, thanks to its support for RMA (remote memory access), active messages, and atomic operations—the three types of communication that Chapel needs to run on a given network.

4. In terms of CPU acceleration, does Chapel support things like AVX-512 vector instructions?

Chapel does support vectorized computations, where our compiler benefits greatly from LLVM in this regard (LLVM also plays a key role in our GPU support). Generally speaking, Chapel programs are compiled down to C-level operations that are then translated into LLVM IR. We then have LLVM compile the code down to the ISAs of the target processors, optimizing along the way. When compiling Chapel’s data-parallel constructs—like ‘forall’ loops or whole-array operations—our compiler uses LLVM metadata to mark the operations’ serial inner loops as order-independent, making them candidates for vectorization. In practice, this results in a well-tuned serial code for the target CPU without any additional effort from the user.

We also make features of target CPUs available as standard library routines to handle cases where LLVM can’t be expected to automatically make use of a specific feature or where the user doesn’t want to rely on automated optimization and instruction selection. Examples include computing a fused multiply-add or the ‘popcount’ of an integer.

5. Chapel can be installed across a number of systems, from laptops to clusters. Is it possible to maintain a single version of an application across all these hardware environments? That is, does Chapel allow me to avoid the “two version” problem in HPC?

Definitely. Chapel programs begin by running ‘main()’ on a single core and then introduce parallelism dynamically as the program executes, whether locally, on GPUs, or across compute nodes. This design means the first prototype code you sketch out on your laptop can be incrementally evolved into a scalable, distributed-memory code, often with only modest changes. For example, a Chapel array’s declaration can easily be updated to specify that its elements should be distributed across some or all of the program’s locales. In making this change, the parallel loops and operations over the array automatically switch from being local, multicore operations to distributed computations, making use of the cores of the target locales, with no other source code changes required. Basically, the declarations undergo modest adjustments, yet the science of the computation can remain unchanged.

Contrast this with other HPC technologies where the user has to code in a Single Program, Multiple Data (SPMD) programming model. In such approaches, ‘main()’ changes from being called just once to once per node or core. While that change is conceptually simple and relatively easy to implement, it has a dramatic impact on how a local laptop program needs to be restructured: data structures have to be manually decomposed into per-image chunks, control flow has to be updated to ensure each image performs its local piece of the work, and (typically) explicit communication needs to be added to coordinate and transfer data between the program images.

We consider these SPMD-induced code changes the primary source of the “two version” problems of conventional HPC programming you mention. They also represent a huge barrier to having more laptop programmers, and applications make the transition to HPC systems because it’s such a different way of viewing code—one that we’ve simply become numb to in the HPC community. Our team believes that Chapel’s post-SPMD execution model, scope-based global namespace, and built-in support for parallelism are crucial remedies for these issues.

6. Is there a programming language that Chapel is “most similar to” (i.e., How hard is it to learn Chapel?)

Chapel isn’t an extension of an existing language, though in designing it, we certainly took inspiration from a number of languages—as well as lessons to avoid. It’s difficult for me to say that Chapel is most similar to any one language, but a rough characterization would be that it’s fairly Python-like in terms of code clarity and level of expression, yet with syntactic elements from C (curly brackets and semicolons rather than being whitespace-sensitive) and Modula-3 (left-to-right keyword-based declarations). Chapel also has rich support for multidimensional arrays, as in Fortran 90. Something that pleases me is that we often hear programmers talk about positive resonances they see between Chapel and their favorite language, whether that’s Fortran, C++, or Python. In practice, my sense is that programmers from diverse backgrounds like these resonances and find Chapel easy to get started with.

For those interested in learning about Chapel, we have a number of resources available on our website as well as community support forums for getting answers to questions, either online or in live sessions.

7. How easy is it to use existing libraries with Chapel? (i.e., I have a sequential C++ code I don’t want to rewrite.)

Chapel supports interoperability with other languages, permitting existing libraries to be called from Chapel or for Chapel libraries to be created and invoked from other languages. Calling between Chapel and C is certainly the most exercised and mature path, and since C acts as a lingua franca, this tends to provide a path to any other language. That said, we also have support for more native/direct interoperability with Python and Fortran, such as the ability to pass multidimensional arrays between Fortran and Chapel in a copy-free manner.

Since you asked specifically about C++, In practice, our team does a lot of C++ interoperability given its broad usage in libraries, but this is almost always done by creating C wrappers around the library, primarily due to challenges like name mangling and differences in OOP semantics. Most Chapel users and developers would like to see more native support for C++ interoperability, but it’s a sufficiently heavy lift that we haven’t had the opportunity to prioritize it yet.

8. What about performance? Can you point to some recent benchmarks that show parallel performance?

Here are three recent performance results that I’m particularly proud of for different reasons:

The first is a serial benchmark from the Computer Language Benchmarks Game that computes an n-body interaction between the five largest bodies in our solar system. In the current standings, a single Chapel implementation (Chapel #3) is the fastest entry that doesn’t use hand-written vector instructions or “unsafe” operations while simultaneously being the most compact entry in terms of compressed code size and a very clear implementation. This result is a benchmark whose performance improved significantly over the past year—with no changes to its source code—due to improvements in our integration with LLVM, as mentioned above.

The second result is my favorite scalability run from last year, in which we sorted 256 TiB of data in 30 seconds using 8192 nodes of an HPE Cray EX running Slingshot-11. This performance was an exciting result for three major reasons. The first is simply the scale of the run, which exceeds anything I could’ve imagined doing when we first started the Chapel project. The second is because we got this result on our first run at this scale, despite it using an order of magnitude more nodes than our previous largest run—and as you surely know, this virtually never happens when running at new scales in HPC. The third reason is that this is not simply a benchmark but a crucial part of Arkouda—a flagship Chapel framework that provides Python programmers with interactive data science capabilities at HPC scales.

Chapel sorted 256 TiB of data in 30 seconds using 8192 nodes of an HPE Cray EX running Slingshot-11 (Source: Chapel Team)

The third performance result I’d like to highlight is actually a pair of (unrelated) papers presented at SC23 (at the PAW-ATMworkshop), each of which uses Chapel effectively in its respective field—satellite image analysis for coral reefs and exact diagonalization for quantum many-body physics, respectively. Though the scientific areas, approaches, and codes are completely different, each application achieved a significant performance improvement relative to prior art while also benefitting significantly from Chapel’s productivity features.

9. Anything else you want our readers to know about Chapel?

I think we’ve covered a lot of good ground here, thanks to your excellent questions! Two final things that occurred to me to mention are:

Last month, we held our annual flagship Chapel event, which we revamped and rebranded this year to ChapelCon from CHIUW (the Chapel Implementers and Users Workshop). For those interested in a recap, Engin Kayraklioglu, ChapelCon’s general chair, wrote a great summary of the event for our blog. If you were to watch or browse one talk from ChapelCon, it should be Paul Sathre’s keynote, A Case for Parallel-First Languages in a Post-Serial, Accelerated World, which was an excellent testimony to the value of languages like Chapel from an external perspective.

Finally, if you’re interested in keeping up with the latest highlights from the Chapel project, be sure to keep an eye on our blog and social media accounts.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Argonne’s HPC/AI User Forum Wrap Up

September 11, 2024

As fans of this publication will already know, AI is everywhere. We hear about it in the news, at work, and in our daily lives. It’s such a revolutionary technology that even established events focusing on HPC specific Read more…

Quantum Software Specialist Q-CTRL Inks Deals with IBM, Rigetti, Oxford, and Diraq

September 10, 2024

Q-CTRL, the Australia-based start-up focusing on quantum infrastructure software, today announced that its performance-management software, Fire Opal, will be natively integrated into four of the world's most advanced qu Read more…

Computing-Driven Medicine: Sleeping Better with HPC

September 10, 2024

As a senior undergraduate student at Fisk University in Nashville, Tenn., Ifrah Khurram's calculus professor, Dr. Sanjukta Hota, encouraged her to apply for the Sustainable Research Pathways Program (SRP). SRP was create Read more…

LLNL Engineers Harness Machine Learning to Unlock New Possibilities in Lattice Structures

September 9, 2024

Lattice structures, characterized by their complex patterns and hierarchical designs, offer immense potential across various industries, including automotive, aerospace, and biomedical engineering. With their outstand Read more…

NSF-Funded Data Fabric Takes Flight

September 5, 2024

The data fabric has emerged as an enterprise data management pattern for companies that struggle to provide large teams of users with access to well-managed, integrated, and secured data. Now scientists working at univer Read more…

xAI Colossus: The Elon Project

September 5, 2024

Elon Musk's xAI cluster, named Colossus (possibly after the 1970 movie about a massive computer that does not end well), has been brought online. Musk recently posted the following on X/Twitter: "This weekend, the @xA Read more…

Shutterstock 793611091

Argonne’s HPC/AI User Forum Wrap Up

September 11, 2024

As fans of this publication will already know, AI is everywhere. We hear about it in the news, at work, and in our daily lives. It’s such a revolutionary tech Read more…

Quantum Software Specialist Q-CTRL Inks Deals with IBM, Rigetti, Oxford, and Diraq

September 10, 2024

Q-CTRL, the Australia-based start-up focusing on quantum infrastructure software, today announced that its performance-management software, Fire Opal, will be n Read more…

NSF-Funded Data Fabric Takes Flight

September 5, 2024

The data fabric has emerged as an enterprise data management pattern for companies that struggle to provide large teams of users with access to well-managed, in Read more…

Shutterstock 1024337068

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Shutterstock 1897494979

What’s New with Chapel? Nine Questions for the Development Team

September 4, 2024

HPC news headlines often highlight the latest hardware speeds and feeds. While advances on the hardware front are important, improving the ability to write soft Read more…

Critics Slam Government on Compute Speeds in Regulations

September 3, 2024

Critics are accusing the U.S. and state governments of overreaching by including limits on compute speeds in regulations and laws, which they claim will limit i Read more…

Shutterstock 1622080153

AWS Perfects Cloud Service for Supercomputing Customers

August 29, 2024

Amazon's AWS believes it has finally created a cloud service that will break through with HPC and supercomputing customers. The cloud provider a Read more…

HPC Debrief: James Walker CEO of NANO Nuclear Energy on Powering Datacenters

August 27, 2024

Welcome to The HPC Debrief where we interview industry leaders that are shaping the future of HPC. As the growth of AI continues, finding power for data centers Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Atos goes under, HPE will be the only major option for building large-scale systems. Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Shutterstock_1687123447

Nvidia Economics: Make $5-$7 for Every $1 Spent on GPUs

June 30, 2024

Nvidia is saying that companies could make $5 to $7 for every $1 invested in GPUs over a four-year period. Customers are investing billions in new Nvidia hardwa Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Shutterstock 1024337068

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Leading Solution Providers

Contributors

IonQ Plots Path to Commercial (Quantum) Advantage

July 2, 2024

IonQ, the trapped ion quantum computing specialist, delivered a progress report last week firming up 2024/25 product goals and reviewing its technology roadmap. Read more…

Intel’s Next-gen Falcon Shores Coming Out in Late 2025 

April 30, 2024

It's a long wait for customers hanging on for Intel's next-generation GPU, Falcon Shores, which will be released in late 2025.  "Then we have a rich, a very Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Department of Justice Begins Antitrust Probe into Nvidia

August 9, 2024

After months of skyrocketing stock prices and unhinged optimism, Nvidia has run into a few snags – a  design flaw in one of its new chips and an antitrust pr Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

MLPerf Training 4.0 – Nvidia Still King; Power and LLM Fine Tuning Added

June 12, 2024

There are really two stories packaged in the most recent MLPerf  Training 4.0 results, released today. The first, of course, is the results. Nvidia (currently Read more…

xAI Colossus: The Elon Project

September 5, 2024

Elon Musk's xAI cluster, named Colossus (possibly after the 1970 movie about a massive computer that does not end well), has been brought online. Musk recently Read more…

Spelunking the HPC and AI GPU Software Stacks

June 21, 2024

As AI continues to reach into every domain of life, the question remains as to what kind of software these tools will run on. The choice in software stacks – Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire