SC21 Panel on Programming Models – Tackling Data Movement, DSLs, More

By John Russell

January 6, 2022

How will programming future systems differ from current practice? This is an ever-present question in computing. Yet it has, perhaps, never been more pressing given the rise of heterogeneous architectures and diverse hardware, the steady incorporation of AI technology, and the proliferation of new programming languages and models.

At SC21, a distinguished panel tackled this broad question. Higher levels of abstraction, a clearer focus on data movement – not compute functions – and the rise of domain-specific languages as important tools were among the dominant points of discussion, which touched on topics as diverse programming Cerebras’s wafer-scale chip to FPGAs.

Moderated by Hal Finkel (DOE), the panelists included Kathy Yelick (UC Berkeley), Saman Amarasinghe (MIT), Torsten Hoefler (ETH Zürich), Maya Gokhale (LLBNL) and Justin Gottschlich (Intel). Capturing the full discussion is too daunting, but each panelist made an opening statement that captures (at least directionally) much of their thinking. Presented here are brief portions (lightly edited) of panelists’ opening remarks.

Kathy Yelick. Image courtesy of Berkeley Lab Computing Sciences.

Yelick, who just assumed her new role as vice chancellor of research at LBNL, kicked off the panel saying, “[In] scientific computing, in general, I think we should think about how people are programming at much higher level of abstraction than we’re used to. I think if you look at machine learning, and the packages that people have built for machine learning, they’ve really shown that you can, with a lot of work in terms of how you implement some of those underlying algorithms, get very good performance out of those.

“That opens up HPC-type of access to a much broader community of people if they can program at the level of something like TensorFlow. And I’d like people to also think a little bit about systems like Julia and Jupyter notebooks as really the interface to the computers, rather than thinking about programming and languages based on things like C/C++ or Fortran. So really, I’m going to be advocating for a much higher level of abstraction, which is not to say that some of us won’t still be programming at a much lower level.”

Next up was Amarasinghe, who leads the compiler research group in MIT’s Computer Science & Artificial Intelligence Laboratory (CSAIL). A leader in the field of high-performance domain-specific languages, Amarasinghe’s group developed the Halide, TACO, Simit, and many other domain-specific languages and compilers,

Saman Amarasinghe, MIT

“If you think about domain-specific languages, [it’s] not too much of a stretch – even if you say you are a C programmer, or Fortran programmer or Python programmer – to say nobody writes loops and arrays and low level things in these languages. We all use libraries. All the systems are based on libraries and that means you’re already programming in higher level abstraction with one caveat. These libraries don’t have understanding of how the entire thing is connected together. So, when you call a library function, it’s a standalone thing; it will do what’s asked and return,” he said.

“What a domain-specific language or domain-specific compiler does is, it can figure out the control flow between these library calls, understand how these things get stitched together and use that to begin to optimize performance. This is especially important now and for the future, because memory systems and data movement are becoming a really important issue,” said Amarasinghe.

Perhaps the most forceful champion for focusing on data movement in future programming development was Hoefler, who directs the Scalable Parallel Computing Laboratory (SPCL) at ETH Zurich. He argued counting FLOPS, as is done in ranking the Top500, misses the point in modern computing.

Commenting on the use of new large models such as GPT-3, he said, “Many companies are spending 10s of millions of dollars to train these models, and these are real HPC problems. They are the largest models people have trained [and] very much [what] we care about. We actually analyzed the workload a little bit more in detail. We found that the 99.8 percent of the floating-point operations in this workload is actually comprised of Tensor contractions [and] Tensor contractions are all expressed as matrix multiplication.

Torsten Hoefler, ETH Zurich

“So, this is wonderful, isn’t it? 99.8 percent of this workload is matrix multiplication. But if you actually look at the remaining 0.2 percent of operations in this workload, [it] turns out those are taking about 40 percent of the runtime. [That’s] because these Tensor contractions have been super highly-optimized over the years. The problem now [that] dominates everything else is data movement. We did some optimizations that I don’t want to go into detail about that show that you can actually speed this up quite significantly, and you can save millions of dollars by just looking at data movement,” said Hoefler.

Gottschlich, who is a principal AI scientist at Intel Labs and the director and founder of the machine programming research group at Intel, noted how Intel’s perspective on programing models has changed.

“When I joined back in 2010, Intel was very much a monolithic computing company, it was just a CPU. As I suspect everyone in the audience knows, we now consider ourselves to be very heterogeneous,” he said. “One of the core challenges we see today is not so much in the compute, but in the data movement. So, I just wanted to quickly acknowledge that I think the data movement, and figuring out how to deal with that, especially as we grow into deeper stochastic systems that tend to be improving their accuracy, as you have more IID data (independent and identically distributed data), that it becomes even more important that we figure out how to handle that that data movement problem.”

Justin Gottschlich, Intel

“Back in 2018, we published this paper, actually jointly with Saman (Amarasinghe) and some others, on the three pillars of machine programming. Machine programming is principally this idea that we are going to try to automate the development of software, and a byproduct of that is the automation of development of hardware given that much of hardware is developed through software. The three pillars are intention, invention and adaptation. Intention is principally concerned with trying to identify novel ways or improve the existing ways for programmers to specify their ideas to the machine. So, going back to, I think, both Kathy and Saman’s comments about higher order abstractions, and DSLs. In fact, I fully agree with this. I think that as we move forward, I suspect that to get outstanding performance, we really need to have this separation of intention from invention and adaptation. Once the intention is understood by the machine, then we can start to invent the algorithms and data structures that are necessary to fulfill that intention.”

Last to deliver intro remarks was Gokhale, distinguished member of technical staff at LLNL and an expert in reconfigurable computing and data intensive architectures.

Maya Gokhale

“I feel as if we’re in a fix right now with a fusion of programming models and it’s because of scaling laws, which we all know very well, between the feature size and the power. What we’ve done is build specialized widgets, that do a smaller thing, but do it very well rather than a general-purpose thing. That is a cause of a lot of problems. [It’s] one factor that is leading us to a lot of new ideas in programming models, this idea of specialization and putting heterogeneous pieces together,” said Gokhale.

“To me, the future is system-on-chip (OSC) like environments. So, heterogeneous compute models, data and or control-driven, tightly or loosely-coupled. [For example,] if you’ve worked for Apple or worked on cell phones, that SOC environment. I have a background in reconfigurable computing with FPGAs that is the combination of SOC-like environment and higher level programming. It’s a difficult environment to work in, but I see that’s where we’re going. On the other side, I see workflows for programming, [with] model interfacing and mapping. [Often] you think of your favorite DSL; it’s just so elegant and so mathematical. But it has to talk to other pieces of things and how do you make it do that? How do you interoperate? [L]arge HPC workflows have embodied some of those ideas of being able to interface with [DSLs],” she said.

A rich discussion followed the introductory comments and the SC21 video was still posted as of this writing and accessible by SC21 registrants.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…


How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers


Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow