Lessons from LLVM: An SC21 Fireside Chat with Chris Lattner

By John Russell

December 27, 2021

Today, the LLVM compiler infrastructure world is essentially inescapable in HPC. But back in the 2000 timeframe, LLVM (low level virtual machine) was just getting its start as a new way of thinking about how to overcome shortcomings in the Java Virtual Machine. At the time, Chris Lattner was a graduate student of Vikram Adve at the University of Illinois.

“Java was taking over the world. It was really exciting. Nobody knew the boundaries of Java. Some of us had some concerns about the sort of workloads that maybe wouldn’t fit well with it. But the compilation story was still quite early. Just-in-time compilers were just coming on,” recalled Lattner.

Participating in a Fireside Chat at SC21 last month, Lattner strolled down memory lane and talked about how LLVM grew from his master’s thesis project at University of Illinois, Urbana-Champaign in 2000 into a broad community effort used by, and contributed to, by nearly every major company producing compilers and programming-language tools. He also discussed LLVM’s future, his work on Swift and MLIR, and the reward and challenge of working in open source communities. Hal Finkel of the DOE Office of Advanced Scientific Computing Research was the interviewer.

Chris Lattner, SciFive

“Vikram and I had this idea that if we took this just-in-time compiler technology, but did more ahead-of-time compilation, we could get better trade-offs in terms of whole program optimization analysis, [and] be able to build analysis tools, and get better performance. A lot of the name LLVM, low level virtual machine, comes from the idea of taking the Java Virtual Machine and building something that is underneath it, a platform that you could then do whole program optimization for,” said Lattner.

“After building a whole bunch of infrastructure and learning all this compiler stuff, which I was just eating up and just loved learning by doing, we actually ended up saying, Well, how about we build a code generator? And how about we go integrate with GCC (GNU Compiler Collection). Very early on, it started as a Java thing that ended up being a C-oriented and statically-compiled tooling language as the initial focus. So, a lot of that early Genesis kind of got derailed. But it became a very useful platform for research and for applications in a large number of different domains.”

Lattner is, of course, no stranger to the programming world. Much of his work on LLVM, Clang, and Swift took place while he was at Apple. Lattner also worked briefly at Tesla leading its auto pilot team. He is currently senior vice president of platform engineering at SiFive which develops RISC-V processors.

Presented here are a few of Lattner’s comments (lightly edited) on his work with the LLVM community and its future. The SC21 video of the session is available here (for registered attendees).

My Time at Apple – “You don’t understand…nothing’s ever going to replace GCC”

When Lattner graduated from U of Illinois in 2005, LLVM was still an advanced research project. “The quality of the generated code wasn’t perfect but it was promising,” he recalled. An Apple engineer was working with LLVM and talked it up to an Apple VP. At the time Lattner was collaborating with the engineer over mailing lists

“The time was right. Apple had been investing a lot in GCC, and I don’t know if it was the GCC technology or the GCC team at Apple at the time, but management was very frustrated with lack of progress. I got to talk with this VP who thought compilers were interesting and he decided to give me a chance. He hired me and said, ‘Yeah, you can work on this LLVM thing. Show that it wasn’t a bad idea. [Not long after] he motivated saying ‘You can have a year or so to work on this. And worst case, you’re a smart guy, we can make you work on GCC.’”

A couple of weeks into the job, Lattner remembers being asked “why are you here” by an experienced Apple engineer. After explaining his LLVM project, the colleague said, “You don’t understand. GCC has been around for 20 years, it’s had hundreds of people working on it, nothing’s ever going to replace GCC, you’re wasting your time.” Lattner said, “Well, I don’t know, I’m having fun.”

It turned out there was a huge need for just-in-time compilers in the graphics space, and LLVM was a good solution.

Lattner said, “The OpenGL team was struggling because Apple [was] coming out with 64-bit Mac and moving from PowerPC to Intel, and a bunch of these things. They were using hand-rolled, just-in-time compilers and we were able to use LLVM to solve a bunch of their problems like enable new hardware [which was] not something that GCC was ever designed to do.”

“So [pieces of LLVM] shipped with the 10.4 Tiger release (2007) improving graphics performance. That showed some value and justified a little bit of investment. I got another person to work with and we went from that to another little thing and to another little thing, one little step at a time,” recounted Lattner. “It gained momentum and eventually started replacing parts of GCC. Another thing along the way was that the GPU team was trying to make a shading language for general-purpose, GPU compute, [and that] turned into what we know now as OpenCL and that became the first user of Clang.”

The rest, of course, is a very rich LLVM history of community development and collaboration.

Collaboration’s Risk and Reward – “It’s time for you to go.”

Not surprisingly, it’s challenging to build an open source development community in which commercial competitors collaborate. This isn’t unique to LLVM, but given its endurance and growth, there may be lessons for others.

Lattner said, “Look at the LLVM community and you have Intel and AMD and Apple and Google and Sony and all these folks that are collaborating. One of the ways we made [it work] was by being very driven by technical excellence and by shared values and a shared understanding of what success looks like.”

“As a community, we always worked engineer-to-engineer to solve problems. For example, for me when I was at Apple or whatever affiliation, I’d have my LLVM hat on when working with the community, but I’d have my Apple hat on where I’m solving the internal problem for hardware that’s not shipped, right. We decided that the corporate hats that many of us wore would not be part of the LLVM community. It was not about bringing up a topic like, I need to get this patch in now to hit a release,” he said.

The shared understanding helped inform LLVM community growth by attracting similarly minded collaborators said Lattner. “I’m proud of the fact that we have people who are harsh, industrial enemies that are fighting with each other on the business landscape, but can still work on and agree on the best way to model some kernel in a GPU or whatever it is,” he said.

Things don’t always work out.

“Over the years, not often, we have had to eject people out of the community. It’s when people have decided that they do not align with the value system [or] they’re not willing to collaborate with people or they’re not aligned with where the community is going. That is super difficult, because some of them are prolific contributors, and there’s real pain, but maintaining that community cohesion [and] value system is so important,” said Lattner.

LLVM Warts & Redo – Would starting from scratch a good idea?

“I am the biggest critic of LLVM because I know all the problems,” said Lattner, half in jest, noting that LLVM is over 20 years old now. “LLVM is definitely a good thing, but it is not a perfect thing by any stretch of the imagination. I’m really happy we’ve been able to continually upgrade and iterate and improve on LLVM over the years. But it’s to the point now, where certain changes are architectural, and it’s very difficult to make.

“One example of this is that the LLVM compiler itself is not internally multi-threaded. I don’t know about you, but I think that multicore is no longer the future. There are also certain design decisions, which I’m not going to go into in detail on, that are regretted. Many of those, only nerds like me care about and they’re not the strategic kind of a problem that faces the community, but others really are,” said Lattner.

“[Among] things that LLVM has never been super great at are loop transformations, HPC-style transformations, auto parallelization, OpenMP support. LLVM works and it’s very useful, but it could be a lot better. Those [weaknesses] all go back to design decisions in LLVM where the LLVM view of the world is really kind of a C-with-vectors view of the world. That original design premise is holding back certain kinds of evolution,” he said.

Today, noted Lattner, the LLVM project overall has many sub-projects, including MLIR and others that are breaking down these barriers and solving some of these problems. “But typically, when people ask about LLVM, they’re thinking about Clang and the standard C/C++ pipeline, and it hasn’t quite adopted all the new technology in the space,” said Lattner.

Finkel asked of Lattner would recommend starting over again.

“Yes, I did. This is what MLIR (multi-level intermediate representation) is right? All kidding aside. LLVM is slow when you’re using it in ways it wasn’t really designed to be used. For example, the Rust community is well known for pushing on the boundaries of LLVM performance because their compilation model instantiates tons and tons and tons of stuff, and then specializes and specializes and specializes it all way. This puts a huge amount of pressure and weight on the compiler that C, for example, or simpler lower level languages don’t have. It leads to amazing things in the Rust community but it’s asking the compiler to do all this work that is implicit in this programing model,” he said.

“Starting all over from scratch, you have to decide what problems you want to fix. The problems that I’m interested in fixing with LLVM come down to it doesn’t model higher level abstractions like loops very well and things like this. I think the constant time performance of any individual pass is generally okay. The other challenge I see with LLVM is that it’s a complicated set of technologies and therefore a difficult tool to wield unless you know all the different pieces. Sometimes, people are writing lots of passes that shouldn’t be run. So, I’m not religiously attached to LLVM being the perfect answer.”

Making LLVM Better – While at Google, Lattner Tackled MLIR

MILR is a sub-project within LLVM and intended to help give it more modern capabilities. Lattner went from Apple to Google where he worked on MLIR.

“I’ll start from the problem statement [which] comes back to the earlier questions on what’s wrong with LLVM? So LLVM is interested in tackling the C-with-vectors part of the design space, but there are a lot of other interesting parts of design space where LLVM may be helpful in small ways, but doesn’t really help the inherent problem. If you talk about distributing computation to a cluster, LLVM doesn’t do any of that. If you talk about machine learning, and I have parallel workloads that are represented as tensors, LLVM, doesn’t help. If you look at other spaces, for example hardware design, LLVM has some features you can use [but] are really not great,” said Lattner.

“The other context was within Google and the TensorFlow team. [Although] TensorFlow itself is not widely seen as this, it’s really a set of compiler technologies. It has TensorFlow graphs. It has this XLA compiler framework with an HLO graphs. It has code generation for CPUs and GPUs. It has many other technology components like TensorFlow Lite, which is a completely separate machine learning framework with converters back and forth,” he said.

What had happened, said Lattner, is that TensorFlow had this massive amount of infrastructure, an ecosystem with “seven or eight different IRs” floating around. “Nobody had built them like a compiler IR. People think of TensorFlow graphs as a protocol buffer, not as an IR representation. As a consequence, the quality around that was not very great. Nothing was really integrated. There were all these different technology islands between the different systems. People weren’t able to talk with each other because they didn’t understand that they’re all working on the same problems in different parts of space,” recalled Lattner.

MLIR, said Lattner, arose from this idea of “saying, how do we integrate these completely different worlds where you’re working on a massive multi-1000-node machine learning accelerator, like GPUs, versus I’m working on an Arm TensorFlow Lite mobile deployment scenario. There’s no commonality between those.”

Lattner said, “There’s a hard part to building compilers, which has nothing to do with the domain. If you look at a compiler like LLVM, a big part of LLVM, is all this infrastructure for testing, for debug info, for walking the graph, for building a control flow graph, for defining call graphs, or doing analyses of pass managers – all of this kind of stuff is common regardless of whether you’re building a CPU JIT compiler or building a TensorFlow graph style representation. The representation on the compiler infrastructure is invariant to the domain your targeting.”

What MLIR evolved into “was taking the notion of a compiler infrastructure and taking the domain out of it. MLIR is a domain-independent compiler infrastructure that allows you to build domain specific verticals on top. It provides the ability to define your IR, your representation, like what are your adds, subtracts, multiply, divides, stores. What are the core abstractions you have? For example, in software, you have functions. In hardware, you have Verilog modules. MILR can do both of those,” he said.

“Building all of this useful functionality, out-of-the-box, allowed us in the Google lab to say, “We have seven different compilers, let’s start unifying them at the bottom and pull them onto the same technology stacks. We can start sharing code, and breaking down these barriers.” Also, because you have one thing, and it’s been used by lots of people, you can invest in making it really, really good. Investing in infrastructure like that is something you often don’t get a chance to do.”

Lattner said he’s not only excited to see MLIR being adopted across the industry, notably for machine learning kinds of applications, but also in new arenas such as in quantum computing. “At SiFive, we use it for hardware design and chip design kinds of problems – any place you can benefit from having the compiler be able to represent a design,” he said.

 

(Presented below is an excerpt from LLVM.org that showcases the wide scope of the project)

LLVM OVERVIEW EXCERPTED FROM LLVM.ORG

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Despite its name, LLVM has little to do with traditional virtual machines. The name “LLVM” itself is not an acronym; it is the full name of the project.

LLVM began as a research project at the University of Illinois, with the goal of providing a modern, SSA-based compilation strategy capable of supporting both static and dynamic compilation of arbitrary programming languages. Since then, LLVM has grown to be an umbrella project consisting of a number of subprojects, many of which are being used in production by a wide variety of commercial and open source projects as well as being widely used in academic research. Code in the LLVM project is licensed under the “Apache 2.0 License with LLVM exceptions”

The primary sub-projects of LLVM are:

  1. The LLVM Core libraries provide a modern source- and target-independent optimizer, along with code generation support for many popular CPUs (as well as some less common ones!) These libraries are built around a well specified code representation known as the LLVM intermediate representation (“LLVM IR”). The LLVM Core libraries are well documented, and it is particularly easy to invent your own language (or port an existing compiler) to use LLVM as an optimizer and code generator.
  2. Clang is an “LLVM native” C/C++/Objective-C compiler, which aims to deliver amazingly fast compiles, extremely useful error and warning messages and to provide a platform for building great source level tools. The Clang Static Analyzer and clang-tidy are tools that automatically find bugs in your code, and are great examples of the sort of tools that can be built using the Clang frontend as a library to parse C/C++ code.
  3. The LLDB project builds on libraries provided by LLVM and Clang to provide a great native debugger. It uses the Clang ASTs and expression parser, LLVM JIT, LLVM disassembler, etc so that it provides an experience that “just works”. It is also blazing fast and much more memory efficient than GDB at loading symbols.
  4. The libc++ and libc++ ABI projects provide a standard conformant and high-performance implementation of the C++ Standard Library, including full support for C++11 and C++14.
  5. The compiler-rt project provides highly tuned implementations of the low-level code generator support routines like “__fixunsdfdi” and other calls generated when a target doesn’t have a short sequence of native instructions to implement a core IR operation. It also provides implementations of run-time libraries for dynamic testing tools such as AddressSanitizerThreadSanitizerMemorySanitizer, and DataFlowSanitizer.
  6. The MLIR subproject is a novel approach to building reusable and extensible compiler infrastructure. MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, significantly reduce the cost of building domain specific compilers, and aid in connecting existing compilers together.
  7. The OpenMP subproject provides an OpenMP runtime for use with the OpenMP implementation in Clang.
  8. The polly project implements a suite of cache-locality optimizations as well as auto-parallelism and vectorization using a polyhedral model.
  9. The libclc project aims to implement the OpenCL standard library.
  10. The klee project implements a “symbolic virtual machine” which uses a theorem prover to try to evaluate all dynamic paths through a program in an effort to find bugs and to prove properties of functions. A major feature of klee is that it can produce a testcase in the event that it detects a bug.
  11. The LLD project is a new linker. That is a drop-in replacement for system linkers and runs much faster.

In addition to official subprojects of LLVM, there are a broad variety of other projects that use components of LLVM for various tasks. Through these external projects you can use LLVM to compile Ruby, Python, Haskell, Rust, D, PHP, Pure, Lua, and a number of other languages. A major strength of LLVM is its versatility, flexibility, and reusability, which is why it is being used for such a wide variety of different tasks: everything from doing light-weight JIT compiles of embedded languages like Lua to compiling Fortran code for massive super computers.

As much as everything else, LLVM has a broad and friendly community of people who are interested in building great low-level tools. If you are interested in getting involved, a good first place is to skim the LLVM Blog and to sign up for the LLVM Developer mailing list. For information on how to send in a patch, get commit access, and copyright and license topics, please see the LLVM Developer Policy.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire