Talking Carbon-free HPC with Lancium at SC22

By Oliver Peckham

December 12, 2022

With climate concerns amplifying and the war in Ukraine destabilizing energy prices, the conversation around the need for low-cost, renewable energy has been rapidly evolving this year. It came to a head at SC22, where scores of panels stressed the need for a transformation in how HPC’s accelerating energy and carbon footprints are measured and ameliorated. One of the champions of this conversation has been a scrappy, Texas-based startup named Lancium. The company – which first came to our attention at a panel at SC21 – is working on setting up large, air-cooled HPC datacenters in west Texas, where renewable energy is cheap, plentiful and congested. Lancium intends to spin up workloads when energy is cheap or negative-cost (a frequent occurrence) and spin them down when demand (and energy prices) begin to go up again. Clients, they wager, will be able and willing to trade a little bit of uptime for substantially reduced energy costs and a zero-carbon footprint. (To learn a lot more about Lancium, click here.)

Right at the start of SC22 – which Lancium called their true “debut” – I spoke with Lancium’s president, Andrew Grimshaw, and its chief revenue officer, Sanjai Marimadaiah, at their booth. The video of that interview can be viewed below and is also included (beneath the video) as a lightly edited transcript.


Describing Lancium’s value proposition and initial offerings

Andrew Grimshaw: Lancium is a company that’s really different from a lot of other companies. We’re focusing on renewable-powered computing. Specifically, a lot of people aren’t aware of the fact that renewables right now – specifically, wind and solar – are the least expensive way to produce electricity in the world and renewables are exploding throughout the world, but particularly in places like Texas. In fact, by 2025 there will be 88 gigawatts of installed capacity of renewables in Texas. There’s two big challenges with renewables that have to be overcome, at least in the short term, if we’re going to utilize them: one of them is that renewables are intermittent, and we all know this – the sun’s not up all the time, the wind doesn’t always blow. The second thing is that they tend to become congested because they get built in places where there’s lots of wind and lots of sun, but then you have to get the energy out. And that’s the challenge. So what Lancium does is it builds datacenters in these congested regions where the energy prices are extremely low because they’re being powered by renewables and secondly, the energy that we’re using is basically carbon-free. So we’re building datacenters there so we can essentially decarbonize high-performance computing. That’s sort of my goal and the next part of my career is to decarbonize this huge amount of computing that I’ve done in my life – and other people’s, as well. So that’s our primary benefit, beyond cost and obviously other things, which Sanjay will probably talk about.

Sanjai Marimadaiah: Primarily in our debut, we are offering our platform as a service to customers. We are opening up our platform. … We are more like a public cloud dedicated to high-performance computing. So the key value is because we are dedicated to high-performance computing, it enables us to provide the software stack that is specifically to solve the problems for the customer to use the platform. So there are three key value propositions of our high-performance computing platform-as-a-service. Number one is: shorten the time to value. That means you just bring your workload and the data and the rest of the things are taken care of by our platform. The second value is – as Andrew pointed out – we are decarbonizing HPC by using 100% renewable power. That’s where we will be in Abilene [with the launch of Lancium’s first Clean Campus], today we are in our Houston datacenter as the launch site. The third, most important value is: renewable energy is not costly. We offer the lowest total cost of ownership – one, by reducing the cost of power, operations and compute. [We eliminate] all the devops activities that a customer needs to do before they can use a public cloud. So the initial customers … do not need any IT professionals involved to run their workloads.

How HPC’s properties enable Lancium to move past flops-per-watt

Grimshaw: We really focus this on the high-performance computing community for a couple of reasons. One of them is that HPC applications tend to be batch, and because we need to be able to modulate our load, that gives us the ability to start them and stop them without affecting the correctness of the application. The second thing – and the way we’re really focused on HPC – I’ve been doing HPC since the original Crays, and in my entire career you would submit a job, you could check the status of a job and you could kill a job. And our whole interface is built around that: you can submit a job, you can check the status of the job and you can look at the job while it’s running, but you don’t have to set up anything. And then secondly, we have MPI support – we’re really focused on high-throughput computing of single-node jobs and then low-degree parallel jobs up to about 2,000 to 3,000 cores. And that’s also where I come from, so it’s a very easy thing for me to do.

So for the last at least ten years, and probably longer – I don’t remember when the Green500 was formed, but it was some people at Virginia Tech who did it, so I should know that, they’re just down the road from us! – the whole focus has been on flops-per-watt. That presumes that your computation is being done in an environment where electricity is expensive, and more recently, carbon-intense. And we take a completely opposite view of that. We operate in an environment where the electricity is almost free and literally 15% of the time the energy is negatively priced. So I don’t think that flops-per-watt is really the right metric for this kind of new environment where the energy is practically free, and it’s also carbon-free. … I think a more interesting question the HPC community can focus on is flops-per-kilogram of CO2. If what you care about is carbon dioxide, that’s it – or flops-per-dollar, right? So that’s our whole approach – we’re sort of flipping the equation because engineering is all about operating inside of design constraints, and our design constraint, because we’re moving into these energy-rich, low-cost energy regions, our design constraint isn’t the same. My boss literally told me when he hired me, he said ‘consider electricity free.’ That is a very different constraint than I’ve operated under in my entire career.

We use older gear – we don’t have to use older gear, by the way, if a customer wants some new thing, we’ll get it and we’ll put it in – but the advantage to older gear is if you look at the performance per core – that’s the important thing here, not per-socket – the performance per core hasn’t really changed a lot in the last ten years. … In fact, when you put more cores on there, oftentimes you have to reduce the clock frequency. What they’ve gotten better at is energy efficiency per flop and density. But we don’t have a fixed-size room: we have 1,100 acres at our Abilene site alone, and up to 1.2 gigawatts of power. So for us, we can use older gear and it doesn’t really affect the per-core performance – and in the cloud business, you’re selling per-core. Getting back to the chicken sheds comment, one of the things that surprised me when I came here is that there was research on essentially how hot you can get machines before they start to fall apart in bad ways. And the literature basically is: the silicon parts will start to come apart at about 90 degrees Centigrade. Well, you’re not going to put a human in a machine room that’s 90 degrees Centigrade. Now, other components will go faster, it will accelerate the rate of falling-apart of the parts, but basically we operate with ambient outside air and it doesn’t have any real effect.

Where Lancium is headed

Marimadaiah: We have about ten-plus customers actively using our platform. Today, we have a couple of them we announced with Reaction Engineering International. The research scientists are using the product – so that’s the key message for us. Our goal for the next few months is to get more people to start experiencing our platform and deriving value out of it. We’ll be making more announcements: today we started off with Reaction Engineering; we are working with Adaptive Computing to provide a hybrid cloud interface so that Lancium will be a greener, cleaner option [for bursting] to cloud. So that’s the activity for the next few months. We will have 200 megawatts of power available at our Abilene facility in May … that’s where we’ll have the 100% renewably powered Lancium HPC platform. For that, our vision is this: we want to bring in more partners to bring their HPC workloads, and the vision for us is to build a commercially operating HPC cloud with multiparty engagement. We are working with very renowned datacenter designers, chip vendors, server vendors and the software stack – including the one Andrew has built in his career at University of Virginia, which is Genesis II, and we are using Slurm and Singularity containers – but we’ll provide more options for this middleware stack. And most importantly, we want to bring in ISP vendors. There are so many of them, we are talking to many of them. … We are focusing on five key verticals where we’ve seen immediate traction. We are working with one pharmaceutical company – again, directly operated by a research scientist. We are [also] working with the Department of Energy – we can’t disclose which project at this time until it is coming to fruition. But there’s a lot of interest from academia and research institutes … and whenever we go to any conference, sustainability is the key focus. It’s funny that we are here at SC22 and COP27 is happening in Egypt at the [same] time, and we are delighted to be making our debut at the right opportune time here.

Grimshaw: I’m just gonna say one blatantly commercial thing here: we’re hoping that Supercomputing can be an inflection point for us, because we’ve been sort of the best-kept secret in the world. One of the things that we’re doing to encourage people to try it out is [that] they can get 5,000 free hours to try out the platform and see if it works for them. And if you know how to use containers – we use Singularity containers, but Docker containers will work as well – it’s really pretty straightforward to use, and what I encourage people to do is just give it a try, because if you’re facile with containers and you’ve ever used a queueing system in your life, it should be pretty straightforward to get your job running.


Later in the week, Grimshaw moderated a panel, “Addressing HPC’s Carbon Footprint,” that went further into the issues surrounding flops-per-watt as a metric for HPC energy efficiency. Read our coverage of that panel here.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use of Rigetti’s Novera 9-qubit QPU. The approach by a quantum Read more…

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pressing needs and hurdles to widespread AI adoption. The sudde Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire