With climate concerns amplifying and the war in Ukraine destabilizing energy prices, the conversation around the need for low-cost, renewable energy has been rapidly evolving this year. It came to a head at SC22, where scores of panels stressed the need for a transformation in how HPC’s accelerating energy and carbon footprints are measured and ameliorated. One of the champions of this conversation has been a scrappy, Texas-based startup named Lancium. The company – which first came to our attention at a panel at SC21 – is working on setting up large, air-cooled HPC datacenters in west Texas, where renewable energy is cheap, plentiful and congested. Lancium intends to spin up workloads when energy is cheap or negative-cost (a frequent occurrence) and spin them down when demand (and energy prices) begin to go up again. Clients, they wager, will be able and willing to trade a little bit of uptime for substantially reduced energy costs and a zero-carbon footprint. (To learn a lot more about Lancium, click here.)
Right at the start of SC22 – which Lancium called their true “debut” – I spoke with Lancium’s president, Andrew Grimshaw, and its chief revenue officer, Sanjai Marimadaiah, at their booth. The video of that interview can be viewed below and is also included (beneath the video) as a lightly edited transcript.
Describing Lancium’s value proposition and initial offerings
Andrew Grimshaw: Lancium is a company that’s really different from a lot of other companies. We’re focusing on renewable-powered computing. Specifically, a lot of people aren’t aware of the fact that renewables right now – specifically, wind and solar – are the least expensive way to produce electricity in the world and renewables are exploding throughout the world, but particularly in places like Texas. In fact, by 2025 there will be 88 gigawatts of installed capacity of renewables in Texas. There’s two big challenges with renewables that have to be overcome, at least in the short term, if we’re going to utilize them: one of them is that renewables are intermittent, and we all know this – the sun’s not up all the time, the wind doesn’t always blow. The second thing is that they tend to become congested because they get built in places where there’s lots of wind and lots of sun, but then you have to get the energy out. And that’s the challenge. So what Lancium does is it builds datacenters in these congested regions where the energy prices are extremely low because they’re being powered by renewables and secondly, the energy that we’re using is basically carbon-free. So we’re building datacenters there so we can essentially decarbonize high-performance computing. That’s sort of my goal and the next part of my career is to decarbonize this huge amount of computing that I’ve done in my life – and other people’s, as well. So that’s our primary benefit, beyond cost and obviously other things, which Sanjay will probably talk about.
Sanjai Marimadaiah: Primarily in our debut, we are offering our platform as a service to customers. We are opening up our platform. … We are more like a public cloud dedicated to high-performance computing. So the key value is because we are dedicated to high-performance computing, it enables us to provide the software stack that is specifically to solve the problems for the customer to use the platform. So there are three key value propositions of our high-performance computing platform-as-a-service. Number one is: shorten the time to value. That means you just bring your workload and the data and the rest of the things are taken care of by our platform. The second value is – as Andrew pointed out – we are decarbonizing HPC by using 100% renewable power. That’s where we will be in Abilene [with the launch of Lancium’s first Clean Campus], today we are in our Houston datacenter as the launch site. The third, most important value is: renewable energy is not costly. We offer the lowest total cost of ownership – one, by reducing the cost of power, operations and compute. [We eliminate] all the devops activities that a customer needs to do before they can use a public cloud. So the initial customers … do not need any IT professionals involved to run their workloads.
How HPC’s properties enable Lancium to move past flops-per-watt
Grimshaw: We really focus this on the high-performance computing community for a couple of reasons. One of them is that HPC applications tend to be batch, and because we need to be able to modulate our load, that gives us the ability to start them and stop them without affecting the correctness of the application. The second thing – and the way we’re really focused on HPC – I’ve been doing HPC since the original Crays, and in my entire career you would submit a job, you could check the status of a job and you could kill a job. And our whole interface is built around that: you can submit a job, you can check the status of the job and you can look at the job while it’s running, but you don’t have to set up anything. And then secondly, we have MPI support – we’re really focused on high-throughput computing of single-node jobs and then low-degree parallel jobs up to about 2,000 to 3,000 cores. And that’s also where I come from, so it’s a very easy thing for me to do.
So for the last at least ten years, and probably longer – I don’t remember when the Green500 was formed, but it was some people at Virginia Tech who did it, so I should know that, they’re just down the road from us! – the whole focus has been on flops-per-watt. That presumes that your computation is being done in an environment where electricity is expensive, and more recently, carbon-intense. And we take a completely opposite view of that. We operate in an environment where the electricity is almost free and literally 15% of the time the energy is negatively priced. So I don’t think that flops-per-watt is really the right metric for this kind of new environment where the energy is practically free, and it’s also carbon-free. … I think a more interesting question the HPC community can focus on is flops-per-kilogram of CO2. If what you care about is carbon dioxide, that’s it – or flops-per-dollar, right? So that’s our whole approach – we’re sort of flipping the equation because engineering is all about operating inside of design constraints, and our design constraint, because we’re moving into these energy-rich, low-cost energy regions, our design constraint isn’t the same. My boss literally told me when he hired me, he said ‘consider electricity free.’ That is a very different constraint than I’ve operated under in my entire career.
We use older gear – we don’t have to use older gear, by the way, if a customer wants some new thing, we’ll get it and we’ll put it in – but the advantage to older gear is if you look at the performance per core – that’s the important thing here, not per-socket – the performance per core hasn’t really changed a lot in the last ten years. … In fact, when you put more cores on there, oftentimes you have to reduce the clock frequency. What they’ve gotten better at is energy efficiency per flop and density. But we don’t have a fixed-size room: we have 1,100 acres at our Abilene site alone, and up to 1.2 gigawatts of power. So for us, we can use older gear and it doesn’t really affect the per-core performance – and in the cloud business, you’re selling per-core. Getting back to the chicken sheds comment, one of the things that surprised me when I came here is that there was research on essentially how hot you can get machines before they start to fall apart in bad ways. And the literature basically is: the silicon parts will start to come apart at about 90 degrees Centigrade. Well, you’re not going to put a human in a machine room that’s 90 degrees Centigrade. Now, other components will go faster, it will accelerate the rate of falling-apart of the parts, but basically we operate with ambient outside air and it doesn’t have any real effect.
Where Lancium is headed
Marimadaiah: We have about ten-plus customers actively using our platform. Today, we have a couple of them we announced with Reaction Engineering International. The research scientists are using the product – so that’s the key message for us. Our goal for the next few months is to get more people to start experiencing our platform and deriving value out of it. We’ll be making more announcements: today we started off with Reaction Engineering; we are working with Adaptive Computing to provide a hybrid cloud interface so that Lancium will be a greener, cleaner option [for bursting] to cloud. So that’s the activity for the next few months. We will have 200 megawatts of power available at our Abilene facility in May … that’s where we’ll have the 100% renewably powered Lancium HPC platform. For that, our vision is this: we want to bring in more partners to bring their HPC workloads, and the vision for us is to build a commercially operating HPC cloud with multiparty engagement. We are working with very renowned datacenter designers, chip vendors, server vendors and the software stack – including the one Andrew has built in his career at University of Virginia, which is Genesis II, and we are using Slurm and Singularity containers – but we’ll provide more options for this middleware stack. And most importantly, we want to bring in ISP vendors. There are so many of them, we are talking to many of them. … We are focusing on five key verticals where we’ve seen immediate traction. We are working with one pharmaceutical company – again, directly operated by a research scientist. We are [also] working with the Department of Energy – we can’t disclose which project at this time until it is coming to fruition. But there’s a lot of interest from academia and research institutes … and whenever we go to any conference, sustainability is the key focus. It’s funny that we are here at SC22 and COP27 is happening in Egypt at the [same] time, and we are delighted to be making our debut at the right opportune time here.
Grimshaw: I’m just gonna say one blatantly commercial thing here: we’re hoping that Supercomputing can be an inflection point for us, because we’ve been sort of the best-kept secret in the world. One of the things that we’re doing to encourage people to try it out is [that] they can get 5,000 free hours to try out the platform and see if it works for them. And if you know how to use containers – we use Singularity containers, but Docker containers will work as well – it’s really pretty straightforward to use, and what I encourage people to do is just give it a try, because if you’re facile with containers and you’ve ever used a queueing system in your life, it should be pretty straightforward to get your job running.
Later in the week, Grimshaw moderated a panel, “Addressing HPC’s Carbon Footprint,” that went further into the issues surrounding flops-per-watt as a metric for HPC energy efficiency. Read our coverage of that panel here.