A major initiative by U.S. president Joe Biden called EarthShots to decarbonize the power grid by 2035 and the U.S. economy by 2050 is getting a major boost through a computing breakthrough at the National Energy Technology Laboratory.
The National Energy Technology Laboratory (NETL), which is run by the U.S. Department of Energy, said it is getting performance gains up to “several 100 times faster” on certain scientific computing tasks that are important to national security and economic interests.
The performance gains are coming on a system with chips from Cerebras Systems, which has emerged as a heavyweight in the high-performance computing space. The company’s second-generation Wafer-Scale Engine chips were used by the 2022 Gordon Bell prize winner to research variants of Covid-19. The WSE-2 chips power Cerebras’ CS-2 machines, which can be networked together to drive large model training and simulations.

Operating the CS-2-based system called Neocortex at the Pittsburgh Supercomputer Center (PSC), NETL simulated a phenomenon called Rayleigh-Benard Convection, which is the behavior of air when a hot surface interacts with cold over 10s of 1000s of meters. It can be compared to putting a hot plate on the floor and a giant ice pack on it, with that set up laid out over tens of thousands of meters.
The simulation shows the way air would move, but also how magma and weather systems move. The simulation represents a fundamental tool that drives the design of an airplane wing, a propeller of a submarine, and how things are built to adapt to air or fluid flows.
The lab is still characterizing the simulation, but it “measured as much as 470 times faster [performance] for problems in this class” compared to its current Joule 2.0 supercomputer, which has CPUs and GPUs, said Dirk Van Essendelft, machine learning and data science engineer at NETL.
The Joule 2.0 at NETL is the 149th fastest supercomputer in the world, according to the Top500 list.
“The main takeaway is that this [the implementation on a CS-2 machine] is several 100 times faster than what even Frontier and what these other HPC facilities could do,” Van Essendelft said, adding “It’s like unobtainium in the HPC world right now.”
The results are in real time, and the simulation is a fundamental tool used in building things and in understanding the behavior of the world around us, Van Essendelft said.
Van Essenfeldt was looking for different hardware solutions than conventional HPC clusters because “I was really sick of waiting months for my simulations to get done.”
It was a chance meeting with Cerebras at the AI for Science townhall that got him looking at chips from Cerebras Systems.
“We recognize that the way the hardware was designed eliminated a couple of really key bottlenecks in computing, mostly related to latency and bandwidth. The entire memory system allows me to have two reads and one write per SIMD lane. That is true, basically, whether I’m looking at my own local memory or that of my neighbor processor on the wafer,” Van Essendelft said.

The CS-2’s compute architecture is relevant for data-intensive workloads like computational fluid dynamics, he said.
“If we try and solve on GPUs, we actually end up hitting significant bandwidth limitations to our own memory because we have to go through the entire dataset before we see any reuse. We’re not really going to be able to make effective use of memory hierarchies in the loop… so it doesn’t even scale that particularly well on a single device,” Van Essendelft said.
Current HPC network architectures are inefficient for scientific computing as the latency involved to move a single bit of data across a network is tremendous relative to the amount of compute that can be done simultaneously on processors connected to the network, Van Essendelft said.
The argument of running computing cycles closer to the point of problem-solving and reducing the distance data had to travel was a recurring theme at last year’s Supercomputing Conference (SC22). At the show, many believed that a radical change in computing architecture wasn’t necessary, but more modern approaches to hardware-software codesign were needed to scale scientific computing.
But NETL is now spoiled with the computing riches it has with the CS-2, which is also used for probabilistic computing, which is central to AI.
The current work at NETL is around solving partial differential equations related to gradients in space, and neighboring information to gradients in time, which is traditional high-performance computing work.
According to NETL lab director Brian Anderson, the computational gains “dramatically accelerate and improve the design process for some really big projects that are vital to mitigate climate change and enable a secure energy future – projects like carbon sequestration and blue hydrogen production.”
There is no AI model training yet, but there are things unique to the CS-2 computing architecture that would allow NETL to do hybrid calculations extremely fast.
“We really want to rely on AI prediction. But we also want to catch when that AI prediction is no good,” Van Essendelft said.
NETL could have a fallback that addresses the trustworthiness issues with AI predictions and refine it with the existing HPC software.
“What you’re looking at right now is just the HPC piece. What we want to develop in the future is the hybridized case, and we think we can get between 10 and 100 times faster than what you’re seeing here by hybridizing it,” Van Essendelft said.
Header image courtesy NETL and Cerebras.