When it’s (ostensibly) ready in early 2023, El Capitan is expected to deliver in excess of two exaflops of peak computing power – around four times the power of Fugaku, the current top-ranked supercomputer in the world. These behemoth capabilities, though, carry with them correspondingly monstrous appetites for energy and water to feed and cool the system. In a new post, Lawrence Livermore National Laboratory (LLNL) – El Capitan’s host – detailed the infrastructure work underway to prepare the lab for an unprecedented system.
The core of this work is the Exascale Computing Facility Modernization (ECFM) project, a ~$100 million initiative to modernize LLNL’s computing center infrastructure that began in 2019, broke ground in June of last year and is scheduled to complete in July 2022. This upgrade will retrofit the 3.5-acre area that hosts the computing center – commissioned in 2004 – to accommodate the needs of El Capitan.
Those needs, by the way, are truly enormous. Previously, LLNL’s computing center – which has hosted systems like the (now decommissioned) Sequoia, which sat atop the Top500 in 2012 – was equipped to deliver 45 megawatts of electricity. The ECFM project is upgrading that infrastructure to deliver 85 megawatts. For context, dozens of wind turbines would be needed to produce that amount of power, which would be sufficient to provide energy to tens of thousands of homes.
Exascale systems will also run hot, necessitating the installation of a half-dozen 3,000-ton cooling towers that, thankfully, use warm-water cooling to mitigate the energy impact. The center’s previous capacity of 10,000 tons of water will, at the conclusion of the project, reach 28,000 tons. “The ECFM project will nearly double the amount of electricity into our classified computing center and nearly triple the amount of cooling into the building,” said Chris Clouse, director of LLNL’s Weapon Simulation & Computing program.
But El Capitan alone isn’t enough to exhaust these massive capabilities: au contraire, the system is expected to have a power footprint around 30 to 35 megawatts. With the enormous infrastructure expansion, LLNL is also thinking ahead – to El Capitan’s successor. “These upgrades were essential in allowing us to site two exascale class computers simultaneously,” Clouse explained, “avoiding any potential downtime in computing cycles as we fully stand up our second exascale system before decommissioning El Capitan in the 2029 timeframe.”
Delivering infrastructure improvements on this scale meant navigating a sea of permits, regulations and approvals spanning seven years of planning and ranging from environmental impact assessments to close integration with multiple electricity utility companies (one of which will directly operate the new switchyard necessary to deliver the facility’s 85 megawatts). “We have to figure out now how we’re impacting the grid,” said Anna Maria Bailey, project manager for ECFM and chief engineer for HPC at LLNL. “It’s become more of a marriage with the utility company as opposed to just a handshake.”
“Exascale is a game changer,” she said. “We’re actually doing utility solutions, and that’s not something you can just snap your fingers and have done. So, when we say exascale, we’re saying ‘it’s a lot of infrastructure.’ It’s no longer something you can do locally, and it doesn’t just happen overnight.”
Despite the hurdles, Bailey said the project is “going great” and “almost done,” with the ECFM now estimated to be over 93 percent complete and, with full operation anticipated for May 2022, actually ahead of schedule. Lots remains to be done in that time, of course: the new capacity needs to be connected to the actual datacenter, and engineers are still working to figure out how to best cushion the grid difficulties that will be posed by a 30+ megawatt system rapidly ramping its energy needs up or down. The construction of the necessary connective tissue will begin in the coming months and, the project managers hope, conclude by January in time for the delivery of El Capitan to begin.
To learn more, read the post from LLNL here.