How Lawrence Livermore Is Facing Exascale Power Demands

By Tiffany Trader

June 9, 2016

The old adage “you cannot improve what you do not measure” is fresh again in the age of ubiquitous data. When considering the challenges of exascale computing, power is right at the top of the list and the major leadership-class centers want to make sure they’re doing everything they can to manage the demands of power today – which can run as high as 10 MW at peak for the largest machines – and in the coming exascale era, when the number could be three times that high. At loads of this magnitude, the largest HPC facilities need to have all the relevant power data within arm’s reach.

Managing power demands is a priority at Lawrence Livermore National Laboratory (LLNL), the Department of Energy (DOE) center entrusted with ensuring nuclear security for the nation. With a peak speed of 20 petaflops, the center’s top supercomputer, Sequoia, draws more than 9 MW of power, equivalent to the energy draw of more than 10,000 average homes.

When tens of megawatts of power are on the line, advanced power management is needed to balance the highly fluctuant power demands and power availability. This requires orchestration of resources and real-time insight into the entire operational facility and energy grid. Even small interruptions during high performance compute cycles can derail the job and disrupt power grid management as well.

Facing the challenge of balancing demands at exascale, LLNL sought out the assistance of OSIsoft, a company with deep roots in data collection, aggregation and storage. OSIsoft helps LLNL track and analyze streams of operational data from computing racks, cooling systems, energy utilities and other equipment and stores it to central control point for the life of the assets. This affords administrators, like Anna Maria Bailey, LLNL high performance computing facility manager, the opportunity to spot efficiency gains, glean what data is important, and coordinate forecasted load demands with utility companies in real-time.

Since implementing OSIsoft’s software product, the PI system, LLNL has been able to identify troubling anomalies, including several megawatt inter-hour power swings. The facility has also earned LEED Gold status and LLNL reports increased operational assurance for the future of its operations and coming big iron, like Sierra, Livermore’s next advanced technology high-performance computing system, which is spec’d at 120-150 petaflops peak.

OSIsoft has been in business for 35 years building a software platform that collects, aggregates and stores high-fidelity data for the life of assets. The company connects the sensor data that has existed for some time – now commonly referred to as big data or IoT — to enable real-time decision making as well as historical performance tracking and ultimately predictive analytics.

OSIsoft started in the refining industry — then moved into the paper industry, upstream oil and gas, metals, and mining. In the last 10 years, it added datacenters to its customer list. “It was a very logical extension because we had been involved with the heavy industry of the previous industrial age, as well as now the heavy industry of the digital age,” said OSIsoft’s Steve Sarnecki, vice president of federal and public sectors. “Datacenters, especially high-performance computing datacenters, are literally the factories of the future and the type of data they generate fits very well in the software we produce, the PI system.”

When the product was expanded to commercial datacenters like eBay, and Dell, HP and others, OSIsoft built interfaces and data collection software to collect the data from those unique pieces of equipment or types of systems with the aim of empowering teams to make better decisions.

Sarnecki further shared that about 80 percent of the megawatts of power generated in the US run through a PI system. 100 percent of the independent system operators (ISOs) that do dispatch of power within the US use the PI system and 78 out of 104 nuclear licensees use the PI system with 104 out of 104 feeding their data up to the Nuclear Regulatory Commission, who is one of OSIsoft’s federal users that looks at emergency response on the PI system.

Asked if the product was modified for Livermore, Sarnecki said it is the same product – his company provides the toolset for the expert who understands the business problem as well as solutions providers in the space.

“At Livermore, our job is to take the sensors in the field that are spread out all over that campus, different types, and make them intimately close to the intelligent resources be they computer simulations or be they scientists so they have immediate access to that data as if they were standing right in front of this plethora of meters at the same time,” said Sarnecki.

Livermore’s relationship with OSIsoft goes back to 2010. LLNL High Performance Computing Facility Manager Anna Maria Bailey explains that it started with the development of a high-performance computing master plan. “We were looking at how we were going to achieve petascale and exascale computing going forward,” she said. “We had created a master plan that had many core competencies in it, from sustainable HPC solutions, doing computational fluid dynamics, benchmarking, leveraging our existing HPC capabilities, facilitating LEED certifications, free cooling, liquid cooling, innovative electrical distribution and developing gap analysis – and another area was power management.”

In looking at the master plan of all the core competencies, Bailey said they all reflected a need for data, but although the data was in the institution of Livermore, it wasn’t all easily accessible within the HPC facilities. For example, when facilities asked for the metering data of particular transformers or the flow rates of particular chillers, they encountered issues with data being in different formats, or not up to date, or infrequently read or downloaded only when needed.

Livermore began looking at different organizations that could help compile this data, and Bailey being an electrical engineer coming from the utility industry knew about OSIsoft. After determining that the software had the functionality they were looking for, a relationship was forged.

“The PI software allowed us to bring all of the numerous data streams that we had into one area,” said Bailey. “We needed to aggregate the data into a single source – not necessarily to view on a common dashboard but that is the capability – but actually to aggregate the data to manipulate on a common platform and it allowed us to determine what data was significant.”

Before having PI, Livermore was unable to correlate events from the various sources because of the different time stamps and the formats, said Bailey. OSIsoft facilities having a common time stamp and format and the PI system does operational event, real-time data management infrastructure of all internal and external data sources.

PI enabled Livermore to bring in data from the rack-level,  the equipment-level, the metering level, the building level, management level and the utility level. With those hundreds of real-time data streams interfaces, Bailey and her team were able to manage, gather and evaluate the large amount of data, analyze it, convert it into real-time data. The system gives the team the ability to notify, send triggers and alarms and provides visualizations to support decision-making.

“Our overall goal of doing this was to lower our power utilization and obviously achieve exascale that’s the long term goal because the better we use these resources, we can actually manage our facilities and infrastructure more appropriately,” said Bailey. “When Sierra, the next machine that we’re bringing online in 2017, every rack will be metered just like Sequoia is and the data will come into PI.”

The project started as a facilities operations tool, but then the team brought in some of the resource manager data from SLURM. So now they have several scientists who use it and they use a solver on it. They migrate the data in PI out to a solver, so they can fine tune the correlated time stamps.

The facilities team uses it for performance but also for looking at anomalies. Bailey shared that while they were bringing up Sequoia, they saw some large variations in the load, specifically there were recurring inter-hour variabilities that were exceeding 8 MW because the machine was dropping from 9.6 MW to 180 kw. Maintenance was considered as a cause, but they insisted they were not responsible for dropping the machines. Working with their utility company, Bailey’s team was able to correlate that data back to maintenance periods.

“PI was able to focus in, pick all these event stamps of the power as well as what was going on with the chilled water plant, what was going on in the condenser water plant and we were able to think it up to notice that there was a correlation at that given time,” she explained. “It helped us clue in what the problem was and give us a frame to actually shut the maintenance down slower on the machine, so now we drop it from say 7.5 MW to 5.5 MW then we wait a while, then we keep dropping it so we’re not having that large inter-hour variability.”

There are analytics use cases too. Fellow LLNL’er Ghaleb Abdulla of the Data Science group is manipulating PI data on a large capacity resource called Cab. Bailey shared that her colleague brings the data into a solver and correlates it with the data that’s on the node of the machine and does some visualizations off of that. The work made it possible to pinpoint sensor locations in the field that if moved around would get better data.

Abdulla is also working on another project about how to analyze a machine that is the same architecture but has a liquid cooling solution and the same architecture that has an air-cooled system, working from the facility level, down into the rack level and into the node level.

“He likes it because he’s got all the data in one location,” said Bailey of her colleague. “The thing that’s really nice about PI is that all of these interfaces are different so the PI interface nodes that you connect to these feeder systems that come in – can be SQL, can be HTML, can be Modbus, can be BACnet, can be any type of open protocol and as long as you have the interface node you are able to bring the data in, where we were finding other systems weren’t that flexible. You were having to bring data in, you were either having to manipulate the data first and then bring it in and then we were finding that there was incompatibility with the data, where this is nice because you bring it in and they can come in the PI server and it works really well that way.”

Bailey said that her team is expecting more use cases and they are looking at grid integration, which provides further assurance of meeting exascale-class power demands. Ghaleb and Bailey are working together on figuring out strategies for fine-grained power management, course-grained power management, job scheduling, back up scheduling, and shutting down and shutting load.

“This is a big topic for us because as we go to exascale and we have a machine that could be 20-30 MW, the difference between the peak and shutting that unit as it goes offline could be huge to the utility,” said Bailey. “We actually met with one of our power providers who also has PI. One of our goals in the future is to have data that we can share amongst ourselves and them – they are also a DOE entity as well – that is huge for us. We are looking at collaboration with them and that’s a big challenge coming up in 2022 – how do we do grid integration with the utility having an exascale machine on the floor, having 20-30 MW in 20,000 sq ft of space, that’s just crazy. How do we take the environmental monitoring system, how do we integrate it to respond to these demand changes and how does the grid integration implementation require energy transactions to the power management system. We’re really heavily involved in that but it’s going to take some time so we use PI a lot on granular studies.”

Livermore reports real results with PI. Bailey said they’ve seen an improvement in PUE across all of the datacenters that are in their HPC complex, which was tied to an energy savings. In the mechanical system, they found that we were having some leakage issues through their environmental monitoring data that was coming into PI. A chiller that was going on and off line sporadically, and it actually had a mechanical problem discovered with PI. Bailey noted that the building management system doesn’t store the data long enough so the data that comes into PI was what made it possible to determine when the unit was going on and off. They reprogrammed the system so that it would use less chilled water.

So far, Livermore is OSIsoft’s only customer in the HPC facilities space. Asked about the prospect of her colleagues at other centers deploying the PI system, Bailey said there’s a need, but there’s also the matter of organizational alignment.

“I’m not matrixed into HPC, I live in HPC, so my supervisor is the same supervisor as the system administrator, as the facility operations manager as the system engineer and the system architects. We all are very aligned here,” she said. “What happens at other laboratories is that their facility people or their system engineers are matrixed in from another organization so they are not completely aligned with their line managers so it’s difficult to convince your line management that you really need this because the bottom line affects the program manager; we have the support of him which makes it huge.

“If you don’t have that support, it’s difficult. So that’s what I’ve seen with the other laboratories, a lot of them want to do it, but the way that they are structured it doesn’t allow them to have that complete backing so who’s going to pay for it, right? It always comes down to that. At our organization, we all have the same direction and the same focus so when you have everyone in alignment that this is what they need to improve their projections and to get to exascale as a common goal, you have the backing.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

What’s New in HPC Research: Natural Gas, Precision Agriculture, Neural Networks and More

December 6, 2019

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

On the Spack Track @SC19

December 5, 2019

At the annual supercomputing conference, SC19 in Denver, Colorado, there were Spack events each day of the conference. As a reflection of its grassroots heritage, nine sessions were planned by more than a dozen thought leaders from seven organizations, including three U.S. national Department of Energy (DOE) laboratories and Sylabs... Read more…

By Elizabeth Leake

Intel’s New Hyderabad Design Center Targets Exascale Era Technologies

December 3, 2019

Intel's Raja Koduri was in India this week to help launch a new 300,000 square foot design and engineering center in Hyderabad, which will focus on advanced computing technologies for the AI and exascale era. "Over th Read more…

By Tiffany Trader

AWS Debuts 7nm 2nd-Gen Graviton Arm Processor

December 3, 2019

The “x86 Big Bang,” in which market dominance of the venerable Intel CPU has exploded into fragments of processor options suited to varying workloads, has now encompassed CPUs offered by the leading public cloud serv Read more…

By Doug Black

Medical Imaging Gets an AI Boost

December 3, 2019

AI technologies incorporated into diagnostic imaging tools have proven useful in eliminating confirmation bias, often outperforming human clinicians who may bring their own prejudices. Another issue slowing progress is t Read more…

By George Leopold

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

IBM Accelerated Insights

AI Needs Intelligent HPC infrastructure

Artificial Intelligence (AI) has revolutionized entire industries and enables humanity to solve some of the most daunting challenges. To accomplish this, it requires massive amounts of data from heterogeneous sources that is processed it new ways that differs significantly from HPC applications. Read more…

Ride on the Wild Side – Squyres SC19 Mars Rovers Keynote

December 2, 2019

Reminding us of the deep and enabling connection between HPC and modern science is an important part of the SC Conference mission. And yes, HPC is a science itself. At SC19, Steve Squyres’ opening keynote recounting th Read more…

By John Russell

On the Spack Track @SC19

December 5, 2019

At the annual supercomputing conference, SC19 in Denver, Colorado, there were Spack events each day of the conference. As a reflection of its grassroots heritage, nine sessions were planned by more than a dozen thought leaders from seven organizations, including three U.S. national Department of Energy (DOE) laboratories and Sylabs... Read more…

By Elizabeth Leake

Intel’s New Hyderabad Design Center Targets Exascale Era Technologies

December 3, 2019

Intel's Raja Koduri was in India this week to help launch a new 300,000 square foot design and engineering center in Hyderabad, which will focus on advanced com Read more…

By Tiffany Trader

AWS Debuts 7nm 2nd-Gen Graviton Arm Processor

December 3, 2019

The “x86 Big Bang,” in which market dominance of the venerable Intel CPU has exploded into fragments of processor options suited to varying workloads, has n Read more…

By Doug Black

Ride on the Wild Side – Squyres SC19 Mars Rovers Keynote

December 2, 2019

Reminding us of the deep and enabling connection between HPC and modern science is an important part of the SC Conference mission. And yes, HPC is a science its Read more…

By John Russell

NSCI Update – Adapting to a Changing Landscape

December 2, 2019

It was November of 2017 when we last visited the topic of the National Strategic Computing Initiative (NSCI). As you will recall, the NSCI was started with an Executive Order (E.O. No. 13702), that was issued by President Obama in July of 2015 and was followed by a Strategic Plan that was released in July of 2016. The question for November of 2017... Read more…

By Alex R. Larzelere

Tsinghua University Racks Up Its Ninth Student Cluster Championship Win at SC19

November 27, 2019

Tsinghua University has done it again. At SC19 last week, the eight-time gold medal-winner team took home the top prize in the 2019 Student Cluster Competition Read more…

By Oliver Peckham

SC19: IBM Changes Its HPC-AI Game Plan

November 25, 2019

It’s probably fair to say IBM is known for big bets. Summit supercomputer – a big win. Red Hat acquisition – looking like a big win. OpenPOWER and Power processors – jury’s out? At SC19, long-time IBMer Dave Turek sketched out a different kind of bet for Big Blue – a small ball strategy, if you’ll forgive the baseball analogy... Read more…

By John Russell

How the Gordon Bell Prize Winners Used Summit to Illuminate Transistors

November 22, 2019

At SC19, the Association for Computing Machinery (ACM) awarded the prestigious Gordon Bell Prize to the Swiss Federal Institute of Technology (ETH) Zurich. The Read more…

By Oliver Peckham

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

SC19: IBM Changes Its HPC-AI Game Plan

November 25, 2019

It’s probably fair to say IBM is known for big bets. Summit supercomputer – a big win. Red Hat acquisition – looking like a big win. OpenPOWER and Power processors – jury’s out? At SC19, long-time IBMer Dave Turek sketched out a different kind of bet for Big Blue – a small ball strategy, if you’ll forgive the baseball analogy... Read more…

By John Russell

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing compon Read more…

By Tiffany Trader

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI

November 17, 2019

Intel today revealed a few more details about its forthcoming Xe line of GPUs – the top SKU is named Ponte Vecchio and will be used in Aurora, the first plann Read more…

By John Russell

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Dell Ramps Up HPC Testing of AMD Rome Processors

October 21, 2019

Dell Technologies is wading deeper into the AMD-based systems market with a growing evaluation program for the latest Epyc (Rome) microprocessors from AMD. In a Read more…

By John Russell

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

SC19: Welcome to Denver

November 17, 2019

A significant swath of the HPC community has come to Denver for SC19, which began today (Sunday) with a rich technical program. As is customary, the ribbon cutt Read more…

By Tiffany Trader

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

With the Help of HPC, Astronomers Prepare to Deflect a Real Asteroid

September 26, 2019

For years, NASA has been running simulations of asteroid impacts to understand the risks (and likelihoods) of asteroids colliding with Earth. Now, NASA and the European Space Agency (ESA) are preparing for the next, crucial step in planetary defense against asteroid impacts: physically deflecting a real asteroid. Read more…

By Oliver Peckham

Cerebras to Supply DOE with Wafer-Scale AI Supercomputing Technology

September 17, 2019

Cerebras Systems, which debuted its wafer-scale AI silicon at Hot Chips last month, has entered into a multi-year partnership with Argonne National Laboratory and Lawrence Livermore National Laboratory as part of a larger collaboration with the U.S. Department of Energy... Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This