OLCF Deputy Project Director Organizing Massive Effort to Install Frontier on Schedule

May 6, 2021

May 6, 2021 — Oak Ridge National Laboratory’s (OLCF) “Pioneering Frontier” series features stories profiling the many talented ORNL employees behind the construction and operation of the OLCF’s incoming exascale supercomputer, Frontier. The HPE Cray system is scheduled for delivery in 2021, with full user operations in 2022.

Matt Sieger has the sort of job that would intimidate ordinary project managers. As deputy project director at the US Department of Energy’s (DOE’s) OLCF, he is on the frontlines of the multiyear effort to install the Frontier supercomputer at Oak Ridge National Laboratory (ORNL). Fortunately, Sieger truly enjoys the nitty-gritty of keeping big projects organized: taking lots of meetings, producing numerous reports, and adhering to a myriad of federal requirements.

On paper, Sieger assists OLCF Program Director Justin Whitt as his right-hand manager. On the ground, he makes sure the Frontier team stays on track to hit its milestones for siting the nation’s first exascale system—which will exceed a quintillion, or 1018, calculations per second—by the end of 2021. This means Sieger must closely monitor the day-to-day progress of a multimillion-dollar government construction project that will produce one of the world’s most powerful and smartest scientific supercomputers. To do that, he heads the team of project support staff that analyzes data streams from all the departments associated with Frontier’s preparation and finds solutions to problems before they cause delays.

“I see my job as enabling other people to be effective by helping to build good management processes and removing roadblocks from their paths—just keeping our focus on the most important things and not getting too wrapped around the axle for issues that are just distractions,” Sieger said. “We have a lot riding on Frontier, from the National Strategic Computing Initiative to the Exascale Computing Project. So we’re under a tremendous amount of pressure to get this thing in on schedule.”

Despite extensive preplanning for every foreseeable contingency, there will always be unexpected threats to the schedule, from delays in obtaining particular components to workers that must quarantine due to COVID-19. “A lot of project management is setting things up to handle everything that you know about, but there’s going to be 15 things you didn’t expect that are going to come and try to get you,” Sieger said. And it’s those problems that get him thinking.

“When you look at a project or any big enterprise, of course there are going to be problems. I always find myself analyzing them: Where did they come from? Why did we have that problem? How do we change ourselves so that we can prevent that problem from happening in the future?” Sieger said. “I realize a lot of satisfaction from making little improvements to how we do things that prevent future problems or makes something easier. I like organizing things, and I really like it when things run smoothly.”

OLCF Deputy Project Director Matt Sieger overseeing Frontier supercomputer construction. Photo by Carlos Jones / ORNL.

Sieger joined the Frontier project about 3 years ago. He had spent the previous 7 years at ORNL as a quality manager working in a variety of areas, including the Spallation Neutron Source, Consortium for Advanced Simulation of Light Water Reactors, and the Nuclear Science and Engineering Directorate. Although he gained valuable experience in assembling quality assurance plans and structuring projects and processes, his earlier career as a software architect is what really inspired him to find satisfaction in the art of organization. He sees similarities between designing software to efficiently complete a task and creating an action plan to effectively tackle a big job.

But enabling the human effort underlying a $500 million federally funded project demands more than just good organizational skills. It also requires strict adherence to DOE directives for managing projects of this scale, ensuring compliance with applicable laws and regulations, and meeting DOE’s expectations for cost and schedule. Sieger keeps all of those directives in mind with every decision made by the Frontier team.

“Project management within DOE is almost its own subculture. There’s an order from DOE called Order 413.3—it has hundreds of pages and a galaxy of guidance documents associated with it that gives us our marching orders. We have to manage this project to this set of standards.”

Whether the Frontier team has been doing a good job of meeting those standards is put to the test each year when the DOE’s Office of Project Assessment conducts an independent project review (IPR) of the entire Frontier effort. Over the course of 3 days, experts from other DOE facilities and offices receive presentations from Frontier’s project managers about their progress—and then the inspectors essentially interrogate them on every aspect of the project’s status. IPRs often result in a list of recommendations to help improve the project. With its last two IPRs, the Frontier team received no recommendations at all—an achievement that Sieger credits to the project staff and their overall approach to the project.

“We’ve got outstanding people here, and one of the key things about how we manage this project is taking the philosophy of constantly being ‘review ready.’ We’re always working to keep metrics, documents, costs, and schedules up to date,” Sieger said. “It’s discipline, like brushing your teeth, but it really helps us in reviews. Having done our homework and having always tried to do the right thing, we have more confidence that things are going to go smoothly.”

For someone responsible for making sure the nation’s first exascale supercomputer successfully launches on schedule, Sieger is a surprisingly easygoing fellow whose wry humor makes his 25 or so virtual meetings per week go smoothly. “I have to say I actually like virtual meetings because it’s easier to get ahold of people,” he insists.

Perhaps the key to his Zen demeanor lies in his primary hobby: playing music. But not with a musical instrument, per se.

“I’m a house and techno DJ. I’ve done music mixing since the ’80s—a long time ago!” Sieger confesses. “In my basement at home, I’ve got a nightclub with lights and sound and mixing decks. I just enjoy doing that. So I spend a lot of time offline just listening to new music, collecting new music, playing music.”

UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.


Source: COURY TURCZYN, OLCF

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Why HPC Storage Matters More Now Than Ever: Analyst Q&A

September 17, 2021

With soaring data volumes and insatiable computing driving nearly every facet of economic, social and scientific progress, data storage is seizing the spotlight. Hyperion Research analyst and noted storage expert Mark No Read more…

GigaIO Gets $14.7M in Series B Funding to Expand Its Composable Fabric Technology to Customers

September 16, 2021

Just before the COVID-19 pandemic began in March 2020, GigaIO introduced its Universal Composable Fabric technology, which allows enterprises to bring together any HPC and AI resources and integrate them with networking, Read more…

What’s New in HPC Research: Solar Power, ExaWorks, Optane & More

September 16, 2021

In this regular feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

Cerebras Brings Its Wafer-Scale Engine AI System to the Cloud

September 16, 2021

Five months ago, when Cerebras Systems debuted its second-generation wafer-scale silicon system (CS-2), co-founder and CEO Andrew Feldman hinted of the company’s coming cloud plans, and now those plans have come to fruition. Today, Cerebras and Cirrascale Cloud Services are launching... Read more…

AI Hardware Summit: Panel on Memory Looks Forward

September 15, 2021

What will system memory look like in five years? Good question. While Monday's panel, Designing AI Super-Chips at the Speed of Memory, at the AI Hardware Summit, tackled several topics, the panelists also took a brief glimpse into the future. Unlike compute, storage and networking, which... Read more…

AWS Solution Channel

Supporting Climate Model Simulations to Accelerate Climate Science

The Amazon Sustainability Data Initiative (ASDI), AWS is donating cloud resources, technical support, and access to scalable infrastructure and fast networking providing high performance computing (HPC) solutions to support simulations of near-term climate using the National Center for Atmospheric Research (NCAR) Community Earth System Model Version 2 (CESM2) and its Whole Atmosphere Community Climate Model (WACCM). Read more…

ECMWF Opens Bologna Datacenter in Preparation for Atos Supercomputer

September 14, 2021

In January 2020, the European Centre for Medium-Range Weather Forecasts (ECMWF) – a juggernaut in the weather forecasting scene – signed a four-year, $89-million contract with European tech firm Atos to quintuple its supercomputing capacity. With the deal approaching the two-year mark, ECMWF... Read more…

Why HPC Storage Matters More Now Than Ever: Analyst Q&A

September 17, 2021

With soaring data volumes and insatiable computing driving nearly every facet of economic, social and scientific progress, data storage is seizing the spotlight Read more…

Cerebras Brings Its Wafer-Scale Engine AI System to the Cloud

September 16, 2021

Five months ago, when Cerebras Systems debuted its second-generation wafer-scale silicon system (CS-2), co-founder and CEO Andrew Feldman hinted of the company’s coming cloud plans, and now those plans have come to fruition. Today, Cerebras and Cirrascale Cloud Services are launching... Read more…

AI Hardware Summit: Panel on Memory Looks Forward

September 15, 2021

What will system memory look like in five years? Good question. While Monday's panel, Designing AI Super-Chips at the Speed of Memory, at the AI Hardware Summit, tackled several topics, the panelists also took a brief glimpse into the future. Unlike compute, storage and networking, which... Read more…

ECMWF Opens Bologna Datacenter in Preparation for Atos Supercomputer

September 14, 2021

In January 2020, the European Centre for Medium-Range Weather Forecasts (ECMWF) – a juggernaut in the weather forecasting scene – signed a four-year, $89-million contract with European tech firm Atos to quintuple its supercomputing capacity. With the deal approaching the two-year mark, ECMWF... Read more…

Quantum Computer Market Headed to $830M in 2024

September 13, 2021

What is one to make of the quantum computing market? Energized (lots of funding) but still chaotic and advancing in unpredictable ways (e.g. competing qubit tec Read more…

Amazon, NCAR, SilverLining Team for Unprecedented Cloud Climate Simulations

September 10, 2021

Earth’s climate is, to put it mildly, not in a good place. In the wake of a damning report from the Intergovernmental Panel on Climate Change (IPCC), scientis Read more…

After Roadblocks and Renewals, EuroHPC Targets a Bigger, Quantum Future

September 9, 2021

The EuroHPC Joint Undertaking (JU) was formalized in 2018, beginning a new era of European supercomputing that began to bear fruit this year with the launch of several of the first EuroHPC systems. The undertaking, however, has not been without its speed bumps, and the Union faces an uphill... Read more…

How Argonne Is Preparing for Exascale in 2022

September 8, 2021

Additional details came to light on Argonne National Laboratory’s preparation for the 2022 Aurora exascale-class supercomputer, during the HPC User Forum, held virtually this week on account of pandemic. Exascale Computing Project director Doug Kothe reviewed some of the 'early exascale hardware' at Argonne, Oak Ridge and NERSC (Perlmutter), while Ti Leggett, Deputy Project Director & Deputy Director... Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer called Dojo to process truly vast amounts of video data. It’s a beast! … A truly useful exaflop at de facto FP32.” Read more…

Berkeley Lab Debuts Perlmutter, World’s Fastest AI Supercomputer

May 27, 2021

A ribbon-cutting ceremony held virtually at Berkeley Lab's National Energy Research Scientific Computing Center (NERSC) today marked the official launch of Perlmutter – aka NERSC-9 – the GPU-accelerated supercomputer built by HPE in partnership with Nvidia and AMD. Read more…

Esperanto, Silicon in Hand, Champions the Efficiency of Its 1,092-Core RISC-V Chip

August 27, 2021

Esperanto Technologies made waves last December when it announced ET-SoC-1, a new RISC-V-based chip aimed at machine learning that packed nearly 1,100 cores onto a package small enough to fit six times over on a single PCIe card. Now, Esperanto is back, silicon in-hand and taking aim... Read more…

Enter Dojo: Tesla Reveals Design for Modular Supercomputer & D1 Chip

August 20, 2021

Two months ago, Tesla revealed a massive GPU cluster that it said was “roughly the number five supercomputer in the world,” and which was just a precursor to Tesla’s real supercomputing moonshot: the long-rumored, little-detailed Dojo system. “We’ve been scaling our neural network training compute dramatically over the last few years,” said Milan Kovac, Tesla’s director of autopilot engineering. Read more…

CentOS Replacement Rocky Linux Is Now in GA and Under Independent Control

June 21, 2021

The Rocky Enterprise Software Foundation (RESF) is announcing the general availability of Rocky Linux, release 8.4, designed as a drop-in replacement for the soon-to-be discontinued CentOS. The GA release is launching six-and-a-half months after Red Hat deprecated its support for the widely popular, free CentOS server operating system. The Rocky Linux development effort... Read more…

Google Launches TPU v4 AI Chips

May 20, 2021

Google CEO Sundar Pichai spoke for only one minute and 42 seconds about the company’s latest TPU v4 Tensor Processing Units during his keynote at the Google I Read more…

Intel Completes LLVM Adoption; Will End Updates to Classic C/C++ Compilers in Future

August 10, 2021

Intel reported in a blog this week that its adoption of the open source LLVM architecture for Intel’s C/C++ compiler is complete. The transition is part of In Read more…

AMD-Xilinx Deal Gains UK, EU Approvals — China’s Decision Still Pending

July 1, 2021

AMD’s planned acquisition of FPGA maker Xilinx is now in the hands of Chinese regulators after needed antitrust approvals for the $35 billion deal were receiv Read more…

Leading Solution Providers

Contributors

Hot Chips: Here Come the DPUs and IPUs from Arm, Nvidia and Intel

August 25, 2021

The emergence of data processing units (DPU) and infrastructure processing units (IPU) as potentially important pieces in cloud and datacenter architectures was Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

HPE Wins $2B GreenLake HPC-as-a-Service Deal with NSA

September 1, 2021

In the heated, oft-contentious, government IT space, HPE has won a massive $2 billion contract to provide HPC and AI services to the United States’ National Security Agency (NSA). Following on the heels of the now-canceled $10 billion JEDI contract (reissued as JWCC) and a $10 billion... Read more…

Quantum Roundup: IBM, Rigetti, Phasecraft, Oxford QC, China, and More

July 13, 2021

IBM yesterday announced a proof for a quantum ML algorithm. A week ago, it unveiled a new topology for its quantum processors. Last Friday, the Technical Univer Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Frontier to Meet 20MW Exascale Power Target Set by DARPA in 2008

July 14, 2021

After more than a decade of planning, the United States’ first exascale computer, Frontier, is set to arrive at Oak Ridge National Laboratory (ORNL) later this year. Crossing this “1,000x” horizon required overcoming four major challenges: power demand, reliability, extreme parallelism and data movement. Read more…

Intel Unveils New Node Names; Sapphire Rapids Is Now an ‘Intel 7’ CPU

July 27, 2021

What's a preeminent chip company to do when its process node technology lags the competition by (roughly) one generation, but outmoded naming conventions make it seem like it's two nodes behind? For Intel, the response was to change how it refers to its nodes with the aim of better reflecting its positioning within the leadership semiconductor manufacturing space. Intel revealed its new node nomenclature, and... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire