Momentum Builds for US Exascale

By Alex R. Larzelere

January 9, 2018

2018 looks to be a great year for the U.S. exascale program. The last several months of 2017 revealed a number of important developments that help put the U.S. quest for exascale on a solid foundation. In my last article, I provided a description of the elements of the High Performance Computing (HPC) ecosystem and its importance for advancing and sustaining this strategically important technology. It is good to report that the U.S. exascale program seems to be hitting the full range of ecosystem elements.

As a reminder, the National Strategic Computing Initiative (NSCI) assigned the U.S. Department of Energy (DOE) Office of Science (SC) and the National Nuclear Security Administration (NNSA) to execute a joint program to deliver capable exascale computing that emphasizes sustained performance on relevant applications and analytic computing to support their missions. The overall DOE program is known as the Exascale Computing Initiative (ECI) and is funded by the SC Advanced Scientific Computing Research (ASCR) program and the NNSA Advanced Simulation and Computing (ASC) program. Elements of the ECI include the procurement of exascale class systems and the facility investments in site preparations and non-recurring engineering. Also, ECI includes the Exascale Computing Project (ECP) that will conduct the Research and Development (R&D) in the areas of middleware (software stack), applications, and hardware to ensure that exascale systems will be productively usable to address Office of Science and NNSA missions.

In the area of hardware – the last part of 2017 revealed a number of important developments. First and most visible, is the initial installation of the SC Summit system at Oak Ridge National Laboratory (ORNL) and the NNSA Sierra system at Lawrence Livermore National Laboratory (LLNL). Both systems are being built by IBM using Power9 processors with Nvidia GPU co-processors. The machines will have two Power9 CPUs per system board and will use a Mellenox InfinBand interconnection network.

Beyond that, the architecture of each machine is slightly different. The ORNL Summit machine will use six Nvidia Volta GPUs per two Power9 CPUs on a system board and will use NVLink to connect to 512 GB of memory. The Summit machine will use a combination of air and water cooling. The LLNL Sierra machine will use four Nvidia Voltas and 256 GB of memory connected with the two Power9 CPUs per board. The Sierra machine will use only air cooling. As was reported by HPCwire in November 2017, the peak performance of the Summit machine will be about 200 petaflops and the Sierra machine is expected to be about 125 petaflops.

Installation of both the Summit and Sierra systems is currently underway with about 279 racks (without system boards) and the interconnection network already installed at each lab. Now that IBM has formally released the Power9 processors, the racks will soon start being populated with the boards that contain the CPUs, GPUs and memory. Once that is completed, the labs will start their acceptance testing, which is expected to be finished later in 2018.

Another important piece of news about the DOE exascale program is the clarification of the status of the Argonne National Laboratory (ANL) Aurora machine. This system was part of the collaborative CORAL procurement that also selected the Sierra and Summit machines. The Aurora system is being manufactured by Intel with Cray Inc. acting as the system integrator. The machine was originally scheduled to be an approximately 180 peak petaflops system using the Knights Hill third generation Phi processors. However, during SC17, we learned that Intel is removing the Knights Hill chip from its roadmap. This explains the reason why during the September ASCR Advisory Committee (ASCAC) meeting, Barb Helland, the Associate Director of the ASCR office, announced that the Aurora system would be delayed to 2021 and upgraded to 1,000 petaflops (aka 1 exaflops).

The full details of the revised Aurora system are still under wraps. We have learned that it is going to use “novel” processor technologies, but exactly what that means is unclear. The ASCR program subjected the new Aurora design to an independent outside review. It found, “The hardware choices/design within the node is extremely well thought through. Early projections suggest that the system will support a broad workload.” The review committee even suggested that, “The system as presented is exciting with many novel technology choices that can change the way computing is done.” The Aurora system is in the process of being “re-baselined” by the DOE. Hopefully, once that is complete, we will get a better understanding of the meaning of “novel” technologies. If things go as expected, the changes to Aurora will allow the U.S. to achieve exascale by 2021.

An important, but sometimes overlooked, aspect of the U.S. exascale program is the number of computing systems that are being procured, tested and optimized by the ASCR and ASC programs as part of the buildup to exascale. Other computing systems involved with “pre-exascale” systems include the 8.6 petaflops Mira computer at ANL and the 14 petaflops Cori system at Lawrence Berkeley National Lab (LBNL). The NNSA also has the 14.1 petaflops Trinity system at Los Alamos National Lab (LANL). Up to 20 percent of these precursor machines will serve as testbeds to enable computing science R&D needed to ensure that the U.S. exascale systems will be able to productively address important national security and discovery science objectives.

The last, but certainly not least, bit of hardware news is that the ASCR and ASC programs are expected to start their next computer system procurement processes in early 2018. During her presentation to the U.S. Consortium for the Advancement of Supercomputing (USCAS), Barb Helland told the group that she expects that the Request for Proposals (RFP) will soon be released for the follow-ons to the Summit and Sierra systems. These systems, to be delivered in the 2021-2023 timeframe, are expected to be provide in excess of exaFLOP/s performance. The procurement process to be used will be similar to the CORAL procurement and will be a collaboration between the DOE-SC ASCR and NNSA ASC programs. The ORNL exascale system will be called Frontier and the LLNL system will be known as El Capitan.

2017 also saw significant developments for the people element of the U.S HPC ecosystem. As was previously reported, at last September’s ASCAC meeting, Paul Messina announced that he would be stepping down as the ECP Director on October 1st. Doug Kothe, who was previously the applications development lead, was announced as the new ECP Director. Upon taking the Director job, Kothe with his deputy, Stephen Lee of LANL, instituted a process to review the organization and management of the ECP. At the December ASCAC conference call, Doug reported that the review had been completed and resulted in a number of changes. This included paring down ECP from five to four components (applications development, software technology, hardware and integration, and project management). He also reported that ECP has implemented a more structured management approach that includes a revised work breakdown structure (WBS) and additional milestones, new key performance parameters and risk management approaches. Finally, the new ECP Director reported that they had established an Extended Leadership Team with a number of new faces.

Another important, element of the HPC ecosystem are the people doing the R&D and other work need to keep the ecosystem going. The DOE ECI involves a huge number of people. Last year, there were about 500 researchers who attended the ECP Principle Investigator meeting and there are many more involved in other DOE/NNSA programs and from industry. The ASCR and ASC programs are involved with a number of programs to educate and train future members of the HPC ecosystem. Such programs are the ASCR and ASC co-funded Computational Science Graduate Fellowship (CSGF) and the Early Career Research Program. The NNSA offers similar opportunities. Both the ASCR and ASC programs continue to coordinate with National Science Foundation educational programs to ensure that America’s top computational science talent continues to flow into the ecosystem.

Finally, in addition to people and hardware, the U.S. program continues to develop the software stack (aka middleware) to develop end users’ applications to ensure that exascale will be used productively. Doug Kothe reported that ECP has adopted standard Software Development Kits. These SDKs are designed to support the goal of building a comprehensive, coherent software stack that enables application developers to productively write highly parallel applications that effectively target diverse exascale architectures. Kothe also reported that ECP is making good progress in developing applications software. This includes the implementation of innovative approaches that include Machine Learning to utilize the GPUs that are part of the future exascale computers.

All in all – the last several months of 2017 have set the stage for a very exciting 2018 for the U.S. exascale program. It has been about 5 years since the ORNL Titan supercomputer came onto the stage at #1 on the TOP500 list. Over that time, other more powerful DOE computers have come online (Trinity, Cori, etc.) but they were overshadowed by Chinese and European systems. It remains unclear whether or not the upcoming exascale systems will put the U.S. back on the top of the supercomputing world. However, the recent developments help to reassure the country is not going to give up its computing leadership position without a fight. That is great news because for more than 60 years, the U.S. has sought leadership in high performance computing for the strategic value it provides in the areas of national security, discovery science, energy security, and economic competitiveness.

About the Author

Alex Larzelere is a senior fellow at the U.S. Council on Competitiveness, the president of Larzelere & Associates Consulting and HPCwire’s policy editor. He is currently a technologist, speaker and author on a number of disruptive technologies that include: advanced modeling and simulation; high performance computing; artificial intelligence; the Internet of Things; and additive manufacturing. Alex’s career has included time in federal service (working closely with DOE national labs), private industry, and as founder of a small business. Throughout that time, he led programs that implemented the use of cutting edge advanced computing technologies to enable high resolution, multi-physics simulations of complex physical systems. Alex is the author of “Delivering Insight: The History of the Accelerated Strategic Computing Initiative (ASCI).”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

At SC18: Humanitarianism Amid Boom Times for HPC

November 14, 2018

At SC18 in Dallas, the feeling on the ground is one of forward-looking buoyancy. Like boom times that cycle through the Texas oil fields, the HPC industry is enjoying a prosperity seen only every few decades, one driven Read more…

By Doug Black

Nvidia’s Jensen Huang Delivers Vision for the New HPC

November 14, 2018

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can do. Animated. Backstopped by a stream of data charts, produ Read more…

By John Russell

New Panasas High Performance Storage Straddles Commercial-Traditional HPC

November 13, 2018

High performance storage vendor Panasas has launched a new version of its ActiveStor product line this morning featuring what the company said is the industry’s first plug-and-play, portable parallel file system that d Read more…

By Doug Black

HPE Extreme Performance Solutions

AI Can Be Scary. But Choosing the Wrong Partners Can Be Mortifying!

As you continue to dive deeper into AI, you will discover it is more than just deep learning. AI is an extremely complex set of machine learning, deep learning, reinforcement, and analytics algorithms with varying compute, storage, memory, and communications needs. Read more…

IBM Accelerated Insights

New Data Management Techniques for Intelligent Simulations

The trend in high performance supercomputer design has evolved – from providing maximum compute capability for complex scalable science applications, to capacity computing utilizing efficient, cost-effective computing power for solving a small number of large problems or a large number of small problems. Read more…

SC18 Student Cluster Competition – Revealing the Field

November 13, 2018

It’s November again and we’re almost ready for the kick-off of one of the greatest computer sports events in the world – the SC Student Cluster Competition. This is the twelfth time that teams of university undergr Read more…

By Dan Olds

At SC18: Humanitarianism Amid Boom Times for HPC

November 14, 2018

At SC18 in Dallas, the feeling on the ground is one of forward-looking buoyancy. Like boom times that cycle through the Texas oil fields, the HPC industry is en Read more…

By Doug Black

Nvidia’s Jensen Huang Delivers Vision for the New HPC

November 14, 2018

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can Read more…

By John Russell

New Panasas High Performance Storage Straddles Commercial-Traditional HPC

November 13, 2018

High performance storage vendor Panasas has launched a new version of its ActiveStor product line this morning featuring what the company said is the industry Read more…

By Doug Black

SC18 Student Cluster Competition – Revealing the Field

November 13, 2018

It’s November again and we’re almost ready for the kick-off of one of the greatest computer sports events in the world – the SC Student Cluster Competitio Read more…

By Dan Olds

US Leads Supercomputing with #1, #2 Systems & Petascale Arm

November 12, 2018

The 31st Supercomputing Conference (SC) - commemorating 30 years since the first Supercomputing in 1988 - kicked off in Dallas yesterday, taking over the Kay Ba Read more…

By Tiffany Trader

OpenACC Talks Up Summit and Community Momentum at SC18

November 12, 2018

OpenACC – the directives-based parallel programing model for optimizing applications on heterogeneous architectures – is showcasing user traction and HPC im Read more…

By John Russell

How ASCI Revolutionized the World of High-Performance Computing and Advanced Modeling and Simulation

November 9, 2018

The 1993 Supercomputing Conference was held in Portland, Oregon. That conference and it’s show floor provided a good snapshot of the uncertainty that U.S. supercomputing was facing in the early 1990s. Many of the companies exhibiting that year would soon be gone, either bankrupt or acquired by somebody else. Read more…

By Alex R. Larzelere

At SC18: GM, Boeing, Deere, BP Talk Enterprise HPC Strategies

November 9, 2018

SC18 in Dallas (Nov.11-16) will feature an impressive series of sessions focused on the enterprise HPC deployments at some of the largest industrial companies: Read more…

By Doug Black

Cray Unveils Shasta, Lands NERSC-9 Contract

October 30, 2018

Cray revealed today the details of its next-gen supercomputing architecture, Shasta, selected to be the next flagship system at NERSC. We've known of the code-name "Shasta" since the Argonne slice of the CORAL project was announced in 2015 and although the details of that plan have changed considerably, Cray didn't slow down its timeline for Shasta. Read more…

By Tiffany Trader

TACC Wins Next NSF-funded Major Supercomputer

July 30, 2018

The Texas Advanced Computing Center (TACC) has won the next NSF-funded big supercomputer beating out rivals including the National Center for Supercomputing Ap Read more…

By John Russell

IBM at Hot Chips: What’s Next for Power

August 23, 2018

With processor, memory and networking technologies all racing to fill in for an ailing Moore’s law, the era of the heterogeneous datacenter is well underway, Read more…

By Tiffany Trader

Requiem for a Phi: Knights Landing Discontinued

July 25, 2018

On Monday, Intel made public its end of life strategy for the Knights Landing "KNL" Phi product set. The announcement makes official what has already been wide Read more…

By Tiffany Trader

House Passes $1.275B National Quantum Initiative

September 17, 2018

Last Thursday the U.S. House of Representatives passed the National Quantum Initiative Act (NQIA) intended to accelerate quantum computing research and developm Read more…

By John Russell

CERN Project Sees Orders-of-Magnitude Speedup with AI Approach

August 14, 2018

An award-winning effort at CERN has demonstrated potential to significantly change how the physics based modeling and simulation communities view machine learni Read more…

By Rob Farber

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

By John Russell

Leading Solution Providers

US Leads Supercomputing with #1, #2 Systems & Petascale Arm

November 12, 2018

The 31st Supercomputing Conference (SC) - commemorating 30 years since the first Supercomputing in 1988 - kicked off in Dallas yesterday, taking over the Kay Ba Read more…

By Tiffany Trader

TACC’s ‘Frontera’ Supercomputer Expands Horizon for Extreme-Scale Science

August 29, 2018

The National Science Foundation and the Texas Advanced Computing Center announced today that a new system, called Frontera, will overtake Stampede 2 as the fast Read more…

By Tiffany Trader

HPE No. 1, IBM Surges, in ‘Bucking Bronco’ High Performance Server Market

September 27, 2018

Riding healthy U.S. and global economies, strong demand for AI-capable hardware and other tailwind trends, the high performance computing server market jumped 28 percent in the second quarter 2018 to $3.7 billion, up from $2.9 billion for the same period last year, according to industry analyst firm Hyperion Research. Read more…

By Doug Black

Intel Announces Cooper Lake, Advances AI Strategy

August 9, 2018

Intel's chief datacenter exec Navin Shenoy kicked off the company's Data-Centric Innovation Summit Wednesday, the day-long program devoted to Intel's datacenter Read more…

By Tiffany Trader

Germany Celebrates Launch of Two Fastest Supercomputers

September 26, 2018

The new high-performance computer SuperMUC-NG at the Leibniz Supercomputing Center (LRZ) in Garching is the fastest computer in Germany and one of the fastest i Read more…

By Tiffany Trader

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

MLPerf – Will New Machine Learning Benchmark Help Propel AI Forward?

May 2, 2018

Let the AI benchmarking wars begin. Today, a diverse group from academia and industry – Google, Baidu, Intel, AMD, Harvard, and Stanford among them – releas Read more…

By John Russell

Google Releases Machine Learning “What-If” Analysis Tool

September 12, 2018

Training machine learning models has long been time-consuming process. Yesterday, Google released a “What-If Tool” for probing how data point changes affect a model’s prediction. The new tool is being launched as a new feature of the open source TensorBoard web application... Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This