Doug Kothe Delivers Whirlwind ECP Update in 70 (or so) Slides

By John Russell

May 2, 2019

So much attention is given to forthcoming exascale hardware – Aurora is scheduled to be the first U.S. exascale system to go live around 2021/22 – that the U.S. Exascale Computing Project’s (ECP) work to develop a robust software ecosystem to coax the most from these exascale machines often gets short shrift. That’s too bad because in many ways there is more to talk about with regard to ECP which has already released many ‘products’.

Doug Kothe, ECP director

On Tuesday, Doug Kothe, midway through his second year as ECP director, provided a high-speed tour of ECP progress in a livestreamed talk for ACM. Officially entitled, The Exascale Computing Project and the Future of HPC, Kothe quipped at the start, “I’ll leave it to the audience to ascertain the future of HPC [and] do my best to get through the depth and breadth of what we’ve been up to.” Good choice. There’s too much to cover.

Quick backgrounder: ECP, you may know, was formed in 2016 as part of the overall U.S. Exascale Computing Initiative being run by DoE. The ECI, among other things, procures the exascale systems. ECPs charge is to ensure there’s an exascale-ready software ecosystem to get the most from exascale hardware when it arrives. You may not know ECP has finite lifetime and is scheduled to end in 2023. Kothe calls ECP a seven-year sprint.

Organizationally, ECP is overseen by a board of directors chaired by Bill Goldstein, director of Lawrence Livermore National Laboratory, and vice chair Thomas Zacharia, director of Oak Ridge National Laboratory. There is also an ECP Industry Council led by General Electric which weighs in on functional requirements and acts in an advisory capacity but has no formal review authority. Kothe notes that perhaps unlike past DoE efforts, ECP is trying to produce hardened, production quality software which is released regularly and that ECP has firm milestones and to keep it on track.

 

 

 

 

Much of Kothe’s presentation is likely familiar to close watchers of ECP; nevertheless, the scope of ECP activities presented along with pointers to sources for more material was impressive. It was almost too much but as an ECP resource the ACM recording and slide deck is a keeper if you can get it. ECP communication lead Mike Bernhardt says DoE is reviewing the talk now before publicly releasing it but that should happen soon (update: recording  plus slides now available).

Kothe zipped through about 70 slides in under an hour. He made clear throughout his presentation that ECP’s various goals specifically supporting DoE missions are matched by the expectation that ECP developed technologies will also find broad use within HPC. It turns out ECP has already been busy churning out applications, SDKs, contributions to open source, and an extreme scale software stack.

It’s probably worth restating what constitutes DoE’s definition of ‘capable exascale’ for the new systems since that’s the official goal. Broadly, ECI calls for at least two diverse system architectures. Each should deliver 50x the performance of today’s 20 petaflop systems and 5x the performance of Summit. The systems should function with sufficient resiliency (an average fault rate of ≤1 per week) and include a software stack that meets the needs of a broad spectrum of applications and workloads.

Presented here are just a few of Kothe’s slides (click to enlarge) and his accompanying comments.

There are six application target areas (shown below) which were selected in 2016 in conjunction with DoE sponsors. Each application (~20) addresses a strategic problem of interest to a program office. Kothe said, “It wasn’t easy to downselect to those” and also emphasized ECP “is not waiting for exascale systems [to arrive] but working hard on the current systems (e.g. Summit, Sierra).” So far, he said, performance is exceeding expectations.

 

 

 

Co-design, of course, has been a key component from ECP’s start. One area being focused on is motifs. “Typically each application has a small set of motifs, a common pattern of computation. We we’ve chosen in terms of co-design to go after motifs and really see if we can make those motifs perform well on the exascale and pre-exascale systems. Here (below) you see the list of six co-design centers and a proxy application [center],” said Kothe. “[They have] all have proven their worth with regard to developing, not just best practices and lessons learned, but libraries and components that we view as sort of next generation middleware that many applications will use.”

 

 

Interestingly, one of the co-design efforts is not motif-focused. It’s the Co-Design Center for Online Data Analysis and Reduction (CODAR) working on approaches to workflow management and data analysis.

“Here (below) you see the traditional approach. An ap runs and dumps its data and another ap runs and picks it up and does the analysis. We really can’t afford to do that. There is a disparity in the hardware in terms of I/O bandwidth relative to the memory bandwidth. We really want to be able to do online reduction. In other words the application runs, and we’re doing of reduction of data as it runs, and process the data as it runs, and that analysis may be passive or active back on the application. We may [sometimes] need a couple applications running on the hardware at the same time that may need to talk back and forth in some sort of consistent way,” he said.

“This center is essentially releasing an entire workflow management system that a number of applications [in areas such as] fusion, material science, molecular dynamics, and climate, are looking at to leverage,” said Kothe.

 

 

Not surprisingly, ECP, like many  in HPC, is scrambling to dive into machine learning. The ExaLearn center was created just last fall.

“There are several use cases of interest to ECP. Obviously we’ll be interested in picking industry frameworks wherever we can. So the goal isn’t to recreate good technologies like Tensorflow that are out there. In particular we are interested in surrogate models for uncertainty quantification and error estimation, control systems, and inverse problems. A good [use case] example for machine learning is looking at our experimental facilities. Here’s a light source, (slide below) and I think there are at least five [similar light sources] in the labs,” said Kothe.

“These are multi-million-dollar facilities that are getting great experimental return but we think we can help even more with everything [from] up-stream design of the light source, controlling the beam lines in real-time, to interfacing with the data acquisition system to help understand the data real-time [and] being able to do fast analysis onsite, and ultimately sending data to the exascale system. This is potentially a high return area.”

 

 

Clearly developing a wide range of software technology – tools and the stack – is a key imperative for ECP. “The philosophy here is to be prudent and extend current technologies where possible, build a comprehensive software stack, but leverage frankly the hundreds of man-years of that investment that we have,” said Kothe.

In delivering these capabilities Kothe said the emphasis is on delivering high quality, production software – “probably something we haven’t done well in DoE in the past”. Convenient delivery of these tools is also important, “recognizing we can encapsulate these products into a smaller set of development kits that have kind of like products packages that are containerized.” ECP currently supports Docker, Charliecloud, Singularity, and Shifter container technology.

 

Performance milestones are also part of ECP delivery requirements and Kothe points to work with Hypre to leverage mixed precision computation as an example: “We were investigating going from 64-bit to 32-bit integer trying to take advantage of the accelerated hardware. In this case we gained about a 25 percent performance increase just by investigating where can we go to lower precision and still do the job. I just want to point out we are a project with milestones kind of every three or four months and this is a good example of a hypre milestone.”

 

On balance Kothe’s talk and slides present a reasonably full picture of the scope of ECP activities. HPCwire will provide a link to those resources when they become available. Stay tuned. (update: link to recording and slides now available)

Link to recording: https://event.on24.com/eventRegistration/console/EventConsoleApollo.jsp?&eventid=1982470&sessionid=1&username=&partnerref=&format=fhvideo1&mobile=false&flashsupportedmobiledevice=false&helpcenter=false&key=810AC9D28D7C7F9885FBA59DACE69F85&text_language_id=en&playerwidth=1000&playerheight=650&overwritelobby=y&eventuserid=237553032&contenttype=A&mediametricsessionid=197265780&mediametricid=2793966&usercd=237553032&mode=launch

Link to slides: https://on24static.akamaized.net/event/19/82/47/0/rt/1/documents/resourceList1556627172809/dougkotheexascaletechtalkslides1556627165263.pdf

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Dell’s AMD-Powered Server Line Targets High-End Jobs

September 17, 2019

Dell Technologies rolled out five new servers this week based on AMD’s latest Epyc processor that are geared toward data-driven workloads running on increasingly popular multi-cloud platforms as well as in the HPC data Read more…

By George Leopold

Cerebras to Supply DOE with Wafer-Scale AI Supercomputing Technology

September 17, 2019

Cerebras Systems, which debuted its wafer-scale AI silicon at Hot Chips last month, has entered into a multi-year partnership with Argonne National Laboratory and Lawrence Livermore National Laboratory as part of a larger collaboration with the U.S. Department of Energy... Read more…

By Tiffany Trader

Better Scientific Software: Turn Your Passion into Cash

September 13, 2019

Do you know your way around scientific software and programming? You think you can contribute to the community by making scientific software better? If so, then the Better Scientific Software (BSSW) organization wants yo Read more…

By Dan Olds

AWS Solution Channel

A Guide to Discovering the Best AWS Instances and Configurations for Your HPC Workload

The flexibility and heterogeneity of HPC cloud services provide a welcome contrast to the constraints of on-premises HPC. Every HPC configuration is potentially accessible to any given workload in a well-resourced cloud HPC deployment, with vast scalability to spin up as much compute as that workload demands in any given moment. Read more…

HPE Extreme Performance Solutions

Intel FPGAs: More Than Just an Accelerator Card

FPGA (Field Programmable Gate Array) acceleration cards are not new, as they’ve been commercially available since 1984. Typically, the emphasis around FPGAs has centered on the fact that they’re programmable accelerators, and that they can truly offer workload specific hardware acceleration solutions without requiring custom silicon. Read more…

IBM Accelerated Insights

Rumors of My Death Are Still Exaggerated: The Mainframe

[Connect with Spectrum users and learn new skills in the IBM Spectrum LSF User Community.]

As of 2017, 92 of the world’s top 100 banks used mainframes. Read more…

Google’s ML Compiler Initiative Advances

September 12, 2019

Machine learning models running on everything from cloud platforms to mobile phones are posing new challenges for developers faced with growing tool complexity. Google’s TensorFlow team unveiled an open-source machine Read more…

By George Leopold

Cerebras to Supply DOE with Wafer-Scale AI Supercomputing Technology

September 17, 2019

Cerebras Systems, which debuted its wafer-scale AI silicon at Hot Chips last month, has entered into a multi-year partnership with Argonne National Laboratory and Lawrence Livermore National Laboratory as part of a larger collaboration with the U.S. Department of Energy... Read more…

By Tiffany Trader

IDAS: ‘Automagic’ HPC With Training Wheels

September 12, 2019

High-performance computing (HPC) for research is notorious for having steep barriers to entry. For this reason, high-tech disciplines were early adopters, have Read more…

By Elizabeth Leake

Univa Brings Cloud Automation to Slurm Users with Navops Launch 2.0

September 11, 2019

Univa, the company behind Grid Engine, announced today its HPC cloud-automation platform NavOps Launch will support the popular open-source workload scheduler Slurm. With the release of NavOps Launch 2.0, “Slurm users will have access to the same cloud automation capabilities... Read more…

By Tiffany Trader

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

Eyes on the Prize: TACC’s Frontera Quickly Ramps up Science Agenda

September 9, 2019

Announced a year ago and officially launched a week ago, the Texas Advanced Computing Center’s Frontera – now the fastest academic supercomputer (~25 petefl Read more…

By John Russell

Quantum Roundup: IBM Goes to School, Delft Tackles Networking, Rigetti Updates

September 5, 2019

IBM today announced a new open source quantum ‘textbook’, a series of quantum education videos, and plans to expand its nascent quantum hackathon program. L Read more…

By John Russell

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

Fastest Academic Supercomputer Enters Full Production at TACC, Just in Time for Hurricane Season

September 3, 2019

Frontera, the NSF supercomputer installed at the Texas Advanced Computing Center (TACC) in June, passed its formal acceptance last week and is now officially la Read more…

By Tiffany Trader

High Performance (Potato) Chips

May 5, 2006

In this article, we focus on how Procter & Gamble is using high performance computing to create some common, everyday supermarket products. Tom Lange, a 27-year veteran of the company, tells us how P&G models products, processes and production systems for the betterment of consumer package goods. Read more…

By Michael Feldman

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

AMD Verifies Its Largest 7nm Chip Design in Ten Hours

June 5, 2019

AMD announced last week that its engineers had successfully executed the first physical verification of its largest 7nm chip design – in just ten hours. The AMD Radeon Instinct Vega20 – which boasts 13.2 billion transistors – was tested using a TSMC-certified Calibre nmDRC software platform from Mentor. Read more…

By Oliver Peckham

TSMC and Samsung Moving to 5nm; Whither Moore’s Law?

June 12, 2019

With reports that Taiwan Semiconductor Manufacturing Co. (TMSC) and Samsung are moving quickly to 5nm manufacturing, it’s a good time to again ponder whither goes the venerable Moore’s law. Shrinking feature size has of course been the primary hallmark of achieving Moore’s law... Read more…

By John Russell

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

Nvidia Embraces Arm, Declares Intent to Accelerate All CPU Architectures

June 17, 2019

As the Top500 list was being announced at ISC in Frankfurt today with an upgraded petascale Arm supercomputer in the top third of the list, Nvidia announced its Read more…

By Tiffany Trader

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Top500 Purely Petaflops; US Maintains Performance Lead

June 17, 2019

With the kick-off of the International Supercomputing Conference (ISC) in Frankfurt this morning, the 53rd Top500 list made its debut, and this one's for petafl Read more…

By Tiffany Trader

A Behind-the-Scenes Look at the Hardware That Powered the Black Hole Image

June 24, 2019

Two months ago, the first-ever image of a black hole took the internet by storm. A team of scientists took years to produce and verify the striking image – an Read more…

By Oliver Peckham

Cray – and the Cray Brand – to Be Positioned at Tip of HPE’s HPC Spear

May 22, 2019

More so than with most acquisitions of this kind, HPE’s purchase of Cray for $1.3 billion, announced last week, seems to have elements of that overused, often Read more…

By Doug Black and Tiffany Trader

Chinese Company Sugon Placed on US ‘Entity List’ After Strong Showing at International Supercomputing Conference

June 26, 2019

After more than a decade of advancing its supercomputing prowess, operating the world’s most powerful supercomputer from June 2013 to June 2018, China is keep Read more…

By Tiffany Trader

Qualcomm Invests in RISC-V Startup SiFive

June 7, 2019

Investors are zeroing in on the open standard RISC-V instruction set architecture and the processor intellectual property being developed by a batch of high-flying chip startups. Last fall, Esperanto Technologies announced a $58 million funding round. Read more…

By George Leopold

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Intel Debuts Pohoiki Beach, Its 8M Neuron Neuromorphic Development System

July 17, 2019

Neuromorphic computing has received less fanfare of late than quantum computing whose mystery has captured public attention and which seems to have generated mo Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This