Doug Kothe on the Race to Build Exascale Applications

By John Russell

May 29, 2017

Ensuring there are applications ready to churn out useful science when the first U.S. exascale computers arrive in the 2021-2023 timeframe is Doug Kothe’s job. No pressure. He’s not alone, of course. The U.S. Exascale Computing Project (ECP) is a complicated effort with many interrelated parts and contributors, all necessary for success. Yet Kothe’s job as director of application development is one of the more visible and daunting and perhaps best described by his boss, Paul Messina, ECP director.

“We think of 50 times [current] performance on applications [as the exascale measure of merit], unfortunately there’s a kink in this,” said Messina. “The kink is people won’t be running today’s jobs in these exascale systems. We want exascale systems to do things we can’t do today and we need to figure out a way to quantify that. In some cases it will be relatively easy – just achieving much greater resolutions – but in many cases it will be enabling additional physics to more faithfully represent the phenomena. We want to focus on measuring every capable exascale system based on full applications tackling real problems compared to what they can do today.”

Doug Kothe, ECP

In this wide-ranging discussion with HPCwire, Kothe touches on ECP application development goals and processes; several technical issues such as efforts to combine data analytics with mod/sim and the need for expanded software frameworks to accommodate exascale applications; and early thoughts for incorporating neuromorphic and quantum computing not currently part of the formal ECP plan. Interestingly, his biggest worry isn’t reaching the goal on schedule – he believes the application teams will get there – but post-ECP staff retention when industry comes calling.

By way of review, ECP is a collaborative effort of two Department of Energy organizations—the Office of Science and the National Nuclear Security Administration. Six applications areas have been singled out: national security; energy security, economic security, scientific discovery; earth science; and health care. In terms of app-dev, that’s translated into 21 Science & Energy application projects, 3 NNSA application projects, and 1 DOE / NIH application project (precision medicine for cancer).

It’s not yet clear what the just released FY2018 U.S. Budget proposed by the Trump Administration portends. Funding for science programs were cut nearly across the board although ECP escaped. Kothe says simply, “It is the beginning of the process for the FY18 budget, and while the overall budget is determined, we will continue working on the applications that are already part of the ECP.”

In keeping with ECP’s broad ambitions, Kothe says, “All of our applications teams are focused on very specific challenge problems and by our definition a challenge problem is one that is intractable today, needs exascale resources, and is a strategic high priority for one of the DOE program offices. We aren’t claiming we are going to solve all the problems but we are claiming is simulation technology that can address the problem. The point is we have the applications vectored in rather specific directions.” (Summary list below, click to enlarge)

 

RISE OF DATA ANALYTICS
One of the more exciting and new-to-HPC areas is incorporation of data analytics into the HPC environment overall and ECP in particular. Indeed, harmonizing or at least integrating the big data and modelling and simulation is a goal specified by the National Strategic Computing Initiative. Data-driven science isn’t new nor is researcher familiarity with underlying statistics. But the sudden rise machine/deep learning techniques and including many that rely on lower precision calculations is somewhat new to the scientific computing community and an area where the commercial world has perhaps taken the lead. Kothe labels the topic “white hot”.

“Not being trained in the data analytics area I’ve been doing a lot of reading and talking [to others]. A large fraction of the area I feel like I know, but I didn’t appreciate the other 20 or 30 percent. The point is by exposing our applications teams to the data analytics community, even just calling libraries, we are going to see some interesting in situ and computational steering use cases. As an example of in situ, think of turbulence. It could be an LES (large eddy simulation) whose parameters could have been tuned a priori by machine learning or chosen on the fly by machine learning. That kind of work is already going on at some universities,” Kothe says.

Climate modeling is a case point. “A big challenge is subgrid models for clouds. Right now and even at exascale we probably cannot do one km or less resolution everywhere. We may be able to do regional coupled simulations that way, but if we try to do five or ten kilometers everywhere – of course it will vary whether over ocean or land ice, sea ice, or atmosphere – you will still have many clouds lost in one cell. You need a subgrid model. Maybe machine learning could be used to select the parameters. Think of a bunch of little LES models running in a 10km x10km cell holding lots of clouds that are then scaled into the higher level physics. I think subgrid models are potentially a poster child for machine learning.”

Steering simulations is another emerging use case. “There’s a couple of labs, Lawrence Livermore in particular, that are already using machine learning to make decisions, to automate decisions about mesh quality for fluid and structure simulations where the mesh is just flowing with the moving material and the mesh may start to contort in a way that will cause the numerical solution to break down or errors to increase. You could do quality checks on the fly and correct the mesh with machine learning.”

One interesting use is being explored as part of the Exascale CANcer Distributed Learning Environment (CANDLE) project (see HPCwire article, Enlisting Deep Learning in the War on Cancer). Part of the project is clarifying the RAS (gene) network activity. The RAS network is implicated very many cancers. “You have machine learning orchestrating ensembles of molecular dynamics simulations [looking at docking scenarios with the RAS protein] and examining factors that are involved in docking,” says Kothe. Machine learning can recognize already known areas and reduce need for computationally intensive simulation in those areas while zeroing in on lesser known areas for intense quantum chemistry simulations. Think of it as zooming in and out as needed.

 

FRAMEWORKS REVISITED
Clearly there’s no shortage of challenges for ECP application development. Kothe cites optimizing node performance and memory management among the especially thorny ones, “We’ve now have many levels of memory exposed to us. We don’t really quite know how best to use it.” Data structure choices can also be problematic and Kothe suggests frameworks may undergo a revival,

One of the application teams (astrophysics), recalls Kothe, came to him and said, “I am afraid to make a choice for a data structure that would be pervasive in my whole code because it might be the wrong one and I’m stuck with it.'” The point is I think what we are seeing with the applications a kind of ‘going back to the future’ in late 80s when you saw lots of heavyweight frameworks where an application would call out to a black box and say register this array for me and hand me back the pointer.

“That’s good and it’s bad. The bad part is you’re losing control and now you have to schlep around this black box and you don’t know if it is going to do what you want it to do. The good part is if you are on a KNL system or an NVIDIA system, you are on different nodes, and that block box memory manager would have been tuned for that hardware. [In] dealing with memory hierarchy risks, I think we are probably seeing applications move more towards frameworks which I find think is a good idea. We’ve learned kind of what I call the big F or little f frameworks. I think we’re learning how to balance the two so applications can be portable and not have to rely on an army of people but still do something that’s more agile than just choose one data structure and hope it works.”

Performance portability is naturally a major consideration. Historically, says Kothe, application developers and he includes himself in the category, “We chose portability over performance because we want to make sure our science can be done anywhere. Performance can’t be an afterthought but it often is. Portability in my mind has several dimensions. So the new system shows up and it is probably not something out of left field, you know something about it, but what’s a reasonable amount of effort that you think should be required to port your code? How much of the code base do you think should change? What is correctness in terms of the problem and getting the answer.

“I would claim that a 64-bit comparison is probably not realistic. I mean it’s probably not even appropriate. What set of problems would you run? You need to run real problems. We’re asking each app team to define what they think portability means and hope that collectively we’ll move towards a good definition and a good target for all the apps but I think it will end up being fairly app specific.”

THE CO-DESIGN IMPERATIVE
The necessity of co-design has become a given throughout HPC as well as with the ECP. Advancing hardware and new systems architectures must be taken into account not merely to push application performance but to get them to run at all. However coupling software too tightly to a specific machine or architecture is limiting. Currently ECP has established six co-design centers to help deal with specific challenges. Kothe believes use of motifs may help.

“Every application team at some level will be doing some vertically integrated co-design and there is probably more software co-design going on – the interplay with the compilers and runtime systems and that kind of thing – than anything else. By having the co-design centers identify a small number of motifs that applications are using, I think we can leverage a deep dive co-design on the motifs as opposed to doing kind of an extensive co-design vertically integrated within every application. This is new and there are some risks. But long term, my dream would be we [develop] community libraries that are co-designed around motifs that are used broadly among the applications.

“The poster child is probably [handling] particles. Almost every application has a discrete particle model for something and that’s good and it’s a challenge. So how do you encapsulate the particle [model] in a way that it can be co-designed not as a separate activity that’s not thinking about the [specific] consumer of that motif, but just thinking about making that motif rock and roll. That’s the challenge, to co-design motifs so they can be broadly used and I have high hopes there.”

 

 

STAY ON TARGET
“A big challenge with application developers, is everything sounds cool and looks good, so we want to keep them focused. Year by year the applications have laid out a number of milestones and for the most parts the milestones are step by step progression towards that challenge program. The progression has many dimensions: is the science capability improving, better physics, better algorithms; is the team utilizing the hardware efficiently [such as] state of the art test beds, the latest systems on the floor; are they integrating software technologies and probably one of the most important is they are using co-design efforts,” says Kothe

One ECP-wide tool is a comprehensive project database where “all the R&D projects and applications and software technology, all their plans and milestones are in one place.” A key aspect of ECP, says Kothe, is that everyone can see what everyone else is doing and how they are progressing.

Think of a milestone as a handful of things, says Kothe, that are generally tangible such as software release or a demonstration simulation. “It could be a report or a presentation. It can even be a small write up that says I tried this algorithm and it didn’t work. A milestone is a decision point.

“It’s not always a huge success. Failure can be just as valuable. Sometimes we can force a sense of urgency. We can review this seven-year plan and say, alright you can’t bring in a technology that doesn’t have a line of sight in this timeframe, or you’ve got algorithm A and B going along [and] at this point you have make a decision and choose one and go with it. I like that. I think it imparts a sense of urgency,” Kothe.

Kothe, of course, has his own milestones. One is an annual application assessment report due every September.

“I am hearing I am a slave driver and I didn’t really think had that personality,” says Kothe. One area where he is inflexible is on scheduled releases. “We want you to release on the scheduled date, that date is gospel. What’s in the release may float. So the team and budget, we like to be pretty rigid, but what’s in the release floats based on what you have learned. You have this bag of tasks and try to get as many tasks done as you can but you still must have the release.”

Currently, the comprehensive database of projects isn’t publicly available (would be interesting reading) but Kothe says individual PIs are encouraged to share information widely.

SOFTWARE TECHNOLOGY SHARING
Not surprisingly, close collaboration with the software technology team is emphasized. “Right now what we have this incredible opportunity because applications teams are exposed to a lot of software technologies they’ve never seen or heard of.” It’s a bit like kids in a candy store says Kothe, “They are looking at this technology and saying I want to do that, to do that, to do that, and so the challenge for integration is on managing the interfaces and doing it in a scalable way.”

There a couple of technology projects that everyone wants to integrate, he says, and that’s big bandwidth worry when you have 20-plus application projects lined up saying “let me try your stuff because chances are there will be new APIs and new functionalities and bugs and features [too]. The software technology people are saying, ‘Doug be careful. let’s come up with a scalable process.’” Conversely, says Kothe, it is also true there’s a fair amount of great “software technology the application teams are not exploring which they should be.”

“We have defined a number of integration milestones which are basically milestones that require deliverables from two or three areas. We call that shared fate. [I know] it sounds like we are jumping off a cliff together. A good example is an application project looks at a linear solver and says ‘you don’t have the functionality I need, lets negotiate requirements.’ So the solver negotiates a new API, a new functionality, and the application team will have a milestone that says it will have integrated and tested and the new technology [by a given date] and the software technology team has to have its release say two or three months before. These things tend to be daisy chained like that. You have a release, then an integration assessment, and we might have another release to basically deal with any issues.

“Right now, early on in ECP, we’re having a lot of point-to-point interaction where there’s lots of aps that want to do lots of same or different things with lots of software projects. I think once we settle down on the requirements the software technologies will be kind of one to all [having] settled on a base functionality and a base API. An obvious example is MPI but even with MPI there’s new features and functionalities that certain aspects. We can’t take it for granted that some of these tremendous technologies like MPI are going to be there working the way we need for exascale,” says Kothe.

 

ECP FUTURE WATCH
Even as ECP pushes forward it remains rooted in CMOS technology yet there are several newer technologies – not least neuromorphic and quantum computing – which have made great strides recently and seem on the cusp of practical application.

“One of the things I have been thinking about is even if we don’t have access to a neuromorphic chip what is its behavior like from a hardware simulator point of view. The same thing with quantum computing. Our mindset has to change with regards to the algorithms we lay out for neuromorphic or quantum. The applications teams need to start thinking about different types of algorithms. As Paul [Messina] has pointed out it’s possible quantum computing could fairly soon become an accelerator on traditional node. Making sure applications are compartmentalized is important to make that possible. It would allow us to be more flexible and extensible and perhaps exploit something like a quantum accelerator.”

Looking ahead, says Kothe, he worries most about the unknown unknowns – there will be surprises. “I feel like right now in apps space we kind of have known unknowns and we’ll hit some unknown unknowns, but I believe we are going to have a number of applications ready to go. We’ll have trips along the way and we may not do some things we plan now. I think we have an aggressive but not naive set of metrics. It’s really the people. We have some unbelievable people,” he says.

One can understand today’s attraction. Kothe points out this is likely to be a once-in-a-career opportunity and the mix of experience among the application team members significant. “What we see is millennials sitting at the table showing people new ways of doing software with gray-haired guys like me who have been to the school of hard knocks. There’s a tremendous cross fertilization. I’m confident. I saw it when we selected these teams. We had teams with rosters that looked like the all star team, but I am worried about retention. We are training people to be some of the best, especially the early career folks, so I am worried that they will be in high demand, very marketable.”

Kothe Bio from ECP website:
Douglas B. Kothe (Doug) has over three decades of experience in conducting and leading applied R&D in computational applications designed to simulate complex physical phenomena in the energy, defense, and manufacturing sectors. Kothe is currently the Deputy Associate Laboratory Director of the Computing and Computational Sciences Directorate (CCSD) at Oak Ridge National Laboratory (ORNL). Prior positions for Kothe at ORNL, where he has been since 2006, were Director of the Consortium for Advanced Simulation of Light Water Reactors, DOE’s first Energy Innovation Hub (2010-2015), and Director of Science at the National Center for Computational Sciences (2006-2010).

Feature Caption:
The Transforming Additive Manufacturing through Exascale Simulation project (ExaAM) is building a new multi-physics modeling and simulation platform for 3D printing of metals to provide an up-front assessment of the manufacturability and performance of additively manufactured parts. Pictured: simulation of laser melting of metal powder in a 3D printing process (LLNL) and a fully functional lightweight robotic hand (ORNL).

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

AI-Focused ‘Genius’ Supercomputer Installed at KU Leuven

April 24, 2018

Hewlett Packard Enterprise has deployed a new approximately half-petaflops supercomputer, named Genius, at Flemish research university KU Leuven. The system is built to run artificial intelligence (AI) workloads and, as Read more…

By Tiffany Trader

New Exascale System for Earth Simulation Introduced

April 23, 2018

After four years of development, the Energy Exascale Earth System Model (E3SM) will be unveiled today and released to the broader scientific community this month. The E3SM project is supported by the Department of Energy Read more…

By Staff

RSC Reports 500Tflops, Hot Water Cooled System Deployed at JINR

April 18, 2018

RSC, developer of supercomputers and advanced HPC systems based in Russia, today reported deployment of “the world's first 100% ‘hot water’ liquid cooled supercomputer” at Joint Institute for Nuclear Research (JI Read more…

By Staff

HPE Extreme Performance Solutions

Hybrid HPC is Speeding Time to Insight and Revolutionizing Medicine

High performance computing (HPC) is a key driver of success in many verticals today, and health and life science industries are extensively leveraging these capabilities. Read more…

New Device Spots Quantum Particle ‘Fingerprint’

April 18, 2018

Majorana particles have been observed by university researchers employing a device consisting of layers of magnetic insulators on a superconducting material. The advance opens the door to controlling the elusive particle Read more…

By George Leopold

AI-Focused ‘Genius’ Supercomputer Installed at KU Leuven

April 24, 2018

Hewlett Packard Enterprise has deployed a new approximately half-petaflops supercomputer, named Genius, at Flemish research university KU Leuven. The system is Read more…

By Tiffany Trader

Cray Rolls Out AMD-Based CS500; More to Follow?

April 18, 2018

Cray was the latest OEM to bring AMD back into the fold with introduction today of a CS500 option based on AMD’s Epyc processor line. The move follows Cray’ Read more…

By John Russell

IBM: Software Ecosystem for OpenPOWER is Ready for Prime Time

April 16, 2018

With key pieces of the IBM/OpenPOWER versus Intel/x86 gambit settling into place – e.g., the arrival of Power9 chips and Power9-based systems, hyperscaler sup Read more…

By John Russell

US Plans $1.8 Billion Spend on DOE Exascale Supercomputing

April 11, 2018

On Monday, the United States Department of Energy announced its intention to procure up to three exascale supercomputers at a cost of up to $1.8 billion with th Read more…

By Tiffany Trader

Cloud-Readiness and Looking Beyond Application Scaling

April 11, 2018

There are two aspects to consider when determining if an application is suitable for running in the cloud. The first, which we will discuss here under the title Read more…

By Chris Downing

Transitioning from Big Data to Discovery: Data Management as a Keystone Analytics Strategy

April 9, 2018

The past 10-15 years has seen a stark rise in the density, size, and diversity of scientific data being generated in every scientific discipline in the world. Key among the sciences has been the explosion of laboratory technologies that generate large amounts of data in life-sciences and healthcare research. Large amounts of data are now being stored in very large storage name spaces, with little to no organization and a general unease about how to approach analyzing it. Read more…

By Ari Berman, BioTeam, Inc.

IBM Expands Quantum Computing Network

April 5, 2018

IBM is positioning itself as a first mover in establishing the era of commercial quantum computing. The company believes in order for quantum to work, taming qu Read more…

By Tiffany Trader

FY18 Budget & CORAL-2 – Exascale USA Continues to Move Ahead

April 2, 2018

It was not pretty. However, despite some twists and turns, the federal government’s Fiscal Year 2018 (FY18) budget is complete and ended with some very positi Read more…

By Alex R. Larzelere

Inventor Claims to Have Solved Floating Point Error Problem

January 17, 2018

"The decades-old floating point error problem has been solved," proclaims a press release from inventor Alan Jorgensen. The computer scientist has filed for and Read more…

By Tiffany Trader

Researchers Measure Impact of ‘Meltdown’ and ‘Spectre’ Patches on HPC Workloads

January 17, 2018

Computer scientists from the Center for Computational Research, State University of New York (SUNY), University at Buffalo have examined the effect of Meltdown Read more…

By Tiffany Trader

How the Cloud Is Falling Short for HPC

March 15, 2018

The last couple of years have seen cloud computing gradually build some legitimacy within the HPC world, but still the HPC industry lies far behind enterprise I Read more…

By Chris Downing

Russian Nuclear Engineers Caught Cryptomining on Lab Supercomputer

February 12, 2018

Nuclear scientists working at the All-Russian Research Institute of Experimental Physics (RFNC-VNIIEF) have been arrested for using lab supercomputing resources to mine crypto-currency, according to a report in Russia’s Interfax News Agency. Read more…

By Tiffany Trader

Chip Flaws ‘Meltdown’ and ‘Spectre’ Loom Large

January 4, 2018

The HPC and wider tech community have been abuzz this week over the discovery of critical design flaws that impact virtually all contemporary microprocessors. T Read more…

By Tiffany Trader

How Meltdown and Spectre Patches Will Affect HPC Workloads

January 10, 2018

There have been claims that the fixes for the Meltdown and Spectre security vulnerabilities, named the KPTI (aka KAISER) patches, are going to affect applicatio Read more…

By Rosemary Francis

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Deep Learning at 15 PFlops Enables Training for Extreme Weather Identification at Scale

March 19, 2018

Petaflop per second deep learning training performance on the NERSC (National Energy Research Scientific Computing Center) Cori supercomputer has given climate Read more…

By Rob Farber

Leading Solution Providers

Lenovo Unveils Warm Water Cooled ThinkSystem SD650 in Rampup to LRZ Install

February 22, 2018

This week Lenovo took the wraps off the ThinkSystem SD650 high-density server with third-generation direct water cooling technology developed in tandem with par Read more…

By Tiffany Trader

Fast Forward: Five HPC Predictions for 2018

December 21, 2017

What’s on your list of high (and low) lights for 2017? Volta 100’s arrival on the heels of the P100? Appearance, albeit late in the year, of IBM’s Power9? Read more…

By John Russell

AI Cloud Competition Heats Up: Google’s TPUs, Amazon Building AI Chip

February 12, 2018

Competition in the white hot AI (and public cloud) market pits Google against Amazon this week, with Google offering AI hardware on its cloud platform intended Read more…

By Doug Black

HPC and AI – Two Communities Same Future

January 25, 2018

According to Al Gara (Intel Fellow, Data Center Group), high performance computing and artificial intelligence will increasingly intertwine as we transition to Read more…

By Rob Farber

US Plans $1.8 Billion Spend on DOE Exascale Supercomputing

April 11, 2018

On Monday, the United States Department of Energy announced its intention to procure up to three exascale supercomputers at a cost of up to $1.8 billion with th Read more…

By Tiffany Trader

New Blueprint for Converging HPC, Big Data

January 18, 2018

After five annual workshops on Big Data and Extreme-Scale Computing (BDEC), a group of international HPC heavyweights including Jack Dongarra (University of Te Read more…

By John Russell

Momentum Builds for US Exascale

January 9, 2018

2018 looks to be a great year for the U.S. exascale program. The last several months of 2017 revealed a number of important developments that help put the U.S. Read more…

By Alex R. Larzelere

Google Chases Quantum Supremacy with 72-Qubit Processor

March 7, 2018

Google pulled ahead of the pack this week in the race toward "quantum supremacy," with the introduction of a new 72-qubit quantum processor called Bristlecone. Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This