Doug Kothe on the Race to Build Exascale Applications

By John Russell

May 29, 2017

Ensuring there are applications ready to churn out useful science when the first U.S. exascale computers arrive in the 2021-2023 timeframe is Doug Kothe’s job. No pressure. He’s not alone, of course. The U.S. Exascale Computing Project (ECP) is a complicated effort with many interrelated parts and contributors, all necessary for success. Yet Kothe’s job as director of application development is one of the more visible and daunting and perhaps best described by his boss, Paul Messina, ECP director.

“We think of 50 times [current] performance on applications [as the exascale measure of merit], unfortunately there’s a kink in this,” said Messina. “The kink is people won’t be running today’s jobs in these exascale systems. We want exascale systems to do things we can’t do today and we need to figure out a way to quantify that. In some cases it will be relatively easy – just achieving much greater resolutions – but in many cases it will be enabling additional physics to more faithfully represent the phenomena. We want to focus on measuring every capable exascale system based on full applications tackling real problems compared to what they can do today.”

Doug Kothe, ECP

In this wide-ranging discussion with HPCwire, Kothe touches on ECP application development goals and processes; several technical issues such as efforts to combine data analytics with mod/sim and the need for expanded software frameworks to accommodate exascale applications; and early thoughts for incorporating neuromorphic and quantum computing not currently part of the formal ECP plan. Interestingly, his biggest worry isn’t reaching the goal on schedule – he believes the application teams will get there – but post-ECP staff retention when industry comes calling.

By way of review, ECP is a collaborative effort of two Department of Energy organizations—the Office of Science and the National Nuclear Security Administration. Six applications areas have been singled out: national security; energy security, economic security, scientific discovery; earth science; and health care. In terms of app-dev, that’s translated into 21 Science & Energy application projects, 3 NNSA application projects, and 1 DOE / NIH application project (precision medicine for cancer).

It’s not yet clear what the just released FY2018 U.S. Budget proposed by the Trump Administration portends. Funding for science programs were cut nearly across the board although ECP escaped. Kothe says simply, “It is the beginning of the process for the FY18 budget, and while the overall budget is determined, we will continue working on the applications that are already part of the ECP.”

In keeping with ECP’s broad ambitions, Kothe says, “All of our applications teams are focused on very specific challenge problems and by our definition a challenge problem is one that is intractable today, needs exascale resources, and is a strategic high priority for one of the DOE program offices. We aren’t claiming we are going to solve all the problems but we are claiming is simulation technology that can address the problem. The point is we have the applications vectored in rather specific directions.” (Summary list below, click to enlarge)

 

RISE OF DATA ANALYTICS
One of the more exciting and new-to-HPC areas is incorporation of data analytics into the HPC environment overall and ECP in particular. Indeed, harmonizing or at least integrating the big data and modelling and simulation is a goal specified by the National Strategic Computing Initiative. Data-driven science isn’t new nor is researcher familiarity with underlying statistics. But the sudden rise machine/deep learning techniques and including many that rely on lower precision calculations is somewhat new to the scientific computing community and an area where the commercial world has perhaps taken the lead. Kothe labels the topic “white hot”.

“Not being trained in the data analytics area I’ve been doing a lot of reading and talking [to others]. A large fraction of the area I feel like I know, but I didn’t appreciate the other 20 or 30 percent. The point is by exposing our applications teams to the data analytics community, even just calling libraries, we are going to see some interesting in situ and computational steering use cases. As an example of in situ, think of turbulence. It could be an LES (large eddy simulation) whose parameters could have been tuned a priori by machine learning or chosen on the fly by machine learning. That kind of work is already going on at some universities,” Kothe says.

Climate modeling is a case point. “A big challenge is subgrid models for clouds. Right now and even at exascale we probably cannot do one km or less resolution everywhere. We may be able to do regional coupled simulations that way, but if we try to do five or ten kilometers everywhere – of course it will vary whether over ocean or land ice, sea ice, or atmosphere – you will still have many clouds lost in one cell. You need a subgrid model. Maybe machine learning could be used to select the parameters. Think of a bunch of little LES models running in a 10km x10km cell holding lots of clouds that are then scaled into the higher level physics. I think subgrid models are potentially a poster child for machine learning.”

Steering simulations is another emerging use case. “There’s a couple of labs, Lawrence Livermore in particular, that are already using machine learning to make decisions, to automate decisions about mesh quality for fluid and structure simulations where the mesh is just flowing with the moving material and the mesh may start to contort in a way that will cause the numerical solution to break down or errors to increase. You could do quality checks on the fly and correct the mesh with machine learning.”

One interesting use is being explored as part of the Exascale CANcer Distributed Learning Environment (CANDLE) project (see HPCwire article, Enlisting Deep Learning in the War on Cancer). Part of the project is clarifying the RAS (gene) network activity. The RAS network is implicated very many cancers. “You have machine learning orchestrating ensembles of molecular dynamics simulations [looking at docking scenarios with the RAS protein] and examining factors that are involved in docking,” says Kothe. Machine learning can recognize already known areas and reduce need for computationally intensive simulation in those areas while zeroing in on lesser known areas for intense quantum chemistry simulations. Think of it as zooming in and out as needed.

 

FRAMEWORKS REVISITED
Clearly there’s no shortage of challenges for ECP application development. Kothe cites optimizing node performance and memory management among the especially thorny ones, “We’ve now have many levels of memory exposed to us. We don’t really quite know how best to use it.” Data structure choices can also be problematic and Kothe suggests frameworks may undergo a revival,

One of the application teams (astrophysics), recalls Kothe, came to him and said, “I am afraid to make a choice for a data structure that would be pervasive in my whole code because it might be the wrong one and I’m stuck with it.'” The point is I think what we are seeing with the applications a kind of ‘going back to the future’ in late 80s when you saw lots of heavyweight frameworks where an application would call out to a black box and say register this array for me and hand me back the pointer.

“That’s good and it’s bad. The bad part is you’re losing control and now you have to schlep around this black box and you don’t know if it is going to do what you want it to do. The good part is if you are on a KNL system or an NVIDIA system, you are on different nodes, and that block box memory manager would have been tuned for that hardware. [In] dealing with memory hierarchy risks, I think we are probably seeing applications move more towards frameworks which I find think is a good idea. We’ve learned kind of what I call the big F or little f frameworks. I think we’re learning how to balance the two so applications can be portable and not have to rely on an army of people but still do something that’s more agile than just choose one data structure and hope it works.”

Performance portability is naturally a major consideration. Historically, says Kothe, application developers and he includes himself in the category, “We chose portability over performance because we want to make sure our science can be done anywhere. Performance can’t be an afterthought but it often is. Portability in my mind has several dimensions. So the new system shows up and it is probably not something out of left field, you know something about it, but what’s a reasonable amount of effort that you think should be required to port your code? How much of the code base do you think should change? What is correctness in terms of the problem and getting the answer.

“I would claim that a 64-bit comparison is probably not realistic. I mean it’s probably not even appropriate. What set of problems would you run? You need to run real problems. We’re asking each app team to define what they think portability means and hope that collectively we’ll move towards a good definition and a good target for all the apps but I think it will end up being fairly app specific.”

THE CO-DESIGN IMPERATIVE
The necessity of co-design has become a given throughout HPC as well as with the ECP. Advancing hardware and new systems architectures must be taken into account not merely to push application performance but to get them to run at all. However coupling software too tightly to a specific machine or architecture is limiting. Currently ECP has established six co-design centers to help deal with specific challenges. Kothe believes use of motifs may help.

“Every application team at some level will be doing some vertically integrated co-design and there is probably more software co-design going on – the interplay with the compilers and runtime systems and that kind of thing – than anything else. By having the co-design centers identify a small number of motifs that applications are using, I think we can leverage a deep dive co-design on the motifs as opposed to doing kind of an extensive co-design vertically integrated within every application. This is new and there are some risks. But long term, my dream would be we [develop] community libraries that are co-designed around motifs that are used broadly among the applications.

“The poster child is probably [handling] particles. Almost every application has a discrete particle model for something and that’s good and it’s a challenge. So how do you encapsulate the particle [model] in a way that it can be co-designed not as a separate activity that’s not thinking about the [specific] consumer of that motif, but just thinking about making that motif rock and roll. That’s the challenge, to co-design motifs so they can be broadly used and I have high hopes there.”

 

 

STAY ON TARGET
“A big challenge with application developers, is everything sounds cool and looks good, so we want to keep them focused. Year by year the applications have laid out a number of milestones and for the most parts the milestones are step by step progression towards that challenge program. The progression has many dimensions: is the science capability improving, better physics, better algorithms; is the team utilizing the hardware efficiently [such as] state of the art test beds, the latest systems on the floor; are they integrating software technologies and probably one of the most important is they are using co-design efforts,” says Kothe

One ECP-wide tool is a comprehensive project database where “all the R&D projects and applications and software technology, all their plans and milestones are in one place.” A key aspect of ECP, says Kothe, is that everyone can see what everyone else is doing and how they are progressing.

Think of a milestone as a handful of things, says Kothe, that are generally tangible such as software release or a demonstration simulation. “It could be a report or a presentation. It can even be a small write up that says I tried this algorithm and it didn’t work. A milestone is a decision point.

“It’s not always a huge success. Failure can be just as valuable. Sometimes we can force a sense of urgency. We can review this seven-year plan and say, alright you can’t bring in a technology that doesn’t have a line of sight in this timeframe, or you’ve got algorithm A and B going along [and] at this point you have make a decision and choose one and go with it. I like that. I think it imparts a sense of urgency,” Kothe.

Kothe, of course, has his own milestones. One is an annual application assessment report due every September.

“I am hearing I am a slave driver and I didn’t really think had that personality,” says Kothe. One area where he is inflexible is on scheduled releases. “We want you to release on the scheduled date, that date is gospel. What’s in the release may float. So the team and budget, we like to be pretty rigid, but what’s in the release floats based on what you have learned. You have this bag of tasks and try to get as many tasks done as you can but you still must have the release.”

Currently, the comprehensive database of projects isn’t publicly available (would be interesting reading) but Kothe says individual PIs are encouraged to share information widely.

SOFTWARE TECHNOLOGY SHARING
Not surprisingly, close collaboration with the software technology team is emphasized. “Right now what we have this incredible opportunity because applications teams are exposed to a lot of software technologies they’ve never seen or heard of.” It’s a bit like kids in a candy store says Kothe, “They are looking at this technology and saying I want to do that, to do that, to do that, and so the challenge for integration is on managing the interfaces and doing it in a scalable way.”

There a couple of technology projects that everyone wants to integrate, he says, and that’s big bandwidth worry when you have 20-plus application projects lined up saying “let me try your stuff because chances are there will be new APIs and new functionalities and bugs and features [too]. The software technology people are saying, ‘Doug be careful. let’s come up with a scalable process.’” Conversely, says Kothe, it is also true there’s a fair amount of great “software technology the application teams are not exploring which they should be.”

“We have defined a number of integration milestones which are basically milestones that require deliverables from two or three areas. We call that shared fate. [I know] it sounds like we are jumping off a cliff together. A good example is an application project looks at a linear solver and says ‘you don’t have the functionality I need, lets negotiate requirements.’ So the solver negotiates a new API, a new functionality, and the application team will have a milestone that says it will have integrated and tested and the new technology [by a given date] and the software technology team has to have its release say two or three months before. These things tend to be daisy chained like that. You have a release, then an integration assessment, and we might have another release to basically deal with any issues.

“Right now, early on in ECP, we’re having a lot of point-to-point interaction where there’s lots of aps that want to do lots of same or different things with lots of software projects. I think once we settle down on the requirements the software technologies will be kind of one to all [having] settled on a base functionality and a base API. An obvious example is MPI but even with MPI there’s new features and functionalities that certain aspects. We can’t take it for granted that some of these tremendous technologies like MPI are going to be there working the way we need for exascale,” says Kothe.

 

ECP FUTURE WATCH
Even as ECP pushes forward it remains rooted in CMOS technology yet there are several newer technologies – not least neuromorphic and quantum computing – which have made great strides recently and seem on the cusp of practical application.

“One of the things I have been thinking about is even if we don’t have access to a neuromorphic chip what is its behavior like from a hardware simulator point of view. The same thing with quantum computing. Our mindset has to change with regards to the algorithms we lay out for neuromorphic or quantum. The applications teams need to start thinking about different types of algorithms. As Paul [Messina] has pointed out it’s possible quantum computing could fairly soon become an accelerator on traditional node. Making sure applications are compartmentalized is important to make that possible. It would allow us to be more flexible and extensible and perhaps exploit something like a quantum accelerator.”

Looking ahead, says Kothe, he worries most about the unknown unknowns – there will be surprises. “I feel like right now in apps space we kind of have known unknowns and we’ll hit some unknown unknowns, but I believe we are going to have a number of applications ready to go. We’ll have trips along the way and we may not do some things we plan now. I think we have an aggressive but not naive set of metrics. It’s really the people. We have some unbelievable people,” he says.

One can understand today’s attraction. Kothe points out this is likely to be a once-in-a-career opportunity and the mix of experience among the application team members significant. “What we see is millennials sitting at the table showing people new ways of doing software with gray-haired guys like me who have been to the school of hard knocks. There’s a tremendous cross fertilization. I’m confident. I saw it when we selected these teams. We had teams with rosters that looked like the all star team, but I am worried about retention. We are training people to be some of the best, especially the early career folks, so I am worried that they will be in high demand, very marketable.”

Kothe Bio from ECP website:
Douglas B. Kothe (Doug) has over three decades of experience in conducting and leading applied R&D in computational applications designed to simulate complex physical phenomena in the energy, defense, and manufacturing sectors. Kothe is currently the Deputy Associate Laboratory Director of the Computing and Computational Sciences Directorate (CCSD) at Oak Ridge National Laboratory (ORNL). Prior positions for Kothe at ORNL, where he has been since 2006, were Director of the Consortium for Advanced Simulation of Light Water Reactors, DOE’s first Energy Innovation Hub (2010-2015), and Director of Science at the National Center for Computational Sciences (2006-2010).

Feature Caption:
The Transforming Additive Manufacturing through Exascale Simulation project (ExaAM) is building a new multi-physics modeling and simulation platform for 3D printing of metals to provide an up-front assessment of the manufacturability and performance of additively manufactured parts. Pictured: simulation of laser melting of metal powder in a 3D printing process (LLNL) and a fully functional lightweight robotic hand (ORNL).

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

China Plans 2019 Exascale Machine To Grow Sea Power

August 23, 2017

The glory of having the world's fastest supercomputer, as measured by the Linpack benchmark, has been China's for four years running, first with the 33-petaflops Tianhe-2 and currently with the 93-petaflops TaihuLight. T Read more…

By Tiffany Trader

Microsoft, Intel Unveil FPGA-driven Project Brainwave

August 23, 2017

We know about the seeming light-speed processing power of FPGAs and the natural fit they pose for data-dense AI workloads. But we also know that FPGAs present usability and programmability problems that flummox IT shops. Read more…

By Doug Black

Study Identifies Best Practices for Public-Private HPC Engagement

August 22, 2017

What's the best way for HPC centers in the public sphere to engage with private industry partners to boost the competitiveness of the companies and the larger communities? That question is at the heart of a new study pub Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

Leveraging Deep Learning for Fraud Detection

Advancements in computing technologies and the expanding use of e-commerce platforms have dramatically increased the risk of fraud for financial services companies and their customers. Read more…

Google Launches Site to Share its NYC-based Algorithm Research

August 22, 2017

Much of Google’s algorithm development occurs in groups scattered throughout New York City. Yesterday, Google launched a single website - NYC Algorithms and Optimization Team page - to provide a deeper view into all of Read more…

By John Russell

China Plans 2019 Exascale Machine To Grow Sea Power

August 23, 2017

The glory of having the world's fastest supercomputer, as measured by the Linpack benchmark, has been China's for four years running, first with the 33-petaflop Read more…

By Tiffany Trader

Microsoft, Intel Unveil FPGA-driven Project Brainwave

August 23, 2017

We know about the seeming light-speed processing power of FPGAs and the natural fit they pose for data-dense AI workloads. But we also know that FPGAs present u Read more…

By Doug Black

Study Identifies Best Practices for Public-Private HPC Engagement

August 22, 2017

What's the best way for HPC centers in the public sphere to engage with private industry partners to boost the competitiveness of the companies and the larger c Read more…

By Tiffany Trader

Tech Giants Outline Battle Plans for Future HPC Market

August 21, 2017

Four companies engaged in a cage fight for leadership in the emerging HPC market of the 2020s are, despite deep differences in some areas, in violent agreement Read more…

By Doug Black

Microsoft Bolsters Azure With Cloud HPC Deal

August 15, 2017

Microsoft has acquired cloud computing software vendor Cycle Computing in a move designed to bring orchestration tools along with high-end computing access capabilities to the cloud. Terms of the acquisition were not disclosed. Read more…

By George Leopold

HPE Ships Supercomputer to Space Station, Final Destination Mars

August 14, 2017

With a manned mission to Mars on the horizon, the demand for space-based supercomputing is at hand. Today HPE and NASA sent the first off-the-shelf HPC system i Read more…

By Tiffany Trader

AMD EPYC Video Takes Aim at Intel’s Broadwell

August 14, 2017

Let the benchmarking begin. Last week, AMD posted a YouTube video in which one of its EPYC-based systems outperformed a ‘comparable’ Intel Broadwell-based s Read more…

By John Russell

Deep Learning Thrives in Cancer Moonshot

August 8, 2017

The U.S. War on Cancer, certainly a worthy cause, is a collection of programs stretching back more than 40 years and abiding under many banners. The latest is t Read more…

By John Russell

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Nvidia’s Mammoth Volta GPU Aims High for AI, HPC

May 10, 2017

At Nvidia's GPU Technology Conference (GTC17) in San Jose, Calif., this morning, CEO Jensen Huang announced the company's much-anticipated Volta architecture a Read more…

By Tiffany Trader

Reinders: “AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors

June 29, 2017

Imagine if we could use vector processing on something other than just floating point problems.  Today, GPUs and CPUs work tirelessly to accelerate algorithms Read more…

By James Reinders

Russian Researchers Claim First Quantum-Safe Blockchain

May 25, 2017

The Russian Quantum Center today announced it has overcome the threat of quantum cryptography by creating the first quantum-safe blockchain, securing cryptocurrencies like Bitcoin, along with classified government communications and other sensitive digital transfers. Read more…

By Doug Black

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Just how close real-wo Read more…

By John Russell

Google Debuts TPU v2 and will Add to Google Cloud

May 25, 2017

Not long after stirring attention in the deep learning/AI community by revealing the details of its Tensor Processing Unit (TPU), Google last week announced the Read more…

By John Russell

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of Read more…

By Alex Woodie

Leading Solution Providers

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its a Read more…

By Tiffany Trader

Six Exascale PathForward Vendors Selected; DoE Providing $258M

June 15, 2017

The much-anticipated PathForward awards for hardware R&D in support of the Exascale Computing Project were announced today with six vendors selected – AMD Read more…

By John Russell

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the cam Read more…

By John Russell

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

Top500 Results: Latest List Trends and What’s in Store

June 19, 2017

Greetings from Frankfurt and the 2017 International Supercomputing Conference where the latest Top500 list has just been revealed. Although there were no major Read more…

By Tiffany Trader

IBM Clears Path to 5nm with Silicon Nanosheets

June 5, 2017

Two years since announcing the industry’s first 7nm node test chip, IBM and its research alliance partners GlobalFoundries and Samsung have developed a proces Read more…

By Tiffany Trader

Graphcore Readies Launch of 16nm Colossus-IPU Chip

July 20, 2017

A second $30 million funding round for U.K. AI chip developer Graphcore sets up the company to go to market with its “intelligent processing unit” (IPU) in Read more…

By Tiffany Trader

Singularity HPC Container Technology Moves Out of the Lab

May 4, 2017

Last week, Singularity – the fast-growing HPC container technology whose development has been spearheaded by Gregory Kurtzer at Lawrence Berkeley National Lab Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Share This