Nov. 13, 2019 — In the main banquet room of Knoxville, Tennessee’s downtown Hilton Hotel, more than 150 scientists from around the world got their first peek at the exascale computing power that will become available for their research projects in two short years. The US Department of Energy’s (DOE’s) Oak Ridge Leadership Computing Facility (OLCF)—a DOE Office of Science User Facility at Oak Ridge National Laboratory (ORNL)—presented its first Frontier Application Readiness Kick-Off Workshop from October 8 through 10.

The OLCF invited teams from its own Center for Accelerated Application Readiness (CAAR) and the Exascale Computing Project (ECP) to attend presentations and talk with the supercomputer’s designers—Cray (a Hewlett Packard Enterprise company) and CPU/GPU-maker AMD—to get a head start on preparing their codes for Frontier’s launch in 2021. Considering that some of these codes have been in use for decades, 2 years is a comparatively short amount of time to revamp them for a whole new system.

Bronson Messer, an ORNL computational scientist who helped organize the event, said the first wave of application teams selected by the OLCF will need to hit the ground running to make sure their research projects can perform efficiently on Frontier by start-up.

“We have had what some might call this obsession with application readiness ever since we first fielded Titan back in 2012—and we actually started preparing that platform well in advance of 2012,” Messer told the crowd in his introductory pep talk, mic in hand. “We want scalable, accelerated scientific applications as soon as Frontier hits the floor.”

The assembled scientists didn’t seem to mind this sense of urgency about the tasks ahead of them. The OLCF’s Frontier represents a big generational leap in computing speed and accuracy given it will be able to solve calculations up to 10 times faster than the current computer leader, the OLCF’s IBM AC922 Summit. As the first supercomputer that will be able to attain 1.5 exaflops (exceeding a quintillion calculations per second), Frontier promises to greatly accelerate scientific discovery.

James McClure, a computational scientist at Virginia Tech, has been working on OLCF computers since the 27-petaflops Titan went online in 2012. His team’s LBPM application is a Lattice Boltzmann-based code that simulates fluid flows in complex geometries, whether they’re geological (such as groundwater) or engineered (such as fuel cells). This was not his first readiness workshop. In fact, he said he always looks forward to refashioning his code to run on more powerful architectures.

“Getting high performance is essential for productivity in science,” McClure said. “I think it’s certainly true that applications that are able to fully leverage the new supercomputers being built are critical to addressing some of the scientific bottlenecks that exist today. To me, it’s pretty exciting to be part of that, and I enjoy doing it.”

The application teams will be supported in their preparation efforts by the Frontier Center of Excellence (CoE), a joint organization led by Cray in partnership with AMD and the OLCF. At the workshop, Frontier CoE personnel trained the CAAR and ECP teams on the current programming environment for early access systems. They also made presentations that shared details of AMD’s Radeon Instinct GPU hardware and programming as well as Cray’s Shasta architecture, including its new Slingshot interconnect. But CoE instructors benefitted from the workshop as well, said Noah Reddell, Cray’s Frontier CoE manager.

“The application readiness kickoff workshop was incredibly helpful for me and the rest of the CoE staff,” Reddell said. “We used the event to make contact with all of the application teams and begin discussions for future assistance to ready their applications for exascale and the Frontier architecture. This kicked off close collaborative relationships that will continue for the next several years.”

In addition to attending the presentations, scientists had the opportunity to confer with other researchers, breaking off into discussion groups on October 8 and continuing their interchanges over the next few days. For Sunita Chandrasekaran, assistant professor of Computer and Information Sciences at the University of Delaware, the chance to meet other researchers and discuss their work was energizing.

“I believe this was the most beneficial outcome of the workshop,” said Chandrasekaran, whose team’s application code, PIConGPU, studies plasma-based accelerators toward high energy particles, which can be used for applications in radiation therapy for cancer. “We made connections with teams and members that we know we are going to find to be extremely helpful resources going forward. Dialogues between domain and computer scientists are key to the success of an interdisciplinary project.”

Likewise, McClure said he enjoys interacting with his peers at the workshop, not only to socialize but also for the very practical purpose of learning new things.

“You have the opportunity to see how other groups are solving problems and learn from that, and sometimes there are things you can pick up that you might not find somewhere else,” McClure said. “Also, I think it’s pretty cool to be around so many different kinds of science—you’re exposed to a lot of different kinds of advancements—because, at the end of the day, all of the groups that are doing this kind of work are tied to very high-impact science-use cases and working with some of the best people in the world.”

The application teams will have a lot of work ahead of them in the next 2 years, from formulating quarterly technical plans to adapting their projects to the hardware and software changes that will occur as the Frontier system nears its launch date. Despite the weight of those responsibilities, Messer said he was most impressed by the teams’ palpable excitement at this initial readiness workshop.

“I think people were very enthusiastic about learning something about this brand-new machine,” Messer said. “It really is going to be one of the world’s first exascale machines, so after having heard about exascale for so many years, practitioners in these disciplines are very anxious to get their hands on an exascale machine.”

About Oak Ridge National Laboratory

ORNL is managed by UT-Battelle LLC for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://science.energy.gov.


Source: Coury Turczyn, Oak Ridge National Laboratory (ORNL)