Practically-speaking, achieving exascale computing requires enabling HPC software to effectively use accelerators – mostly GPUs at present – and that remains something of a challenge. Consider Summit, the U.S. supercomputer at ORNL, which captured the top spot on the Top500 list in June. Summit has 4,356 nodes, each with two IBM 22-core Power9 CPUs and six Nvidia Tesla V100 GPUs. It’s the GPUs that provide most of the performance speedup, and math libraries, in particular, must be able to take advantage of them to speed up HPC applications.
The SLATE project – Software for Linear Algebra Targeting Exascale – is intended to help solve the accelerator-readiness problem. Last week the U.S. Exascale Computing Project (ECP) posted a video interview with Jakub Kurzak, co-PI on SLATE, updating progress. It’s brief, breezy and worth watching given how foundational math libraries are for HPC applications. SLATE is intended to replace the 20-plus year-old Scalable Linear Algebra PACKage (ScaLAPACK) library, currently the industry standard for dense linear algebra operations in distributed memory environments.
“The main motivation for rewriting ScaLAPACK [is] it is very hard to imagine an accelerated ScaLAPACK,” says Kurzak. “If you look at where HPC is going, if you look at the big machine here, Summit, you see immediately the need. To put a number on it, something like 98 percent of the Summit’s performance is in its GPUs.” If codes are not GPU-accelerated, “you won’t reach exascale,” he says.
As described on the SLATE website:
“SLATE aims to extract the full performance potential and maximum scalability from modern, many-node HPC machines with large numbers of cores and multiple hardware accelerators per node. For typical dense linear algebra workloads, this means getting close to the theoretical peak performance and scaling to the full size of the machine (i.e., thousands to tens of thousands of nodes). This is to be accomplished in a portable manner by relying on standards like MPI and OpenMP.
“SLATE functionalities will first be delivered to the ECP applications that most urgently require SLATE capabilities (e.g., EXascale Atomistics with Accuracy, Length, and Time [EXAALT], NorthWest computational Chemistry for Exascale [NWChemEx], Quantum Monte Carlo PACKage [QMCPACK], General Atomic and Molecular Electronic Structure System [GAMESS], CANcer Distributed Learning Environment [CANDLE]) and to other software libraries that rely on underlying dense linear algebra services (e.g., Factorization Based Sparse Solvers and Preconditioners [FBSS]). SLATE will also fill the void left by ScaLAPACK’s inability to utilize hardware accelerators, and it will ease the difficulties associated with ScaLAPACK’s legacy matrix layout and Fortran API.”
These are ambitious goals. Kurzak and co-PI Jack Dongarra, both of the University of Tennessee’s Innovative Computing Laboratory (ICL), lead a group of roughly eight researchers dedicated to the ECP project. In the video, Kurzak is interviewed by Mike Bernhardt, ECP communications manager, and they discuss what’s been accomplished, what’s expected in the next year or so, and some of the challenges.
Presented here, slightly edited, are a few of Kurzak’s comments.
“We’ve spent a lot of time laying out the foundations making sure the architecture is solid. In terms of functionality we haven’t released all that much, but we have released some routines for basic linear algebra operations. If you want to multiply to really large matrices right now and get GPU acceleration, SLATE has these kinds of routines. We [also] released a batch of matrix norms routines. Now we’re working on a really exciting batch of routines for solving linear systems. I think our user base should explode when we release the linear solvers at the end of this quarter,” he says.
“[By] the end of 2019 SLATE should be a solid replacement for ScaLAPACK. At least for the most important parts of ScaLAPACK. It should offer a viable replacement for GPU acceleration. That being said we designed the package to be much more flexible than ScaLAPACK so we should be able to go way beyond [its] capabilities as we go beyond 2019. There’s a lot of exciting things I think we can do algorithmically in SLATE and cater to many more applications in terms of what kinds of problems we can solve, what sizes, what types of matrices.”
Kurzak notes SLATE is the first major project at ICL to be implemented in C++. “That’s a bit barrier to adoption initially, but I have to say it’s been a blessing [because] I think the choice of the C++ language, the shift from C, is probably going to be one of the key technologies that will contribute to SLATE’s success.”
Perhaps not surprisingly, recruitment and retention are among SLATE’s most difficult challenges.
“You want somebody that does know C++ well, somebody who definitely knows MPI, and oh yes knows multithreading too, and yes, knows GPU programming too, and yes, knows linear algebra. That is a long list of requirements. The assumption is we’ll hire somebody who does not know everything but will pick it up on the job. Nevertheless the barrier to entry is pretty high.”
Interestingly, enthusiasm is the number one factor he is looking for.
Link to SLATE poster: https://www.exascaleproject.org/wp-content/uploads/2018/01/ECP-Meeting-Poster-SLATE.pdf
Link to video: https://www.youtube.com/watch?v=wS5aPAcaNbY