by Uwe Harms
Stuttgart, GERMANY — NEC’s European HPC Technology Center (EHPCTC) licensed the parallel direct PARDISO solver for its mathematical library MathKeisan.
The solver has been previously used for INPRO GmbH’s sheet metal forming simulation package INDEED on NEC SX Series supercomputers. INPRO is funded by a consortium of German automotive companies. A set of benchmarks show, that with the new SX-5 an overall performance of over 110 Gflop/s – out of 160 GFlop/s peak – on 16 CPU NEC SX-5 can be achieved. This is for sparse 3D problems with 300’000 degrees of freedom with the NEC tuned version of PARDISO.
The PARDISO package is a mathematical library of Fortran90 OpenMP routines for solving large sparse linear systems of equations in parallel on shared memory multiprocessors. One of the package’s strengths is its ability to solve large sparse systems of linear equations very efficiently in parallel. In order to improve sequential and parallel sparse numerical factorisation performance, the direct methods are based on Level-3 BLAS fundamental matrix operations and parallelism is exploited with left-right looking supernode techniques. The pivoting method allows supernode pivoting in order to compromise numerical stability and scalability during the factorisation process. The algorithm delivers substantial speedup, already for moderate problem sizes. PARDISO is implemented as a thread save library. The sparse matrices can be symmetric (positive definite, indefinite, hermitian or complex symmetric) or structurally symmetric (real, complex). The sparse direct method can also be automatically combined (as a preconditioner) with a CG or CGS iteration to solve parameter dependent sequences of linear equations more efficiently.
The performance figures below have been measured on an NEC SX-5. The problem set-up resulted from a semiconductor laser problem solved by the Integrated Systems Laboratory at ETH Zurich with the DESSIS-ISE simulation package.
Recently NEC announced a new version of the SX-5 which features an increased clock speed. Now it has 3.2 ns instead of 4 ns cycle time. This results in an increased CPU peak performance of 10 GFLOPS. The second line in the table represents reliable estimates of the performance.
The whole system, memory, cpu boards with all chips, now can support this higher frequency and therefore the performance will be simply 1.25 times the one measured on the former 8 GFlops version as long as no IO needs to be taken into account.
The “older” machine gets nearly 83% out of the peak performance on one processor and as improvement of a factor of 14,2 using 16 processors. The figure of 8.2 Gflop/s for the numerical factorisation represents 80% of peak on a one CPU SX-5 (10 GFlops) with a speedup of 14 with 16 CPUs, a rate that is only possible with the uniquely high memory bandwidth of the NEC SX Series systems.
——-
Uwe Harms is a supercomputing consultant and owner of
Harms-Supercomputing-Consulting in Munich, Germany.
============================================================