One of the most significant differentiators between HPC systems and lesser performant systems is the interconnect technology deployed. Late last month PRACE (Partnership for Advanced Computing Europe) released its brief 2019 Best Practice Guide – Modern Interconnects, which offers advice on technology selection and deployment issues.
The latest guide, “gives an overview of the most common types of interconnects in the current generation of HPC systems. It introduce[s] the key features of each interconnect type and the most common network topologies. The selected interconnect type within the current generation of PRACE Tier-0 systems [are] listed and the final section give[s] some hints concerning network benchmarking.”
As shown here, at least in terms of the November 2018 Top500 systems, Ethernet and InfiniBand broadly (various flavors) dominate interconnect choices although the guide notes “Omni-Path (OPA, Omni-Path architecture) was officially started by Intel in 2015 and is one of the youngest HPC interconnects. Within the last four years more and more systems have been built using Omni-Path technology. The November 2018 TOP500 list [1] contains 8.6% Omni-Path systems.”
Concise description of various interconnect technologies (OPA, InfiniBand, Aries, NVLink, Ethernet, etc.) are among examined along with discussion of common HPC network topologies (e.g. fat trees, torus, hypercube, dragonfly, etc.) and their distinguishing attributes.
In describing fat trees, for example, the guide notes, “The idea of the fat tree is instead of using the same wire ‘thickness’ for all connections, to have ‘thicker’ wires closer to the top of the tree and provide a higher bandwidth over the corresponding switches. In an ideal fat tree the aggregated bandwidth of all connections from a lower tree level must be combined in the connection to the next higher tree level. This setup is not always achieved, in such a situation the tree is called a pruned fat tree.”
The guide is freely downloadable as a pdf. There are also brief descriptions of the interconnects deployed by several of Europe’s largest systems including: CURIE (CEA), Hazel Hen (HLRS), JUWELS (Julich Supercomputing Centre), Marconi (Cinceca), MareNostrum (Barcelona Supercomputing Center), PizDaint (CSCS), and SuperMUC (LRZ).
Figures via PRACE report