There’s a lot going on in the networks of HPC clusters, and selecting the right network fabric, equipment, and topology is important to ensuring good performance for given applications. A “one size fits all” approach rarely works, and architects will do well to tailor the network to the needs of the application.
In a recent report, titled “The Role of High-Bandwidth, Low-Latency Interconnects in High Performance Clusters,” Sebastian Kalcher, lead HPC architect at Adtech Global and a former HPC and high-speed interconnect engineer at CERN, discusses the important role of the network in an HPC cluster the various design considerations that should be taken into account.
Today’s HPC applications are very dependent on high-bandwidth, low-latency interconnects to move data among the various nodes of cluster, Kalcher says. As clusters get bigger, efficiency becomes a bigger concern, and low-overhead protocols that can help to eliminate wasting compute power becomes even more important.
When selecting the main fabric to be used for an HPC cluster network, the choice often comes down to two options: QDR/FDR InfiniBand or 1G/10G-Ethernet. Users should look at the communication patterns of the HPC application at hand–including the size of messages being sent and the level of latency that is acceptable–to make the best decision, Kalcher says.
For example, are the parallel processes communicating large chunks of data with their peers? And if so, are they communicating with all others or maybe only with their neighbors? “This can have a direct effect on a suitable network topology (and with that, on the overall cost of the fabric),” Kalcher says in his paper.
On the other hand, some communication patterns are dominated by the exchange of smaller control messages, in which case, latency might be the more important issue. “The size of the actual messages that are exchanged can have an effect on the overall performance,” he writes.
InfiniBand is the choice for many general purpose HPC clusters, thanks to is high throughput and low latency. And thanks to the low overhead 64b/66b encoding scheme used in Fourteen Data Rate (FDR) InfiniBand, very high data rates (up to 54.55 Gbps) can be achieved, while dedicating fewer CPU cycles on message copying, protocol handling, or checksum calculation than QDR or DDR, which use an 8b/10b encoding scheme.
InfiniBand also delivers flexibility in the network topology. Most clusters use a fat-tree topology, with 36-port switches arranged in a tree structure as the building block, according to Kalcher’s paper. Depending on whether flexibility or cost is the main goal, the HPC network architect can choose different topologies.
When the edge switches in an HPC cluster have an equal number of InfiniBand links going to the core switches and to the processor nodes, it is considered to have 1:1 bisectional bandwidth. This is the most flexible and fault-tolerant topology, but it is also the most expensive. Depending on the application, a bisectional bandwidth ratio of 1:3 (twice as many links to compute nodes as to core switches) may deliver the required performance, at a lower cost.