May 25, 2023 — Recently, the International Supercomputing Conference 2023 (ISC 2023) was held in Hamburg, Germany. The annual conference focuses on a host of hot topics such as system architectures, parallel programming models and performance modeling, applications and algorithms, machine learning, and quantum computing. It gathers experts from around the world to collectively clarify the challenges facing high-performance computing (HPC), machine learning, data analysis, and quantum computing, as well as discuss how the boundaries of conventional HPC are extended with emerging technologies and applications.
Huawei shared its insights into the HPC development trend and showcased its HPC strategies and innovations. At the conference, Zheng Tong, Huawei senior network architect in Europe, expressed great pleasure at seeing so many industry players demonstrate their brand-new products and solutions at this first face-to-face ISC exhibition held after the pandemic. Europe has formulated positive policies conducive to HPC development, injecting new impetus into the industry and further clarifying the industry development objective. Underpinned by years of technical expertise and innovations, Huawei brings many ground-breaking network solutions with ultimate performance, such as the HPC storage system.
At the conference, the data center network (DCN) team of Huawei’s data communication product line demonstrated the future-oriented lossless Ethernet-based HPC network solution. The solution stands out with four highlights. First, it provides the intelligent lossless (iLossless) algorithm to enable a Remote Direct Memory Access over Converged Ethernet-based (RoCE-based) lossless network, improving network performance by 30% within a data center (DC) and achieving lossless data transmission over distances of up to 200 km between DCs.
The solution is also empowered by the network scale load balancing (NSLB) technology. This ground-breaking technology eliminates network load imbalance, enabling 90% ultra-high network throughput and improving the AI training efficiency by 20%. Another highlight of the solution is its data plane failover recovery (DPFR) technology. Based on the data-plane hardware, this technology implements rapid fault detection, remote notification, and fast link switching, achieving submillisecond-level fault convergence and minimizing the impact on service performance. The solution is also equipped with an intelligent operations and maintenance (O&M) system, which monitors and visualizes RoCE key performance indicators (KPIs) from multiple dimensions. Huawei’s lossless Ethernet-based HPC network solution can reduce the total cost of operation (TCO) by 36% and is ideal for the computing era.
So far, Huawei’s HPC network solution, based on lossless Ethernet, has been widely adopted in various industries, including education, energy, and manufacturing. A notable example is the Wuhan AI Computing Center. This project, a collaboration between Huawei and the city of Wuhan, establishes a standard for CPU usage by utilizing the combined AI Fabric, Atlas, and FusionPoD solution. This solution is set to be promoted and duplicated in next-generation AI innovation and development zones across over ten countries.
As HPC and AI applications gain momentum, higher requirements are raised for network bandwidth and throughput. To keep up, Ethernet is rapidly evolving from 100GE to 200GE, 400GE, and even 800GE. Higher transmission speeds require better Ethernet performance. In the foreseeable future, the low-latency, high-throughput, and lossless Ethernet ecosystem will become more mature around the world, and Ethernet technologies with higher speeds (400GE, 800GE, and even faster) will see wider applications. In this trend, many key technologies such as more efficient coding and modulation methods, as well as more advanced switching chips, will keep emerging to further improve the lossless Ethernet performance.
Huawei will continue working closely with industry partners to propel the intelligent pace of HPC and create ultimate network experience for customers.