Nov. 10, 2017 — As supercomputers achieve petascale and reach toward exascale, efficient communication among thousands of nodes becomes an important question. One pioneering solution is the Silicon Switch (an OPA-based Torus topology switch) by Sugon, China’s high-performance computing leader. A demo of the switch will take place at SC17 in Denver.
“Large-scale supercomputers, especially those quasi-exascale or exascale systems, have to face severe challenges in terms of system scale, scalability, cost, energy consumption, reliability, etc. The Silicon Switch released by Sugon adopts the Torus architecture and the state-of-art OPA technology, and then carries more competitive features, including advanced performance, almost infinite scalability, and excellent fault tolerance ability. It shall be a wise choice for exascale supercomputer,” said Dr. Li Bin, General Manager of Business Department for HPC Product of Sugon.
Compared with the traditional fat-tree network topology, the Torus direct network, which emphasizes the neighboring interconnection, has obvious advantages in scalability and cost/performance, since it only holds a linear dependency between the network cost and the system scale. In addition, the rich redundant data paths and the dynamic routing give inherent superiority in fault tolerance ability. All these features well meet the requirements of exascale supercomputers and pave a new trend of high-speed network technology.
Dr.Li Bin further remarked that Sugon had realized 3D-Torus network in 2015, as a solution for their Earth System Numerical Simulator. Recently, Sugon’s researches in Torus network technology have made bigger breakthroughs. The dimension of the Torus network has evolved from 3D to 6D which can effectively reduce of the longest network hops of large-scale systems. At the software level, the deadlock-free dynamic routing algorithms supporting 6D-Torus have been verified and tested in the actual environment. At the hardware level, the Silicon Switch released this time is an important sample of the hardware implementation.
The “Silicon” mentioned above refers to a unit in Torus high-dimensional direct network. With the 3D-Torus topological structure adopted in asilicon unit, multiple silicon units can agglomerate into a higher-dimensional 4D/5D/6D-Torus direct network. Integrating a 3D-Torus silicon unit into a modular switch can bring many benefits, such as greatly improving the integration and density of the system, simplifying the network cabling, reducing the deployment complexity and costs. The released Silicon Switch can support up to 192 ports (100Gb each). Different Silicon Switches could be connected through a 400Gb specific interface.
Leveraging the integrated Silicon Switch could also greatly raise the popularity of Torus high-speed network technology, since there is almost no change on the computing nodes side and then some small and medium-scale high-performance computing systems may adopt the Torus topology smoothly.
It is worth mentioning that the released Silicon Switch by Sugon supports the cold-plate direct liquid cooling as well. It has been marking the extension of Sugon’s liquid cooling technology from the computing device to the network system. In fact, the liquid cooling technology has played a key role in improving the integration and reliability of the large-scale network systems in terms of reducing their energy consumption.
The flourishing development of high-performance computing and artificial intelligence rely on not only the powerful computing parts, but also on efficient communication parts. Sugon shall aim to blaze new trails in computing, storage, networking and other core technologies.