Nvidia today introduced its Mellanox NDR 400 gigabit-per-second InfiniBand family of interconnect products, which are expected to be available in Q2 of 2021. The new lineup includes adapters, data processing units (DPUs–Nvidia’s version of smart NICs), switches, and cable. Pricing was not disclosed. Besides the obvious 2X jump in throughput from HDR 200 Gbps InfiniBand devices available now, Nvidia promises improved TCO, beefed up in-network computing features, and increased scaling capabilities.
During a media/analyst pre-briefing, Gilad Shainer, now Nvidia SVP of networking and long-time Mellanox exec, noted that he joined Mellanox 20 years ago to design the first InfiniBand adapter which back then ran at 10 Gbps. Indeed, InfiniBand and Mellanox have had a productive journey since and have consistently been at forefront of networking in HPC. The introduction of NDR 400 Gbps InfiniBand is perhaps an indication that InfiniBand’s momentum will continue with Mellanox now being part of Nvidia.
Next on the InfiniBand roadmap would be XDR (800 Gbps) and GDR (1.6 terabits per second) and more extensive use of in-network computing.
It’s noteworthy the NDR 400 Gbps InfiniBand product family uses passive copper ‘wires’ leveraging its low cost and reliability.
“We’re very proud that even at a speed of 400 gigabits per second, we’re still able to run a copper cable. It looks like we can do 1.5-meter [and] potentially even more for passive copper cable. That means that within a rack you can run on copper. We will also provide the active copper cables that extend that to several meters. Beyond that, optical transceivers that can go tens and hundreds of meters,” Shainer said.
There is a growing consensus that co-packaging with optical interconnect will be required in the not so distant future. “Co-packaging of [copper and optical] is going to happen because you need to reduce power consumption and you need to enable density, [but] as long as you can run copper it’s much better,” he said.
The NDR generation is both backward and forward compatible with the InfiniBand standard said Shainer, adding “To run 400 gigabits per second you will need either 16 lanes of PCIe Gen5 or 32 lanes of PCIe Gen4. Our adapters are capable of both.”
Systems with NDR 400 InfiniBand technology are expected in the second quarter of 2021. Nvidia cited Atos, Dell Technologies, Fujitsu, Inspur, Lenovo and Supermicro as companies who’ve committed to integrating the new interconnect line into enterprise solutions and HPC offerings.
Shainer highlighted the increased scalability associated with the new switches, “We’re not just doubling the bandwidth per port. [We’re] also more than tripling the number of ports in a single device, which means you can build very large infrastructures with much better the total cost of ownership. You can connect 2000 boards of 400 Gigabit per second or 4000 boards at 200 gigabit per second speed. It’s one switch platform that can connect the entire data center. [Also] with more ports on every switch, we can connect more than one million GPUs in three hops of a network topology.”
For adopters of NDR 400 Gbps InfiniBand, Nvidia projects network cost savings of 1.4x and power savings of 1.6x for datacenters with ~4,000 GPUs and 1.2x (cost and power) savings for datacenters with ~1,500 GPUs.
The purposeful migration of select compute functions into the network is a long-time theme for Mellanox and Nvidia has been actively promoting the idea of a data processing unit – essentially a smart NIC – which has onboard CPU and GPU capabilities (See slide below from GTC fall 2020).
“In-network computing engines include actually two kinds of engines. There are engines that are predefined, meaning engines that are set to run a very specific algorithm because that algorithm is critical to run on the network. Because it’s much more scalable. It’s immune to system size, it’s immune to jitter and many other things. An example of that is the SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) technology that enables data reductions and data aggregations on the network itself and reduces latencies dramatically. Another example is MPI agents that run on the network. We have MPI tag-matching [as] an example. In NDR we’re adding another MPI agent, an MPI engine that runs the MPI all-to-all operation, [which] is critical for deep learning. So those are those are dedicated engines.
“Besides [dedicated engines] there are programmable engines. There is a programmable data path and there are programmable cores, like Arm cores, that sit on the network that enable migration of other elements from the [main system] to the network; [this] could include the infrastructure management, enabling security and isolations, as well as pre-processing of data or batch control of applications. [The network becomes] a distributed compute platform that works in synergy with the GPUs on either side,” said Shainer.