November 17, 2010

InfiniBand Continues Upward Climb in Top Supers

Michael Feldman

Lost in the hoopla about the ascendency of China and GPGPUs in the TOP500 is the continuing saga of the InfiniBand-Ethernet interconnect rivalry. In the latest TOP500 list, the number of InfiniBand and Ethernet connected supercomputers is now nearly the same — 215 for InfiniBand and 227 for Ethernet. But that’s a 18 percent increase for the former and a 14 decrease for the latter compared to last year. Only seven 10 Gigabit Ethernet-based supercomputers made the current list, although that’s up from only one such system last year.

For the top 100 systems, which encompass the majority of the FLOPS on the list, the numbers skew quite heavily in favor of InfiniBand. For these elite machines, InfiniBand connects 61 percent of them, while Ethernet manages just a one-percent share. For the petaflop machines, InfiniBand is employed in 57 percent of them, with custom interconnects used in the remainder.

But proprietary interconnects have a substantial presence at the top. For example, vendor-specific system networks from Cray and IBM BlueGene are used in 23 percent and 9 percent, respectively, for the top 100. Even the number one Tianhe-1A system is using its own home-grown interconnect.

InfiniBand’s growth in the big systems follows a trend that’s been building for years. But that trend is not so much InfiniBand-based per se, but a more general movement toward system networks with the lowest possible latencies and the highest possible bandwidths.

In fact, at a press briefing here at SC10 on Tuesday, IDC’s Earl Joseph predicted that at the high end of the supercomputing market, use of high-performance custom interconnects, such as EXTOLL, will actually expand. According to him, there are six vendors working on new supercomputing interconnects, with EXTOLL representing the only one that is publicly known.

The driving force is the escalating processor and core counts on these big machines. Connecting them together so they behave as one requires greater and greater performance from the network fabric. So while most of these petascale supercomputers are likely to be based on standard CPU and GPU architectures, it may end up that there is no dominant interconnect. But with InfiniBand speeds bumping up to 56 Gbps (4X FDR) in 2011 and 104 Gbps (4X EDR) in 2012, the technology will certainly be a big player in the petascale space for the foreseeable future.

Speaking of GPUs, even though the machines accelerated by graphics chips made a great showing this year on the TOP500, their Linpack efficiency seems stuck at around 50 percent. Help is on the way though. NVIDIA’s GPUDirect technology (with the support of network adapters) should push those efficiency numbers up significantly. Of course, the idea is not to just get better Linpack. GPUDirect can bypass system memory copies, thus eliminating a lot of CPU overhead, which should speed up nearly all GPU computing applications. NVIDIA says the technology could crank up data transfer performance by as much as 30 times. That may be reason enough to check the InfiniBand box when building these big GPGPU supercomputers.

The battle for market share between InfiniBand and custom interconnects at the high end is shaping up to be a rather interesting rivalry — in some ways a more interesting one than the InfiniBand-10GbE battle for less well-endowed machines. In the meantime, we can look forward to a more rich and diverse interconnect landscape than the one we have today.