Smart switches, major deployment wins, HPC Centers of Excellence, NICs with on-board FPGAs, and UCX progress were all part of Mellanox’s hectic agenda at SC15. In mid-December DoD approved Mellanox 10/40 Gigabit Ethernet switches for use in defense networks. Then last week, time ran out for acquisition target EZchip to find a higher offer seemingly clearing the way for Mellanox’s roughly $800M purchase of EZchip.
These are busy times for the HPC interconnect powerhouse. Indeed all things interconnect – fabrics, switches, new architectures, competing acceleration approaches – are under the spotlight in the race to exascale and the effort to broaden HPC penetration of the enterprise. Mellanox (NASDAQ: MLNX) is in the thick of all of it.
During SC15 HPCwire managing editor John Russell sat down with Gilad Shainer, VP of marketing for Mellanox, for a discussion of company directions and products and broad interconnect technology trends. Perhaps not surprisingly, the need to make networks smarter was top of mind for Shainer. If the transition from SMP to clusters ushered in terascale and the move from single core to multi-core enabled petascale computing – both requiring interconnect advances – the path to exascale will require yet more progress.
A Mellanox step in that direction is introduction of its Switch-IB2 100Gbs smart EDR switch. “For the first time you can execute and manage MPI operations on the switch silicon instead of on the server side,” said Shainer. “We’re seeing a 10x performance improvement by moving the management and execution of collective operations to the switch. The switch is essentially a coprocessor within the network.”
At SC15, Mellanox announced the Texas Advanced Computing Center (TACC) at The University of Texas in Austin had selected Mellanox’s 100Gb/s EDR interconnect solutions to develop North America’s first end-to-end 100Gb/s EDR high-performance computing (HPC) cluster. “TACC has had a number of InfiniBand systems over the years and we were looking to take our research capabilities to the next level,” said Bill Barth, director of high performance computing, TACC.
Of course Mellanox products encompass both InfiniBand and Ethernet and Shainer said the company expects to release “200Gbs Ethernet and maybe 400Gbs in the 2017 timeframe.”
In an interesting exascale gambit, Mellanox is seeking develop a consortium of “HPC Centers of Excellence.” The ideas is to drive collaboration to accelerate development and use of ‘intelligent interconnect’ including efforts in co-designed architecture and solutions. Mellanox announced an open invitation to “distinguished high-performance computing (HPC) centers” at SC15. The main proposed benefits are:
- Early access to the Mellanox roadmap
- Open collaboration with Mellanox and other leaders within the HPC community
- The opportunity to contribute to the development of next-generation solutions and drive toward Exascale computing
Roughly 25 organizations, mostly academic and government labs, have joined so far. A few of the members include: Australian National University, Federal University of Rio de Janeiro, German Climate Computing Centre (DKRZ), Harvard University, High Performance Computing Center of Shanghai Jiaotong University, The High Performance Computing Center Stuttgart (HLRS), Los Alamos National Laboratory, NASA, Max Planck, and Shanghai Supercomputing Center. It will be interesting to watch what projects are undertaken.
Like virtually all HPC technology suppliers, Mellanox also has its eye on the enterprise market. Shainer noted, “Hyperscale[rs] are essentially using the same technologies being used in HPC, whether if those are GPUs, intelligent networks, or RDMA (remote direct memory access). RDMA started in HPC but today it’s a key element in any storage infrastructure.” He contends no one would run up a major storage system without RDMA today for both cost and efficiency reason.
Mellanox also presented an update on nascent Unified Communication X (UCX) communication framework. “The UCX the mission is to create a unified communication framework that’s very low weight and connects directly to the hardware level interface so it enable much better power performance [and] supports any kind of infrastructure and any kind of communication library [such as] MPI or SHMEM,” said Shainer. (For fuller account, read HPCwire article, Mellanox, ORNL to Deliver UCX Progress Report at SC15).