Two distinct solutions yielding nearly identical results – but with a significant difference in cost and management.
These are the key findings of a recent study conducted by Chelsio Communications that compares the performance of Lustre RDMA (Remote Direct Memory Access) over Ethernet vs. FDR InfiniBand.
Lustre is the popular, scalable, secure, high availability HPC file system that addresses extreme I/O needs by providing low latency and high throughput in large computing clusters. Like other storage protocols, Lustre benefits from the use of RDMA.
RDMA achieves unprecedented levels of efficiency, thanks to direct system or application memory-to-memory communication, without CPU involvement or data copies. With RDMA enabled adapters, all packet and protocol processing required for communication is handled in hardware by the network adapter to achieve high performance.
Chelsio’s Terminator 5 ASIC with RDMA over Ethernet (iWARP) uses a hardware TCP/IP stack that runs in the adapter, completely bypassing the host software stack, thus eliminating any inefficiencies due to software processing. iWARP RDMA provides all the benefits of RDMA, including CPU bypass and zero copy, while operating over standard, simple Ethernet.
The Chelsio T5 ASIC is a fifth generation, high-performance 2x40Gbps/4x10Gbps server adapter engine with Unified Wire capability, allowing offloaded storage, compute and networking traffic to run simultaneously. T5 also provides a full suite of high performance stateless offload features for both IPv4 and IPv6.
In addition, T5 is a fully virtualized NIC engine with separate configuration and traffic management for 128 virtual interfaces, and includes an on-board switch that offloads the hypervisor v-switch.
Results of the Study
The Chelsio study compared the performance of Lustre RDMA over 40Gbps Ethernet and FDR InfiniBand. The results showed nearly identical performance.
But here’s the big difference. Unlike InfiniBand, iWARP provides a high performance RDMA transport, preserving investments in Ethernet network functions, such as security, load balancing and monitoring appliances, and network infrastructure in general, and without the need for an expensive gateway, special configurations or additional management costs. Thanks to its hardware TCP/IP foundation, it provides low latency and all the benefits of RDMA, with routability to scale to large clusters and long distances.
Also, iWARP concurrently enables a full suite of networking and storage protocols, including user space IO with WireDirect, full offload of TCP/IP and UDP/IP, iSCSI and FCoE, all traffic managed and firewalled.
The figures below tell the full story. The graphs compare Lustre READ and WRITE throughput over iWARP and IB-FDR, at different I/O sizes using the fio tool.
The READ throughput numbers show 40 Gbps iWARP delivering nearly identical performance with IB-FDR (56 Gbps) over the range of interest.
The WRITE results confirm the equality between the two transports, with nearly the same performance despite the theoretical bandwidth advantage of IB (56G vs. 40G for one port).
Conclusion
This comparison of the performance of Lustre RDMA over Chelsio’s T5 iWARP RDMA adapters and the latest IB-FDR adapters proved to be highly instructive.
The performance results show that iWARP at 40GbE is on par with IB-FDR, while using standard Ethernet infrastructure, with no special configuration or management needed. Thanks to the resulting cost and management savings, iWARP is the most cost effective high performance RDMA transport available today.