Multi-Path Transport for RDMA in Datacenters

NSDI |

Published by USENIX

RDMA is becoming prevalent because of its low latency, high throughput and low CPU overhead. However, current RDMA remains a single path transport which is prone to failures and falls short to utilize the rich parallel paths in datacenters. Unlike previous multipath approaches, which mainly focus on TCP, this paper presents a multi-path transport for RDMA, i.e. MPRDMA, which efficiently utilizes the rich network paths in datacenters. MP-RDMA employs three novel techniques to address the challenge of limited RDMA NICs on-chip memory size: 1) a multi-path ACK-clocking mechanism to distribute traffic in a congestion-aware manner without incurring per-path states; 2) an out-of-order aware path selection mechanism to control the level of out-of-order delivered packets, thus minimizes the meta data required to them; 3) a synchronise mechanism to ensure in-order memory update whenever needed. With all these techniques, MP-RDMA only adds 66B to each connection state compared to single-path RDMA. Our evaluation with an FPGA-based prototype demonstrates that compared with single-path RDMA, MPRDMA can significantly improve the robustness under failures (2x∼4x higher throughput under 0.5%∼10% link loss ratio) and improve the overall network utilization by up to 47%.