{"id":372896,"date":"2017-03-21T16:39:33","date_gmt":"2017-03-21T23:39:33","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=372896"},"modified":"2017-04-19T16:44:47","modified_gmt":"2017-04-19T23:44:47","slug":"rdma-commodity-ethernet-scale-2","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/rdma-commodity-ethernet-scale-2\/","title":{"rendered":"RDMA over Commodity Ethernet at Scale"},"content":{"rendered":"

Over the past one and half years, we have been using RDMA over commodity Ethernet (RoCEv2) to support some of\u00a0 Microsoft’s highly-reliable, latency-sensitive services.\u00a0 This paper describes the challenges we encountered during the process and the solutions we devised to address them. In order to scale RoCEv2 beyond VLAN, we have designed a DSCP-based priority flow-control (PFC) mechanism to ensure large-scale deployment. We have addressed the safety challenges brought by PFC-induced deadlock (yes, it happened!), RDMA transport livelock, and the NIC PFC pause frame storm problem.\u00a0\u00a0We have also built the monitoring and management systems to make sure RDMA works as expected. Our experiences show that the safety and scalability issues of running RoCEv2 at scale can all be addressed, and RDMA can replace TCP for intra data center communications and achieve low latency, low CPU overhead, and high throughput.<\/p>\n","protected":false},"excerpt":{"rendered":"

Over the past one and half years, we have been using RDMA over commodity Ethernet (RoCEv2) to support some of\u00a0 Microsoft’s highly-reliable, latency-sensitive services.\u00a0 This paper describes the challenges we encountered during the process and the solutions we devised to address them. In order to scale RoCEv2 beyond VLAN, we have designed a DSCP-based priority […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13547],"msr-video-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-372896","msr-video","type-msr-video","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/www.youtube.com\/watch?v=QTHSGmPhVEs","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/372896"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":0,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/372896\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=372896"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=372896"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=372896"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=372896"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=372896"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=372896"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=372896"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}