{"id":929844,"date":"2023-03-22T14:27:49","date_gmt":"2023-03-22T21:27:49","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/"},"modified":"2023-03-29T21:26:11","modified_gmt":"2023-03-30T04:26:11","slug":"empowering-azure-storage-with-rdma-technical-report","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/empowering-azure-storage-with-rdma-technical-report\/","title":{"rendered":"Empowering Azure Storage with RDMA"},"content":{"rendered":"

Given the wide adoption of disaggregated storage in public clouds, networking is the key to enabling high performance and high reliability in a cloud storage service. In Azure, we choose Remote Direct Memory Access (RDMA) as our transport and aim to enable it for both storage frontend traffic (between compute virtual machines and storage clusters) and backend traffic (within a storage cluster) to fully realize its benefits. As compute and storage clusters may be located in different datacenters within an Azure region, we need to support RDMA at regional scale.<\/p>\n

This work presents our experience in deploying intra-region RDMA to support storage workloads in Azure. The high complexity and heterogeneity of our infrastructure bring a series of new challenges, such as the problem of interoperability between different types of RDMA network interface cards. We have made several changes to our network infrastructure to address these challenges. Today, around 70% of traffic in Azure is RDMA and intra-region RDMA is supported in all Azure public regions. RDMA helps us achieve significant disk I\/O performance improvements and CPU core savings.<\/p>\n","protected":false},"excerpt":{"rendered":"

Given the wide adoption of disaggregated storage in public clouds, networking is the key to enabling high performance and high reliability in a cloud storage service. In Azure, we choose Remote Direct Memory Access (RDMA) as our transport and aim to enable it for both storage frontend traffic (between compute virtual machines and storage clusters) […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13547],"msr-publication-type":[193718],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[251707,254068,248227],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-929844","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us","msr-field-of-study-cloud-storage","msr-field-of-study-cloud-systems","msr-field-of-study-computer-network"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2023-3-29","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"MSR-TR-2023-13","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"Microsoft","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/03\/RDMA_Experience_Paper_TR-1.pdf","id":"931917","title":"rdma_experience_paper_tr-1","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":931917,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/03\/RDMA_Experience_Paper_TR-1.pdf"},{"id":931914,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/03\/RDMA_Experience_Paper_TR-64250bc48be3d.pdf"},{"id":931908,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/03\/RDMA_Experience_Paper_TR.pdf"},{"id":929856,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/03\/Azure_Storage_RDMA_TR-641b73408a5aa.pdf"},{"id":929853,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/03\/Azure_Storage_RDMA_TR.pdf"}],"msr-author-ordering":[],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[860115,317750],"publication":[],"video":[],"download":[],"msr_publication_type":"techreport","related_content":{"projects":[{"ID":860115,"post_title":"Network Stack for Modern Cloud","post_name":"network-stack-for-modern-cloud","post_type":"msr-project","post_date":"2022-08-02 15:21:06","post_modified":"2023-03-05 01:11:10","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/network-stack-for-modern-cloud\/","post_excerpt":"As part of the Network Stack for 2030 initiative, we are rethinking the network stack, which was designed about 30 years ago, when the networks, and the applications they supported, looked very different.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/860115"}]}},{"ID":317750,"post_title":"RDMA for Cloud Computing","post_name":"rdma-for-cloud-computing","post_type":"msr-project","post_date":"2016-11-07 15:54:05","post_modified":"2017-06-14 09:32:57","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/rdma-for-cloud-computing\/","post_excerpt":"In this project, we have introduced a series of technologies, including DCQCN congestion control and DSCP-based PFC, and addressed a set of challenges including PFC deadlock, RDMA transport livelock, PFC pause frame storm, slow-receiver symptom, to make RDMA scalable and safe, and to enable RDMA deployable in production at large scale. We currently are working on RDMA deadlock understanding and prevention, and RDMA support for future AI infrastructure. RDMA Congestion Control Modern datacenter applications demand…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/317750"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/929844"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/929844\/revisions"}],"predecessor-version":[{"id":930021,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/929844\/revisions\/930021"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=929844"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=929844"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=929844"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=929844"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=929844"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=929844"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=929844"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=929844"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=929844"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=929844"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=929844"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=929844"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=929844"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=929844"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=929844"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=929844"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}