{"id":284585,"date":"2014-08-19T07:00:22","date_gmt":"2014-08-19T14:00:22","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=284585"},"modified":"2016-09-08T16:56:43","modified_gmt":"2016-09-08T23:56:43","slug":"mobility-networking-researchers-making-big-impact-cloud","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/mobility-networking-researchers-making-big-impact-cloud\/","title":{"rendered":"Mobility and Networking Researchers Making a Big Impact in the Cloud"},"content":{"rendered":"

The annual conference of the Association for Computing Machinery\u2019s Special Interest Group on Data Communication<\/a> (SIGCOMM) is always a highlight for those who follow the latest developments in applications, technologies, architectures, and protocols for computer communication. SIGCOMM 2014<\/a>, to be held in Chicago from August 17 to 22, is definitely the highlight of the year for Victor Bahl<\/a>, (@SuperBahl<\/a>) director of Microsoft Research\u2019s Mobility and Networking Research Group<\/a> (MNR).<\/p>\n

\"Mobility

Members of the Mobility and Networking Research Group at Microsoft Research: (front) Victor Bahl; (second row) Aakanksha Chowdhery, Ganesh Ananthanarayanan, and Matthai Philipose; (third row) Ratul Mahajan and Meg Walraed-Sullivan; (fourth row) Srikanth Kandula and Peter Bodik; (fifth row) Ming Zhang, Alec Wolman, and Ranveer Chandra; (sixth row) Hongqiang Liu and Stefan Saroiu; and (rear) Sharad Agarwal.<\/p><\/div>\n

\"George

George Varghese<\/p><\/div>\n

Two of Bahl\u2019s MNR colleagues are being recognized for significant achievements during SIGCOMM 2014. George Varghese<\/a> is receiving SIGCOMM\u2019s highest honor, the SIGCOMM Award<\/a> for lifetime achievement, for sustained and diverse contributions to network algorithmics, with far-reaching impact in both research and industry. He also will deliver the conference’s keynote address. Meanwhile, Ratul\u00a0Mahajan<\/a> is receiving the prestigious ACM SIGCOMM Test of Time Paper Award<\/a>\u00a0for his 2002 paper Measuring ISP topologies with Rocketfuel<\/em><\/a>, written with University of Washington colleagues Neil Spring and David Wetherall.<\/p>\n

It\u2019s also been a bumper year for papers from Microsoft researchers, who wrote or co-wrote nine papers accepted for SIGCOMM, including a Best Paper Award<\/a>-winning paper, CONGA: Distributed Congestion-Aware Load Balancing for Datacenters<\/em><\/a>, written by Varghese and a host of industrial colleagues. Microsoft researchers Ming Zhang<\/a>, Srikanth Kandula<\/a>, and Mahajan contributed to multiple papers, with Zhang and Kandula co-authoring four papers each, a feat only three others have managed in the last 20 years of SIGCOMM.<\/p>\n

For Bahl, such recognition is testimony to the high caliber of the scientists within MNR and their academic partners.<\/p>\n

\u201cThis is a group with amazing depth,\u201d he says. \u201cThey\u2019re not just world-class scientists, who routinely come up with great ideas and theory, but they are also very pragmatic. They love nothing better than to solve real-world problems with broad impact. As a research group, we have a real advantage, because we can collaborate in-house with fantastic engineers in Microsoft Azure<\/a> networking and data-platform teams. This close working relationship is absolutely essential to all parties and key to our continuous success.\u201d<\/p>\n

Those interested in how Microsoft\u2019s engineering teams have gained from MNR research over the years need look no further than the Mobility and Networking Research page<\/a>, in particular the section titled Tech. Transfers. Scroll down the page, and the range of research work by MNR team members<\/a> becomes evident, from data-center networking to protocols for Xbox One<\/a> controllers.<\/p>\n

Bahl notes that Microsoft\u2019s Wide Area Software Defined Network, <\/em>the first item on the Tech. Transfers list, was based on the pre-production version of a traffic-engineering system. This work was described in a paper presented last year<\/a> during SIGCOMM 2013, and the system is now in full production at Microsoft, saving millions of dollars annually by optimizing bandwidth utilization.<\/p>\n

Bahl emphasizes two of the Microsoft papers accepted for SIGCOMM this year that are based on close collaboration with the Microsoft Azure teams.<\/p>\n

A Breakthrough in Cluster Scheduling<\/h2>\n

The first is Multi-Resource Packing for Cluster Schedulers<\/em><\/a>, by Robert Grandl and Aditya Akella of the University of Wisconsin-Madison, along with Ganesh Ananthanarayanan<\/a>, Kandula, and Sriram Rao<\/a> of Microsoft.<\/p>\n

It is challenging to schedule tasks on server clusters. Ideally, a scheduling algorithm should maximize the number of tasks that run at the same time, improving the average job-completion time, as well as maximizing the number of tasks that can run on each server, thus improving server utilization. Historically, though, schedulers were designed for scheduling processors and memory. Extending them to handle storage, which can reside remotely, creates efficiency problems, because of network contention. When this happens, the effective throughput of jobs can decrease, sometimes by more than 40 percent.<\/p>\n

\"total

The tables show total cluster utilization with time. Scenario (a) uses fairness-based allocation. Scenario (b), using Tetris, speeds up both jobs.<\/p><\/div>\n

Researchers and members of the Windows Big Data Platform Team working on a new scheduler achieved a breakthrough when they noticed the problem they had identified was similar to a well-known computer-science problem called multidimensional bin packing. When mapped to big-data systems where data might not be stored in a single location, this problem becomes even tougher because of additional complications. For example, tasks can use less than their peak resources and still finish because their resource requirements change depending on where they are placed\u2014on the same machine or a different one.<\/p>\n

Current packing techniques improve cluster throughput but can delay individual jobs. Tetris, the team\u2019s new scheduler, trades off between the two.<\/p>\n

Researchers are hardening the code to make it available for Microsoft\u2019s big-data systems, and, possibly, via an open-source storage framework.<\/p>\n

Network-State Services Adopted by Azure<\/h2>\n

The second paper Bahl highlights is A Network-State Management Service<\/em><\/a>, by Peng Sun and Jennifer Rexford of Princeton University, and Mahajan, Ahsan Arefin<\/a>, Lihua Yuan, and Zhang of Microsoft.<\/p>\n

\"Network-state

Network-state service: a foundation for data-center-network management.<\/p><\/div>\n

Statesman, the service described in the paper, is a network-state service (NSS) that has progressed well beyond the prototype stage. Deployed worldwide in all Microsoft Azure data centers since December 2013, it manages more than a million links and 20,000 network devices.<\/p>\n

Cloud services, including those operated by Microsoft, support hundreds of millions of Internet users. Beneath these online services, some of the largest data-center networks in the world, often including thousands of network devices and spanning several continents, operate within highly dynamic environments. The sheer number of physical devices means that multiple devices might go offline at any moment for maintenance, firmware upgrades, reconfiguration, or component failures.<\/p>\n

Against this complex backdrop, human operators perform management work, much of it manually. It can take hours or even days for human beings to troubleshoot networks, steer traffic away from hotspots, or upgrade firmware on a large number of devices. Meanwhile, users experience degraded service, and network operators suffer losses to the bottom line.<\/p>\n

Automated network-management systems are difficult because they must work correctly even if there are component failures or variable delays in communicating with distributed devices. Moreover, there is always the possibility of conflict, for example, between systems for firmware upgrades and traffic engineering. Such conflicts affect the network, sometimes to the extent of disrupting an entire data center.<\/p>\n

\"Statesman\"

Statesman divides the network state into observed, proposed, and target states.<\/p><\/div>\n

The Statesman NSS solves such issues by maintaining the states of all network devices and offering that as a service. Network-management systems built atop Statesman can make decisions without worrying about low-level interactions with physical devices. To prevent conflicts and violations, Statesman divides the network state into\u00a0observed, proposed,\u00a0and\u00a0target\u00a0states. Each management system reads the\u00a0observed state\u00a0and produces a\u00a0proposed state. Statesman merges multiple\u00a0proposed states\u00a0into one\u00a0target state.<\/p>\n

This approach was inspired by the way multiple developers collaborate on the same project through a revision-control system. The Azure Networking group quickly adopted Statesman and worked with Zhang and Mahajan to implement a solution for Microsoft\u2019s data centers. A switch-upgrade system and a\u00a0link-failure-mitigation system have been deployed on top of Statesman, and a traffic-engineering system will be operational soon.<\/p>\n

\u201cNSS is critical to our data-center networks,\u201d says Albert Greenberg, director of development for Azure Networking. \u201cIt is now fundamental to how we will write the software-defined-networking stack for the core network to bring higher reliability to the backbone. NSS is now a fundamental building block for Microsoft networking.\u201d<\/p>\n

Ambitious Objective<\/h2>\n

Relevance to the real world is core to Bahl\u2019s MNR group objective of producing work that delivers significance and legacy.<\/p>\n

\u201cI believe we achieve a lasting legacy in two ways,\u201d Bahl explains. \u201cThe first is through research that stands the test of time. The second is through solving real-world problems.\u201d<\/p>\n

Given the team\u2019s prominence during SIGCOMM 2014 and its recent contributions to Microsoft Azure, it\u2019s not a stretch to suggest that Bahl and team are achieving their group objective: Significance. Legacy. Impact.<\/p>\n

Microsoft Research papers in SIGCOMM 2014<\/h2>\n