Researchers and engineers from Microsoft Research and Microsoft Azure have contributed nine scientific papers to the technical program of the 16th Annual USENIX Symposium on Networked Systems Design and Implementation – NSDI ‘19 – to be held in Boston, Massachusetts between February 26 and February 28, 2019. Our papers cover some of the latest technologies Microsoft has developed in networked systems.
While I would love to discuss all of our papers in detail, that would make this post far too long. Instead, as I’d previously written a couple of posts about cloud reliability and availability, today I’ll focus on another topic near and dear to me: network performance.
The world of containers
Spotlight: On-demand video
Recently, a lightweight and portable application-sandboxing mechanism called containers has become popular among developers who build applications for a wide variety of targets ranging from IoT Edge to planet-scale distributed web applications for multi-national enterprises. A container is an isolated execution environment on a Linux host. It supports its own file system, processes and network stack. A single machine—host—can support a significantly larger number of containers than standard virtual machines, providing attractive cost savings. Running an application inside a container isolates it from the host and other applications running in other containers. Even when the applications are running with superuser privileges, they cannot access or modify the files, processes or memory of the host or other containers. There is more to say, but this is not intended to be a tutorial for containers. Let’s instead talk about networking between containers.
As it turns out, many container-based applications are developed, deployed and managed as groups of containers that communicate with one other to deliver the desired service. Unfortunately, until recently, container networking solutions had either poor performance or poor portability, which undermined some of the advantages of containerization.
Enter Microsoft FreeFlow. Jointly developed by researchers at Microsoft Research and Carnegie Mellon University, Freeflow is an inter-container networking technology that achieves high performance and good portability by using a new software element we call the Orchestrator. The Orchestrator is aware of the location of each container, and by leveraging the fact that containers for the same application do not require strict isolation, Orchestrator is able to speed things up. FreeFlow uses a variety of cool techniques such as shared memory and Remote Direct Memory Access (RDMA) to improve network performance—that is, higher throughput, lower latency, and less CPU overhead. This is accomplished while maintaining full portability and in a manner that is transparent to application developers. It’s said a picture is worth a thousand words and the following figure does a nice job of illustrating Freeflow’s capabilities.
Freeflow is a high-performance container overlay networking solution that takes advantage of RDMA and accelerates TCP sessions between containers used by the same applications.
Yibo Zhu, a former colleague from Microsoft Research and a co-inventor of this technology neatly summed up some unique advantages of FreeFlow. “One of the nice features of FreeFlow is that it works on top of popular technologies including Flannel, and Weave,” he said. “Containers have their individual virtual network interfaces and IP addresses. They do not need direct access to the hardware network interface. A lightweight FreeFlow library inside containers intercepts RDMA and TCP socket calls, and a FreeFlow router outside the containers helps accelerate the flow of data.”
You can read all the important details in our NSDI paper, “FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds.”
Contribution to Open Source
We believe this technology is important for everyone working in this space, so on June 12, 2018, Microsoft Research released FreeFlow on GitHub. Our release is based on the Linux RDMA project with the MIT license. This technology supports three modes of operation—fully isolated RDMA, semi-isolated RDMA, and TCP. Fully isolated RDMA works very well in multi-tenant environment such as Azure cloud. Most RDMA applications should run with no or very little modification and outperform traditional TCP socket-based implementation. We have tested with RDMA enabled –Spark, –HERD, –Tensorflow and –rsocket.
While the paper includes a solid evaluation, the following table shows some throughput results for a single TCP connection using container overlay across different VMs (28 Gbps models), measured by iperf3.
These results were measured just before we released Freeflow in June 2018 to open source. We will continue to find ways to improve performance (bet on it), but here’s the takeaway—before FreeFlow, it was not possible to run RDMA on container overlay. Freeflow enables it with minimal overhead.
Speeding up container overlay network
In addition to FreeFlow, I’d like to call attention to another related paper, “Slim: OS Kernel Support for a Low-Overhead Container Overlay Network.” Slim was jointly developed by researchers at the University of Washington and Microsoft Research.
Container overlay networks is a technology that enables a set of containers, potentially distributed over several machines, to communicate with one another using their own independently assigned IP addresses and port numbers. For an application running in these containers, the overlay network coordinates the connection between the various ports and IP addresses. Popular container orchestrators, such as Docker Swarm, require overlay network for hosting containerized applications.
Before we introduced Slim, container overlay networks imposed significant overhead. This was because of the multiple packet transformations within the operating system needed to implement network virtualization. Specifically, every packet had to traverse the network stack twice in the sender’s and receiver’s host OS kernel. To avoid this, researchers developed Slim, which implements network virtualization by manipulating connection-level metadata while maintaining compatibility with existing containerized applications. Packets go through the OS kernel’s network stack only once. Performance improvements are substantial. For example, the throughput of an in-memory key-value store improved by 66% while latency reduced by 42% and CPU utilization by 54%. Check out the paper for a thorough description and evaluation with additional results. If you are wondering what the difference between the two systems is, FreeFlow creates a fast container RDMA network; Slim provides an end-to-end low-overhead container overlay network for TCP traffic.
I will end by saying how proud I am that this is the fifteenth time since its inception that Microsoft Research has sponsored NSDI. Over the years, our researchers and engineers have contributed well over a hundred papers to the technical program and helped organize this symposium, its sessions, and co-located workshops. We are deeply committed to sharing knowledge and supporting our academic colleagues and the networking research community at large.
I am also delighted to announce that our paper Sora: High Performance Software Radio Using General Purpose Multi-core Processors has been awarded the NSDI ‘19 Test of Time Award for research results we presented over ten years ago.
If you plan to attend NSDI ’19, I urge you to attend the presentations of Microsoft papers. I also encourage you to meet our researchers; they will be happy to discuss their latest research and new ideas. And who knows, maybe that leads to some future collaborations and more great NSDI papers!