{"id":602421,"date":"2019-08-15T09:10:32","date_gmt":"2019-08-15T16:10:32","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=602421"},"modified":"2019-08-15T09:11:53","modified_gmt":"2019-08-15T16:11:53","slug":"whos-to-blame-debugging-internet-performance-for-azure-users-with-blameit","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/whos-to-blame-debugging-internet-performance-for-azure-users-with-blameit\/","title":{"rendered":"Who\u2019s to blame? Debugging Internet performance for Azure users with BlameIt"},"content":{"rendered":"
Microsoft Azure cloud hosts a wide variety of services, and Azure has hundreds of network edge locations worldwide across the globe\u2019s six continents to host those services. The Azure locations host many interactive (latency-sensitive) services that cater to consumer and enterprise clients covering a broad set of products around productivity, search, communications, and storage. These Azure edge locations are the first stop in the Microsoft network that customers hit, and with edge locations spread out worldwide, customers all over can reach Microsoft with low latency. Hundreds of millions of clients use the services on Azure every single day.<\/p>\n
Plenty of prior studies have shown the precipitous fall in user engagement with increasing latency. But we don\u2019t have to read studies to understand how important low latency is in our daily lives\u2014we only need to start a video call with someone on our phones or computers. The importance of low latency and round-trip time (RTT) becomes especially evident when latency is high\u2014the glitches and lag in the audio or video make it impossible to have a natural conversation. In fact, our own past work with Skype<\/a> has studied the importance of the network for good user experience.<\/p>\n The example above illustrates the importance of low latency in the network, and it also shows how, when there are inevitable slow-downs in the network, the system must be able to identify the problem and recover as quickly as possible. This is where BlameIt technology comes in. In real time, BlameIt endeavors to precisely identify where, in the pathway from client to cloud and back to client, there are issues in individual autonomous systems (AS or ASes) along the way. In our SIGCOMM 2019<\/a> paper, \u201cZooming in on Wide-area Latencies to a Global Cloud Provider<\/a>,\u201d we show how BlameIt works to identify these faulty ASes. The work is a result of multiple years of collaboration between Microsoft Research and Azure Networking.<\/p>\n