Redundancy in Network Traffic: Findings and Implications

Ashok Anand; Chitra Muthukrishnan; Aditya Akella; Ramachandran Ramjee

Redundancy in Network Traffic: Findings and Implications

Ashok Anand ,
Chitra Muthukrishnan ,
Aditya Akella ,
Ramachandran Ramjee

Proceedings of ACM SIGMETRICS/Performance | June 2009

Published by Association for Computing Machinery, Inc.

Download BibTex

A large amount of popular content is transferred repeatedly across network links in the Internet. In recent years, packet-level protocol-independent redundancy elimination which can remove duplicate strings from within network packets has emerged as a powerful technique to improve the efficiency of network links in the face of repeated data. Many vendors offer such redundancy elimination middleboxes to improve the effective bandwidth of enterprise, data center and ISP links alike.

In this paper, we conduct a large scale trace-driven study of protocol independent redundancy elimination mechanisms, driven by several terabytes of packet payload traces collected at 12 distinct network locations, including the access link of a large US-based university and of 11 enterprise networks of different sizes. Based on extensive analysis, we present a number of findings on the benefits and fundamental design issues in redundancy elimination systems. Two of our key findings are (1) A new redundancy elimination algorithm based on Winnowing that outperforms the widely-used Rabin fingerprint-based algorithm by 5-10% on most traces and by as much as 35% in some traces. (2) A surprising finding that 75-90% of middlebox bandwidth savings in our enterprise traces is due to redundant byte-strings from within each client’s traffic, implying that pushing redundancy elimination capability to the end hosts, i.e. an end-to-end redundancy elimination solution, could obtain most of the middlebox’s bandwidth savings.

Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.