Virtual Switch Packet Classification
- Daniel Firestone ,
- Hadi Katebi ,
- George Varghese
MSR-TR-2016-66 |
Packet classification is the problem of categorizing packets into flows, where all packets belonging to the same flow are processed by a predefined set of rules. Packet classification can be N-dimensional, i.e., can be on N different fields of packet header. For example, a two-dimensional packet classification may define a flow as “all packets destined to address A and port P”, and choose to “block” all such flows.
Network devices solve packet classification by maintaining a database of flow-rule pairs. For every incoming packet, they search the database for a flow that matches the packet header, and then apply the matching rule. This procedure is referred to as rule matching. In practice, a packet might match multiple rules. To resolve ambiguity, rules are given priorities, and the “best” matching rule, i.e., the rule with the highest priority, is returned. This matching scheme is known as priority-based matching (as opposed to longest-prefix matching which is not our focus here).
The most intuitive approach to solve packet classification is to create a list of all flow-rule pairs, sort them by their priorities, and linearly search the list to find a match. This approach, already employed in the Virtual Filtering Platform (VFP), our virtual switch, does not scale well as the number of rules grows (beyond a few hundred rules). In fact, today allows tenants to define tens of thousands of rules (user-defined routes, ACLs, etc.), which well justifies the need for scalable and efficient packet classification.
Here, we introduce a novel approach to packet classification that replaces linear rule matching in our Virtual Switch, the Virtual Filtering Platform (VFP). Our approach builds a data structures for every condition type (e.g., CIDR IP, IP range, single ports, etc.). It then independently searches each data structure for a matching rule, and returns the best matching rule found from any of the data structures. We call each of these data structures a classifier. Each classifier is constructed for a specific condition type (e.g., trie for CIDR IP). Conversely, we say that a condition is optimized by a classifier. Our classification approach is efficient in both space and time; it does not replicate any rules (previous approaches did) and performs insertion/lookup in average constant time. Hence, it will significantly improve VFP’s SYN rate.