P4: Sketches

At a conceptual level, traffic monitoring is a simple, almost trivial task. For example, suppose that we wish to monitor traffic at the granularity of individual flows. Then it seems like we can simply maintain a finite map data structure that associates flows with integers, and increment the corresponding value when each packet is received.

Unfortunately, there is a serious fly in the ointment: it is not clear how to implement data structures such as finite maps on hardware devices like network switches. In particular, languages such as P4 provide no way to dynamically resize the amount of memory used to store headers, metadata, and other state. Hence, we would need to somehow pre-allocate storage at compile-time for every possible flow that could be seen at run-time – a difficult task given the huge number of possible flows, and the limited resources available in hardware devices.

This lecture will introduce a family of data structures and algorithms that trade-off accuracy for tight bounds on storage requirements.

Identifying Flows

Before diving into the details of these data structures, let us briefly discuss how to identify network flows. Intuitively, we would like to associate each packet with the transport protocol-level connection that it belongs to. However, protocols such as TCP execute non-trivial state machines to set up and tear down connections, and while it would be possible to simulate those state machines on a switch, it would be quite costly. In practice, many devices simply identify flows with a “5-tuple” consisting of a subset of the IP and TCP/UDP headers that are strongly correlated with actual flows:

IP source address
IP destination address
IP protocol type
TCP/UDP source port
TCP/UDP destination port

Note that two packets with the same 5-tuple may in fact belong to different flows. However, many data plane algorithms are not sensitive to such collisions. If additional precision is required there are two main options:

Actually track the transport protocol-level state machine, as discussed above, to accurately track TCP connections. While this is prohibitively expensive on current hardware switches, some software devices do provide this capability.
Keep track of timestamps and when a significant amount of time has elapsed between subsequent packets in the same 5-tuple, treat those packets as belonging to distinct flowlets.

In either case, note that the number of possible flows is enormous – i.e., 2^32 * 2^32 * 2^8 * 2^16 * 2^16 = 2^104!

Flow size distribution

Empirically, it has been observed that the distribution of flow sizes in many networks follows a skewed distribution, with a small number of large flows (“elephants”) and a large number of small flows (“mice”). Hence, many queries of interest can be answered with reasonable acccuracy by using approxiamte algorithms. For example, if we only need to keep track of the top-k “heavy hitters,” one can imagine using strategies such as the following:

Sampling: select 1 out of every N packets (e.g., for N = 10,000), evaluate the monitoring query, and extrapolate the results to the entire set of input packets.
Smart filtering: somehow cheaply filter away mice flows, and then evaluate the query on the remaining elephant flows.

There is a vast literature on sampling, which we will not focus on in this lecture. Instead, we will see how to implement data structures that can implement “smart” filtering in a space efficient manner.

Bloom Filters

Let us start with a classical approximate data structure that is simple to understand. Suppose that we wish to implement a set interface with the following operations:

empty(): initializes the set,
insert(x): adds x to the set, nad
member(x): returns 1 if x is in the set and 0 otherwise.

Note that our interface does not include a remove operation.

Now let a be an array of bits of size M, and let h_1 to h_k be distinct (ideally independent, perfect) hash functions.

We can implement the operations in the set interface as follows:

empty(): for i in 0 to M, set a[i] := 0
insert(x):
- use the hash functions to compute k indices:
  - i_1 := h_1(x)
  - …
  - i_k := h_k(x)
- then at each of these indices i set the corresponding entry to 1:
  - a[i_1] := 1
  - …
  - a[i_k] := 1
`member(x):
- use the hash functions to compute k indices:
  - i_1 := h_1(x)
  - …
  - i_k := h_k(x)
- return the bit-wise AND of the entries at each index: a[i_1] & ... & a[i_k]

This implementation has a few attractive properties. First, it uses a constant amount of space no matter how large the universe of elements is, or how many elements have been inserted into the set. Second, assuming the hash functions execute in constant time, all of the set operaitons also execute in constant time. However, to achieve these properties, bloom filters give up on accuracy. In particular, if member(x) returns 1 then it is possible that x was never inserted into the set, due to collisions of the hash functions. That is, false positives are possible but false negatives are not.

Example Application

Suppose that we wish to implement a resource-intensive data plane operation – e.g., diverting a packet to an SDN controller – only on flows with at least two packets. For example, this strategy might be used to help protect the controller during a simple SYN flood denial-of-service attack. By using a bloom filter, we can efficiently compute when a packet with the same headers has been seen previously, and only divert packets to the controller if the member test is positive.

Analysis

It is possible to derive bounds on the accuracy of a bloom filter based on the size of the array M, the number of hash functions k, and the number of items that have been inserted n. See the excellent survey by Broder and Mitzenmacher for details.

Count-Min Sketch

Next we will see how to develop a more complicated data structure called a count-min sketch that is inspired by bloom filters and can be used to compute queries over a stream. To start, suppose we wish to answer frequency queries – i.e., how many of each item have been received so far.

More formally, we can think of a count-min sketch as a data structure with the following operations:

create(): initialize an empty count-min sketch
increment(x): process item x in the data stream by incrementing its tally
frequency(x): return the frequency of x in the stream

As with bloom filters, suppose that we have k hash functions h_1 to h_k. We will associate with each hash function an of integers, organized into a matrix:

+-----+--+--+--+--+--+--+--+--+--+
| a_1 |  |  |  |  |  |  |  |  |  |
+-----+--+--+--+--+--+--+--+--+--+
|                ...             | 
+-----+--+--+--+--+--+--+--+--+--+
| a_k |  |  |  |  |  |  |  |  |  |
+-----+--+--+--+--+--+--+--+--+--+

The operations of a count-min sketch can be implemented as follows:

create(): initialize a matrix as shown above
process(x):
- use the hash functions to compute k indices:
  - i_1 := h_1(x)
  - …
  - i_k := h_k(x)
- then increment the corresponding entry in each array by 1:
  - a1[i_1] := a1[i_1] + 1
  - …
  - ak[i_k] := a1[i_k] + 1
frequency(x):
- use the hash functions to compute k indices:
  - i_1 := h_1(x)
  - …
  - i_k := h_k(x)
- then return the smallest value in each array: min(a1[i_1],..., ak[i_k])

Why does frequency(x) return the smallest value? Consider what happens when there is a collision: then an entry in some array ai contains the counts for two distinct entries, say x and x'. Hence, each array entry is an upper bound on the true frequency count. By taking the minimum value, we obtain the tightest bound.

Other operations

Frequency queries can be used to implement many data plane operations, but sometimes richer queries are needed. For example, we may wish to answer range queries – what is the frequency of all items whose IP address matches 10.0.*.*? Or we may wish to extract the set of itemse being tracked by the count-min sketch more efficiently than by brute force.

One way to do achieve this is to organize the space of possible keys into a binary tree, with a count-min sketch to track membership at each level of the tree. For example, the root would represent the set of all items, the children would represent a partition of the items, and so on. By traversing the tree, it is possible to implement range queries as well as extract the set of items in the count-min sketch.

Reading

For more details on uses of bloom filters and count-min sketches, see the survey by Broder and Mitzenmacher and the paper on OpenSketch by Yu et al.