P4: Sketches
At a conceptual level, traffic monitoring is a simple, almost trivial task. For example, suppose that we wish to monitor traffic at the granularity of individual flows. Then it seems like we can simply maintain a finite map data structure that associates flows with integers, and increment the corresponding value when each packet is received.
Unfortunately, there is a serious fly in the ointment: it is not clear how to implement data structures such as finite maps on hardware devices like network switches. In particular, languages such as P4 provide no way to dynamically resize the amount of memory used to store headers, metadata, and other state. Hence, we would need to somehow pre-allocate storage at compile-time for every possible flow that could be seen at run-time – a difficult task given the huge number of possible flows, and the limited resources available in hardware devices.
This lecture will introduce a family of data structures and algorithms that trade-off accuracy for tight bounds on storage requirements.
Identifying Flows
Before diving into the details of these data structures, let us briefly discuss how to identify network flows. Intuitively, we would like to associate each packet with the transport protocol-level connection that it belongs to. However, protocols such as TCP execute non-trivial state machines to set up and tear down connections, and while it would be possible to simulate those state machines on a switch, it would be quite costly. In practice, many devices simply identify flows with a “5-tuple” consisting of a subset of the IP and TCP/UDP headers that are strongly correlated with actual flows:
- IP source address
- IP destination address
- IP protocol type
- TCP/UDP source port
- TCP/UDP destination port
Note that two packets with the same 5-tuple may in fact belong to different flows. However, many data plane algorithms are not sensitive to such collisions. If additional precision is required there are two main options:
-
Actually track the transport protocol-level state machine, as discussed above, to accurately track TCP connections. While this is prohibitively expensive on current hardware switches, some software devices do provide this capability.
-
Keep track of timestamps and when a significant amount of time has elapsed between subsequent packets in the same 5-tuple, treat those packets as belonging to distinct flowlets.
In either case, note that the number of possible flows is enormous – i.e., 2^32 * 2^32 * 2^8 * 2^16 * 2^16 = 2^104!
Flow size distribution
Empirically, it has been observed that the distribution of flow sizes in many networks follows a skewed distribution, with a small number of large flows (“elephants”) and a large number of small flows (“mice”). Hence, many queries of interest can be answered with reasonable acccuracy by using approxiamte algorithms. For example, if we only need to keep track of the top-k “heavy hitters,” one can imagine using strategies such as the following:
-
Sampling: select 1 out of every N packets (e.g., for N = 10,000), evaluate the monitoring query, and extrapolate the results to the entire set of input packets.
-
Smart filtering: somehow cheaply filter away mice flows, and then evaluate the query on the remaining elephant flows.
There is a vast literature on sampling, which we will not focus on in this lecture. Instead, we will see how to implement data structures that can implement “smart” filtering in a space efficient manner.
Bloom Filters
Let us start with a classical approximate data structure that is simple to understand. Suppose that we wish to implement a set interface with the following operations:
empty()
: initializes the set,insert(x)
: addsx
to the set, nadmember(x)
: returns1
ifx
is in the set and0
otherwise.
Note that our interface does not include a remove
operation.
Now let a
be an array of bits of size M
, and let h_1
to h_k
be
distinct (ideally independent, perfect) hash functions.
We can implement the operations in the set interface as follows:
empty()
: fori
in0
toM
, seta[i] := 0
insert(x)
:- use the hash functions to compute
k
indices:i_1 := h_1(x)
- …
i_k := h_k(x)
- then at each of these indices
i
set the corresponding entry to1
:a[i_1] := 1
- …
a[i_k] := 1
- use the hash functions to compute
- `member(x):
- use the hash functions to compute
k
indices:i_1 := h_1(x)
- …
i_k := h_k(x)
- return the bit-wise
AND
of the entries at each index:a[i_1] & ... & a[i_k]
- use the hash functions to compute
This implementation has a few attractive properties. First, it uses a
constant amount of space no matter how large the universe of
elements is, or how many elements have been inserted into the set.
Second, assuming the hash functions execute in constant time, all of
the set operaitons also execute in constant time. However, to achieve
these properties, bloom filters give up on accuracy. In particular, if
member(x)
returns 1
then it is possible that x
was never
inserted into the set, due to collisions of the hash functions. That
is, false positives are possible but false negatives are not.
Example Application
Suppose that we wish to implement a resource-intensive data plane
operation – e.g., diverting a packet to an SDN controller – only on
flows with at least two packets. For example, this strategy might be
used to help protect the controller during a simple SYN
flood
denial-of-service attack. By using a bloom filter, we can efficiently
compute when a packet with the same headers has been seen previously,
and only divert packets to the controller if the member test is
positive.
Analysis
It is possible to derive bounds on the accuracy of a bloom filter
based on the size of the array M
, the number of hash functions k
,
and the number of items that have been inserted n
. See the excellent
survey
by Broder and Mitzenmacher for details.
Count-Min Sketch
Next we will see how to develop a more complicated data structure called a count-min sketch that is inspired by bloom filters and can be used to compute queries over a stream. To start, suppose we wish to answer frequency queries – i.e., how many of each item have been received so far.
More formally, we can think of a count-min sketch as a data structure with the following operations:
create()
: initialize an empty count-min sketchincrement(x)
: process itemx
in the data stream by incrementing its tallyfrequency(x)
: return the frequency ofx
in the stream
As with bloom filters, suppose that we have k
hash functions h_1
to h_k
. We will associate with each hash function an of integers,
organized into a matrix:
+-----+--+--+--+--+--+--+--+--+--+
| a_1 | | | | | | | | | |
+-----+--+--+--+--+--+--+--+--+--+
| ... |
+-----+--+--+--+--+--+--+--+--+--+
| a_k | | | | | | | | | |
+-----+--+--+--+--+--+--+--+--+--+
The operations of a count-min sketch can be implemented as follows:
create()
: initialize a matrix as shown aboveprocess(x)
:- use the hash functions to compute
k
indices:i_1 := h_1(x)
- …
i_k := h_k(x)
- then increment the corresponding entry in each array by
1
:a1[i_1] := a1[i_1] + 1
- …
ak[i_k] := a1[i_k] + 1
- use the hash functions to compute
frequency(x)
:- use the hash functions to compute
k
indices:i_1 := h_1(x)
- …
i_k := h_k(x)
- then return the smallest value in each array:
min(a1[i_1],..., ak[i_k])
- use the hash functions to compute
Why does frequency(x)
return the smallest value? Consider what
happens when there is a collision: then an entry in some array ai
contains the counts for two distinct entries, say x
and x'
.
Hence, each array entry is an upper bound on the true frequency count.
By taking the minimum value, we obtain the tightest bound.
Other operations
Frequency queries can be used to implement many data plane operations,
but sometimes richer queries are needed. For example, we may wish to
answer range queries – what is the frequency of all items whose IP
address matches 10.0.*.*
? Or we may wish to extract the set of
itemse being tracked by the count-min sketch more efficiently than by
brute force.
One way to do achieve this is to organize the space of possible keys into a binary tree, with a count-min sketch to track membership at each level of the tree. For example, the root would represent the set of all items, the children would represent a partition of the items, and so on. By traversing the tree, it is possible to implement range queries as well as extract the set of items in the count-min sketch.
Reading
For more details on uses of bloom filters and count-min sketches, see the survey by Broder and Mitzenmacher and the paper on OpenSketch by Yu et al.