Homework 4

Changelog

v1: November 5, 2019
v2: November 11, 2019: clarified probe header that should be returned in exercise 3.

Overview

In this homework assignment, you will implement several space-efficient network monitoring schemes in P4.

Due Date

11:59pm, November 14, 2018

Academic Integrity

This assignment must be completed individually. All work you submit must be your own and sharing or receiving code is forbidden, with the exception of your partner. Do not look for or submit code you find on the Internet, and do not post solutions or partial solutions on the discussion site. If you make use of any outside materials, you must give attribution. You may ask general questions about the development environment, p4c, bmv2, Mininet, etc., and you may discuss high-level details of the exercises with your classmates. If you have any questions about what is allowed and what is not allowed, please ask the instructor first!

Starter Code

You can download starter code for this assignment as a zipfile from CMS.

Exercise 1: Warmup

To familiarize yourself with the starter code, please take a look at the files in the homework04 directory including:

The topology.json JSON file, which describes the topology of the network. For this assignment we will use a single switch connected to eight hosts with addresses 10.0.1.1, 10.0.1.2, and so on through 10.0.1.8.
The code in switch.p4, which implements a basic source routing scheme, similar to what we’ve seen in previous assignments.
The send.py receive.py, and probe.py scripts, which can be used to test your solutions.

Note that unlike previous assignments, we do not need a controller, as switch.p4 is based on source routing and has no match-action tables.

Next, let’s test that the starter code works:

% make
...
mininet> xterm h1 h2

Then execute ./receive.py in the terminal for h1 and ./send.py 10.0.1.1 "Hello World" in the terminal forh2. You should see that the packet is successfully sent fromh2 to h1`.

Finally, let’s add a simple counter to tabulate the total number of bytes sent to each switch. The v1model.p4 architecture includes the following declarations:

enum CounterType {
    packets,
    bytes,
    packets_and_bytes
}

extern counter {
    counter(bit<32> size, CounterType type);
    void count(in bit<32> index);
}

To complete this exercise, simply instantiate a counter in the MyIngress control, and execute its count method on every packet.

To read the value of the counter, you can use the simple_switch_CLI command-line utility on switch s1. For example, if your program has a counter c, then the command counter_read MyIngress.c will print its value to standard output.

To submit: Rename your solution counter.p4 and submit on CMS.

Exercise 2: Efficient Flow Monitoring

In this exercise, you will develop a more sophisticated monitoring program in P4 that tracks traffic statistics at per-flow granularity. For the purposes of this exercise, you may assume that flows are identified by an IP source-destination pair. Note that even with this simplification, the number of flows grows still quadratically with the number of hosts in the network – e.g., even if we assumed that all hosts have addresses of the form 10.0.1.* there would be 65,025 possible flows! Hence, it would be quite expensive to allocate a counter object for every possible flow, and infeasible in a larger network.

Instead, we will develop an approximate solution based on the count-min sketch data structure we saw in lecture. You can implement a sketch using the hash functions and register objects provided the V1Model architecture.

To answer frequency queries about a particular flow, we will use special probe packets, which have an Ethernet type of 0x9999 followed by an instance of the following header type:

header probe_t {
    bit<32> srcAddr;
    bit<32> dstAddr;
    bit<16> result;   
}

Upon receipt of a packet containing probe header probe, your program should estimate the cumulative size of all packets in the flow from probe.srcAddr to probe.dstAddr and then return the answer by storing the answer in probe.result and forwarding the packet back to the host that issued the query. We have provided a script probe.py that will send and print probe packets.

Implementation Steps

To complete this exericse, you should extend the starter code in switch.p4 as follows:

Add the probe_t header and extend the headers and parser appropriately. Extend the source routing forwarding logic to send probe packets back to the host that sent the query. (Hint: use standard_metadata.ingress_port.)
Develop data structures and operations that implement the functionality of a count-min sketch. For the purposes of this assignment, your sketch should use 3 hash functions (e.g., crc16, csum16 and identity) and registers of size 6.
To estimate packet sizes, you may rely on the totalLen field in the IPv4 header. You do not need to count the traffic generated by the probe packets in the count-min sketch.
Complete your implementation by adding packet sizes to the sketch for non-probe packets and querying the sketch for probe packets.

Testing

To test that your solution is working correctly, you can use the supplied send.py, receive.py, and probe.py scripts.

To submit: Rename your solution sketch.p4 and submit on CMS.

Karma: Accuracy Experiment

Note: Karma exercises are suggested problems that are completely optional. Feel free to try them if you are so inclined, or skip them if you are busy!

Because the count-min sketch is an approximate data structure, it does not always return the most accurate answer. Find an input (i.e., a list of specific invocations of send.py followed by probe.py) that cause your sketch to overestimate the traffic in a given flow. You may increase the number of hosts if it helps.

To submit: Assemble your solution into a file collision.zip and submit on CMS.

Exercise 3: Heavy Hitters

In this exercise, we will develop a way to answer “heavy hitter” queries. For the purpose of this exercise, to keep things simple, we will identify the heaviest senders of traffic rather than the traffic generated between each source-destation pair.

The main challenge is that a count-min sketch does not actually keep track of the values that were used to populate the counts in the sketch. Unless we were to somehow hash the entire space of possible values, there is no easy way to recover those values.

Your task in this exercise is to add extra state to your sketch.p4 program so it can compute the heaviest hitters. For simplicity, you only have to compute the single heaviest hitter (in practice, operators are typically interested in knowing the k heaviest hitters.)

More specifically, upon receipt of a packet containing the probe your program should estimate the cumulative size of all packets sent by the heaviest hitter as well as the IP address of the sender. You should store the cumulative size in the probe.result field and the IP address of the sender in the probe.srcAddr field. The probe.dstAddr field should not be modified. To finish the job, forward the packet back to the host that issued the query. You can use the same probe.py script as before to send and print probe packets to the switch, filling in the command-line arguments for the source and destination addresses with dummy values (e.g., 127.0.0.1).

Following are two possible implementation strategies you could consider:

Easy: Add a register to keep track of the heavy hitter, and update it on every packet. Because P4 lacks loops, you will have to iterate through the sketches and this register manually. Then modify the probe header as described above.
Sophisticated: Decompose the space of possible flows into a tree of disjoint blocks of IP addresses, and allocate a count-min sketch at each level whose keys correspond to sets of addresses rather than individual addresses.

The root of the tree should track all IP addresses and the leaves of the tree should track individual IP addresses. Then, for each node, use a count-min sketch to keep track of the traffic generated by hosts in each set. For example, for our node topology, the tree might look like this:

sketch1:               {h1-h8}
                     /         \
                    /           \
sketch2:      {h1-h4}            {h5-h8}
             /       \          /       \
            /         \        /         \
sketch3:  {h1-h2}   {h3-h4}   {h5-h6}   {h7-h8}
           /   \     /   \     /   \     /   \
          /     \   /     \   /     \   /     \
sketch3: {h1} {h2} {h3} {h4} {h5} {h6} {h7} {h8}

To process traffic from a given host, say h2, we would update all of the sketches from root to leaf–e.g., we would update sketch1 using the key {h1-h8}, sketch2 using the key {h1-h4}, sketch3 using the key {h1-h2}, and sketch4 using the key {h2}.

Then, to answer a heavy hitter query, we would traverse the tree from root to leaf, following the branch associated the highest estimate at each step.

For simplicity, because our network only has eight hosts, you can write P4 code to do this analysis. In reality, one would likely transfer the values of the sketches to a general-purpose machine that could perform the required analysis.

To submit: Rename your solution heavy_hitters.p4 and submit it on CMS.

Debriefing

How many hours did you spend on this assignment?
Would you rate it as easy, moderate, or difficult?
If you worked with a partner, please briefly describe both of your contributions to the solution you submitted.
How deeply do you feel you understand the material it covers (0%-100%)?
If you have any other comments, I would like to hear them! Please write them down or send email to jnfoster@cs.cornell.edu

To submit: debriefing.txt