Did you know that FreeBSD has more than one TCP stack and that TCP stacks are pluggable at run time? Since FreeBSD 12, FreeBSD has support pluggable TCP stacks, and today we will look at the RACK TCP Stack. The FreeBSD RACK stack takes this pluggable TCP feature to an extreme: rather than just swapping the congestion control algorithm, FreeBSD now supports dynamically loading and an entirely separate TCP stack. With the RACK stack loaded, TCP flows can be handled either by the default FreeBSD TCP stack or by the RACK stack.
Dummynet: The Better Way to Build FreeBSD Networks
Dummynet: The Better Way of Building FreeBSD Networks
Dummynet is the FreeBSD traffic shaper, packet scheduler, and network emulator. Dummynet allows you to emulate a whole set of network environments in a straight-forward way. It has the ability to model delay, packet loss, and can act as a traffic shaper and policer. Dummynet is roughly equivalent to netem in Linux, but we have found that dummynet is easier to integrate and provides much more consistent results.
Network emulation is a technique for creating virtual networks to simulate the conditions of real networks in a more controlled lab-like environment. Using network emulation helps speed up testing and development by allowing the user to duplicate and replicate their findings. Emulation combined with traffic shaping allows you to build interesting networks that can place limits on how much of the available capacity is used by one type of application or flow. With network emulation we can create strict environments for different types of traffic, limiting bandwidth or artifically increasing round trip time so they don’t overwhelm the network.
Network emulation gives you the power to test your applications and services in network conditions with vastly different from the ones you currently have. This means that developers can get the full hellish experience of using a weak edge signal or the terrible interactivity of geostationary satellite delay. With emulation you can test a field deployment from the comfort of a warm lab. Network conditions vary greatly and many types of network (especially those that use radio somewhere along the path) can vary in capacity and delay based on the time of day or even the weather.
Using dummynet on FreeBSD
Dummynet is a feature of the ipfw firewall. It exposes three rule types that let you control how traffic is managed once it is fed into the emulator: pipes, schedulers, and queues. A pipe emulates a network link, you can set characteristics on a pipe and they will be enforced by the emulator. This allows you to limit the available bandwidth or impose additional delay on packets that traverse the pipe. Queues and scheduler rules allow us to use dummynet as a traffic shaper, packets fed into a queue are classified by their 5-tuple (protocol, source and destination IP and port) and scheduled differently based on the algorithm and parameters provided.
To start using dummynet you need to load the ipfw and dummynet kernel modules. You should always experiment with firewall rules on a test machine, ideally one where you can easily get access to the console when you lock yourself out.
When testing rules, it can be helpful to change the ipfw default rule to accept rather than reject all traffic. This helps avoid lock out and removes some of the more perplexing forwarding issues. You can do this before loading the kernel module with the kenv utility, or by adding a very late rule that allows all traffic through.
# kenv net.inet.ip.fw.default_to_accept=1 # kldload ipfw dummynet
# kldload ipfw dummynet # ipfw add 65534 allow any from any to any
ipfw is configured with a ruleset, a list of numbered rules from 1 to 65535. Packets are evaluated against the ruleset until a rule either drops the packet or passes the packet through. The first rule that matches is the winner and further processing stops. In ipfw rules you can use the actions deny or drop to stop packets and allow, accept, pass or permit to allow packets through.
When configuring network emulation (and firewalls generally) it is very important to have a good idea how you are going to test that the rules you have written are performing the actions you intend (and hopefully no other actions).
When performing network emulation you are primarily concerned with two parameters, the bandwidth available and the path delay. For testing delay you can use the hardy ping(8) command. ping will tell you the round trip time it experiences, helping you see how much delay there is in the network. Sometimes more usefully it will report all ICMP error messages it receives. This can be really helpful when an errant rule is rejecting traffic.
To measure the bandwidth available with the emulator (and normally before) you can use the iperf3(1) tool available from ports or packages. iperf3 is the third version of the iperf tool and it offers a whole host of features, and there are some public servers available that allow you to perform measurements to them. iperf3 is more automatable than the older iperf2 was and now offers json output so it can easily be integrated into automated test workflows.
The table below shows typical bandwidth and delay values for different types of network you may wish to emulate:
Name Downlink bandwidth Units Downlink delay (ms) Uplink bandwidth Units Uplink delay (ms) Edge 240 Kbps 400 200 Kbps 440 3G 780 Kbps 100 330 Kbps 100 LTE 50 Mbps 50 10 Mbps 65 DSL 2 Mbps 10 256 Kbps 10 VDSL 50 Mbps 10 10 Mbps 10 Geo-Satellite 10 Mbps 325 2 Mbps 325 Bandwidth and Delay for Different Types of Network
When configuring network emulation with ipfw using pipes, the pipe emulates a link with the given bandwidth, delay, loss, burst sizes and buffering characteristics. To use pipes with ipfw you first have to configure them, then add rules that feed traffic to the pipe. Pipes are typically used to enforce network conditions on an entire subnetwork, but you can use any of the filters in ipfw to control which traffic goes into the emulated link. Pipes are referred to by a number, which should not be confused with the ipfw rule numbers. You may wish to define a specific range of numbers that are used for pipes, and not create firewall rules with the same numbers, to avoid confusion. The
ipfw pipe config command will use the default values for any parameter you don’t specify when creating the pipe.
For these tests you can use a FreeBSD host machine with bhyve running a FreeBSD guest VM. Add the guest’s tap interface as a member to a bridge on the host.
# ifconfig tap creat tap0 # ifconfig bridge create bridge0 # ifconfig bridge0 inet 10.0.4.1/24 up # ifconfig bridge0 addm tap0
The bridge has the address 10.0.4.1 while the VM will have the address 10.0.4.2. Configuring bhyve is left as an exercise to the reader.
Setting up Network Emulation
Before you do anything, use ping and iperf3 to measure the network characteristics from the VM guest to the host in your test network (you could do this to a public host you know, such as 18.104.22.168 or to one of the iperf3 test servers listed here).
$ ping -c 3 10.0.4.1 PING 10.0.4.1 (10.0.4.1): 56 data bytes 64 bytes from 10.0.4.1: icmp_seq=0 ttl=64 time=0.683 ms 64 bytes from 10.0.4.1: icmp_seq=1 ttl=64 time=0.615 ms 64 bytes from 10.0.4.1: icmp_seq=2 ttl=64 time=0.626 ms --- 10.0.4.1 ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.615/0.641/0.683/0.030 ms
To run a bandwidth test you run iperf3 in server mode on your VM host:
$ iperf3 -s
and run a default 10 second test from your guest VM to the host:
$ iperf3 -c 10.0.4.1 Connecting to host 10.0.4.1, port 5201 [ 5] local 10.0.4.2 port 54881 connected to 10.0.4.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 281 MBytes 2.35 Gbits/sec 0 1.19 MBytes [ 5] 1.00-2.00 sec 365 MBytes 3.06 Gbits/sec 0 1.21 MBytes [ 5] 2.00-3.00 sec 364 MBytes 3.05 Gbits/sec 0 1.21 MBytes [ 5] 3.00-4.00 sec 373 MBytes 3.13 Gbits/sec 0 1.21 MBytes [ 5] 4.00-5.00 sec 347 MBytes 2.91 Gbits/sec 0 1.21 MBytes [ 5] 5.00-6.00 sec 366 MBytes 3.06 Gbits/sec 0 1.21 MBytes [ 5] 6.00-7.00 sec 375 MBytes 3.15 Gbits/sec 0 1.21 MBytes [ 5] 7.00-8.00 sec 371 MBytes 3.11 Gbits/sec 0 1.21 MBytes [ 5] 8.00-9.00 sec 375 MBytes 3.15 Gbits/sec 0 1.21 MBytes [ 5] 9.00-10.00 sec 373 MBytes 3.13 Gbits/sec 0 1.21 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 3.51 GBytes 3.01 Gbits/sec 0 sender [ 5] 0.00-10.03 sec 3.51 GBytes 3.00 Gbits/sec receiver iperf Done.
That looks like a great network, there is a ton of capacity (it is just a guest speaking to the VM host after all) and the delay is below 1ms and hardly varies at all. This setup might exist between machine in a data center or your house, but it isn’t anything like what you will see from the Internet edge to the core.
Around 20 milliseconds of total delay (the round trip time that ping reports) is typical from many locations to services on the Internet. Lets set up pipes to delay all of the traffic from the guest VM by 10ms in each direction:
# ipfw add 00097 pipe 1 ip from 10.0.4.2 to 10.0.4.1 # ipfw add 00099 pipe 2 ip from 10.0.4.1 to 10.0.4.2 # ipfw pipe 1 config delay 10ms # ipfw pipe 2 config delay 10ms
With these commands all traffic from the guest to the hosts bridge is fed into pipe 1 and any traffic from the host to the bridge is fed into pipe 2. Each pipe only models one direction of the link, this can be very helpful if you are modelling a cellular network that has delays that are different in each direction. Lets test this config to see if it has the desired effect:
$ ping -c 3 10.0.4.1 PING 10.0.4.1 (10.0.4.1): 56 data bytes 64 bytes from 10.0.4.1: icmp_seq=0 ttl=64 time=19.661 ms 64 bytes from 10.0.4.1: icmp_seq=1 ttl=64 time=20.110 ms 64 bytes from 10.0.4.1: icmp_seq=2 ttl=64 time=20.225 ms --- 10.0.4.1 ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 19.661/19.999/20.225/0.243 ms
The delay is what we expect. Now lets configure a bandwidth limit in our network:
# ipfw pipe 1 config delay 10ms bw 50Mbit/s # ipfw pipe 2 config delay 10ms bw 10Mbit/s # iperf3 -c 10.0.4.1 Connecting to host 10.0.4.1, port 5201 [ 5] local 10.0.4.2 port 27352 connected to 10.0.4.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 2.94 MBytes 24.7 Mbits/sec 0 98.1 KBytes [ 5] 1.00-2.00 sec 4.90 MBytes 41.0 Mbits/sec 10 66.5 KBytes [ 5] 2.00-3.00 sec 4.34 MBytes 36.4 Mbits/sec 0 135 KBytes [ 5] 3.00-4.00 sec 4.16 MBytes 34.9 Mbits/sec 8 115 KBytes [ 5] 4.00-5.00 sec 4.32 MBytes 36.2 Mbits/sec 8 94.9 KBytes [ 5] 5.00-6.00 sec 4.65 MBytes 39.0 Mbits/sec 8 49.5 KBytes [ 5] 6.00-7.00 sec 4.33 MBytes 36.3 Mbits/sec 0 135 KBytes [ 5] 7.00-8.00 sec 4.17 MBytes 35.0 Mbits/sec 8 115 KBytes [ 5] 8.00-9.00 sec 4.41 MBytes 37.0 Mbits/sec 10 89.2 KBytes [ 5] 9.00-10.00 sec 4.64 MBytes 38.9 Mbits/sec 0 146 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 42.8 MBytes 35.9 Mbits/sec 52 sender [ 5] 0.00-10.02 sec 42.6 MBytes 35.7 Mbits/sec receiver iperf Done.
This has set a limit, but the TCP transfer that iperf3 is doing is not able to utilize the full link capacity. This is because there is not enough buffering configured on the link. Sizing buffers is a complicated field of study, but as a general rule you want one full Bandwith Delay Product (BDP, the bandwidth times the total delay) of buffering. If we configure this for the forward link it does much better:
# ipfw pipe 2 config delay 10ms bw 10Mbit/s queue 200kb # ipfw pipe 1 config delay 10ms bw 50Mbit/s queue 1000kb # ipfw pipe 1 show 00001: 50.000 Mbit/s 10 ms burst 0 q131073 1000 KB 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail sched 65537 type FIFO flags 0x0 0 buckets 1 active BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp 0 ip 0.0.0.0/0 0.0.0.0/0 4679 4775549 12 16552 0
# iperf3 -c 10.0.4.1 Connecting to host 10.0.4.1, port 5201 [ 5] local 10.0.4.2 port 16845 connected to 10.0.4.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 3.60 MBytes 30.2 Mbits/sec 0 115 KBytes [ 5] 1.00-2.00 sec 4.98 MBytes 41.8 Mbits/sec 0 165 KBytes [ 5] 2.00-3.00 sec 5.12 MBytes 42.9 Mbits/sec 0 205 KBytes [ 5] 3.00-4.00 sec 5.50 MBytes 46.1 Mbits/sec 0 241 KBytes [ 5] 4.00-5.00 sec 5.56 MBytes 46.6 Mbits/sec 0 272 KBytes [ 5] 5.00-6.00 sec 5.67 MBytes 47.6 Mbits/sec 0 300 KBytes [ 5] 6.00-7.00 sec 5.60 MBytes 47.0 Mbits/sec 0 326 KBytes [ 5] 7.00-8.00 sec 5.71 MBytes 47.9 Mbits/sec 0 350 KBytes [ 5] 8.00-9.00 sec 5.69 MBytes 47.7 Mbits/sec 0 373 KBytes [ 5] 9.00-10.00 sec 5.69 MBytes 47.7 Mbits/sec 0 394 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 53.1 MBytes 44.5 Mbits/sec 0 sender [ 5] 0.00-10.04 sec 52.7 MBytes 44.0 Mbits/sec receiver iperf Done.
With that change you have a guest VM that with network conditions similar to a VDSL line. This has only scratched the surface of some of what can be done with dummynet as a network emulator. Dummynet can also be used as a packet scheduler and is able to make very interesting and useful networks.
Like this article? Share it!
You might also be interested in
Get more out of your FreeBSD development
Kernel development is crucial to many companies. If you have a FreeBSD implementation or you’re looking at scoping out work for the future, our team can help you further enable your efforts.
Today, let’s talk a little bit less about technology itself, and a little bit more about business management. There are a couple of key management terms that every system administrator and IT professional should know and love—RPO and RTO, or Recovery Point Objective and Recovery Time Objective.
Once we understand the meaning and importance of RTO and RPO, we will take a look at two ZFS technologies—snapshots and replication—which greatly ease their management.
Understanding which data benefits from being in a snapshot and how long it makes sense to keep snapshots will help you get the most out of OpenZFS snapshots. Pruning snapshots to just the ones you need will make it easier to find the data you want to restore, save disk capacity, and prevent performance bottlenecks on your OpenZFS system.