Announcement

OpenZFS Development, Solutions, and Support. Learn More

Klara

The FreeBSD Operating System introduces new features in CURRENT, its main development branch. Snapshots of CURRENT are made available as installer images weekly—making it easy to follow the CURRENT branch directly, by simply building a newer FreeBSD system from the source as changes land.


Why FreeBSD CURRENT?

Following CURRENT can take a lot of work since FreeBSD releases are stabilized over a long development process, but CURRENT is where new features first arrive and get tested. CURRENT doesn’t get official security or errata notices, so users must follow developments via the commit logs and mailing list, creating obvious downsides to running it in production. This doesn’t stop large organizations such as Netflix using systems based on CURRENT in production—but doing so responsibly means having an experienced team capable of catching and mitigating issues in CURRENT before they reach the production stack.

Why might you want to run CURRENT? If you have a large modified code base, or are building a product based on FreeBSD, CURRENT gives you a look into the future of FreeBSD. Running CURRENT will help you understand changes that are happening in the FreeBSD Operating System and it gives you an opportunity to see how your stack performs with new features. 

Running CURRENT can help you address incompatibilities in the software you run long before they appear in a release. As new FreeBSD releases get closer, you might want to run CURRENT to validate that your software stack runs and performs well, and report problems upstream in time for them to be fixed.

CURRENT includes a number of features that make it easier for FreeBSD to make the system more reliable. These debugging features detect errors in the system and alert system users when the errors occur. In addition to breaking bugs, they can even catch behaviors that simply result in “undesired behaviors.” These features help give FreeBSD releases the high level of reliability we love, but they come at the cost of system performance.

This is why the UPDATING file in root of the FreeBSD source tree has offered this warning since 20020:

NOTE TO PEOPLE WHO THINK THAT FreeBSD 14.x IS SLOW:
    FreeBSD 14.x has many debugging features turned on, in both the kernel
    and userland. These features attempt to detect incorrect use of
    system primitives, and encourage loud failure through extra sanity
    checking and fail stop semantics. They also substantially impact
    system performance. If you want to do performance measurement,
    benchmarking, and optimization, you'll want to turn them off. This
    includes various WITNESS-related kernel options, INVARIANTS, malloc
    debugging flags in userland, and various verbose features in the
    kernel. Many developers choose to disable these features on build
    machines to maximize performance. (To completely disable malloc
    debugging, define WITH_MALLOC_PRODUCTION in /etc/src.conf and rebuild
    world, or to merely disable the most expensive debugging functionality
    at runtime, run "ln -s 'abort:false,junk:false' /etc/malloc.conf".)

This warning was added during the development of FreeBSD 5. FreeBSD 5 saw the introduction of many SMP features, including debugging options with high overhead. Benchmarks of FreeBSD 5 looking to see SMP-related performance improvements saw an apparent reduction of performance instead.  If early testers don’t understand the impact of debugging features, they can make FreeBSD pre-releases look bad.

If you want to evaluate new features and performance impacts in CURRENT, you will need to disable its debugging features in order to get accurate measurements.

Understanding FreeBSD Development Features

The warning in UPDATING mentions three main features that help with debugging: INVARIANTS, WITNESS and malloc debugging.

INVARIANTS enables a lot of debugging code paths in the kernel. These are features that developers add to include extra checks beyond the kasserts (assertions that will result in a panic if the expected condition is not true). 

INVARIANTS features include extra walks of lists in loops, more assertions, and checks for points that never should become null. These extra tests can be costly, but also expose errors in the system.

WITNESS tracks when locks are acquired and released in the kernel, as well as keeping track of the order in which locks are taken. If you have ever seen the ‘lock order reversal’ message in dmesg, that is WITNESS in action. 

WITNESS is a powerful feature for debugging concurrency and locking issues in a development system. The extra tracking it adds is too much for a production system, but isn’t so onerous as to make a development system unusable.

Finally, malloc (memory allocation) debugging features enable tracking of use-after-free conditions, the collection of memory usage statistics, and assertions that help protect against common errors.

A/B Testing FreeBSD CURRENT

In this article we will show how to build a CURRENT system with the debugging features disabled, and perform some benchmarks to test the impact debugging features have on performance.

FreeBSD offers a built-in kernel configuration called GENERIC-NODEBUG that disables a lot of the debugging features so that CURRENT can be more accurately benchmarked and evaluated for performance.

sys/amd64/conf/GENERIC-NODEBUG:
#
# GENERIC-NODEBUG -- WITNESS and INVARIANTS free kernel configuration file
#                    for FreeBSD/amd64
#
# This configuration file removes several debugging options, including
# WITNESS and INVARIANTS checking, which are known to have significant
# performance impact on running systems.  When benchmarking new features
# this kernel should be used instead of the standard GENERIC.
# This kernel configuration should never appear outside of the HEAD
# of the FreeBSD tree.
#
# For more information on this file, please read the config(5) manual page,
# and/or the handbook section on Kernel Configuration Files:
#
#    https://docs.freebsd.org/en/books/handbook/kernelconfig/#kernelconfig-config
#
# The handbook is also available locally in /usr/share/doc/handbook
# if you've installed the doc distribution, otherwise always see the
# FreeBSD World Wide Web server (https://www.FreeBSD.org/) for the
# latest information.
#
# An exhaustive list of options and more detailed explanations of the
# device lines are also present in the ../../conf/NOTES and NOTES files.
# If you are in doubt as to the purpose or necessity of a line, check first
# in NOTES.
#
# $FreeBSD$

include GENERIC
include "../../conf/std.nodebug"

ident   GENERIC-NODEBUG

From the configuration we can see that this config includes the normal GENERIC config file and a second file, sys/conf/std.nodebug, where debugging features are removed:

sys/conf/std.nodebug:
#
# std.nodebug -- Disable the debug options found in the GENERIC kernel config.
#

nooptions       INVARIANTS
nooptions       INVARIANT_SUPPORT
nooptions       WITNESS
nooptions       WITNESS_SKIPSPIN
nooptions       DEBUG_VFS_LOCKS
nooptions       BUF_TRACKING
nooptions       FULL_BUF_TRACKING
nooptions       DEADLKRES
nooptions       COVERAGE
nooptions       KCOV
nooptions       MALLOC_DEBUG_MAXZONES
nooptions       QUEUE_MACRO_DEBUG_TRASH

# Net80211 debugging
nooptions       IEEE80211_DEBUG

# USB debugging
nooptions       USB_DEBUG
nooptions       HID_DEBUG

# CAM debugging
nooptions       CAMDEBUG
nooptions       CAM_DEBUG_FLAGS

We used a development machine with a Ryzen 3800X with 32GB of RAM and NVME storage to test the performance impact of CURRENT’s debugging features. The system was installed with a recent 14-CURRENT image with ZFS on root and then updated to the latest tip-of-tree from git. We also used a boot environment (here is a great introduction to using boot environments with FreeBSDto more easily perform differential (A/B) tests in very similar environments.

We’ll call the CURRENT system with no changes to /etc/src.conf or /etc/make.conf debug, and we’ll call the CURRENT system with debugging disabled nodebug.

As well as kernel changes, we need to disable some userland features, we can put the full set of build configuration for nodebug in /etc/make.conf and /etc/src.conf, which looks like this on nodebug:

/etc/make.conf:
KERNCONF="GENERIC-NODEBUG"

/etc/src.conf:
WITH_MALLOC_PRODUCTION="YES"
WITHOUT_LLVM_ASSERTIONS="YES"

Benchmarking FreeBSD

The FreeBSD development features we disabled in the “nodebug” system have different impacts in different sub systems, and their impacts are exercised by different workloads.

When we benchmark, it is important to test workloads that accurately model your system’s real-world workload. In the best case, we would be able to steer real traffic to a test machine and evaluate changes in its ability to serve traffic. 

For this article we are going to build FreeBSD itself as a test workload, and use two tests from the large number of benchmarks available in the FreeBSD ports system.

There are 124 tools in the benchmarks category of the ports tree—most likely, some of those development tools can exercise your own real-world workloads well.

In this article we will look at the following benchmarks:

  • building FreeBSD kernel and world
  • iperf3 TCP loopback test
  • lzbench

Userspace-only Benchmarking

As we have seen earlier, many of the development features we need to disable in CURRENT affect many if not most workloads. The malloc debugging features can impact memory-based workloads even in userspace, but INVARIANTS and WITNESS debug paths mostly impact kernel-space workloads.

We can see this by comparing the OpenSSL benchmarks on our debug and nodebug systems.

Build benchmarks

Building FreeBSD is a great benchmark for development style workloads. Building FreeBSD involves reading thousands of small files, compiling them, writing the results to temporary files, and then linking the results together. Compilation is a heavy userspace processing task, but getting the files stresses the kernel too.

World and kernel builds on this machine take about half an hour when we don’t need to rebuild llvm. The long build time makes testing over several runs impractical, but this isn’t a typical task for most users.

Building the FreeBSD kernel exercises a similar workload to doing a full system build, but with just a subset of the total files. To look at the penalty to building FreeBSD on the debug host, we built the kernel once to prime our filesystem cache, then ran 10 more build iterations to test performance repeatably. The compiler toolchain is in sync with the system here, so we see just the kernel compilation time in both boot environments.

debug $ for x in $(jot 10); do /usr/bin/time -o ~/buildkernel-debug -a make -j 16 buildkernel KERNCONF=GENERIC; done
nodebug $ for x in $(jot 10); do /usr/bin/time -o ~/buildkernel-nodebug -a make -j 16 buildkernel KERNCONF=GENERIC; done

In each case, the buildkernel stage is timed using time and appended to a log file. The average time to run buildkernel on both systems is plotted below:

The impact on build performance for a FreeBSD CURRENT system with debugging features is clear. Kernel builds complete about 30% faster with debugging features disabled.

Similarly, a single ‘buildworld buildkernel’ on the debug machine took 36 minutes, but only 24 minutes on the nodebughost. This is a huge reduction in build times.

Benchmarking Network Throughput

For a quick estimate of network stack performance, we can use iperf3. We run the test with the client and server on the same machine. This removes the network from the equation, and exercises the locking and invariants debug paths in the kernel in both directions.

For this quick test, we set up iperf3 to transfer 100GiB from the client side to the server side. There is some potential noise that filters through from the TCP congestion controller, so we run ten iterations to minimize its impact.

We need to start the iperf3 server side in one terminal:

$ iperf3 -s

In another terminal, we first disable TCP host caching, then do 10 iperf3 client iterations in a loop, appending the time each transfer took to a log file.

# sysctl net.inet.tcp.hostcache.enable=0  

The test transfer is started like so:

for x in `jot 10`; do /usr/bin/time -o iperf3-nodebug -a iperf3 -c localhost -n 100G; done

The average transfer time is plotted below for both debug and nodebug:

CURRENT’s debug features have a large impact in this test, with runs on the debug host taking nearly twice as long as runs on the nodebug host. Converted into bitrates, debug gets ~18Gbit/s compared to ~31Gbit/s on the nodebug environment.

Benchmarking with lzbench

Finally, let’s look at lzbench—a benchmark that should be unaffected by most of the debug features. 

lzbench is an in-memory benchmark of open source compression tools and algorithms. It exercises a compression algorithm in memory, attempting to avoid interaction with other parts of the system. This means that when it is doing a good job, the results we get should be far more representative of the algorithm than the host operating system.

For these tests, we created a random file of ascii data like so:

dd if=/dev/random bs=1m count=1024| b64encode - > testfile.out 

This creates a nicely compressible test file —it is just ascii, but it won’t compress down to nothing in the way a file of all zeros would. 

We then ran lzbench with the following command:

debug $ lzbench -ezstd testfile.out
lzbench 1.8 (64-bit FreeBSD)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                   5718 MB/s  5755 MB/s  1450493368 100.00 testfile.out
zstd 1.4.5 -1            1020 MB/s  1319 MB/s  1093210136  75.37 testfile.out
zstd 1.4.5 -2            1007 MB/s  1318 MB/s  1093210154  75.37 testfile.out
zstd 1.4.5 -3             960 MB/s  1317 MB/s  1093211523  75.37 testfile.out
zstd 1.4.5 -4             987 MB/s  1318 MB/s  1093211809  75.37 testfile.out
zstd 1.4.5 -5             345 MB/s  1320 MB/s  1093254405  75.37 testfile.out
zstd 1.4.5 -6             317 MB/s  1322 MB/s  1093272851  75.37 testfile.out
zstd 1.4.5 -7             309 MB/s  1322 MB/s  1093271832  75.37 testfile.out
zstd 1.4.5 -8             313 MB/s  1321 MB/s  1093271832  75.37 testfile.out
zstd 1.4.5 -9             311 MB/s  1324 MB/s  1093299343  75.37 testfile.out
^C

lzbench displays the compression speed, decompression speed and the ratio between the uncompressed and compressed files. It starts with a memory benchmark which shows how fast the null compressor (which does nothing at all) can go.

The first 10 levels of decompression for debug and nodebug from lzbench were plotted below:

We can see that lzbench doesn’t benefit our nodebug test system, but it also doesn’t suffer a penalty—lzbench is doing a good job of testing the algorithm rather than the operating system. 

In heavy CPU userspace benchmarks, the debugging features in CURRENT don’t have a high overhead—but more realistic server and networking workloads can experience huge impacts

Conclusion

When it comes to performance benchmarking, you should always test with workloads which accurately model your real workloads. Most workloads will benefit significantly from disabling CURRENT’s debugging features, but some CPU only tasks won’t.

It is important to know how upgrades to FreeBSD are going to impact your servers, and testing CURRENT is the best way to know what is going to happen in the future FreeBSD releases.

If you integrate CURRENT hosts into your test or parts of your production environment, it is important to disable its development-only debugging features. Failing to do so is likely to give you a false impression of how FreeBSD is progressing, since a promising commit offering a huge performance increase can easily be overshadowed by the performance impact of WITNESS of INVARIANTS debugging.

Additional Resources

Here are some interesting resources on the shell that you may also find useful:

  • If you want to learn more or have questions on how Klara can help you stay on top of performance, simply contact us. Our team of senior engineers will be ready to answer your questions
  • The release of #FreeBSD 13.1 means it's time to start validating and benchmarking your stack against 14.0-CURRENT before its release in 2023.  Klara can provide any support you need with this process.
  • Need any specific help with implementing benchmarking? You know where to find us.
Back to Articles

What makes us different, is our dedication to the FreeBSD project.

Through our commitment to the project, we ensure that you, our customers, are always on the receiving end of the best development for FreeBSD. With our values deeply tied into the community, and our developers a major part of it, we exist on the border between your infrastructure and the open source world.