Klara

When troubleshooting a Linux or FreeBSD system, you need to be able to probe the system to find answers as to why it is behaving in a particular way. Often, it isn't possible to replicate problems in a sandbox—so you need to gain an insight into what is happening on a production system in a non-intrusive manner. In this article, we'll provide an overview of some of the basic tools and introduce the FreeBSD equivalents of common Linux tracing and troubleshooting tools.


In most cases, you can use logs and traces can allow you to review what is happening without disturbing a system. Both Linux and FreeBSD come with a variety of tools that can expose extensive data and statistics for a running system. Knowing what the tools are and how to apply them can be more of an art than a science—but we’ll try to demystify them a bit below.

Understanding Resource Usage

When provisioning a system, a variety of finite resources (including but not limited to RAM, data bandwidth, processor time and persistent storage capacity) need to be allocated. . Aside from physical limits, there can also be imposed limits on such things as shared memory, locks and semaphores, message queues, number of processes, stack size and many other things besides. 

In a simple case, exhausting some resource will cause processes to fail with error messages pointing to the culprit. But things aren’t always so simple, and it can be much more difficult to find the cause of more complex issues such as gradual or intermittent loss of performance.

We can use the well-known tool top to monitor resource usage by process on either FreeBSD or Linux systems. FreeBSD’s version of top is different from its Linux counterpart in many specific details, but what it does is essentially the same: top shows a list of processes ordered according to their current CPU usage where the list is updated every second and continues updating until you press to quit. 

The following is an example of top’s output on one FreeBSD system:

last pid: 23871;  load averages:  0.33,  0.32,  0.26   up 29+01:38:31  21:33:22
96 processes:  1 running, 95 sleeping
CPU:  0.4% user,  0.0% nice,  0.8% system,  0.4% interrupt, 98.4% idle
Mem: 290M Active, 1026M Inact, 331M Laundry, 5749M Wired, 403M Free
ARC: 3762M Total, 2391M MFU, 774M MRU, 5336K Anon, 79M Header, 512M Other
     2541M Compressed, 7528M Uncompressed, 2.96:1 Ratio
Swap: 4096M Total, 572M Used, 3523M Free, 13% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
23843 opk           1  20    0    19M  7552K ttyin    1   0:00   1.06% zsh
18619 jellyfinse  113  44    0  2891M   369M uwait    1 439:00   0.20% jellyfin
  720 opk           1  20    0    21M  8168K select   0   0:00   0.13% sshd
28749    535        4  20    0    55M    30M kqread   1  52:58   0.10% redis-se
23833 root          1  20    0    13M  3248K CPU0     0   0:00   0.10% top
  787 root          1  20    0    24M    10M select   0   0:51   0.08% openconn
48338 www          12  20    0    84M    21M uwait    0   3:47   0.05% vaultwar
79108 www           1  20    0    19M  6228K kqread   0   0:34   0.05% nginx
 4691    770        1  20    0   172M  5200K kqread   1   6:07   0.03% postgres
 1283 root          1  20    0    11M   916K select   1   7:39   0.02% powerd
 2779 root          1  20    0    32M  1732K select   0   8:33   0.02% python2
 1233 root          1  20    0    15M  1452K select   0   6:50   0.01% mountd
28759    236        1  20    0   421M    30M kqread   1   7:03   0.01% rspamd-3
 1279 ntpd          1  20    0    18M  1744K select   0   3:21   0.01% ntpd
 4730 www           1  20    0    54M  7168K nanslp   0   2:34   0.00% php
 1247 root          1  20    0    14M   868K rpcsvc   0   0:08   0.00% rpc.lock
 4737 www           1  22    0    87M    14M accept   0  32:20   0.00% php-fpm

The header shows totals for CPU, Memory and Swap usage for the whole system. On FreeBSD, you also get details for the ZFS ARC memory cache. This particular system has plenty of CPU resources available, but has little free memory and the swap is in use. 

The list of processes shown here is very similar to what the ps command would show you, but is more focused on resource usage. Sometimes, just observing that a particular program is consuming excessive CPU is all that's needed to narrow down the source of a problem—but the CPU frequently isn’t the bottleneck for many real-world problems!

If you’re more interested in RAM usage than CPU usage, you can change the ordering in top by pressing o and then entering "size" on FreeBSD, or by pressing f on Linux. A certain amount of caution is needed when interpreting the memory figures, though—memory is often shared between processes, such as when multiple copies of in-memory images of binaries and shared libraries are in use. 

Going into further detail about RAM usage in top, we can see that both SIZE and RES fields are listed. The SIZE field includes all the virtual memory address-space allocated to the process, much of which may not be consuming any real physical memory. The resident size in the RES field excludes shared memory, and tends to be closer to the actual usage of that specific process. The ps command has options for further breaking down the types of memory a process uses.

If CPU usage is low when a system or process is performing poorly, it may be that access to storage devices is a bottleneck. FreeBSD's top will switch to I/O mode if you press m. This shows reads, writes and errors. On Linux, pressing instead switches display modes for memory usage—on that platform, it’s generally better to shift to a separate tool (such as iotop) entirely for even simple I/O monitoring.

Given the popularity of top, many other commands mimic its manner of displaying a sorted list that is kept frequently updated. If you search your system for commands with names that end in top you'll likely find a few matches. Many people prefer htop which, like top, shows processes but uses a friendlier, more colorful display and offers some convenient key shortcuts.

Getting a Thorough Understanding of I/O

The I/O statistics shown by top are listed by process. FreeBSD's GEOM subsystem includes the gstat command, which you can use to see statistics organized by physical drive or connection. gstat is much like top in that it lists disks ordered by the amount of I/O taking place and updates the list on a regular interval.

The more commonly known (and portable) command for listing I/O statistics is iostat. By default this shows an average of data transactions since the system booted. For current statistics, you can pass it an interval as a parameter, for example iostat 1 will print fresh statistics every second. 

FreeBSD and Linux implemetations of iostat differ somewhat in their specific options, but the command serves the same basic purpose on both. On either platform, it’s well worth considering the -x argument for extended disk statistics, which provides a much greater level of detail.

It is also possible to obtain statistics for ZFS pools with the command zpool iostat. In addition to real-time statistics on direct disk usage,  zpool iostat -v can be used to check on the usage of cache, log and special vdevs associated with a pool. 

Much like ZFS, the NFS file sharing protocol offers its own version of iostat—namely, nfsstat, which can monitor metrics for NFS exports on both client and server side of the connection. 

In our earlier article on “Easily Migrating from Linux to FreeBSD”, we also touched on some of the tools for showing statistics for network-based I/O. In particular, netstat -i shows statistics for dropped and transmitted packets.

Probing Processes

When problems are isolated to a particular process, it can be useful to probe the process to see what it is doing. You may be familiar with the core files (also known as “crash dumps” or “core dumps”) that are produced when a program crashes. These contain an image of the full process state, and are useful for programmers when debugging new code. But you don't need a process to crash to obtain a core dump! 

Using the gcore command, you can generate a core dump for a running process without affecting it. You can then use a debugger to examine the state of the process. gcore is included in the FreeBSD base system. On Linux, it is sometimes available as a shell script as part of the gdb debugger. 

In many cases, it may be simpler to use the pstack command instead. This prints the name of the currently running function and the chain of function calls that led to it – also known as a stack trace because it follows the return addresses in the stack memory. On FreeBSD, pstack is available as a port or package. On Linux, you may find a pstack or gstackshell script included with the debugger.

If a process holds resources such as files and network connections open, there can be conflicts with other processes which want access to the same resources. 

In other cases, more effort is required—this may include (for example) selecting different dependencies during the build process, changing compiler-time flags and arguments, or modifying how an application interacts with the OS kernel. 

It can therefore be useful to be able to identify such files—or, given a particular file, identify associated processes. One common case is seeing an error message such as "Device busy" when attempting to unmount a filesystem. In such a case the fuser command can be used to get a list of processes that are keeping the device "busy". With a mount point, you'll need the -c option. 

An alternative, very feature-rich, tool for this job is lsof (short for “ls open files”). Although typically a core component on Linux distros, on FreeBSD lsof needs to be installed from ports or packages. It supports many different operating systems and has long been popular among Unix administrators,. 

Linux and FreeBSD also come with their own platform-specific tools. On Linux, it is common to examine the contents of /proc for details of open files. To see an example, try running ls -l /dev/fd/. on Linux, the file descriptors (which is what "fd" stands for) appear as symbolic links where the target of the symbolic link identifies the underlying file, device or socket. On FreeBSD, the procstat tool covers much of the functionality that you would find in /proc on Linux. For files, there is procstat -f. However, FreeBSD also has a tool dedicated to identifying open files named fstat.

Most of the tools we've just mentioned will also include network sockets in their output. But in the case of network connections, it is more convenient to use dedicated networking utilities such as sockstat on FreeBSD or ss on Linux.

Tracing Processes

There are limits to what you can discern by viewing a process based on a snapshot of a particular point in time. To cover more dynamic situations, you need to be able to follow the steps a process takes as it takes them. One approach to this involves intercepting and presenting all the system calls a program makes. Linux has the stracecommand for this purpose. The more traditional Unix equivalent is named truss, and this is what you'll find on FreeBSD. With both tools, you can either name a command to run directly or attach to an existing process with -p. It is common to direct the trace output to a file with the -o option and -f controls whether to trace child processes. 

If you're comparing the traced system calls against the source code for a program, you may notice a slight disconnect between the names, especially on Linux. This is because the C library includes wrappers for all the system calls and sometimes makes substitutions such as to select a 64-bit interface.

strace has rather more options for filtering which system calls are traced. While very useful, it can be more flexible to defer filtering until you open the output file. A useful tip if you use the common less tool for viewing text files, is to press during the less session. This allows you to enter a search term as a regular expression (without the slashes), and the displayed file is then filtered to only contain matching lines—essentially, the same result as you’d get from grep keyword filename | less. This is very often useful with trace output.

There is also has a similar ltrace command which can follow calls to the functions contained in libraries. This was originally a Linux tool, but has been ported to FreeBSD as well.

Next, the dtrace tool takes tracing to a whole new level. Like ZFS, dtrace originated on Solaris. When Sun opened the source of Solaris, it was possible to port some of the more compelling features and dtrace found its way to the likes of FreeBSD, macOS and even Windows. If you use Oracle Linux, you'll also find that they’ve made dtrace available.

On modern Linux systems, we now have bpftrace. This builds on the eBPF framework, but is very closely inspired by dtrace. Much of what you might learn about how to use dtrace is applicable to bpftrace and vice-versa.

The usage of dtrace resembles and was inspired by awk. It can be used to write full scripts for observing particular aspects of a system but, like awk, also lends itself to single-line scripts. The basic components for dtrace usage look as follows:

    dtrace -n 'probe /predicate/ { actions }'

While awk triggers for each line of input, dtrace needs the probe to identify where it should trigger. An example might be syscall::open:entry. The exact format of these is provider:module:function:name

If you first run kldload dtraceall to ensure dtrace is available, you can run dtrace -l to list them all. That list can be somewhat overwhelming so it may be more helpful to limit the list by provider, for example by running dtrace -l -P proc

In practical terms you can achieve a lot while only knowing of a few basic probes. The optional predicate allows a condition to filter when the action is triggered. For example, /execname == "zsh"/ would restrict tracing by the name of the command. Finally, there is an action. This might just print some text. For example, the following will print the number of bytes of data requested with each read system call.

    dtrace -n 'syscall::read:entry { printf("%d bytes", arg2) }'

It is also very common to use assignments to aggregations. These look like an assignment to a variable with a name beginning with an “@”. At the point of the action, these collect data. Then, when dtrace finishes, the data is collated and presented in a more useful form such as a histogram. For example, the following is similar to the previous invocation but shows the distribution of values at the end when you press Ctrl-C to terminate dtrace:

    dtrace -n 'syscall::read:entry { @dist = quantize(arg2) }'

There are a great many dtrace scripts that have already been written and can be used without particular knowledge of dtrace. In particular, consider installing dtrace-toolkit from ports or packages. This even includes man pages for the many scripts it includes—but note that they aren't installed in the default MANPATH

On Linux, you may find a similar toolkit in a package named bcc-tools or bpfcc-tools. This doesn't use bpftrace but the lower-level compiler for eBPF directly.

Conclusion

Traditional approaches to debugging software aren't always possible, especially on a production system where services can't be interrupted. Both Linux and FreeBSD include a wealth of tools that enable you to answer countless questions about why the system and programs might be behaving in a particular way. 

There are more that you can discover by observing the links in the "See Also" section of man pages, or by simply looking for tools that follow common naming patterns. For example, just checking for commands that end in "stat" would find vmstat and systat.

Additional Resources

FreeBSD man page – top(1)

FreeBSD man page – gcore(8)

FreeBSD man page – iostat(8)

FreeBSD man page – gcore(1)

FreeBSD man page – fstat(1)

FreeBSD man page – truss(1)

FreeBSD man page – systat(1)

FreeBSD man page – vmstat(8)

FreeBSD Handbook, Chapter 26. DTrace

 

 

Back to Articles

What makes us different, is our dedication to the FreeBSD project.

Through our commitment to the project, we ensure that you, our customers, are always on the receiving end of the best development for FreeBSD. With our values deeply tied into the community, and our developers a major part of it, we exist on the border between your infrastructure and the open source world.