history of freebsd dtrace

FreeBSD History – Understanding the Origins of DTrace

FreeBSD History Series

Understanding the Origins of DTrace

DTrace is a powerful tool for system administrators to diagnosis system issues without unduly impacting performance. It was created because of a desperate need by a team of three engineers from Sun Microsystems: Bryan CantrillAdam Leventhal and Mike Shapiro.

A Star is Born

According to DTrace: The Reverse Engineer’s Unexpected Swiss Army Knife by Tiller Beauchamp and David Weston, development of DTrace started in 2001. At the time, Cantrill was the sole developer. Leventhal and Shapiro joined the effort later. The changelog, shows that DTrace was first released in Sun’s Solaris 10 in 2004.

In 2006, Cantrill talked to TechRepublic about how DTrace came about. The initial idea was sparked in 1997 when Sun ran into a problem while benchmarking a new system.

“In 1997 we were working on breaking some Solaris benchmarking and ran into problems for two weeks with some of the best minds in the community. The answer was simple: one of the computers was thinking it was a router when it was supposed to be benchmarking. However, after that experience I was a bit shaken. We had some of the best Solaris expertise in the world on this problem. What would happen if this was a customer? A lot of customers either live with the problem or escalate it with Sun and see if we could answer it or not. We realised we had to come up with a way to dynamically instrument production systems and answer questions about the systems. The problems we had before can now be solved by one person in one afternoon, instead of two weeks with 16 or so people.”

Cantrill talked further about DTrace’s origins in an interview with Lightstep’s Ben Sigelman. Cantrill said that DTrace came out of a simple problem they had—“if a complicated system is doing something unexpected, how do you determine what it is doing and why, without taking the system down? Why is the system not operating as well as we think it should be? It’s not crashing. It’s operating. But we want it to be operating much more efficiently.” Cantrill also remarked that “The debugging literature prior to that was really focused on debugging systems in development, as opposed to actually examine systems in production.”

Cantrill also said that they had to make sure DTrace was implemented in such a way that it would not panic or crash the system because “if you run a tool and the system dies as a result, you will never be allowed to run that tool again.”

DTrace: The Reverse Engineer’s Unexpected Swiss Army Knife goes on to state that, “DTrace was Sun’s first software component to be released under their own open source Common Development and Distribution License (CDDL).” However, some groups were slow to port DTrace because they didn’t trust the CDDL—for example, Adam Leventhal claimed in 2011 that Oracle believed the CDDL license would “make DTrace too toxic for other Linux vendors.” These license concerns may have contributed to Red Hat’s decision to release a similar utility named SystemTap.

Other Systems Add Support

MacOS was one of the first non-Solaris operating system to add support for DTrace in 2007’s Mac OS X 10.5 Leopard. Apple also included a graphical tool named Instrument to make DTrace more user-friendly.

DTrace became part of FreeBSD with the release of FreeBSD 7.1 in 2009—two years before Oracle began porting DTrace, and nine years before Oracle eventually solved the inherent CDDL vs GPL license conflict by adding a GPLv2 dual license option for the kernel-space portion that they now owned, and a UPL license for the formerly-proprietary userspace tools.

Shortly after Oracle purchased Sun Microsystems in 2010, several key OpenSolaris developers announced the creation of Illumos, an open source fork of OpenSolaris which included DTrace. This fork was based on the OpenSolaris OS/Net consolidation.

In 2018, Oracle’s Linux port of Dtrace hit the 1.0 milestone. Oracle is still working to improve this version, with released in 2020.

Up to this point, each port was maintained separately, with the Illumos, Apple, and FreeBSD versions of DTrace drifting apart, and the Oracle-maintained and developed project being entirely separate. That changed in 2016 when the OpenDTrace project appeared on GitHub.

According to an article from TechGig, the purpose of the OpenDTrace project is to “produce a portable implementation of DTrace for multiple operating systems”. The project’s GitHub account states, “new scripts being licensed under the BSD 2-Clause license and modifications to the original scripts remaining under the CDDL”

Also in 2018, Microsoft announced that they were porting DTrace to Windows from FreeBSD. DTrace for Windows became available to the general public in Windows 10 Insider builds, starting in March of 2019. Earlier versions of Windows were not supported.

What can DTrace Do For Me?

The goal of dtrace is to be able to dynamically instrument a system. Traditionally, operating systems had debugging features, a flag you could enable at build or boot time, to output verbose messages about what the system was doing, to be able to investigate issues. 

Unfortunately, traditional debugging systems were generally not very fine grained—and, worse, produced significant (and sometimes enormous) performance impacts once enabled. This made troubleshooting performance problems very difficult, since the debugging system could constitute a performance issue of its own!

Traditional debuggers also generally needed to be enabled when the software was being compiled or started—so if a problem cropped up while a system was running, the only way to try to investigate it was to restart with debugging enabled. This would cripple performance, and possibly make the cause of the issue disappear. 

Between the destruction of state involved in restarting an app with debugging enabled, and the large change in performance profile with debugging running, many performance investigations effectively began already stuck at a dead end.

By contrast, Dtrace allows the tracepoints to exist as disabled code which do not impact performance. They can then dynamically be enabled at or during runtime, when an issue arises. 

There are 3 main varieties of tracing: Statically defined tracing (SDT), system call tracing, and function boundary tracing (FBT). The former allows specific probes to be created while the system is being developed, allowing details of specific events to be output easily. The latter instruments nearly every function in the code, allowing the observer to see who called a function, and with what arguments, as well as the return value.

With this set of tools, most problems can be investigated. Reads from disk seem slow? Start by asking (and answering) a few basic questions:

  • What applications are doing the most reads?
  • How big are the reads?
  • How long do the reads take?

With these answers, you can start to dig deeper and find the root cause of the issue.

Using DTrace

Now that we’ve looked at the history of DTrace, let’s look at how to use it with some quick examples. These examples were tested on FreeBSD.

kldload dtrace_all
List all of the probes:
dtrace -l
Search for a particular probe:
dtrace -l | grep read
Filter the list of probes:
dtrace -l -f syscall:freebsd: | grep read

Coming back to our previous example, if we want to determine what programs are performing read system calls, we can do:

dtrace -n ‘syscall::read:entry { print(execname); }’

This will print out the executable name for each program when it does a read system call:

dtrace: description 'syscall::read:entry ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  5  90124                       read:entry string "firefox"
  5  90124                       read:entry string "firefox"
  5  90124                       read:entry string "sshd"
  3  90124                       read:entry string "firefox"
  5  90124                       read:entry string "sshd"
  3  90124                       read:entry string "firefox"
  5  90124                       read:entry string "sshd"
  3  90124                       read:entry string "firefox"
  5  90124                       read:entry string "sshd"
  5  90124                       read:entry string "sshd"

Although this output turns out not to be terribly useful. Dtrace has a series of “aggregation” functions which produce more useful output. Let’s perform the same trace again—but this time, we’ll aggregate the results based on the executable name. This will run until you hit control+c, then display the results:

dtrace -n 'syscall::read:entry { @[execname] = count(); }'

The results:

dtrace: description 'syscall::read:entry ' matched 2 probes
  ssh                                                               1
  qterminal                                                         2
  sshd                                                              2
  lumina-desktop                                                   30
  firefox                                                         492

Much more usable output.

Another aggregation function allows us to create a histogram, to break down these read calls by their size, and see how many of these calls fall into buckets of different sizes. For the read syscall, the arguments are: arg0=file descriptor to read from, arg1 = buffer to read into, arg2 = number of bytes to read, see man 2 read:

dtrace -n ‘syscall::read:entry { @[execname] = quantize(arg2); }’

The output:

dtrace: description 'syscall::read:entry ' matched 2 probes

           value  ------------- Distribution ------------- count
               0 |                                         0
               1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 447
               2 |                                         0

           value  ------------- Distribution ------------- count
               0 |                                         0
               1 |@@@@@@@@@                                27
               2 |                                         0
               4 |@@@@@@@                                  21
               8 |@                                        2
              16 |@                                        2
              32 |@                                        4
              64 |@@                                       6
             128 |                                         1
             256 |@@                                       6
             512 |@                                        2
            1024 |                                         0
            2048 |@                                        3
            4096 |@@@@@@@@                                 26
            8192 |@@@@@                                    16
           16384 |@                                        4
           32768 |@                                        4
           65536 |                                         0

           value  ------------- Distribution ------------- count
              16 |                                         0
              32 |                                         2
              64 |                                         0
             128 |                                         1
             256 |                                         1
             512 |                                         0
            1024 |                                         0
            2048 |                                         0
            4096 |                                         0
            8192 |                                         0
           16384 |                                         0
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3438337
           65536 |                                         0

From this we can see that the reads from firefox are 1 byte, and not likely the cause of any issues. We see that a dd process is reading a large number of blocks of 32 KB, while sshd’s reads are distributed across an array of sizes.

You might also be interested in

Get more out of your FreeBSD development

Kernel development is crucial to many companies. If you have a FreeBSD implementation or you’re looking at scoping out work for the future, our team can help you further enable your efforts.

To answer our last question, we can look at the timing of the calls, by measuring the elapsed time between the entry to the syscall, and its eventual return:

dtrace -n 'syscall::read:entry { self->ts = timestamp; } syscall::read:return /self->ts/ { @[execname, "ns"] = quantize(timestamp - self->ts); self->ts = 0; }'

This script will set a variable (self->ts) to the current timestamp when a read syscall is initiated. When a read syscall returns, if it has that variable set, we will subtract it from the current timestamp, and add that value to our aggregation:

dtrace: description 'syscall::read:entry ' matched 4 probes

  firefox                                             ns
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@@@@@@@@@                                131
            1024 |@@@@@@                                   97
            2048 |@@@@@@@@@@@@@@@@                         251
            4096 |@@@@@@@@                                 116
            8192 |@                                        14
           16384 |                                         0

  sshd                                                ns
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@@@@@@@@@@@@                             104
            1024 |@@@@@@                                   54
            2048 |@@@@@@@@@@                               88
            4096 |@@@@@                                    40
            8192 |@@@@                                     36
           16384 |@@                                       15
           32768 |                                         2
           65536 |                                         0
          131072 |                                         0
          262144 |                                         0
          524288 |@                                        6
         1048576 |                                         0

  dd                                                  ns
           value  ------------- Distribution ------------- count
            8192 |                                         0
           16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           54393
           32768 |@@@@@@@@@@                               19190
           65536 |                                         0

From this we see that about 26% of the reads done by dd took more than 32,768 nanoseconds, while the rest took between 16,384 and 32,767 ns. We also see a small number (6) of outliers in the ssh read calls, taking more than half a million nanoseconds.

This is only a small taste of what dtrace can do, but you can see how quickly and easily you can determine what is happening on a system while it is running, in a non-disruptive way.

Some other examples you might find useful:

1.) Count system calls by program name:

dtrace -n 'syscall:::entry { @[execname] = count(); }'

2.) Find the parent of a process calling a syscall:

dtrace -n 'syscall::read:entry { printf("execname=%s parent_execname=%s", execname, curthread->td_proc->p_pptr->p_comm); }'

3.) Trace file opens by process and filename:

dtrace -n 'syscall::open*:entry { printf("%s %s", execname, copyinstr(arg0)); }'

4.) Trace file opens with process and filename:

dwatch -X open

5.) Find the parent of a process calling a syscall:

dwatch -R syscall::read:entry

6.) Watch signals being passed to kill syscall:

dwatch -X kill

7.) Watch signals being passed between bash(1) and vi(1):

dwatch -k bash -k vi -X kill

8.) Display the first process to call kill syscall and then exit:

dwatch -N 1 kill

9.) Count IP-level events:

dtrace -n 'ip::: { @[probename] = count(); }'

10.) Trace new processes showing parent program and command executed:

dwatch execve

11.) Trace new processes showing mini process-tree for command executed:

dwatch -R execve

12.) Trace new processes belonging to root super-user:

dwatch -u root execve

13.) Trace new processes belonging to wheel super-group:

dwatch -g wheel execve

You can find more information on using DTrace here.

<strong>Meet the Author</strong>: Allan Jude” class=”wp-block-coblocks-author__avatar-img” src=”<a href=https://klarasystems.com/wp-content/uploads/2020/05/allan.jpg&#8221;>
Meet the Author: Allan Jude

ZFS Engineering Manager at Klara Inc., Allan has been a part of the FreeBSD community since 1999 (that’s more than 20 years ago!), and an active participant in the ZFS community since 2013. In his free time he enjoys baking, and obviously a good episode of Star Trek: DS9.

One Comment on “FreeBSD History – Understanding the Origins of DTrace

  1. Thanks for the article, it was good reading.

    I believe there’s a typo/mistake on this line:
    kldload dtrace_all

    It should be:
    kldload dtraceall

Tell us what you think!