Klara

It’s not exactly difficult to figure out how much space you’ve got left when you’re using OpenZFS–but it is different from doing so on traditional filesystems, as OpenZFS brings considerably more complexity to the table. Space accounting in OpenZFS requires a different approach due to factors like snapshots, compression, and deduplication.

By the time we’re done today, we’ll understand:  

  • how to use both filesystem-agnostic tools like du and df  
  • OpenZFS-native tools like zfs list and zpool list. 

First, we need to start with some basic concepts. 

Understanding the Basics of Filesystems 

In traditional, non copy-on-write filesystems, space management is simple. You’ve got a certain number of physical sectors available on your single storage device–some of them are used, and some of them are not. 

In such a filesystem, when we want to know the number of free sectors, we subtract the used sectors from the total available, and, presto–free space! Similarly, we might subtract the free sectors from the available sectors to get the total amount of used space. None of these values are ambiguous, and all are part of a fixed-sum system. 

Logical Sectors vs Physical Sectors 

OpenZFS brings new concepts to filesystem management that muddy this simple picture a bit: snapshots, inline compression, and block-level deduplication. To effectively manage our OpenZFS filesystem, we’ll need to begin by understanding three properties: USED, REFER, and AVAIL. 

All three properties revolve around the status of logical sectors, not physical sectors. Physical sectors are literal units individual to each raw device in the pool. Logical sectors are the “sectors” made directly available to your applications. Proper Space accounting in OpenZFS depends on understanding these distinctions.

To see how this plays out, let’s assume we have a pair of storage devices, each of which offers precisely 1MiB of storage. 1MiB / 4KiB == 256, so each device has 256 physical 4KiB sectors. 

In a simple mirror vdev with both drives, the vdev provides 512 physical sectors—the combined sector count of all devices—but only 256 logical sectors, which are the “sectors” available to your applications for reading and writing. 

Similarly, if we had three of these mythical 1MiB storage devices and put them in a three-wide RAIDz1 vdev, that vdev would offer 768 physical sectors, and 512 logical sectors. In short, logical sectors represent the storage space after redundancy or parity. 

Understanding USED, REFER, and AVAIL 

Now that we understand the difference between physical and logical sectors, it’s time to re-examine our understanding of what it means for space to be used, free, or available. 

  • USED refers to the number of logical sectors allocated to a dataset and its children. Blocks allocated to sub-datasets and snapshots count towards USED. 
  • REFER refers to the number of logical sectors actively referred to by the dataset, excluding its children and any snapshots. 
  • AVAIL refers to the number of logical sectors available to be written to with new data. In the absence of imposed quotas, AVAIL refers directly to the number of unused logical sectors in the pool as a whole–otherwise, it refers to the number of logical sectors by which the dataset may grow without exceeding its quota. 

Let’s see how this plays out, using a test pool creatively named testpool with three child datasets, just as creatively named 0, 1, and 2:  

root@elden:/tmp# find /testpool -type f | xargs ls -lh
-rw-r--r-- 1 root root 1.0G Nov 30 14:10 /testpool/0/0.bin 
-rw-r--r-- 1 root root 1.0G Nov 30 14:11 /testpool/0/1/1.bin 
-rw-r--r-- 1 root root 1.0G Nov 30 14:11 /testpool/0/1/2/2.bin 

As we can see, each of the child datasets has a single 1GiB file in it. How does that play out in practice? We’ll use the ZFS-native tool zfs list to examine:

root@elden:/tmp# zfs list -r testpool
NAME             USED  AVAIL  REFER  MOUNTPOINT
testpool        3.00G  1.92T  30.6K  /testpool
testpool/0      3.00G  1.92T  1024M  /testpool/0
testpool/0/1    2.00G  1.92T  1024M  /testpool/0/1
testpool/0/1/2  1024M  1.92T  1024M  /testpool/0/1/2

The pool’s root dataset, testpool, doesn’t contain any data in it directly. Therefore, its REFER is a measly 30.6KiB, which amounts to just a few sectors of metadata and nothing else. But it shows 3.00GiB USED–because although it contains no data of its own, its child datasets do!  

The dataset testpool/0 directly contains the 1GiB file /testpool/0/0.bin, so it REFERs to 1024MiB–in other words, 1GiB–of data. However, its USED is the same as testpool’s–3.00GiB. 

Remember, USED is the sum of the REFER of the current dataset and all child datasets–so we add testpool/0’s 1.00GiB to the 1.00GiB each in testpool/0/1 and testpool/0/1/2, and come up with the same 3.00GiB total USED we saw in the root dataset of testpool itself. 

Meanwhile, testpool/0/1 has a USED of 2.00GiB–its own 1GiB of REFER plus testpool/0/1/2’s 1GiB of REFER. 

This leaves us, finally, with testpool/0/1/2 itself, which has no child datasets, and therefore only has 1GiB showing for both USED and REFER. 

What About Snapshots? 

Understanding Space Accounting in OpenZFS Snapshots

Now that we understand what USED and REFER mean, let’s briefly talk about how snapshots affect each.  

Remember, OpenZFS is a copy-on-write filesystem. This means that every block is immutable once written. You can’t ever alter the value of a block once it’s created–you can only read it or destroy it.  

When you or an application tells OpenZFS “I want to change the value of this block,” OpenZFS only seems to do what you asked it to do–in fact, it really creates an entirely new block with the changed value you requested. Then, as a single atomic operation, OpenZFS unlinks the original block from the live filesystem, and replaces it with a link to the new block with the new value you “edited” in. 

How Snapshots Impact Space Availability

In the absence of snapshots, this freshly-unlinked block is effectively destroyed, and becomes a part of the FREE space available to the dataset. But if you took a snapshot of the dataset prior to “editing” the block, the original block remains immutable once unlinked from the live filesystem. It is still referenced to all of the snapshots newer than the birth date of this block.  

Let’s see how this plays out in action: 

root@elden:/tmp# find /testpool -type f | xargs ls -lh
-rw-r--r-- 1 root root 1.0G Nov 30 14:10 /testpool/0/0.bin
-rw-r--r-- 1 root root 1.0G Nov 30 14:11 /testpool/0/1/1.bin
-rw-r--r-- 1 root root 1.0G Nov 30 14:11 /testpool/0/1/2/2.bin

root@elden:/tmp# zfs snapshot testpool/0/1/2@demoshot

root@elden:/tmp# zfs list -rt all testpool/0/1
NAME                      USED  AVAIL  REFER  MOUNTPOINT
testpool/0/1             2.00G  1.92T  1024M  /testpool/0/1
testpool/0/1/2           1024M  1.92T  1024M  /testpool/0/1/2
testpool/0/1/2@demoshot     0B      -  1024M  -

After taking the snapshot demoshot, we see that neither the USED nor the REFER of testpool/0/1/2 has changed–both still report only 1GiB.  

This is because the REFER of the dataset and the REFER of its snapshot both still point to exactly the same blocks. Even though we have a “new” snapshot dataset which REFERs to 1GiB of data, the actual number of logical sectors required to store the total amount of data has not changed. 

Deleting a File with Snapshots Present

Now, what happens if we delete /testpool/0/1/2/2.bin itself? A couple of things, actually… and they happen quickly: 

root@elden:/tmp# rm /testpool/0/1/2/2.bin

root@elden:/tmp# zfs list -rt all testpool/0/1
NAME                      USED  AVAIL  REFER  MOUNTPOINT
testpool/0/1             2.00G  1.92T  1024M  /testpool/0/1
testpool/0/1/2           1024M  1.92T  1024M  /testpool/0/1/2
testpool/0/1/2@demoshot     0B      -  1024M  -

The first thing we see is that nothing apparently changed! This is because modern OpenZFS uses asynchronous destruction.  

When you delete files or destroy snapshots, the system acknowledges the command immediately, but it can take seconds or even minutes for the entire operation to complete in the background. 

How Space Accounting Adjusts Over Time

Let’s wait a few seconds for OpenZFS to finish unlinking those blocks, then try again:  

root@elden:/tmp# sleep 5 ; zfs list -rt all testpool/0/1
NAME                      USED  AVAIL  REFER  MOUNTPOINT
testpool/0/1             2.00G  1.92T  1024M  /testpool/0/1
testpool/0/1/2           1024M  1.92T  30.6K  /testpool/0/1/2
testpool/0/1/2@demoshot  1024M      -  1024M  -

Now we can see the actual effects of deleting a 1GiB file which also exists in a snapshot; since the snapshot is a child of the dataset, testpool/0/1/2 still shows the same 1GiB USED that it always did.  

But testpool/0/1/2 now REFERs to only 30.6KiB. Similar to the root dataset of testpool itself, this amounts to just a handful of metadata sectors. 

Snapshots and Space Availability

When we look at USED, and REFER of @demoshot itself, we see 1GiB in both columns. The snapshot directly REFERs to all the blocks contained in the deleted file, and it has no children, so that’s straightforward enough. But what about AVAIL?  

Since snapshots are entirely immutable, there is never any “available” space inside one. We get a simple dash in that column, indicating that it’s not applicable here. 

One Block, Many Snapshots 

So far, snapshot space accounting doesn’t seem all that complicated. When you delete a file whose blocks are contained in a snapshot, those blocks disappear from the dataset’s REFER and move to the snapshot’s REFER, remain in the parent’s USED. Simple enough! 

It gets a little more complicated when you realize that a block may be referenced in multiple snapshots, not just one. Let’s blow away our old testpool, create a new one, and examine this in action! 

root@elden:/tmp# zpool destroy testpool
root@elden:/tmp# zpool create testpool /tmp/0.bin /tmp/1.bin

Now that we’ve gotten rid of the old testpool and recreated it, let’s seed it with three new 1GiB files, taking a new snapshot after creating each file: 

root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/0.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.94372 s, 552 MB/s

root@elden:/tmp# zfs snapshot testpool@0

root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/1.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.87999 s, 571 MB/s

root@elden:/tmp# zfs snapshot testpool@1

root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/2.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.89892 s, 565 MB/s

root@elden:/tmp# zfs snapshot testpool@2

There we go! We’ve got three files, 0.bin, 1.bin, and 2.bin. After creating each file, we take a snapshot–so snapshot @2 contains all three 1GiB files, @1 contains only the first two, and @0 only contains the first. 

What does this look like so far? 

root@elden:/tmp# zfs list testpool
NAME       USED  AVAIL  REFER  MOUNTPOINT
testpool  3.00G  1.92T  3.00G  /testpool

Simple enough–three 1GiB files, 3.00GiB USED and REFER. What about the snapshots themselves? 

root@elden:/tmp# zfs list -rt all testpool
NAME         USED  AVAIL  REFER  MOUNTPOINT
testpool    3.00G  1.92T  3.00G  /testpool
testpool@0    13K      -  1.00G  -
testpool@1    13K      -  2.00G  -
testpool@2     0B      -  3.00G  -

Just as we’d expect, our snapshots show 1.00G, 2.00G, and 3.00G REFER in ascending order. But why do they have little or no USED? 

Like we said, this is where things get complicated. The USED column of a snapshot itself is a bit different from the USED column of a live dataset–it refers to blocks which exist only in that particular dataset. 

In this case, all of the blocks in each dataset also exist in the live filesystem. We see 0B USED for snapshot @2, which hasn’t diverged from the live filesystem at all yet. For snapshots @0 and @1, we see 13KiB apiece–this is a tiny handful of metadata blocks which were “overwritten with new values” as new files were added to the filesystem. 

Now, what happens if we delete all three files from the live filesystem? 

root@elden:/tmp# rm /testpool/*.bin

root@elden:/tmp# sleep 5 ; zfs list -rt all testpool
NAME         USED  AVAIL  REFER  MOUNTPOINT
testpool    3.00G  1.92T    24K  /testpool
testpool@0    13K      -  1.00G  -
testpool@1    13K      -  2.00G  -
testpool@2  1.00G      -  3.00G  -

Aha! Nothing changed for testpool@0 or testpool@1, but suddenly the USED column of testpool@2 went from 0B to 1.00GiB.  

This is because testpool@2 now contains blocks which exist in no other dataset. These are the blocks belonging to the now-deleted file /testpool/2.bin, which are now unlinked from testpool itself, and never existed in testpool@0 or testpool@1.  

We can also see that testpool’s own REFER dropped from 3.00GiB to 24KiB, as a result of deleting those three 1GiB files. However, its USED stays unchanged at the same 3.00GiB, because those blocks still belong to its snapshots, which count as child datasets.  

The AVAIL for testpool didn’t change either for the same reason. We couldn’t actually mark any logical sectors as available for new writes, since all of the ones we unlinked from the live filesystem are still marked as belonging to one or more snapshots. 

What happens if we start destroying those snapshots? Before we get started, let’s use the hidden snapshot directory to remind ourselves of the content of each snapshot:  

root@elden:/tmp# find /testpool/.zfs/snapshot/* -type d | xargs ls -lh

/testpool/.zfs/snapshot/0:
total 1.1G
-rw-r--r-- 1 root root 1.0G Nov 30 15:22 0.bin

/testpool/.zfs/snapshot/1:
total 2.1G
-rw-r--r-- 1 root root 1.0G Nov 30 15:22 0.bin
-rw-r--r-- 1 root root 1.0G Nov 30 15:22 1.bin

/testpool/.zfs/snapshot/2:
total 3.1G
-rw-r--r-- 1 root root 1.0G Nov 30 15:22 0.bin
-rw-r--r-- 1 root root 1.0G Nov 30 15:22 1.bin
-rw-r--r-- 1 root root 1.0G Nov 30 15:23 2.bin

root@elden:/tmp# zfs list -rt all testpool
NAME         USED  AVAIL  REFER  MOUNTPOINT
testpool    3.00G  1.92T    24K  /testpool
testpool@0    13K      -  1.00G  -
testpool@1    13K      -  2.00G  -
testpool@2  1.00G      -  3.00G  -

Okay: our oldest snapshot has a single 1GiB file in it, the next one has two, and the final one has all three. We see this reflected in zfs list -rt all, which shows us that only testpool@2 has any significant USED value, since 2.bin is now only REFERred to by testpool@2. 

What happens if we destroy testpool@2? 

root@elden:/tmp# zfs destroy testpool@2 ; sleep 5

root@elden:/tmp# zfs list -rt all testpool
NAME         USED  AVAIL  REFER  MOUNTPOINT
testpool    2.00G  1.92T    24K  /testpool
testpool@0    13K      -  1.00G  -
testpool@1  1.00G      -  2.00G  -

With testpool@2 destroyed and gone, there is no further reference to the blocks contained by the deleted file /testpool/2.bin, so testpool itself goes from a USED of 3.00GiB to 2.00GiB.  

The root dataset’s AVAIL increased by the same amount–we just can’t actually see that directly, since testpool’s AVAIL is in TiB, 1.00GiB is a mere 0.001TiB, and that’s not enough to change the displayed value. 

Since the USED column of a snapshot measures the blocks only referenced by that specific snapshot, we also see the USED of testpool@1 change from 13KiB to 1.00GiB.  

As long as testpool@2 was around, the blocks referenced by /testpool/1.bin were present in both @1 and @2. Yet now that we destroyed @2, those blocks exist only in @1, and therefore show up in its USED for the first time. 

Reclaiming Space From Snapshots 

This is usually the place where heads have begun to firmly spin. In a normal filesystem, you reclaim storage space simply by deleting files. But on OpenZFS, we need to not only delete the files, but destroy any snapshots which referenced them! 

The thing is (as we saw in the last section) the same file may be referenced by multiple snapshots. This makes finding space to reclaim significantly more challenging, in a system which regularly takes snapshots! 

Using zfs destroy to Identify Reclaimable Space

We could mess around trying to use the find command looking for which snapshots contain which files, of course–but since the goal is finding space rather than finding files, there’s an easier way: we’ll use zfs destroy with a couple of special arguments. 

Assuming we’d like to reclaim another 1GiB of space, which snapshot of testpool’s should we destroy? 

root@elden:/tmp# zfs list -rt all testpool
NAME         USED  AVAIL  REFER  MOUNTPOINT
testpool    2.00G  1.92T    24K  /testpool
testpool@0    13K      -  1.00G  -
testpool@1  1.00G      -  2.00G  -

root@elden:/tmp# zfs destroy -nv testpool@0
would destroy testpool@0
would reclaim 13K

root@elden:/tmp# zfs destroy -nv testpool@1
would destroy testpool@1
would reclaim 1.00G

There we have it! We can use zfs destroy with the special arguments -n and -v , for “dry run” and “verbose” respectively. Running the command shows what issuing a zfs destroy would do, without actually losing any data! 

Space Accounting Challenges with Multiple Snapshots

This is a very simple system, so we already knew that destroying testpool@1 would reclaim 1GiB of space, while destroying testpool@0 would only reclaim a handful of metadata blocks. 

But what happens if we have multiple snapshots referencing the same blocks? Let’s blow testpool away and start over again, this time adding a little more complexity. 

root@elden:/tmp# zpool destroy testpool

root@elden:/tmp# zpool create testpool /tmp/0.bin /tmp/1.bin

root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/0.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.94212 s, 553 MB/s

root@elden:/tmp# zfs snapshot testpool@0

root@elden:/tmp# zfs snapshot testpool@1

root@elden:/tmp# zfs list -rt all testpool
NAME         USED  AVAIL  REFER  MOUNTPOINT
testpool    1.00G  1.92T  1.00G  /testpool
testpool@0     0B      -  1.00G  -
testpool@1     0B      -  1.00G  -

Now we’ve got a single 1GiB file, /testpool/0.bin, which is still referenced in the live filesystem and in two separate snapshots.

Deleting Files Doesn't Always Free Space

What happens if we delete the file? 

root@elden:/tmp# rm /testpool/0.bin

root@elden:/tmp# zfs list -rt all testpool
NAME         USED  AVAIL  REFER  MOUNTPOINT
testpool    1.00G  1.92T    24K  /testpool
testpool@0     0B      -  1.00G  -
testpool@1     0B      -  1.00G  -

As expected, testpool’s REFER goes down from 1.00GiB to the usual few KiB of metadata blocks, while the REFER of testpool@0 and testpool@1 remain unchanged. But what about the snapshots’ USED column? 

Since /testpool/0.bin is still referenced in both snapshots, its blocks are not unique to either. They don’t show up in either snapshot’s USED column. Similarly, destroying one snapshot won’t reclaim any space, since all the blocks linked in testpool@0 are also linked in testpool@1 and vice versa!  

Using Snapshot Ranges for Bulk Space Reclamation

So far, zfs destroy -nv isn’t any help either, and for the same reasons: deleting one snapshot won’t reclaim any space: 

root@elden:/tmp# zfs destroy -nv testpool@0
would destroy testpool@0
would reclaim 0B

root@elden:/tmp# zfs destroy -nv testpool@1
would destroy testpool@1
would reclaim 0B

Again, this is an artificially simple system, so we already know the answer is to destroy both snapshots in order to reclaim the space occupied by deleted file /testpool/0.bin 

But what if we didn’t already know that? 

root@elden:/tmp# zfs destroy -nv testpool@0%1
would destroy testpool@0
would destroy testpool@1
would reclaim 1.00G

By using the special operator %, we can pass OpenZFS a range of snapshots (inclusive) and thereby see the effects of destroying multiple snapshots. As we can see above, if we destroy both testpool@0 and testpool@1, we reclaim 1GiB of space. Fabulous! 

OpenZFS-Native Tools 

Now that we’ve got the basics under our belt, let’s take another look at zfs list and its unfortunately sneaky, badly misunderstood cousin tool zpool list.  

zfs list 

First, let’s blow away testpool and recreate it from scratch, yet again, so that we know what we’re working with: 

root@elden:/tmp# ls -lh *.bin
-rw-rw-r-- 1 root root 1.0T Nov 30 16:49 0.bin
-rw-rw-r-- 1 root root 1.0T Nov 30 16:49 1.bin
-rw-rw-r-- 1 root root 1.0T Nov 30 15:21 2.bin

root@elden:/tmp# zpool create testpool /tmp/0.bin

root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/0.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.95716 s, 549 MB/s

root@elden:/tmp# zfs list testpool
NAME       USED  AVAIL  REFER  MOUNTPOINT
testpool  1.00G   983G  1.00G  /testpool

As we can see above, we created testpool from a single sparse file of 1TiB in size. Then, we dumped 1GiB of pseudo-random data onto it. As a result, we see 1.00GiB USED, 1.00GiB REFER, and 983GiB AVAIL. 

We know that the 1.00GiB of /testpool/0.bin shows in both the USED and REFER columns of testpool itself, because that file is present directly in that dataset.  

We can also easily surmise that only 983GiB of the theoretically 1TiB testpool shows as AVAIL due to losing a few GiB to root reservation, as well as the 1GiB used in /testpool/0.bin itself. 

zpool list 

What happens when we use zpool list instead of using zfs list? 

root@elden:/tmp# zpool list testpool
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool  1016G  1.00G  1015G        -         -     0%     0%  1.00x    ONLINE  -

The only column that really matches here is ALLOC, which shows the same 1.00GiB as USED did in zfs list. We might expect FREE to match AVAIL, but it doesn’t. Similarly, SIZE does not match AVAIL + USED. What gives? 

First, zfs list shows you filesystem-level details and zpool list shows you block-level details. But more importantly… zpool list simply isn’t fit for this purpose. 

With the extremely simple pool above–consisting of a single device in a single vdev–we only see minor differences between zpool list’s ALLOC and zfs list’s USED 

Let’s see what happens if we use zpool list on a pool with a three-wide RAIDz1 vdev instead: 

root@elden:/tmp# zpool destroy testpool

root@elden:/tmp# zpool create testpool raidz1 /tmp/0.bin /tmp/1.bin /tmp/2.bin

root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/0.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.97829 s, 543 MB/s

root@elden:/tmp# zfs list testpool
NAME       USED  AVAIL  REFER  MOUNTPOINT
testpool  1024M  1.92T  1024M  /testpool

Now, we’re seeing exactly what we would expect: our pool consists of a single three-wide RAIDz1 vdev composed of 1TiB devices, which gives us 2TiB of available space. But when we look at the pool using zpool list, we see something different:  

root@elden:/tmp# zpool status testpool
  pool: testpool
 state: ONLINE
config:

NAME            STATE     READ WRITE CKSUM
testpool        ONLINE       0     0     0
  raidz1-0      ONLINE       0     0     0
    /tmp/0.bin  ONLINE       0     0     0
    /tmp/1.bin  ONLINE       0     0     0
    /tmp/2.bin  ONLINE       0     0     0

errors: No known data errors

root@elden:/tmp# zpool list testpool -o name,size,alloc,free
NAME       SIZE  ALLOC   FREE
testpool  2.98T  1.50G  2.98T

With a RAIDz1 vdev, we’re seeing 2.98TiB SIZE and FREE, and 1.50GiB ALLOC. This means that zpool list is showing us values before parity–we’re seeing the effect of our filesystem on the raw pool itself. 

Since our RAIDz1 vdev must store one parity sector for each two data sectors, that means our 1.00GiB file occupies 1.50GiB on-disk. Similarly, since we’re not accounting for parity, that means our pool of three 1TiB devices shows 3TiB SIZE and FREE, not the 2TiB of logical space available! 

So far, this might actually be useful–but what happens if we blow testpool away again, and this time rebuild it with a mirror vdev? 

root@elden:/tmp# zpool destroy testpool

root@elden:/tmp# zpool create testpool mirror /tmp/0.bin /tmp/1.bin

root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/0.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.94077 s, 553 MB/s

root@elden:/tmp# zpool status testpool
  pool: testpool
 state: ONLINE
config:

NAME            STATE     READ WRITE CKSUM
testpool        ONLINE       0     0     0
  mirror-0      ONLINE       0     0     0
    /tmp/0.bin  ONLINE       0     0     0
    /tmp/1.bin  ONLINE       0     0     0

errors: No known data errors

Okay, we know that testpool is built of a simple two-wide mirror vdev. That means that we’ve got 1TiB of usable space with each of our blocks stored identically on each drive in the vdev. 

Last time, we saw that zpool list showed us raw values for SIZE, ALLOC, and FREE before parity calculations, using the raw number of physical sectors allocated rather than the number of logical sectors available.   

So, we expect to see 2TiB SIZE  and FREE, with 2GiB ALLOCated this time around, right? 

root@elden:/tmp# zpool list testpool -o name,size,alloc,free
NAME       SIZE  ALLOC   FREE
testpool  1016G  1.00G  1015G

Unfortunately, no. zpool list displays values derived from physical sectors (meaning raw values before parity) on RAIDz vdevs, but values derived from logical sectors (meaning values after redundancy) on mirror vdevs.  

We didn’t bother going into the exact calculation considerations that led zpool list and zfs list to slightly disagree on the simple one-device pool earlier, because its inconsistent behavior on more complex pool topologies renders it useless. 

As bad as this was already, remember that some pools will have both RAIDz and mirror vdevs in the same pool–with different meanings of SIZE, ALLOC, and FREE applied to different vdevs in the same pool!  

We do not recommend attempting to use zpool list for space availability calculations and management, period.  

Filesystem-Agnostic Tools 

For the most part, filesystem-agnostic tools get along with OpenZFS just fine. They always report on logical space, not physical space, so there is less to get confused about. 

The major room for confusion with filesystem-agnostic tools like ls, du, and df comes from inline compression and deduplication, when enabled.  

Those tools have no concept of filesystem features which might break the traditional, simple relationship of Size, Used, and Avail–they expect Size to be the sum of Used and Avail, period, with the ability to derive any one of the three from the values of the other two. 

OpenZFS, obviously, complicates things. Does the df tool’s Used column refer to OpenZFS USED, or OpenZFS REFER? And what happens when some files are compressed or deduplicated? 

To find the answer, let’s yet again blow testpool away and start from scratch. We don’t need to worry about pool topology this time as the filesystem-agnostic tools always report logical space–but we will need to look at the impact of snapshots, compression, and dedup. 

root@elden:/tmp# zpool destroy testpool
root@elden:/tmp# zpool create testpool /tmp/0.bin -O compress=off -O dedup=off
root@elden:/tmp# zfs create testpool/plain
root@elden:/tmp# zfs create testpool/compress -o compress=lz4
root@elden:/tmp# zfs create testpool/dedup -o dedup=on

There we go! We’ve got a single pool with compression and dedup off, but with child datasets where they’re enabled. Now, let’s seed it with some data–for this purpose, raw text works best since it’s highly compressible, so we’ll grab a little help from Project Gutenberg. 

root@elden:/testpool# wget -qO huckfinn.txt https://www.gutenberg.org/cache/epub/76/pg76.txt 

root@elden:/testpool# for i in {0..999}; do cat huckfinn.txt >> kilohuck.txt ; done

root@elden:/testpool# rm huckfinn.txt

Now we’ve got a nice fat 594MiB text file that will compress very nicely! Let’s go ahead and put a copy of it in each of our child datasets plain, compress, and dedup: </p

root@elden:/testpool# cp kilohuck.txt compress ; cp kilohuck.txt dedup ; cp kilohuck.txt plain

root@elden:/testpool# zfs list -rt all testpool
NAME                USED  AVAIL  REFER  MOUNTPOINT
testpool           2.10G   982G   594M  /testpool
testpool/compress   369M   982G   369M  /testpool/compress
testpool/dedup      594M   982G   594M  /testpool/dedup
testpool/plain      594M   982G   594M  /testpool/plain

This should look quite familiar: zfs list shows us the REFER of all three child datasets in the USED of the root dataset testpool, and the same value for AVAIL in the parent and all three children. 

The only wrinkle so far is the USED column of testpool/compress, which shows the compressed size of kilohuck.txt, not the raw value. But let’s give ourselves another wrinkle, this time by placing a second copy of kilohuck.txt in testpool/dedup: 

root@elden:/testpool# cp kilohuck.txt dedup/kilohuck2.txt

root@elden:/testpool# zfs list -rt all testpool
NAME                USED  AVAIL  REFER  MOUNTPOINT
testpool           2.68G   982G   594M  /testpool
testpool/compress   369M   982G   369M  /testpool/compress
testpool/dedup     1.16G   982G  1.16G  /testpool/dedup
testpool/plain      594M   982G   594M  /testpool/plain

The USED column in testpool/dedup doubled–showing that we now have two copies of kilohuck.txt in there–but the AVAIL didn’t change, because inline deduplication merely marked every block of kilohuck.txt as belonging to both kilohuck.txt and kilohuck2.txt. Fun! 

Now that we’ve got a sufficiently complex pool, let’s start examining it with some of the most common filesystem-agnostic space management tools to see what they make of it. 

du, the Disk Usage Tool 

The du tool works by finding all of the files in one or more directories, adding them up individually, and reporting the totals. Let’s check it out: 

root@elden:/testpool# du -h /testpool
369M /testpool/compress
595M /testpool/plain
1.2G /testpool/dedup
2.7G /testpool

Reading from bottom to top, we can see that du found 2.7GiB of files beneath /testpool in total. 1.2GiB of that belongs to the two uncompressed copies of kilohuck.txt in testpool/dedup, 595MiB belongs to the uncompressed copy in testpool/plain, and another 369MiB belongs to the final, LZ4-compressed copy of kilohuck.txt in testpool/compress. 

Let’s check the math on that: first up, (595MiB * 2 / 1024 GiB/MiB) == 1.162GiB. This tells us that, as expected, du thinks that testpool/dedup and its two copies of kilohuck.txt occupy twice as much space as testpool/plain and its one copy. 

What about testpool/compress? 

root@elden:/testpool# zfs list testpool/compress
NAME                USED  AVAIL  REFER  MOUNTPOINT
testpool/compress   369M   982G   369M  /testpool/compress

root@elden:/testpool# zfs get compressratio testpool/compress
NAME               PROPERTY       VALUE  SOURCE
testpool/compress  compressratio  1.61x  -

root@elden:/testpool# du -h /testpool/compress
369M /testpool/compress

It’s apparent that du and zfs list agree that the compressed copy of kilohuck.txt in  testpool/compress occupies 369MiB on disk.  

We can also verify that 369MiB multiplied by OpenZFS’ reported compressratio of 1.61 is 594.09MiB–not quite the 595MiB we saw du report for testpool/plain, but close enough for government work!  

Sufficiently modern versions of du can also be configured to report the uncompressed size of files, using the --apparent-size flag: 

root@elden:/testpool# du -h /testpool/compress
369M	/testpool/compress 

root@elden:/testpool# du -h --apparent-size /testpool/compress
594M	/testpool/compress 

There we have it: du reports compressed size, unless you use the --apparent-size argument, in which case it reports on the uncompressed size. 

But what about deduplication? 

root@elden:/testpool# du -h /testpool/dedup 
1.2G	/testpool/dedup 

root@elden:/testpool# du -h --apparent-size /testpool/dedup 
1.2G	/testpool/dedup 

root@elden:/testpool# zfs list testpool/dedup 
NAME             USED  AVAIL  REFER  MOUNTPOINT 
testpool/dedup  1.16G   982G  1.16G  /testpool/dedup 

Not only does du have no idea about deduplication, zfs list doesn’t either–not in terms of USED or REFER, at any rate. Both USED and REFER show the full 1.16GiB occupied by two copies of kilohuck.txt, despite them only occupying half that space on-disk. 

However, zfs list does still get the AVAIL correct–that 982GiB AVAIL won’t change even if we make several more copies of kilohuck.txt in the deduplicated dataset. 

root@elden:/testpool# cp -a kilohuck.txt dedup/kilohuck3.txt 
root@elden:/testpool# cp -a kilohuck.txt dedup/kilohuck4.txt 
root@elden:/testpool# cp -a kilohuck.txt dedup/kilohuck5.txt 

root@elden:/testpool# zfs list testpool/dedup 
NAME             USED  AVAIL  REFER  MOUNTPOINT 
testpool/dedup  2.90G   982G  2.90G  /testpool/dedup 

 As we can see, USED and REFER keep going up. Yet AVAIL keeps staying the same. 

There’s one more thing we should test before we move on from du, and that’s the impact of snapshots. To cleanly see this, let’s place another copy of kilohuck.txt in testpool/plain, then take a snapshot, then delete the extra copy from the live dataset: 

root@elden:/testpool# cp -a kilohuck.txt plain/kilohuck2.txt 
root@elden:/testpool# zfs snapshot testpool/plain@0 
root@elden:/testpool# rm plain/kilohuck2.txt 

root@elden:/testpool# zfs list testpool/plain 
NAME             USED  AVAIL  REFER  MOUNTPOINT 
testpool/plain  1.16G   982G   594M  /testpool/plain 

By now, this part shouldn’t be a surprise: we added an extra copy, took a snapshot, then deleted the extra copy. Since the extra copy still lives on in the snapshot, our USED doubled. 

How will du handle this? 

root@elden:/testpool# du -h /testpool/plain 
595M	/testpool/plain 
root@elden:/testpool# du -h --apparent-size /testpool/plain 
594M	/testpool/plain 

Since du has no concept of snapshots and works by adding up the sizes of individual files, it sees no difference between /testpool/plain before and after our snapshot-related shenanigans.  

The 1MiB of difference we see between du -h and du -h --apparent-size here has nothing to do with OpenZFS–in this case, we’re seeing the difference between kilohuck.txt in terms of its actual data contents, and in terms of how many total sectors it requires on-disk. 

There’s just enough slack space–the empty space at the end of partially-filled sectors–to bump kilohuck.txt up from its native 594MiB to the full 595MiB it occupies in terms of total sector count. 

That just about covers it for du. Now, let’s take a look at another venerable reporting tool which works very differently! 

df, the Disk Free Tool 

The great thing about du, the tool we covered in the last section, is that it doesn’t concern itself with filesystem boundaries.  

But that’s also the unfortunate thing about du–because in order to give you space reporting on a directory with 10 million files in it, it must first stat all ten million files, then add up their size values! 

When you have neither the time nor inclination to wait for du to grovel through thousands or millions of files, there’s a much leaner, faster, more efficient tool available: df, the Disk Free (space) tool. 

Where du adds up the sizes of individual files, df ignores individual files entirely, and instead queries the actual filesystem for its total Size, Used, and Avail metrics. Let’s see how that plays out with testpool: 

root@elden:/testpool# df -h /testpool 
Filesystem      Size  Used Avail Use% Mounted on 
testpool        983G  595M  982G   1% /testpool 

The first noticeable difference here is that, unlike du, df only provides output for testpool itself—not its child datasets. This happens because df operates on filesystems, not directories, and doesn’t recognize the concept of "children." It treats each filesystem as entirely separate and unrelated to other mounted filesystems on the machine. 

Now, let’s look at Size, Used, and Avail as reported. It looks like Used + Avail == Size, but how are they calculated? To make that easier to spot, let’s compare the testpool/compress and testpool/dedup datasets: 

root@elden:/testpool# df -h /testpool/plain 
Filesystem      Size  Used Avail Use% Mounted on 
testpool/plain  983G  595M  982G   1% /testpool/plain 

root@elden:/testpool# df -h /testpool/dedup 
Filesystem      Size  Used Avail Use% Mounted on 
testpool/dedup  985G  3.0G  982G   1% /testpool/dedup 

Since Avail is the same for both child datasets but Size is different, we can see that df is querying the filesystem for real values of Used and Avail, then adding them together to come up with Size. 

The values of Used are accurate for both /testpool/plain and /testpool/dedup, corresponding to the amount of data stored on each. We can also see that df is just as ignorant of OpenZFS dedup as du was, and reports the size of all files added together, despite deduplication.  

Finally, we see that df is completely ignorant of the extra copies of kilohuck.txt that only exist in snapshots of testpool/plain–as far as it’s concerned, those snapshots are entirely separate filesystems. 

The values of Avail are accurate–and, necessarily, the same–for each dataset, since no quotas have been imposed and they all share the same underlying collection of physical sectors. 

By contrast, the Size value as reported by df is both bogus and irrelevant. While you might recreate that value by adding the REFER of all datasets on the pool to the shared AVAIL value of the pool, there’s no practical point in doing so. 

What about compression? 

root@elden:/testpool# df -h /testpool/compress 
Filesystem         Size  Used Avail Use% Mounted on 
testpool/compress  983G  369M  982G   1% /testpool/compress 

root@elden:/testpool# du -h /testpool/compress 
369M	/testpool/compress 

Just like du, the df tool reports the on-disk size of compressed files, after compression.  

Unlike du, there is no optional argument to force the tool to report the raw size of the individual files. This is because df doesn’t know or care about files in the first place! 

ls, ncdu, and find, Oh My! 

I’ve got good news for you, exhausted reader–there’s not really anything else you need to know about additional filesystem-agnostic tools, because they all operate in the same manner as either df or du. 

The majority of tools–including ls, ncdu, and find–operate just as du does, by stating (pulling the metadata) for each individual file that the tool can find. They then aggregate the data together.  

Some of these tools–such as ls–offer the same optional --apparent-size argument that du does, and it generally works the same way.  

You can generally tell the difference between du-like tools and df-like tools because the former take forever to operate on directories with thousands of files. In contrast, df-like tools return answers instantly regardless of the number of files involved. 

Conclusion 

Due to the vagaries of inline compression and synchronous deduplication (when either or both are enabled), it’s not actually possible to accurately predict “how much space is left” on an OpenZFS filesystem at all! 

With that caveat, we can at least see how many blocks are available–preferably, using the zfs-native tool zfs list. We should not use the tool zpool list for the same purpose, because its output is ambiguous, unpredictable, and not usefully organized for that purpose. 

Meanwhile, filesystem-agnostic applications get a good enough picture of what’s going on for most purposes. In particular, they should at least get the value of AVAIL correct in terms of how many blocks are left to write to, even if they have no idea how compression or deduplication might impact that value. 

As for storage administrators–presumably like yourselves, dear readers–you should now understand enough about the OpenZFS-native tools and space accounting to, finally, have a handle on the deceptively complicated answers to how much space you’ve used, how much you’ve got left, and how to reclaim it! 

Back to Articles