Improve the way you make use of ZFS in your company.
Did you know you can rely on Klara engineers for anything from a ZFS performance audit to developing new ZFS features to ultimately deploying an entire storage system on ZFS?
ZFS Support ZFS DevelopmentAdditional Articles
Here are more interesting articles on ZFS that you may find useful:
- ZFS Orchestration Tools – Part 2: Replication
- Isolating Containers with ZFS and Linux Namespaces
- ZFS Orchestration Tools – Part 1: Snapshots
- Key Considerations for Benchmarking Network Storage Performance
- Managing and Tracking Storage Performance – OpenZFS Storage Bottlenecks
It’s not exactly difficult to figure out how much space you’ve got left when you’re using OpenZFS–but it is different from doing so on traditional filesystems, as OpenZFS brings considerably more complexity to the table. Space accounting in OpenZFS requires a different approach due to factors like snapshots, compression, and deduplication.
By the time we’re done today, we’ll understand:
- how to use both filesystem-agnostic tools like du and df
- OpenZFS-native tools like zfs list and zpool list.
First, we need to start with some basic concepts.
Understanding the Basics of Filesystems
In traditional, non copy-on-write filesystems, space management is simple. You’ve got a certain number of physical sectors available on your single storage device–some of them are used, and some of them are not.
In such a filesystem, when we want to know the number of free sectors, we subtract the used sectors from the total available, and, presto–free space! Similarly, we might subtract the free sectors from the available sectors to get the total amount of used space. None of these values are ambiguous, and all are part of a fixed-sum system.
Logical Sectors vs Physical Sectors
OpenZFS brings new concepts to filesystem management that muddy this simple picture a bit: snapshots, inline compression, and block-level deduplication. To effectively manage our OpenZFS filesystem, we’ll need to begin by understanding three properties: USED, REFER, and AVAIL.
All three properties revolve around the status of logical sectors, not physical sectors. Physical sectors are literal units individual to each raw device in the pool. Logical sectors are the “sectors” made directly available to your applications. Proper Space accounting in OpenZFS depends on understanding these distinctions.
To see how this plays out, let’s assume we have a pair of storage devices, each of which offers precisely 1MiB of storage. 1MiB / 4KiB == 256, so each device has 256 physical 4KiB sectors.
In a simple mirror vdev with both drives, the vdev provides 512 physical sectors—the combined sector count of all devices—but only 256 logical sectors, which are the “sectors” available to your applications for reading and writing.
Similarly, if we had three of these mythical 1MiB storage devices and put them in a three-wide RAIDz1 vdev, that vdev would offer 768 physical sectors, and 512 logical sectors. In short, logical sectors represent the storage space after redundancy or parity.
Understanding USED, REFER, and AVAIL
Now that we understand the difference between physical and logical sectors, it’s time to re-examine our understanding of what it means for space to be used, free, or available.
- USED refers to the number of logical sectors allocated to a dataset and its children. Blocks allocated to sub-datasets and snapshots count towards USED.
- REFER refers to the number of logical sectors actively referred to by the dataset, excluding its children and any snapshots.
- AVAIL refers to the number of logical sectors available to be written to with new data. In the absence of imposed quotas, AVAIL refers directly to the number of unused logical sectors in the pool as a whole–otherwise, it refers to the number of logical sectors by which the dataset may grow without exceeding its quota.
Let’s see how this plays out, using a test pool creatively named testpool with three child datasets, just as creatively named 0, 1, and 2:
root@elden:/tmp# find /testpool -type f | xargs ls -lh
-rw-r--r-- 1 root root 1.0G Nov 30 14:10 /testpool/0/0.bin
-rw-r--r-- 1 root root 1.0G Nov 30 14:11 /testpool/0/1/1.bin
-rw-r--r-- 1 root root 1.0G Nov 30 14:11 /testpool/0/1/2/2.bin
As we can see, each of the child datasets has a single 1GiB file in it. How does that play out in practice? We’ll use the ZFS-native tool zfs list to examine:
root@elden:/tmp# zfs list -r testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 3.00G 1.92T 30.6K /testpool
testpool/0 3.00G 1.92T 1024M /testpool/0
testpool/0/1 2.00G 1.92T 1024M /testpool/0/1
testpool/0/1/2 1024M 1.92T 1024M /testpool/0/1/2
The pool’s root dataset, testpool, doesn’t contain any data in it directly. Therefore, its REFER is a measly 30.6KiB, which amounts to just a few sectors of metadata and nothing else. But it shows 3.00GiB USED–because although it contains no data of its own, its child datasets do!
The dataset testpool/0 directly contains the 1GiB file /testpool/0/0.bin, so it REFERs to 1024MiB–in other words, 1GiB–of data. However, its USED is the same as testpool’s–3.00GiB.
Remember, USED is the sum of the REFER of the current dataset and all child datasets–so we add testpool/0’s 1.00GiB to the 1.00GiB each in testpool/0/1 and testpool/0/1/2, and come up with the same 3.00GiB total USED we saw in the root dataset of testpool itself.
Meanwhile, testpool/0/1 has a USED of 2.00GiB–its own 1GiB of REFER plus testpool/0/1/2’s 1GiB of REFER.
This leaves us, finally, with testpool/0/1/2 itself, which has no child datasets, and therefore only has 1GiB showing for both USED and REFER.
What About Snapshots?
Understanding Space Accounting in OpenZFS Snapshots
Now that we understand what USED and REFER mean, let’s briefly talk about how snapshots affect each.
Remember, OpenZFS is a copy-on-write filesystem. This means that every block is immutable once written. You can’t ever alter the value of a block once it’s created–you can only read it or destroy it.
When you or an application tells OpenZFS “I want to change the value of this block,” OpenZFS only seems to do what you asked it to do–in fact, it really creates an entirely new block with the changed value you requested. Then, as a single atomic operation, OpenZFS unlinks the original block from the live filesystem, and replaces it with a link to the new block with the new value you “edited” in.
How Snapshots Impact Space Availability
In the absence of snapshots, this freshly-unlinked block is effectively destroyed, and becomes a part of the FREE space available to the dataset. But if you took a snapshot of the dataset prior to “editing” the block, the original block remains immutable once unlinked from the live filesystem. It is still referenced to all of the snapshots newer than the birth date of this block.
Let’s see how this plays out in action:
root@elden:/tmp# find /testpool -type f | xargs ls -lh
-rw-r--r-- 1 root root 1.0G Nov 30 14:10 /testpool/0/0.bin
-rw-r--r-- 1 root root 1.0G Nov 30 14:11 /testpool/0/1/1.bin
-rw-r--r-- 1 root root 1.0G Nov 30 14:11 /testpool/0/1/2/2.bin
root@elden:/tmp# zfs snapshot testpool/0/1/2@demoshot
root@elden:/tmp# zfs list -rt all testpool/0/1
NAME USED AVAIL REFER MOUNTPOINT
testpool/0/1 2.00G 1.92T 1024M /testpool/0/1
testpool/0/1/2 1024M 1.92T 1024M /testpool/0/1/2
testpool/0/1/2@demoshot 0B - 1024M -
After taking the snapshot demoshot, we see that neither the USED nor the REFER of testpool/0/1/2 has changed–both still report only 1GiB.
This is because the REFER of the dataset and the REFER of its snapshot both still point to exactly the same blocks. Even though we have a “new” snapshot dataset which REFERs to 1GiB of data, the actual number of logical sectors required to store the total amount of data has not changed.
Deleting a File with Snapshots Present
Now, what happens if we delete /testpool/0/1/2/2.bin itself? A couple of things, actually… and they happen quickly:
root@elden:/tmp# rm /testpool/0/1/2/2.bin
root@elden:/tmp# zfs list -rt all testpool/0/1
NAME USED AVAIL REFER MOUNTPOINT
testpool/0/1 2.00G 1.92T 1024M /testpool/0/1
testpool/0/1/2 1024M 1.92T 1024M /testpool/0/1/2
testpool/0/1/2@demoshot 0B - 1024M -
The first thing we see is that nothing apparently changed! This is because modern OpenZFS uses asynchronous destruction.
When you delete files or destroy snapshots, the system acknowledges the command immediately, but it can take seconds or even minutes for the entire operation to complete in the background.
How Space Accounting Adjusts Over Time
Let’s wait a few seconds for OpenZFS to finish unlinking those blocks, then try again:
root@elden:/tmp# sleep 5 ; zfs list -rt all testpool/0/1
NAME USED AVAIL REFER MOUNTPOINT
testpool/0/1 2.00G 1.92T 1024M /testpool/0/1
testpool/0/1/2 1024M 1.92T 30.6K /testpool/0/1/2
testpool/0/1/2@demoshot 1024M - 1024M -
Now we can see the actual effects of deleting a 1GiB file which also exists in a snapshot; since the snapshot is a child of the dataset, testpool/0/1/2 still shows the same 1GiB USED that it always did.
But testpool/0/1/2 now REFERs to only 30.6KiB. Similar to the root dataset of testpool itself, this amounts to just a handful of metadata sectors.
Snapshots and Space Availability
When we look at USED, and REFER of @demoshot itself, we see 1GiB in both columns. The snapshot directly REFERs to all the blocks contained in the deleted file, and it has no children, so that’s straightforward enough. But what about AVAIL?
Since snapshots are entirely immutable, there is never any “available” space inside one. We get a simple dash in that column, indicating that it’s not applicable here.
One Block, Many Snapshots
So far, snapshot space accounting doesn’t seem all that complicated. When you delete a file whose blocks are contained in a snapshot, those blocks disappear from the dataset’s REFER and move to the snapshot’s REFER, remain in the parent’s USED. Simple enough!
It gets a little more complicated when you realize that a block may be referenced in multiple snapshots, not just one. Let’s blow away our old testpool, create a new one, and examine this in action!
root@elden:/tmp# zpool destroy testpool
root@elden:/tmp# zpool create testpool /tmp/0.bin /tmp/1.bin
Now that we’ve gotten rid of the old testpool and recreated it, let’s seed it with three new 1GiB files, taking a new snapshot after creating each file:
root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/0.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.94372 s, 552 MB/s
root@elden:/tmp# zfs snapshot testpool@0
root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/1.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.87999 s, 571 MB/s
root@elden:/tmp# zfs snapshot testpool@1
root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/2.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.89892 s, 565 MB/s
root@elden:/tmp# zfs snapshot testpool@2
There we go! We’ve got three files, 0.bin, 1.bin, and 2.bin. After creating each file, we take a snapshot–so snapshot @2 contains all three 1GiB files, @1 contains only the first two, and @0 only contains the first.
What does this look like so far?
root@elden:/tmp# zfs list testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 3.00G 1.92T 3.00G /testpool
Simple enough–three 1GiB files, 3.00GiB USED and REFER. What about the snapshots themselves?
root@elden:/tmp# zfs list -rt all testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 3.00G 1.92T 3.00G /testpool
testpool@0 13K - 1.00G -
testpool@1 13K - 2.00G -
testpool@2 0B - 3.00G -
Just as we’d expect, our snapshots show 1.00G, 2.00G, and 3.00G REFER in ascending order. But why do they have little or no USED?
Like we said, this is where things get complicated. The USED column of a snapshot itself is a bit different from the USED column of a live dataset–it refers to blocks which exist only in that particular dataset.
In this case, all of the blocks in each dataset also exist in the live filesystem. We see 0B USED for snapshot @2, which hasn’t diverged from the live filesystem at all yet. For snapshots @0 and @1, we see 13KiB apiece–this is a tiny handful of metadata blocks which were “overwritten with new values” as new files were added to the filesystem.
Now, what happens if we delete all three files from the live filesystem?
root@elden:/tmp# rm /testpool/*.bin
root@elden:/tmp# sleep 5 ; zfs list -rt all testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 3.00G 1.92T 24K /testpool
testpool@0 13K - 1.00G -
testpool@1 13K - 2.00G -
testpool@2 1.00G - 3.00G -
Aha! Nothing changed for testpool@0 or testpool@1, but suddenly the USED column of testpool@2 went from 0B to 1.00GiB.
This is because testpool@2 now contains blocks which exist in no other dataset. These are the blocks belonging to the now-deleted file /testpool/2.bin, which are now unlinked from testpool itself, and never existed in testpool@0 or testpool@1.
We can also see that testpool’s own REFER dropped from 3.00GiB to 24KiB, as a result of deleting those three 1GiB files. However, its USED stays unchanged at the same 3.00GiB, because those blocks still belong to its snapshots, which count as child datasets.
The AVAIL for testpool didn’t change either for the same reason. We couldn’t actually mark any logical sectors as available for new writes, since all of the ones we unlinked from the live filesystem are still marked as belonging to one or more snapshots.
What happens if we start destroying those snapshots? Before we get started, let’s use the hidden snapshot directory to remind ourselves of the content of each snapshot:
root@elden:/tmp# find /testpool/.zfs/snapshot/* -type d | xargs ls -lh
/testpool/.zfs/snapshot/0:
total 1.1G
-rw-r--r-- 1 root root 1.0G Nov 30 15:22 0.bin
/testpool/.zfs/snapshot/1:
total 2.1G
-rw-r--r-- 1 root root 1.0G Nov 30 15:22 0.bin
-rw-r--r-- 1 root root 1.0G Nov 30 15:22 1.bin
/testpool/.zfs/snapshot/2:
total 3.1G
-rw-r--r-- 1 root root 1.0G Nov 30 15:22 0.bin
-rw-r--r-- 1 root root 1.0G Nov 30 15:22 1.bin
-rw-r--r-- 1 root root 1.0G Nov 30 15:23 2.bin
root@elden:/tmp# zfs list -rt all testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 3.00G 1.92T 24K /testpool
testpool@0 13K - 1.00G -
testpool@1 13K - 2.00G -
testpool@2 1.00G - 3.00G -
Okay: our oldest snapshot has a single 1GiB file in it, the next one has two, and the final one has all three. We see this reflected in zfs list -rt all, which shows us that only testpool@2 has any significant USED value, since 2.bin is now only REFERred to by testpool@2.
What happens if we destroy testpool@2?
root@elden:/tmp# zfs destroy testpool@2 ; sleep 5
root@elden:/tmp# zfs list -rt all testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 2.00G 1.92T 24K /testpool
testpool@0 13K - 1.00G -
testpool@1 1.00G - 2.00G -
With testpool@2 destroyed and gone, there is no further reference to the blocks contained by the deleted file /testpool/2.bin, so testpool itself goes from a USED of 3.00GiB to 2.00GiB.
The root dataset’s AVAIL increased by the same amount–we just can’t actually see that directly, since testpool’s AVAIL is in TiB, 1.00GiB is a mere 0.001TiB, and that’s not enough to change the displayed value.
Since the USED column of a snapshot measures the blocks only referenced by that specific snapshot, we also see the USED of testpool@1 change from 13KiB to 1.00GiB.
As long as testpool@2 was around, the blocks referenced by /testpool/1.bin were present in both @1 and @2. Yet now that we destroyed @2, those blocks exist only in @1, and therefore show up in its USED for the first time.
Reclaiming Space From Snapshots
This is usually the place where heads have begun to firmly spin. In a normal filesystem, you reclaim storage space simply by deleting files. But on OpenZFS, we need to not only delete the files, but destroy any snapshots which referenced them!
The thing is (as we saw in the last section) the same file may be referenced by multiple snapshots. This makes finding space to reclaim significantly more challenging, in a system which regularly takes snapshots!
Using zfs destroy to Identify Reclaimable Space
We could mess around trying to use the find command looking for which snapshots contain which files, of course–but since the goal is finding space rather than finding files, there’s an easier way: we’ll use zfs destroy with a couple of special arguments.
Assuming we’d like to reclaim another 1GiB of space, which snapshot of testpool’s should we destroy?
root@elden:/tmp# zfs list -rt all testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 2.00G 1.92T 24K /testpool
testpool@0 13K - 1.00G -
testpool@1 1.00G - 2.00G -
root@elden:/tmp# zfs destroy -nv testpool@0
would destroy testpool@0
would reclaim 13K
root@elden:/tmp# zfs destroy -nv testpool@1
would destroy testpool@1
would reclaim 1.00G
There we have it! We can use zfs destroy with the special arguments -n and -v , for “dry run” and “verbose” respectively. Running the command shows what issuing a zfs destroy would do, without actually losing any data!
Space Accounting Challenges with Multiple Snapshots
This is a very simple system, so we already knew that destroying testpool@1 would reclaim 1GiB of space, while destroying testpool@0 would only reclaim a handful of metadata blocks.
But what happens if we have multiple snapshots referencing the same blocks? Let’s blow testpool away and start over again, this time adding a little more complexity.
root@elden:/tmp# zpool destroy testpool
root@elden:/tmp# zpool create testpool /tmp/0.bin /tmp/1.bin
root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/0.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.94212 s, 553 MB/s
root@elden:/tmp# zfs snapshot testpool@0
root@elden:/tmp# zfs snapshot testpool@1
root@elden:/tmp# zfs list -rt all testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 1.00G 1.92T 1.00G /testpool
testpool@0 0B - 1.00G -
testpool@1 0B - 1.00G -
Now we’ve got a single 1GiB file, /testpool/0.bin, which is still referenced in the live filesystem and in two separate snapshots.
Deleting Files Doesn't Always Free Space
What happens if we delete the file?
root@elden:/tmp# rm /testpool/0.bin
root@elden:/tmp# zfs list -rt all testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 1.00G 1.92T 24K /testpool
testpool@0 0B - 1.00G -
testpool@1 0B - 1.00G -
As expected, testpool’s REFER goes down from 1.00GiB to the usual few KiB of metadata blocks, while the REFER of testpool@0 and testpool@1 remain unchanged. But what about the snapshots’ USED column?
Since /testpool/0.bin is still referenced in both snapshots, its blocks are not unique to either. They don’t show up in either snapshot’s USED column. Similarly, destroying one snapshot won’t reclaim any space, since all the blocks linked in testpool@0 are also linked in testpool@1 and vice versa!
Using Snapshot Ranges for Bulk Space Reclamation
So far, zfs destroy -nv isn’t any help either, and for the same reasons: deleting one snapshot won’t reclaim any space:
root@elden:/tmp# zfs destroy -nv testpool@0
would destroy testpool@0
would reclaim 0B
root@elden:/tmp# zfs destroy -nv testpool@1
would destroy testpool@1
would reclaim 0B
Again, this is an artificially simple system, so we already know the answer is to destroy both snapshots in order to reclaim the space occupied by deleted file /testpool/0.bin.
But what if we didn’t already know that?
root@elden:/tmp# zfs destroy -nv testpool@0%1
would destroy testpool@0
would destroy testpool@1
would reclaim 1.00G
By using the special operator %, we can pass OpenZFS a range of snapshots (inclusive) and thereby see the effects of destroying multiple snapshots. As we can see above, if we destroy both testpool@0 and testpool@1, we reclaim 1GiB of space. Fabulous!
OpenZFS-Native Tools
Now that we’ve got the basics under our belt, let’s take another look at zfs list and its unfortunately sneaky, badly misunderstood cousin tool zpool list.
zfs list
First, let’s blow away testpool and recreate it from scratch, yet again, so that we know what we’re working with:
root@elden:/tmp# ls -lh *.bin
-rw-rw-r-- 1 root root 1.0T Nov 30 16:49 0.bin
-rw-rw-r-- 1 root root 1.0T Nov 30 16:49 1.bin
-rw-rw-r-- 1 root root 1.0T Nov 30 15:21 2.bin
root@elden:/tmp# zpool create testpool /tmp/0.bin
root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/0.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.95716 s, 549 MB/s
root@elden:/tmp# zfs list testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 1.00G 983G 1.00G /testpool
As we can see above, we created testpool from a single sparse file of 1TiB in size. Then, we dumped 1GiB of pseudo-random data onto it. As a result, we see 1.00GiB USED, 1.00GiB REFER, and 983GiB AVAIL.
We know that the 1.00GiB of /testpool/0.bin shows in both the USED and REFER columns of testpool itself, because that file is present directly in that dataset.
We can also easily surmise that only 983GiB of the theoretically 1TiB testpool shows as AVAIL due to losing a few GiB to root reservation, as well as the 1GiB used in /testpool/0.bin itself.
zpool list
What happens when we use zpool list instead of using zfs list?
root@elden:/tmp# zpool list testpool
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
testpool 1016G 1.00G 1015G - - 0% 0% 1.00x ONLINE -
The only column that really matches here is ALLOC, which shows the same 1.00GiB as USED did in zfs list. We might expect FREE to match AVAIL, but it doesn’t. Similarly, SIZE does not match AVAIL + USED. What gives?
First, zfs list shows you filesystem-level details and zpool list shows you block-level details. But more importantly… zpool list simply isn’t fit for this purpose.
With the extremely simple pool above–consisting of a single device in a single vdev–we only see minor differences between zpool list’s ALLOC and zfs list’s USED.
Let’s see what happens if we use zpool list on a pool with a three-wide RAIDz1 vdev instead:
root@elden:/tmp# zpool destroy testpool
root@elden:/tmp# zpool create testpool raidz1 /tmp/0.bin /tmp/1.bin /tmp/2.bin
root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/0.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.97829 s, 543 MB/s
root@elden:/tmp# zfs list testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 1024M 1.92T 1024M /testpool
Now, we’re seeing exactly what we would expect: our pool consists of a single three-wide RAIDz1 vdev composed of 1TiB devices, which gives us 2TiB of available space. But when we look at the pool using zpool list, we see something different:
root@elden:/tmp# zpool status testpool
pool: testpool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
/tmp/0.bin ONLINE 0 0 0
/tmp/1.bin ONLINE 0 0 0
/tmp/2.bin ONLINE 0 0 0
errors: No known data errors
root@elden:/tmp# zpool list testpool -o name,size,alloc,free
NAME SIZE ALLOC FREE
testpool 2.98T 1.50G 2.98T
With a RAIDz1 vdev, we’re seeing 2.98TiB SIZE and FREE, and 1.50GiB ALLOC. This means that zpool list is showing us values before parity–we’re seeing the effect of our filesystem on the raw pool itself.
Since our RAIDz1 vdev must store one parity sector for each two data sectors, that means our 1.00GiB file occupies 1.50GiB on-disk. Similarly, since we’re not accounting for parity, that means our pool of three 1TiB devices shows 3TiB SIZE and FREE, not the 2TiB of logical space available!
So far, this might actually be useful–but what happens if we blow testpool away again, and this time rebuild it with a mirror vdev?
root@elden:/tmp# zpool destroy testpool
root@elden:/tmp# zpool create testpool mirror /tmp/0.bin /tmp/1.bin
root@elden:/tmp# dd if=/dev/urandom bs=1G count=1 of=/testpool/0.bin
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.94077 s, 553 MB/s
root@elden:/tmp# zpool status testpool
pool: testpool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/tmp/0.bin ONLINE 0 0 0
/tmp/1.bin ONLINE 0 0 0
errors: No known data errors
Okay, we know that testpool is built of a simple two-wide mirror vdev. That means that we’ve got 1TiB of usable space with each of our blocks stored identically on each drive in the vdev.
Last time, we saw that zpool list showed us raw values for SIZE, ALLOC, and FREE before parity calculations, using the raw number of physical sectors allocated rather than the number of logical sectors available.
So, we expect to see 2TiB SIZE and FREE, with 2GiB ALLOCated this time around, right?
root@elden:/tmp# zpool list testpool -o name,size,alloc,free
NAME SIZE ALLOC FREE
testpool 1016G 1.00G 1015G
Unfortunately, no. zpool list displays values derived from physical sectors (meaning raw values before parity) on RAIDz vdevs, but values derived from logical sectors (meaning values after redundancy) on mirror vdevs.
We didn’t bother going into the exact calculation considerations that led zpool list and zfs list to slightly disagree on the simple one-device pool earlier, because its inconsistent behavior on more complex pool topologies renders it useless.
As bad as this was already, remember that some pools will have both RAIDz and mirror vdevs in the same pool–with different meanings of SIZE, ALLOC, and FREE applied to different vdevs in the same pool!
We do not recommend attempting to use zpool list for space availability calculations and management, period.
Filesystem-Agnostic Tools
For the most part, filesystem-agnostic tools get along with OpenZFS just fine. They always report on logical space, not physical space, so there is less to get confused about.
The major room for confusion with filesystem-agnostic tools like ls, du, and df comes from inline compression and deduplication, when enabled.
Those tools have no concept of filesystem features which might break the traditional, simple relationship of Size, Used, and Avail–they expect Size to be the sum of Used and Avail, period, with the ability to derive any one of the three from the values of the other two.
OpenZFS, obviously, complicates things. Does the df tool’s Used column refer to OpenZFS USED, or OpenZFS REFER? And what happens when some files are compressed or deduplicated?
To find the answer, let’s yet again blow testpool away and start from scratch. We don’t need to worry about pool topology this time as the filesystem-agnostic tools always report logical space–but we will need to look at the impact of snapshots, compression, and dedup.
root@elden:/tmp# zpool destroy testpool
root@elden:/tmp# zpool create testpool /tmp/0.bin -O compress=off -O dedup=off
root@elden:/tmp# zfs create testpool/plain
root@elden:/tmp# zfs create testpool/compress -o compress=lz4
root@elden:/tmp# zfs create testpool/dedup -o dedup=on
There we go! We’ve got a single pool with compression and dedup off, but with child datasets where they’re enabled. Now, let’s seed it with some data–for this purpose, raw text works best since it’s highly compressible, so we’ll grab a little help from Project Gutenberg.
root@elden:/testpool# wget -qO huckfinn.txt https://www.gutenberg.org/cache/epub/76/pg76.txt
root@elden:/testpool# for i in {0..999}; do cat huckfinn.txt >> kilohuck.txt ; done
root@elden:/testpool# rm huckfinn.txt
Now we’ve got a nice fat 594MiB text file that will compress very nicely! Let’s go ahead and put a copy of it in each of our child datasets plain, compress, and dedup: </p
root@elden:/testpool# cp kilohuck.txt compress ; cp kilohuck.txt dedup ; cp kilohuck.txt plain
root@elden:/testpool# zfs list -rt all testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 2.10G 982G 594M /testpool
testpool/compress 369M 982G 369M /testpool/compress
testpool/dedup 594M 982G 594M /testpool/dedup
testpool/plain 594M 982G 594M /testpool/plain
This should look quite familiar: zfs list shows us the REFER of all three child datasets in the USED of the root dataset testpool, and the same value for AVAIL in the parent and all three children.
The only wrinkle so far is the USED column of testpool/compress, which shows the compressed size of kilohuck.txt, not the raw value. But let’s give ourselves another wrinkle, this time by placing a second copy of kilohuck.txt in testpool/dedup:
root@elden:/testpool# cp kilohuck.txt dedup/kilohuck2.txt
root@elden:/testpool# zfs list -rt all testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 2.68G 982G 594M /testpool
testpool/compress 369M 982G 369M /testpool/compress
testpool/dedup 1.16G 982G 1.16G /testpool/dedup
testpool/plain 594M 982G 594M /testpool/plain
The USED column in testpool/dedup doubled–showing that we now have two copies of kilohuck.txt in there–but the AVAIL didn’t change, because inline deduplication merely marked every block of kilohuck.txt as belonging to both kilohuck.txt and kilohuck2.txt. Fun!
Now that we’ve got a sufficiently complex pool, let’s start examining it with some of the most common filesystem-agnostic space management tools to see what they make of it.
du, the Disk Usage Tool
The du tool works by finding all of the files in one or more directories, adding them up individually, and reporting the totals. Let’s check it out:
root@elden:/testpool# du -h /testpool
369M /testpool/compress
595M /testpool/plain
1.2G /testpool/dedup
2.7G /testpool
Reading from bottom to top, we can see that du found 2.7GiB of files beneath /testpool in total. 1.2GiB of that belongs to the two uncompressed copies of kilohuck.txt in testpool/dedup, 595MiB belongs to the uncompressed copy in testpool/plain, and another 369MiB belongs to the final, LZ4-compressed copy of kilohuck.txt in testpool/compress.
Let’s check the math on that: first up, (595MiB * 2 / 1024 GiB/MiB) == 1.162GiB. This tells us that, as expected, du thinks that testpool/dedup and its two copies of kilohuck.txt occupy twice as much space as testpool/plain and its one copy.
What about testpool/compress?
root@elden:/testpool# zfs list testpool/compress
NAME USED AVAIL REFER MOUNTPOINT
testpool/compress 369M 982G 369M /testpool/compress
root@elden:/testpool# zfs get compressratio testpool/compress
NAME PROPERTY VALUE SOURCE
testpool/compress compressratio 1.61x -
root@elden:/testpool# du -h /testpool/compress
369M /testpool/compress
It’s apparent that du and zfs list agree that the compressed copy of kilohuck.txt in testpool/compress occupies 369MiB on disk.
We can also verify that 369MiB multiplied by OpenZFS’ reported compressratio of 1.61 is 594.09MiB–not quite the 595MiB we saw du report for testpool/plain, but close enough for government work!
Sufficiently modern versions of du can also be configured to report the uncompressed size of files, using the --apparent-size flag:
root@elden:/testpool# du -h /testpool/compress
369M /testpool/compress
root@elden:/testpool# du -h --apparent-size /testpool/compress
594M /testpool/compress
There we have it: du reports compressed size, unless you use the --apparent-size argument, in which case it reports on the uncompressed size.
But what about deduplication?
root@elden:/testpool# du -h /testpool/dedup
1.2G /testpool/dedup
root@elden:/testpool# du -h --apparent-size /testpool/dedup
1.2G /testpool/dedup
root@elden:/testpool# zfs list testpool/dedup
NAME USED AVAIL REFER MOUNTPOINT
testpool/dedup 1.16G 982G 1.16G /testpool/dedup
Not only does du have no idea about deduplication, zfs list doesn’t either–not in terms of USED or REFER, at any rate. Both USED and REFER show the full 1.16GiB occupied by two copies of kilohuck.txt, despite them only occupying half that space on-disk.
However, zfs list does still get the AVAIL correct–that 982GiB AVAIL won’t change even if we make several more copies of kilohuck.txt in the deduplicated dataset.
root@elden:/testpool# cp -a kilohuck.txt dedup/kilohuck3.txt
root@elden:/testpool# cp -a kilohuck.txt dedup/kilohuck4.txt
root@elden:/testpool# cp -a kilohuck.txt dedup/kilohuck5.txt
root@elden:/testpool# zfs list testpool/dedup
NAME USED AVAIL REFER MOUNTPOINT
testpool/dedup 2.90G 982G 2.90G /testpool/dedup
As we can see, USED and REFER keep going up. Yet AVAIL keeps staying the same.
There’s one more thing we should test before we move on from du, and that’s the impact of snapshots. To cleanly see this, let’s place another copy of kilohuck.txt in testpool/plain, then take a snapshot, then delete the extra copy from the live dataset:
root@elden:/testpool# cp -a kilohuck.txt plain/kilohuck2.txt
root@elden:/testpool# zfs snapshot testpool/plain@0
root@elden:/testpool# rm plain/kilohuck2.txt
root@elden:/testpool# zfs list testpool/plain
NAME USED AVAIL REFER MOUNTPOINT
testpool/plain 1.16G 982G 594M /testpool/plain
By now, this part shouldn’t be a surprise: we added an extra copy, took a snapshot, then deleted the extra copy. Since the extra copy still lives on in the snapshot, our USED doubled.
How will du handle this?
root@elden:/testpool# du -h /testpool/plain
595M /testpool/plain
root@elden:/testpool# du -h --apparent-size /testpool/plain
594M /testpool/plain
Since du has no concept of snapshots and works by adding up the sizes of individual files, it sees no difference between /testpool/plain before and after our snapshot-related shenanigans.
The 1MiB of difference we see between du -h and du -h --apparent-size here has nothing to do with OpenZFS–in this case, we’re seeing the difference between kilohuck.txt in terms of its actual data contents, and in terms of how many total sectors it requires on-disk.
There’s just enough slack space–the empty space at the end of partially-filled sectors–to bump kilohuck.txt up from its native 594MiB to the full 595MiB it occupies in terms of total sector count.
That just about covers it for du. Now, let’s take a look at another venerable reporting tool which works very differently!
df, the Disk Free Tool
The great thing about du, the tool we covered in the last section, is that it doesn’t concern itself with filesystem boundaries.
But that’s also the unfortunate thing about du–because in order to give you space reporting on a directory with 10 million files in it, it must first stat all ten million files, then add up their size values!
When you have neither the time nor inclination to wait for du to grovel through thousands or millions of files, there’s a much leaner, faster, more efficient tool available: df, the Disk Free (space) tool.
Where du adds up the sizes of individual files, df ignores individual files entirely, and instead queries the actual filesystem for its total Size, Used, and Avail metrics. Let’s see how that plays out with testpool:
root@elden:/testpool# df -h /testpool
Filesystem Size Used Avail Use% Mounted on
testpool 983G 595M 982G 1% /testpool
The first noticeable difference here is that, unlike du, df only provides output for testpool itself—not its child datasets. This happens because df operates on filesystems, not directories, and doesn’t recognize the concept of "children." It treats each filesystem as entirely separate and unrelated to other mounted filesystems on the machine.
Now, let’s look at Size, Used, and Avail as reported. It looks like Used + Avail == Size, but how are they calculated? To make that easier to spot, let’s compare the testpool/compress and testpool/dedup datasets:
root@elden:/testpool# df -h /testpool/plain
Filesystem Size Used Avail Use% Mounted on
testpool/plain 983G 595M 982G 1% /testpool/plain
root@elden:/testpool# df -h /testpool/dedup
Filesystem Size Used Avail Use% Mounted on
testpool/dedup 985G 3.0G 982G 1% /testpool/dedup
Since Avail is the same for both child datasets but Size is different, we can see that df is querying the filesystem for real values of Used and Avail, then adding them together to come up with Size.
The values of Used are accurate for both /testpool/plain and /testpool/dedup, corresponding to the amount of data stored on each. We can also see that df is just as ignorant of OpenZFS dedup as du was, and reports the size of all files added together, despite deduplication.
Finally, we see that df is completely ignorant of the extra copies of kilohuck.txt that only exist in snapshots of testpool/plain–as far as it’s concerned, those snapshots are entirely separate filesystems.
The values of Avail are accurate–and, necessarily, the same–for each dataset, since no quotas have been imposed and they all share the same underlying collection of physical sectors.
By contrast, the Size value as reported by df is both bogus and irrelevant. While you might recreate that value by adding the REFER of all datasets on the pool to the shared AVAIL value of the pool, there’s no practical point in doing so.
What about compression?
root@elden:/testpool# df -h /testpool/compress
Filesystem Size Used Avail Use% Mounted on
testpool/compress 983G 369M 982G 1% /testpool/compress
root@elden:/testpool# du -h /testpool/compress
369M /testpool/compress
Just like du, the df tool reports the on-disk size of compressed files, after compression.
Unlike du, there is no optional argument to force the tool to report the raw size of the individual files. This is because df doesn’t know or care about files in the first place!
ls, ncdu, and find, Oh My!
I’ve got good news for you, exhausted reader–there’s not really anything else you need to know about additional filesystem-agnostic tools, because they all operate in the same manner as either df or du.
The majority of tools–including ls, ncdu, and find–operate just as du does, by stating (pulling the metadata) for each individual file that the tool can find. They then aggregate the data together.
Some of these tools–such as ls–offer the same optional --apparent-size argument that du does, and it generally works the same way.
You can generally tell the difference between du-like tools and df-like tools because the former take forever to operate on directories with thousands of files. In contrast, df-like tools return answers instantly regardless of the number of files involved.
Conclusion
Due to the vagaries of inline compression and synchronous deduplication (when either or both are enabled), it’s not actually possible to accurately predict “how much space is left” on an OpenZFS filesystem at all!
With that caveat, we can at least see how many blocks are available–preferably, using the zfs-native tool zfs list. We should not use the tool zpool list for the same purpose, because its output is ambiguous, unpredictable, and not usefully organized for that purpose.
Meanwhile, filesystem-agnostic applications get a good enough picture of what’s going on for most purposes. In particular, they should at least get the value of AVAIL correct in terms of how many blocks are left to write to, even if they have no idea how compression or deduplication might impact that value.
As for storage administrators–presumably like yourselves, dear readers–you should now understand enough about the OpenZFS-native tools and space accounting to, finally, have a handle on the deceptively complicated answers to how much space you’ve used, how much you’ve got left, and how to reclaim it!

Jim Salter
Jim Salter (@jrssnet ) is an author, public speaker, mercenary sysadmin, and father of three—not necessarily in that order. He got his first real taste of open source by running Apache on his very own dedicated FreeBSD 3.1 server back in 1999, and he's been a fierce advocate of FOSS ever since. He's the author of the Sanoid hyperconverged infrastructure project, and co-host of the 2.5 Admins podcast.
Learn About Klara