OpenZFS and storage in general is a complex and important part of any project’s architecture. It should be planned thoughtfully and ideally, ahead of time! In this article, we’ll talk about how to understand, measure, and plan for your storage performance needs.
OpenZFS Native Encryption
OpenZFS Native Encryption
Understand the differences between FreeBSD GELI disk encryption and OpenZFS native encryption
Beginning with version 13.0, FreeBSD supports the long-anticipated OpenZFS native encryption feature. If you’ve used FreeBSD’s GELI encryption in the past, you may have questions regarding the differences between the two encryption schemes, whether you should switch to OpenZFS native encryption, and how to implement it in your environment.
This article begins by summarizing the user-facing differences between FreeBSD GELI disk encryption and OpenZFS native encryption, covering their benefits and limitations. It then provides examples for creating and managing encrypted datasets.
GELI vs OpenZFS Native Encryption
From an end-user implementation perspective, the biggest difference between GELI and OpenZFS native encryption is what gets encrypted. To over-simplify, GELI encrypts disks while OpenZFS encrypts datasets. Let’s look at that distinction more closely:
GELI disk encryption: think of this as a filesystem-agnostic “all or nothing” encryption mechanism which protects physical block devices (disks) below the filesystem layer. At boot time, each GELI-protected disk has to be decrypted before system boot can continue and the overlying filesystem can be mounted. For a system using ZFS that means that each GELI-encrypted disk in a pool has to be decrypted before the pool can be imported, which adds to the complexity of systems with many disks.Worse, ZFS is not aware that it is operating on top of encrypted devices.
NOTE: The 13.0 installer implements GELI if you choose the “encryption” option in the guided ZFS section of the installer. It is recommended to not choose this option during installation if you are going to use the OpenZFS native encryption support.
OpenZFS Dataset Encryption: in contrast to “all or nothing” encryption, OpenZFS native encryption is applied on a per-dataset basis. This offers the flexibility to mix encrypted and non-encrypted datasets in the same pool and means that you do not have to decrypt all datasets when mounting or importing a pool.
With regards to inputting the passphrase for a key: on a GELI system, the password prompt for the key happens early in the boot process; once decryption occurs, the system continues to load as usual. On a system with datasets encrypted with OpenZFS native encryption, bootup occurs normally but encrypted datasets aren’t mounted until the key is loaded.
Benefits of OpenZFS Native Encryption
From a system administrator’s perspective, there are many benefits to using native encryption rather than running ZFS on top of GELI-encrypted disks. The biggest benefit is that you don’t need to mount encrypted datasets (which requires key access) in order to run ZFS administrative tasks using the zfs or zpool commands. This means you can perform scrubs, resilvers, snapshots, replication, and other maintenance tasks on unmounted, encrypted datasets, without requiring access to the key.
All of ZFS’ data integrity checks understand native encryption, even when the encrypted datasets are unmounted. ZFS compression understands native encryption, allowing data to be compressed when saving it to an encrypted dataset.
It is possible to replicate snapshots of unencrypted datasets to encrypted datasets by including the -x encryption option in the zfs recv command. Conversely, it is possible to replicate snapshots of encrypted datasets to unencrypted datasets. And finally it is possible to securely replicate snapshots by including the -w (raw send) option in the zfs send command. Raw send allows replication to an untrusted location, since the data remains encrypted and the key is not exposed to the untrusted location.
What To Be Aware Of
While there are many benefits to OpenZFS native encryption on FreeBSD, there are a few things to be aware of as they affect your filesystem design decisions:
- Native encryption is per dataset, not per pool. This may seem nonintuitive at first and does force you to think where in the pool your sensitive data should live. It does add flexibility to the layout, as you get to pick and choose which datasets to protect with encryption. It also allows different datasets to be encrypted with different keys.
- Encryption can only be enabled at dataset creation time. Fortunately, the encryption property for children datasets is inherited by default from the parent dataset. Since you cannot set the encryption property on an existing dataset, the requirement to create a new, encrypted dataset is a bigger deal if you already have a lot of data currently residing in unencrypted datasets. The solution is to create an encrypted dataset and move the sensitive data from its old location.
- Native encryption does not encrypt all metadata. This is why maintenance tasks can still be performed on an unmounted encrypted dataset. Some ZFS metadata is exposed, such as the name, size, usage, and properties of the dataset. However, the number and sizes of individual files and the contents of the files themselves are inaccessible without the decryption key.
- The FreeBSD boot loader does not yet support booting from an encrypted dataset.
Creating an Encrypted Dataset
Let’s take a closer look at native encryption in action. We have a FreeBSD 13.0 testing system from a default installation. Here are its currently mounted filesystems:
Since encryption is per-dataset and applied at dataset creation, we’ll create a new encrypted dataset in the zroot pool called encrypted. The command line options (-o) indicate that we want encryption enabled, will use a passphrase, and want to be prompted to enter the passphrase whenever we mount the dataset. Note that the passphrase should be memorable (to you), not easily guessable (by others), and at least 8 characters long:
We can verify that the new encrypted dataset is mounted; it was mounted during creation since we were prompted for the passphrase:
We can verify that the new dataset is encrypted by requesting a listing of its encryption properties:
aes-256-gcm is the encryption algorithm; the FreeBSD default is the strongest one currently available. The other properties indicate that we will be prompted for the passphrase when mounting the dataset. Since this is a test system, we’ll reboot and demonstrate that the encrypted dataset is not automatically mounted:
Note that the first zfs mount command failed as encryption keys are not automatically loaded at boot time. Also note that we cannot mount the dataset before the keys are loaded. The zfs load-key command is used to load encryption keys. We asked to load the key for the specified dataset; load-key read the dataset’s encryption properties and prompted us to enter the passphrase. Then, we was able to successfully mount the dataset and verify it was mounted.
After the key is loaded, we can zfs umount and zfs mount all we want without additional prompting.
What happens if we try to unload the key while the encrypted dataset is still mounted?
In other words, while an encrypted dataset is mounted, its key remains loaded so that ZFS can encrypt/decrypt data as it is read from and written to the dataset. If you don’t want a dataset to be remounted after unmounting it, unload its key. Don’t forget that ZFS maintenance tasks can still occur on unmounted datasets, even when their keys are unloaded.
Rerooting to an Encrypted Dataset
In this example, we have an unencrypted pool that already contains data and would like to move that data to encrypted datasets. The pool has a lot of remaining capacity, providing plenty of room to replicate the existing data locally so that we can test the data migration before destroying the original, unencrypted datasets.
Before proceeding, there is a bit of a chicken and egg problem since FreeBSD currently needs to boot into an unencrypted dataset (which also means that we don’t want to destroy the unencrypted root dataset!). Fortunately, we can just switch to an encrypted version of the root dataset after boot—that process is known as rerooting.
Let’s start from scratch on this system. First, we’ll create a recursive snapshot of the entire zroot pool:
zfs snap -R zroot@backup
Then, we’ll create the encrypted dataset:
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt zroot/encrypted
Next, we’ll replicate the snapshot of the pool to the encrypted dataset. The -R in the zfs send command recursively sends all the datasets within the snapshotted pool. The -x encryption in the zfs recv command indicates that the receiving dataset is encrypted. Note that the receive portion of the command will fail if you don’t give a location that does not yet exist! In this example, we’ve chosen to keep the name zroot, but put it beneath a parent dataset named encrypted:
zfs send -v -R zroot@backup | zfs recv -x encryption zroot/encrypted/zroot
If we type mount after the replication completes, we’ll see something interesting:
The encrypted datasets are definitely populated from the snapshot. But which data is mounted? For example, is the /mountpoint populated by zroot/ROOT/default or by zroot/encrypted/zroot/ROOT/default? To find out, get the value of the encryption property:
zfs get encryption / NAME PROPERTY VALUE SOURCE zroot/ROOT/default encryption off default
That makes sense as FreeBSD booted into the unencrypted root dataset. You can repeat that command for each of the mount points to confirm that the unencrypted datasets are the ones which are currently mounted. To switch to the encrypted datasets, change vfs.root.mountfrom using the kenv command. First, let’s look at its default value:
kenv vfs.root.mountfrom zfs:zroot/ROOT/default
Then, specify the same location but within the encrypted dataset. In this example, it is:
Next, use this command to shut down the system without unloading the kernel, so that it can restart with the specified encrypted dataset mounted as root:
You can confirm that the system is now using the encrypted datasets by repeating the previous zfs get command:
zfs get encryption / NAME PROPERTY VALUE SOURCE zroot/encrypted/zroot/ROOT/default encryption aes-256-gcm -
To remove the unencrypted datasets from the mount output, set the mountpoint property of each unencrypted dataset to legacy and add those filesystems to /etc/fstab. For example, to set the legacy mountpoint for /usr/home and /var/tmp:
zfs set mountpoint=legacy zroot/usr/home zfs set mountpoint=legacy zroot/var/tmp
Then add entries for those filesystems to /etc/fstab:
# Device Mountpoint FStype Options Dump Pass# zroot/usr/home /usr/home zfs rw 0 0 zroot/var/tmp /var/tmp zfs rw 0 0
Suggestions for More Complex Scenarios
The example in this article demonstrates the basic workflow for using OpenZFS native encryption:
- Create an encrypted dataset and possibly copy existing sensitive data to it.
- To provide read/write access to the encrypted dataset after the system boots (or after unmounting and unloading the key for an encrypted dataset), first load the key and then mount the encrypted dataset. This allows remote headless systems to boot into a basic system that you can SSH into to enter the passphrase and then re-root into the encrypted system.
- To stop read/write access to an encrypted dataset, unmount it and unload its key.
This basic workflow can be modified for more complex scenarios containing multiple encrypted datasets and encrypted child datasets. For example, on a busy system with many encrypted datasets, you probably don’t want to enter a bunch of passphrases and manually mount multiple filesystems after every system boot! Here are some resources to help you simplify your mounting strategy:
- The load-key and unload-key commands each provide recursive (-r) and all (-a) switches for dealing with multiple datasets. See zfs-load-key(8) and zfs-unload-key(8) for usage details.
- Using prompt for the keylocation is suited to systems with few encrypted datasets or where it is not critical for all datasets to be mounted at boot time. It may also be necessary in environments where the security policy requires a human to manually input passphrases. Systems better suited to automatic mounting of all datasets at system boot should use keyfile for the keylocation and either hex or raw for the keyformat. Refer to the keyfile and keylocation descriptions in zfs-props(8) for more information on how to create a keyfile.
- When automating mounting with a keyfile, add -l (load keys) to any zpool import scripts.
- To change the keylocation or keyformat, use zfs-change-key(8). Despite the name, this command doesn’t actually change the master key that is used to encrypt/decrypt. Instead, it changes a dataset’s encryption properties with the added benefit of not requiring all encrypted data to be re-encrypted because of the change.
Like this article? Share it!
Data security is about preventing data from being disclosed, ensuring that only the correct people can access it. Data integrity ensures the data is correct, that it has not become corrupt due to hardware failure or other issues. With ZFS, you can get both.
If you’re getting ready to close the year, we’ve got you covered with some of the best content that we put out in the past year. Check out our top ZFS and FreeBSD content from 2022 and go down the open source rabbit hole for the holidays!