Announcement

Join us for ZFS Basecamp Launch: A Panel with the People Behind ZFS! Learn More

Klara

One of the more commonly asked questions about OpenZFS essentially boils down to one thing: “What’s the fuss?” Historically, filesystems haven’t been a topic of much public interest. They’re quiet, sit behind the scenes, and whichever one you use is usually taken for granted.

Personally, I believe the outsized interest in OpenZFS—which also fueled interest in competitors like btrfs and bcachefs—stems from the project’s similarly outsized focus on keeping your data safe.

We frequently talk about checksums, snapshots, and replication as ways to keep your data safe. Today, we’ll go deeper: if you’re already using these basics, how can you take data safety even further?

Storage Policy Enforcement

At the most basic level, all modern filesystems enforce policies through permissions: files and folders are owned by a specific user, who can grant permissions to read, write, and or modify access to other users and groups.

Policy Enforcement on Traditional Filesystems

This works as far as it goes—but simple filesystem permissions are pretty easy to bypass if you’ve got physical access to the device(s) that the filesystem resides on. Making a file inaccessible to anyone but root is a fine security measure, but if your attacker can simply reboot the system into a live USB environment, that live environment allows them to become root, and therefore access your files!

Aside from controlling access to specific files and folders, storage administrators often need to impose quotas —settings that determine how much data particular users or groups are allowed to write. Although UNIX kernels allow imposition of quotas on any filesystem, that quota management is quite limited—for example, it does not support placing a quota on a directory—and comes with a significant performance cost.

Improved Policy Enforcement with OpenZFS

OpenZFS encryption protects data at rest, closing the loophole which allows an attacker to simply boot into a different environment, become root, and gain access to data they had no access to under their real user account.

OpenZFS also provides its own built-in quota enforcement and management tools—and unlike traditional kernel-level quotas, they offer extremely granular control with no associated performance penalty.

Traditional kernel-level quotas apply only to an entire mounted filesystem—but OpenZFS quotas can be applied directly to any dataset, making it simple, for example, to place a limit on a user’s home directory without imposing the same restriction on a large group of shared network files.

OpenZFS Encryption

How It Differs

At first glance, one might think of OpenZFS encryption as no different from existing full-disk encryption technologies like LUKS or GELI, or even SED (Self-Encrypting Drives). There are some important differences to be aware of:

First, OpenZFS encryption is considerably newer than LUKS, GELI, and most similar technologies—and in this case, “newer” can mean “less time for bugs to be discovered and fixed.” The good news is that the one persistent issue we knew of — ZFS corruption related to snapshots post-2.0.x upgrade— has recently been resolved.

It’s also important to be clear on what OpenZFS encryption doesn’t protect. Full disk encryption technologies encrypt entire devices—OpenZFS encryption encrypts datasets, with no option to encrypt higher in the stack.

On the upside, this means you don’t need to dedicate entire disks or partitions to encrypted data—you can simply mark one or a few datasets for encryption while leaving the rest of the pool unencrypted. On the downside, this means some metadata is unavoidably left unencrypted, even if you set encryption as inheritable all the way at the pool’s root dataset.

Specifically, the names and sizes of datasets can be read in the clear directly from the raw devices a pool is built on, even if the contents of those datasets are encrypted. In practice, we don’t believe this is much of a security issue—while an attacker might see that the name of an encrypted dataset is “business documents,” they cannot see either the data or metadata of any files and folders inside that dataset.

An attacker can, however, determine that the encrypted dataset “business documents” exists, how large it is, and how many snapshots it has (including their names). In practice, this means you should take care not to leak sensitive information in snapshot or dataset names—if you don’t want people to know Bob is a subject within your encrypted dataset, don’t name the dataset “Bob’s Documents” or create a snapshot called “before Bob incorporated!”

Managing Encryption Keys

One of the most important things to remember about any form of encryption is that it is only as strong as your key management practices. If an attacker gains access to your key, your encryption will not stop them—so you must carefully consider the scenarios you expect encryption to safeguard against.

Encryption based on hardware profiles—such as the “automatic” device-level encryption typically offered by enterprise SSDs—can protect you from an attacker simply removing a physical device and attaching it to another system, but may not protect you from the same attacker simply booting from a thumbdrive in the same system, and will have no effect whatsoever on an attacker who already has a toehold in your operating system. Often, this encryption is merely about the ability to securely erase the drive by overwriting the encryption key with a new one, making all data previously written to the drive unrecoverable.

The simplest—and arguably most secure—method of key management is a simple passphrase, which must be typed in directly before an encrypted dataset can be mounted. Obviously, this presents its own problems—what if you’re not available when the system reboots, for example?

OpenZFS also allows keys to be loaded from files or even URLs, which opens the door to many potentially friendlier ways to safeguard data at rest without hampering the ability to reboot the system. One might save the key on a thumbdrive, for example, and configure encrypted datasets to look for the key on that thumbdrive—allowing you to create, essentially, a physical “key” which can be added or removed at will, and without any additional admin privileges necessary.

One may also configure an encrypted dataset to look for a key in an HTTPS URL. This type of management offers a sort of “remote locking” safeguard against physical theft–if an attacker steals your URL-keyed laptop from a coffee shop, you can disable the web page that supplied the key, and the attacker won’t be able to get into your data.

A creative administrator can find plenty of ways to use the key management tools given—for example, a server expected to always be located on-premises might use a private HTTPS URL, only accessible via the local network. If an attacker steals that server and boots it up remotely, the server won’t be able to resolve the private URL to get the key, and the encrypted datasets cannot be mounted.

It is difficult to recommend a single “best practice” about key management, because the best practices are dependent on the security goals of any particular use case. We encourage administrators to carefully think through all likely scenarios in their particular security environments, and design solutions which minimize friction for legitimate uses, while most reliably denying access to attackers.

OpenZFS Replication

It’s important to understand how OpenZFS encryption relates to OpenZFS replication. Without additional arguments, the zfs send command will decrypt encrypted datasets before sending them along to the target running zfs receive.

On the other hand, performing a zfs send from an unencrypted dataset and zfs receive into an encrypted dataset works exactly like you would expect—the source remains in the clear, but the target is encrypted.

What if your source is encrypted, and you don’t ever want that data to be decrypted? In these cases, one invokes the zfs send command with the -w argument, for raw send. Raw send does not decrypt or decompress blocks before sending them—the blocks are sent exactly as they were stored on-disk at the source.

The best thing about raw send is that the target doesn’t need to possess the key, or have the ability to decrypt the encrypted data at all—it simply writes the still-encrypted blocks directly to its own pool, where they remain inaccessible until somebody comes along with a key. Even restorations don’t necessarily require a key on the less-trusted remote target—one might instead choose to raw-send the encrypted data right back from the less-trusted target to the more-trusted source network, and only decrypt and mount the datasets in the more-trusted environment.

Suppose you want your origin to remain fully unencrypted and available, but your remote backups to be encrypted with zero trust. In that case, you’ll need an intermediate step—a backup server in a trusted environment. In this common scenario, an unencrypted production dataset is replicated locally to an encrypted-but-trusted target, then the encrypted-but-trusted copy is replicated via zfs send -w, aka “raw send,” to the final encrypted-and-untrusted remote target.

OpenZFS Delegation

Delegation is one of OpenZFS’s lesser-known policy management tools. In the Unix-like world, we typically see only two permission levels—either you’re root and you get to do all the fun stuff, or you’re not root and you’re limited to unprivileged-user stuff.

By default, this simple binary separation of privilege is how OpenZFS works as well. However, administrators with more complex needs can fine-tune things by delegating classifications of storage operations to non-root users and groups. This delegation is managed with the zfs allow command

OpenZFS delegation can be applied to a dizzying number of operations. For example, one might allow user Bob to manage quotas on the dataset pool/Bob and all dependent datasets, or allow user Alice to load or unload the keys for pool/Alice without granting her access to Bob’s keys (or his quotas).

This kind of fine-grained management is obviously a huge benefit for large teams—but it can also be quite helpful in even single-user, security-conscious environments. In particular, we advise security-focused admins to review Improving Replication Security With OpenZFS Delegation to learn how to enable OpenZFS replication from unprivileged accounts.

Conclusion

OpenZFS is already well known for a focus on keeping your data safe from accidents, but the tools we discussed today also demonstrate its value in protecting data from attackers.

This article can’t tell you when you should or shouldn’t encrypt your data—or which parts to leave out in the clear. For that, you should reach out for an assessment of your specific environment. But it should give you plenty of ideas on how to audit your own environment,  make intelligent decisions, and use OpenZFS’s built-in encryption, replication, and delegation tools to make it as secure as possible.

In particular, a security-conscious admin should think about the most likely attack vectors for any data under their control, and how best to address those threats.

If you are considering how to get more out of OpenZFS encryption in your own environment, Klara’s engineering team can help with developing advanced encryption features tailored to your needs.

Back to Articles