OpenZFS – Data Security vs. Integrity
A secure system does not necessarily provide integrity – and vice versa. Let’s examine the differences between these two concepts. Data security is about preventing data from being disclosed, ensuring that only the correct people can access it. Data integrity ensures the data is correct, that it has not become corrupt due to hardware failure or other issues. With ZFS, you can get both. This may be the reason these two concepts can be so difficult to distinguish from each other. This article explains the differences and focuses on the ZFS features that help keep your data correct and secure.
Understanding ZFS and Data Integrity
ZFS ensures data integrity. From the moment data reaches the filesystem, it is ensured that the files or directories are not altered by means other than those the filesystem provides, but that faulty hardware, or malicious modification. ZFS and the underlying pool which manages the storage media use several mechanisms to avoid and detect corruption of the stored information.
Data corruption can happen in more than one way: broken cabling, faulty memory chips, storage media nearing its meantime between failure (MTBF), and rising temperatures are among the most common sources. It causes the data to either be written incompletely (a shorn write), to the wrong location (a misdirected write), or with incorrect data (corrupt write). When that happens, non-ZFS filesystems may not detect such errors at all.
That causes the incorrect data to be read and handed further up the I/O chain (from main memory or the storage media) to the application requesting it. That leads to applications crashing or, even worse, processing the corrupted data with unintended consequences. With enough redundancy in the pool, ZFS will not only detect that data integrity has been compromised – it can also repair the data automatically.
A process called self-healing will look at checksum information stored with the data. When the stored checksum does not match the calculated one, the redundantly stored data with the correct checksum is used. Simultaneously, the correct data is passed to the application and at the same time, overwrites the wrong data blocks, healing the corruption. Even better: this happens in the background without requiring intervention by an administrator. Of course, if this happens too often, human eyes should look at the cause. Any of the above-listed sources may very well be responsible.
Basic Checksum Theory
A key point of data integrity is having a way to detect errors. A checksum is a cryptographic function that generates a certain unique value called a hash to a given input every time – a file or a string typically. When the same hash is produced for a given input, it is safe to assume that the inputs are identical. As long as the hash function emits a wide enough range of hashes and does not collide (resulting in the same hash output for different inputs), that is.
The checksum algorithms used by ZFS are sufficiently strong that users can trust them not to generate collisions. Each time ZFS accesses stored data in a pool or dataset, a checksum is calculated and compared to the checksum stored with the metadata.
But there is a catch: If there is no prior record of what is considered “healthy” data, there is no way of knowing if the data is correct. This is why traditional filesystems are unable to verify data integrity. ZFS goes further than just verifying the integrity of the data blocks,: it validates the entire filesystem tree, all the entries above and (if present) below. That way the entire filesystem structure is protected by integrity checks.
Using the self-validating Merkle tree, a change in a file causes the checksums of all the metadata pointing to that data to become invalid. That means that when ZFS finds an error it is able to gives information about which other blocks are affected as well. This re-validation and self-correcting may go all the way up to the root of the filesystem tree as metadata changes with it. That is the integrity that ZFS provides.
Many older Unix filesystems either lack end-to-end data integrity entirely or they require extensive filesystem checks (fsck) which may take considerable time to complete. Time in which the storage may not be accessible. These checks are mostly looking for inconsistencies in the layout of the filesystem, and cannot detect, let alone repair, actual data corruption.
ZFS does not need fsck, since its transactional nature means it moves atomically from consistent state to consistent state. What ZFS has instead is a process called scrub. When a scrub is run, each block of data is read, its checksum is recalculated and compared to the one stored with the metadata. When the checksums differ, ZFS heals the error by using any redundancy or parity data that is found to still be intact (matching checksums). This process typically runs on a monthly basis as a background maintenance task. During this time, the storage remains active for reads and writes.
Never turn the dataset checksum property off. There is no noticeable performance gain in doing so, but the risk of data loss increases drastically. That is because when a dataset does not contain checksums, there is no way to verify that data integrity. Each time a file is accessed on such a dataset, ZFS is unable to verify it, leading to potentially corrupt data being passed on to your applications.
ZFS uses some additional means to ensure data integrity. The most important metadata about the pool and datasets within the pool are stored multiple times, in what are called ditto-blocks. This ensures that even if the data cannot be repaired from parity, these additional copies ensure that large amounts of data are not lost due to damage to a single piece of metadata.
What Is Failmode Integrity
Even when disaster strikes, ZFS give you control over how to respond, in order to ensure data integrity. The failmodepool property controls what the system should do in situations when too many storage devices have failed, and the pool is unable to continue normal operations.. By default (failmode=wait), ZFS will block any further I/O, waiting until sufficient devices are restored, and any errors are cleared.
That way, new data cannot be written to a pool which may have lost its capacity to provide enough redundancy to ensure continued data integrity. Reads are also blocked because the remaining devices may not contain sufficient parity to reconstruct the missing data. In both cases, the results are unpredictable. Only when the missing devices are re-attached, will the pool continue to operate as normal.
In failmode=continue, ZFS will do its best to continue to operate, allowing reads of data that is in cache or is able to be successfully reconstructed, and returning errors for the rest. If your applications can gracefully handle these errors, it can allow some limited operations to continue. However, since most applications assume such errors are fatal, it is often the case that the preference is to wait for the filesystem to be restored to working order instead.
Adding ZFS-Integrity to other Filesystems
ZFS volumes are a contiguous virtual storage device of a certain pre-defined size. Volumes are typically used for exporting storage via iSCSI to other systems for use as a local filesystem or as storage for virtual machines. The system using the volume does not know anything about ZFS, it treads the volume the same as it would any other block storage device.
Formatting this free space with a different filesystem and storing data on it gives it many of the data integrity features that ZFS provides. FAT32-formatted ZFS volumes will provide end-to-end checksums and are snapshotable, a feature FAT32 will never have of its own. Because this storage lives on ZFS, it automatically benefits from ZFS’s capabilities. Adding this extra integrity may come with a catch though: not all the features are well understood by the non-ZFS filesystem on top.
Taking a ZFS snapshot of a virtual machine writing a lot of data may not be consistent when rolling it back. Although there are tools to allow the VM guest to quiesce before the ZFS snapshot, to ensure that the VM filesystem in consistent at the time of the snapshot. This is not a flaw of ZFS: the other filesystem sitting on top of the ZFS volume simply does not know about that fact. ZFS sitting below it does not mean the two will communicate with each other and coordinate their actions.
Why does this matter? In non-ZFS storage managers, you have one tool to create a RAID out of several devices. Then, a different tool creates the filesystem on top. Historically, these tools were separately developed and not necessarily knowing about each other. With ZFS being a filesystem with integrated volume manager, both were developed together by the same engineers.
They were conceived to know about each other and communicate about the I/O happening in both directions. This integration breaks with traditional views of storage management and filesystem as separate entities. But the combination allows ZFS features that other filesystems can never have in terms of data integrity.
What About ZFS And Data Security?
Unix systems pioneered the concept of protecting the data on the filesystem from prying eyes. Long before DOS with its single user concept, Unix systems allowed multiple users to work on the system simultaneously. Access permissionbits (rwx) and user/group memberships protected the user’s data in their home directories from each other.
The operating system protects itself by ensuring that only the administrative user or restricted system daemons are allowed to access and run certain binaries. If everyone would have the possibility to change any file as they please, the system would be insecure, unstable, or outright unusable after some time. The data on it simply cannot be trusted anymore as anyone could have done any kind of manipulation to it. Which is why compromised computer systems must be immediately disconnected from the network and power supply to prevent further damage.
The Unix inventors realized this important aspect of information security early on and integrated it into the filesystem layer. Since everything is a file in Unix, protecting that file is one part of the filesystem. Note that other protection mechanisms exist in operating systems like FreeBSD. We will focus on the protections that ZFS adds on top of those.
Take ZFS snapshots for example. They keep the state of a dataset from a certain point in time. This snapshot is immutable. While it is running, ZFS will not allow any conventional means to change the data to preserve the files from the point when the snapshot was taken. Only when a writable clone is created from the snapshot are changes possible, but those changes are to the clone, not the snapshot.
Basic Unix Filesystem Protections
Both individual datasets and the whole pool can be configured to deny any write operations. With the readonlyproperty set to true, not even the root user is allowed to make any changes to the stored data. On ZFS, the administrator can fine-tune at the dataset level which datasets are allowed to be written to and which are not. This allows for more flexibility while keeping that configuration settings stored with the dataset itself rather than in a configuration file like /etc/fstab. When exporting and importing the pool to another system, the settings are applied there as well, instead of having to be re-configured in the other systems /etc/fstab file.
A port committer or maintainer’s first task is often to find ways to “de-Linuxify” software originally developed for a particular Linux environment. Sometimes, this can be as simple as changing paths to prefix /usr/local as the install destination.
Other familiar settings from traditional filesystems are exec and setuid. While turning off the exec dataset prevents any binary files contained on that dataset from being executed, the setuid setting similarly controls if binaries are allowed to escalate privilege to the owning user’s uid (or not). Both protect data to a certain extent from being changed by the wrong process or user, respectively. There is typically no need for execution to be allowed on image files. This may have come as an innocent-looking file attached to an email called invoice. Saving this to a dataset with the exec bit turned off will prevent malicious code from executing. While not 100% safe as malware is ever involving, it adds another layer of protection for the system and ultimately the data stored in the filesystem.
One of the big features ZFS users had been waiting a long time for is encryption. With ZFS 2.0, the feature was finally made available. It only allows datasets to be encrypted during their creation, not retroactively, but it ensures the security of the data better than existing full-disk encryption mechanisms. When the dataset is created, the sysadmin provides a passphrase or key file that is used to encrypt the randomly generated master key.
All user data written to that dataset is encrypted with that master key. Unlike with full disk encryption, the individual datasets can be unmounted, and the encryption keys unloaded from memory, putting the data fully at rest, ensuring it cannot be accessed. This provides better security, since it protects the data even while the system is online, where full-disk encryption only protects data when the machine is off. Child datasets inherit the encryption key from their parents by default, but can be provided their own keys if desired.
Multiple such datasets can sit next to each other with independent keys, allowing only the data that is needed to be online, and keeping the rest of the data securely offline. This also ensures that separate users, or departments placed in the same pool next to each other will not be able to compromise each other’s secrets. Data protection as it should be. But the ZFS developers also thought to build in data integrity as well.
When data is encrypted, the checksum is split in half, part of it storing the same style checksum as unencrypted data, allowing ZFS to continue to detect corruption and repair it even when the encryption key is not loaded, but the second half of the checksum is a MAC, a special type of checksum secured by the encryption key. This MAC allows ZFS to ensure that the data has not been maliciously tampered with, as only someone with the encryption key can generate the correct MAC value.
Protecting by Redaction
ZFS provides another layer of data security: redacted snapshots. Consider the following scenario: a dataset contains sensitive information next to data that is not sensitive. When a backup of that dataset is created, sensitive data like credit card information or personal photos should not be included. ZFS can redact this information from a clone of the dataset that has the information either removed or replaced with dummy values.
ZFS can identify the sensitive blocks that have been altered to not be in the backup, while backing up the other data using “zfs send –redact”. This allows follow-up changes to be included in an incremental backup while keeping the sensitive information out.
ZFS has many built-in features that add extra layers of protection against data integrity issues and helps protect files against prying eyes. Other filesystems provide only a standard-set of these protections, some of which cost extra money or effort to implement. With the open source and freely available version of ZFS, users have a powerful platform for their most important information. Especially the integrity checking is worth checking ZFS out for.
Don’t worry about this being a complex task. ZFS does many of these validations during normal operation and the default settings are good for most people. The ZFS encryption feature is both flexible and secure, giving administrators a choice to set different keys for each dataset to isolate them from different users.