As commercial storage becomes increasingly expensive, more and more of the Education vertical is looking at Open Source solutions for storage. In this article, we discuss the value of OpenZFS for Universities and how system administrators can best leverage it to their benefit.
History of ZFS – Part 3: Heading Into the Future
History of ZFS
Part 3: Heading Into the Future
This is part of our article series published as “History of OpenZFS”. Subscribe to our article series to find out more about the secrets of OpenZFS
From its birth at Sun, ZFS grew exponentially in popularity. Many were impressed by its revolutionary features, and ported it to run on their systems. They were able to do this thanks to Sun open sourcing the code. As it was expanding onto other operating systems, its mother company was swallowed up by Oracle, who shut the door on access to the code. From the ashes grew the OpenZFS project, which continues to make ZFS available to those outside of Oracle’s walled garden.
Up until now, we have been focusing on the history of the extraordinary filesystem that is ZFS. Today, we will take a look at what the future holds for ZFS. There are a number of recent changes, and more coming down the pipe to make the future of ZFS better for both developers and users.
Uniting the Streams
As we have stated in the previous articles of this series, ZFS is a revolutionary piece of technology. However, very few pieces of technology (no matter how revolutionary) are without issue. ZFS has been plagued by division. As we noted in the previous part of the series, there are separate projects working on ZFS support for different operating systems. Each of these projects is working on adding features and compatibility for their system. They are limited by the resources available to them.
To mark this joining of the streams, the lead developer of ZoL Brian Behlendorf announced on November 30, 2020, that “The ZFS on Linux project has been renamed OpenZFS!” This announcement also pointed to a series of new features in OpenZFS 2.0.
These new features include:
Over time, arrays eventually suffer failed disks and need to be rebuilt. ZFS handles this differently than other systems. For example, RAID performs a “more pedestrian block-by-block whole-disk rebuild”. On the other hand, ZFS “only needs to touch the used portion of the disk”. However, if an array is almost full, the advantage of the ZFS method loses its advantage.
To regain the advantage, Sequential resilvering was introduced to allow ZFS to rebuild (or resilver) certain types of arrays much more quickly.
ZFS uses the Adaptive Replacement Cache (or ARC) to “cache read requests”. ARC’s make use of the system’s RAM. When the ARC’s a full, ZFS uses the Second Level Adaptive Replacement Cache (or L2ARC). Unlike ARCs, L2ARCs run on SSDs. A L2ARC is a “SSD-based read cache” and is filled with “blocks in the ARC nearing eviction”. This means that L2ARCs don’t run as fast as ARCs because SSDs are slower than memory. On the plus side, SSDs are cheaper than buying large amounts of RAM.
The problem is that the L2ARCs are emptied after every reboot and take time to populate. OpenZFS 2.0 makes the L2ARC cache persistent between reboots. This eliminates “the usual cache warmup time normally needed after importing your pool.”
Traditionally, ZFS has used the LZ4 compression algorithm. LZ4 has “relatively poor compress ratio but very light CPU loading”. OpenZFS 2.0 introduces a new compression algorithm named ZStandard (or zstd). Zstd is created by Yann Collet, who also wrote LZ4. The goal is to create an algorithm for ZFS that has compression levels similar to GZIP, “but with much better performance”. According to tests, zstd-2 “achieves 50 percent higher compression in return for a 30 percent throughput penalty”. When the disk is being decompressed and read, “the throughput penalty is slightly higher, at around 36 percent.”
Redacted streams is a unique feature that allows the user to not backup certain data. You might choose not to back up this information because it is of a sensitive nature, or just to save space. This is accomplished by cloning the data set that includes the data you don’t want backed up. Next, the data that is not to be backed-up is removed from the clone. Then, a bookmark is created for the parent data set “which marks the blocks which changed from the parent to the clone”. Finally, you use the `–redact redaction_bookmark` argument to back up everything, except for what you don’t want backed up.
When Will OpenZFS 2.0 Be Available for Users?
For those who want to use these features today, it’s easy if you have FreeBSD. OpenZFS 2.0 is available for FreeBSD 12.0 from the ports system. Once you install OpenZFS 2.0 from ports, it will overwrite the base ZFS. If you have FreeBSD 13.0 installed, you already have OpenZFS 2.0.
If, on the other hand, you are using ZFS on Linux, it is recommended that you wait until the developers of your distro make OpenZFS 2.0 available.
New Features on the Horizon
Even more features are planned for OpenZFS, based on the roadmap. Here are just a few of them.
ZFS Compatibility Layer
Since ZFS was born at Sun Microsystems, the code is full of “Solaris-isms” which can lead to confusion on non-Solaris systems such as BSD and Linux. The goal is to replace the current Solaris Portability Layer with a platform-neutral ZFS Compatibility Layer to avoid future confusion and make the code easier to understand.
Declustered parity RAID (or DRAID)
We mentioned resilvering earlier. DRAID is another attempt to speed up the resilvering process. “During normal dRAID operations, all the data are randomly distributed utilizing all the drives.” In regular RAID Z, the spare drives are idle during the resilvering process, but in DRAID “spares block are distributed as logical spares rather than physical drive unit”. In other words, during the resilvering process, data is read from all the disks and written across the drives that make up the DRAID volume.
Sometimes you are forced to run ZFS on a system that you don’t control, such as a server in the cloud. This often means that storage is not preallocated. Eager Zero allows you to “forcibly write data to the raw blocks occupying empty metaslabs”. This would cause the storage to preallocate the required blocks without causing a penalty.
These are just a couple of the upcoming changes to OpenZFS. As this piece of software continues to evolve, it can only go from strength to strength.
Explore the options of OpenZFS 2.0 and learn what new features were added in the upgrade with our article on getting started with OpenZFS 2.0.
Like this article? Share it!
Discover how OpenZFS can provide cost-effective and reliable storage for high-performance computing (HPC) workloads in this comprehensive write-up.
The most common category of ZFS questions is “how should I set up my pool?” Sometimes the question ends “… using the drives I already have” and sometimes it ends with “and how many drives should I buy.” Either way, today’s article can help you make sense of your options.