Umair Khurshid
Improve the way you make use of ZFS in your company.
Did you know you can rely on Klara engineers for anything from a ZFS performance audit to developing new ZFS features to ultimately deploying an entire storage system on ZFS?
ZFS Support ZFS DevelopmentAdditional Articles
Here are more interesting articles on ZFS that you may find useful:
- FreeBSD and OpenZFS in the Quest for Technical Independence: A Storage Architect’s View
- Five‑Year Storage Design with OpenZFS: Media Refresh, Rebalancing, and Hardware Independence
- The Real Cost of Technology Dependence: Building Independence with Open-Source Storage
- Designing OpenZFS Storage for Independence: Pool Architecture, Failure Domains, and Migration Paths
- ZFS Fast Dedup for Proxmox VE 9.x: Technical Implementation Guide
The Hidden Value of CPU-Intensive Compression on Modern Hardware
Compression in the storage stack is usually treated with quiet suspicion, and it is assumed that compression is an expensive luxury that consumes CPU cycles needed elsewhere.
This idea rests on a particular hierarchy of constraints in which the processor is scarce and I/O is comparatively expensive but secondary. A legacy persists that, generally, when you enable compression, you are trading performance for additional capacity. Under that assumption, compression is evaluated primarily as overhead rather than as a structural optimization within the storage path. The trade-off seems obvious because the CPU cost is immediate and visible, while the downstream effects on I/O scheduling, cache behavior, and write amplification are less so. With CPUs now having dozens of threads, and modern compression algorithms having throughputs of gigabytes per second, compression can often increase performance rather than trade it off for capacity. Even if there is a performance penalty for some workloads, the trade-off is often for improved cache capacity, rather than just improved storage capacity.
This article revisits these assumptions by examining the trade-off from a system architecture perspective, asking whether CPU cycles remain the dominant constraint on modern storage platforms.
Understanding the Change in Hardware Economics
To understand why the perception of compression has lagged behind reality, it is useful to examine the hardware environment in which the conventional understanding was formed.
In the early 2000s, a typical server was built around processors running in the 2 to 3 GHz range. These CPUs had a single core, maybe with Hyper-Threading offering some limited parallelism. Memory bandwidth was constrained by the Front Side Bus architecture, which peaked at around 6.4 GB per second for dual-channel setups.
Storage was dominated by mechanical hard drives and a 15,000 RPM SCSI drive could sustain sequential reads of 80 to 100 MB per second. Seek times averaged around 4 to 6 milliseconds and in this environment, the I/O subsystem was the clear bottleneck for most workloads.
Consider a database server running on a Xeon 5100 series processor from 2006. These chips introduced the core microarchitecture and offered dual cores. A server with two such processors might have four physical cores total. Storage was likely a RAID array of 10,000 RPM SAS drives delivering perhaps 300 MB per second under ideal conditions.
In this environment, compressing data before writing to disk required CPU time that could have been used for query processing. The savings in I/O were real but the cost was paid in a resource that was already scarce. LZ4 compression (which didn’t exist back then), which is lightweight by modern standards, would consume a measurable fraction of one core and heavier algorithms like gzip were prohibitively expensive for online compression.
Tape systems from this era tell a similar story. Tape drives had native compression hardware, which offloaded the work from the host CPU. Without that hardware, compression was rarely used because the CPU could not keep up with the streaming throughput of the drive.
Virtualization platforms of the era faced similar pressure. Early VMware ESX deployments multiplexed limited cores across multiple guest operating systems, producing sustained CPU contention. Introducing compression into the storage path would have compounded the bottleneck.
In that context, skepticism toward compression made sense, as it was an accurate reflection of the dominant constraint in the system architecture of the time.
Architectural Shift
Multi core processors, wide vector units, large shared caches, and more predictable memory subsystems have materially changed the balance between compute and storage. A contemporary AMD EPYC or Intel Xeon Scalable processor may expose 64 or more physical cores in a single socket. With DDR5 and multiple memory channels, aggregate memory bandwidth can exceed 300 GB per second. CPU capacity has scaled in both parallelism and data movement, not merely in clock speed. These shifts have completely altered the economic calculus of compression.
Compression is no longer justified solely as a space-saving mechanism but functions as a throughput amplifier, a means of reducing write amplification, a control surface for device wear, and a path to higher storage density without proportionally increasing latency. The architectural effect is a relocation of work. Tasks that would otherwise burden storage devices, which remain latency-bound and capital-intensive, are moved to processors that are comparatively abundant and often underutilized.
Storage hardware has also improved over time, but not at a rate comparable to CPU growth. Rotational disks continue to offer larger capacities but show little improvement in latency. NAND-based storage offers significant bandwidth but remains constrained by internal program erase cycles, wear patterns, and limited opportunities for further latency reduction.
By contrast, even mid-range processors provide enough headroom to handle compression that would have crippled earlier systems. This divergence between compute growth and storage stagnation is one of the defining characteristics of modern architecture.
It creates a natural environment in which compression, including CPU-intensive methods, yields net gains. When compression reduces the size of blocks that the filesystem reads or writes, it reduces the amount of data that must traverse the storage pipeline. This directly improves throughput and indirectly reduces latency and wear.
OpenZFS benefits from this environment because compression is integrated into the I/O path. The system compresses blocks before writing them to disk and decompresses them upon retrieval. The CPU cost is contained within a well-predictable section of the pipeline, which means that administrators can make informed decisions about which algorithms and settings match the workload.
The Role of Compression in OpenZFS
OpenZFS incorporates compression into its block allocation strategy. Blocks are compressed before being written to disk, which reduces the number of physical block allocations and the amount of data that the system must later read or repair. When a block compresses to a value smaller than the maximum recordsize segment, that data is stored in a smaller dynamically sized record occupying less storage space while retaining the full logical structure expected by the file system.
A key advantage of this design is that compression in OpenZFS operates transparently to applications. Software sees normally coherent files, but the underlying pool stores the smallest possible representation of that data. Block pointers store the size of the compressed data, so the filesystem can read exactly the bytes required without scanning additional space.
The algorithms available in OpenZFS span a wide range of characteristics. The commonly used ones include LZ4, gzip, and zstd with selectable compression levels. These algorithms differ in speed, compression ratio, and CPU consumption. Modern usage patterns favor LZ4 for general workloads, but zstd has become increasingly relevant because it provides strong compression ratios while maintaining throughput that exceeds the performance of most storage devices. Importantly, zstd maintains the same decompression speed even when data is compressed with higher levels, whereas with gzip the higher compression complexity also impacts decompression speeds.
This architectural model allows administrators to tune compression policies based on dataset requirements. Some datasets benefit from high-ratio compression such as zstd at higher levels, while others need lightweight and predictable compression such as LZ4. The ability to select compression algorithms at the dataset level is a significant benefit because it aligns compression policy with workload structure.
CPU-Intensive Compression as a Storage Multiplier
Traditional views of compression focused on disk savings but on modern hardware, the more significant value comes from reducing the amount of data that leaves memory and enters the storage pipeline. Compression changes the balance of resource utilization. When compressed blocks are smaller, the system experiences:
- Reduced write amplification
- Reduced read bandwidth consumption
- Longer device lifespan
- Faster replication and network transfers
- More effective caching
- Improved resiliency behavior during recovery operations
The CPU is the only resource in this list that improves faster than storage, memory, or network bandwidth. It is therefore the ideal place to offload work that reduces pressure on the slower components.
In OpenZFS, where each block is independently compressed, high-performance compression algorithms can run concurrently across many CPU cores. This ensures that even CPU-intensive algorithms such as zstd at high levels do not stall the system. The work distributes naturally across the available compute resources. ZFS also defaults to using only 75% of available cores for compression and encryption, ensuring these loads can never starve the system of CPU resources.
Compression in OpenZFS therefore becomes a mechanism for substituting inexpensive compute cycles for scarce storage bandwidth. The result is an overall throughput increase that outweighs any CPU overhead incurred.
Interaction with the Memory Hierarchy
Modern compression algorithms are built around the realities of cache and memory bandwidth. LZ4 and zstd both use designs that minimize unpredictable branching and optimize linear memory access. The internal structure enables predictable prefetching, effective use of L1 and L2 caches, and minimal load on DRAM.
These characteristics matter in the context of OpenZFS. When a system compresses or decompresses blocks, the working set must fit within cache for optimal performance. Efficient algorithms make this feasible by reducing the number of memory accesses required.
Smaller compressed blocks also mean that the Adaptive Replacement Cache (ARC) can store more logical data in the same physical memory space. This increases cache hit rates and reduces the frequency with which the system must fetch blocks from storage. The benefit is not limited to read-heavy workloads. Write-heavy workloads gain from storing recently written and compressed blocks inside the ARC while the system performs background tasks such as syncing or transaction group flushing.
The cumulative effect of compression interacting with ARC behavior is significant. It produces improvements in latency, reduces jitter, and increases predictable throughput across workloads that access mixed block sizes.
In a real-world example of these benefits, one Klara customer is able to keep their entire 9 TB database cached in memory for optimal performance, even though the system has only 6TB of RAM(a hardware limit). The advantage provided by compression of the cached blocks in the ARC is the only way this database can maintain the required performance; installing more RAM is not physically possible at this time.
Network and Replication Benefits
Many OpenZFS deployments operate in distributed environments. Snapshots are replicated across sites, datasets are synchronized for disaster recovery, and backups use incremental send and receive operations. These tasks rely heavily on network throughput.
Compression reduces the size of snapshot streams and incremental transfers. The reduction scales with the compression ratio of the underlying data and because snapshot replication operates at the block level, transferring compressed blocks reduces the total data that must traverse the network during backup windows or disaster recovery routines.
On modern hardware, the CPU cost of decompressing or recompressing blocks during replication is negligible compared to network costs. ZFS replication can even avoid this decompression and recompression, sending the blocks compressed as they are on disk. Even high-speed networks benefit from reduced transfer volume because congestion, queuing delays, and switch buffer behavior all become less significant when moving smaller amounts of data.
Compression therefore serves as a network multiplier in the same way it serves as a storage multiplier. It reduces load on the most constrained part of the system and moves work to the CPU where additional throughput is readily available.
Wrapping Up
The value of CPU-intensive compression on modern hardware is far greater than the sum of its individual advantages. Compression has evolved from a tool that saves disk space into a mechanism that improves the entire behavior of storage systems. When used strategically, compression in OpenZFS reduces pressure on storage devices, improves caching efficiency, and increases overall throughput.
Administrators who adopt compression with a strategic perspective can build systems that are more efficient, more resilient, and more cost-effective. Compression is no longer a tax on CPU cycles but an essential feature of modern storage design and one of the most powerful performance multipliers available to OpenZFS.
Klara provides architectural guidance and operational support for teams designing and operating OpenZFS infrastructure. This includes evaluating compression strategies in the context of specific workloads such as databases, virtual machine storage, backup targets, and multi-tenant environments.
For teams seeking to extract the maximum value from modern hardware while maintaining stability and clarity of design, expert guidance can make the difference between theoretical optimization and operational success.

Umair Khurshid
Developer, open source contributor, and relentless homelab experimenter.
Learn About Klara




