Announcement

Register now: Open-Source Storage for European Sovereignty  Learn More

Klara

Most people know ZFS as a stable, resilient, and durable file system that excels at large-scale storage and backup workloads. What few understand is that ZFS is also a highly performant file system when configured and tuned correctly. 

ZFS has many aspects that provide performance benefits. From the ARC, to prefetching, metadata devices, DirectIO, and many other capabilities, ZFS is able to provide performance optimizations that other filesystems do not offer. 

Adaptive Replacement Cache (ARC) 

The Adaptive Replacement Cache (ARC) is ZFS's intelligent memory caching system that significantly enhances storage performance by keeping frequently and recently accessed data in RAM. Unlike traditional caches that use a simple “least recently used” algorithm, the ARC dynamically balances between two caching strategies:  

  • It maintains both recently used data (MRU) and frequently used data (MFU), automatically adjusting the ratio based on workload patterns. By serving data directly from memory instead of requiring disk reads, ARC can reduce storage latency from milliseconds to microseconds, dramatically improving application responsiveness and overall system throughput.  

Reading Data with Prefetch 

ZFS's prefetch mechanism proactively reads data from storage into memory before applications explicitly request it, based on detected access patterns and sequential read behavior.  

When ZFS identifies that an application is reading data sequentially or following predictable patterns, it automatically begins fetching subsequent blocks or related metadata into the ARC cache, effectively staying ahead of the application's needs. This anticipatory loading transforms what would otherwise be synchronous, blocking read operations into asynchronous background operations, allowing applications to continue processing without waiting for disk I/O.  

The prefetch system is particularly effective for workloads like media streaming, database scans, or large file operations, where it can eliminate the traditional seek-and-read delays by ensuring data is already resident in memory when requested. By reducing or eliminating wait times for predictable data access patterns, ZFS prefetch can dramatically improve sustained throughput and create smoother, more responsive application performance. ZFS uses a feedback mechanism to determine when a particular prefetch stream is providing value.  

The “read-ahead” window starts relatively small, the size of the first request from the application. Each time ZFS finds that the blocks it prefetched were used within a few seconds, it doubles the read-ahead window for that stream, until it reaches a threshold (default 4 MiB), and then it changes to increasing the window by 1/8th of the current size up to a maximum (default 64 MiB). This allows ZFS to aggressively increase the amount it prefetches, without worrying that it will prefetch too much data that the application will ultimately never use, which would waste I/O that could be used for other applications. 

Metadata VDEVs 

ZFS's dedicated metadata devices (often called special vdevs or special allocation classes) allow administrators to store ZFS indirect blocks, filesystem metadata, small files, and other critical data structures on higher-performance storage devices like SSDs and NVMes, while keeping bulk data on more traditional media like spinning disks. By separating metadata operations from data operations, ZFS can dramatically reduce the impact of metadata-intensive workloads that would otherwise cause frequent head movements and seek penalties on mechanical drives.  

The separation is particularly beneficial for operations like directory traversals, file creation and deletion, permission changes, and small file access, all of which generate significant metadata activity. Since metadata operations are typically latency-sensitive and occur frequently in most workloads, placing them on fast storage devices can reduce filesystem response times from tens of milliseconds to microseconds.  

This hybrid storage architecture can deliver near-SSD performance while maintaining the cost-effectiveness of traditional storage for bulk data, making it especially valuable for workloads with many small files, complex directory structures, or applications that perform frequent metadata operations. By offloading smaller metadata reads to a dedicated device, and preserving the HDDs limited IOPS for large blocks, ZFS is able to maximize the streaming performance of HDDs. 

Direct IO – ZFS Bypass Feature 

ZFS's Direct I/O feature allows applications to bypass the filesystem's buffer cache and write data directly to storage devices, eliminating the traditional copy operations between user space, kernel buffers, and the ARC cache. This feature is particularly beneficial for applications that manage their own caching strategies or handle large datasets that would otherwise overwhelm system memory, such as databases, virtualization platforms, or high-performance computing applications. By avoiding double-buffering scenarios where data exists in both application memory and filesystem cache, Direct I/O reduces memory pressure and eliminates unnecessary CPU overhead from cache management operations.  

The feature also provides more predictable I/O latency since applications can bypass the complexities of cache eviction policies and write-back delays, making it valuable for latency-sensitive workloads that require deterministic storage performance. The DirectIO feature allows an application to request that data it is reading or writing not be added to the cache. This can be especially valuable when the application knows for a fact that this data will not be used again in the near future, thus avoiding potentially removing a number of more useful items from the cache to hold an item that is known not to be required. While Direct I/O trades off some of ZFS's caching benefits, it enables applications to achieve higher throughput and lower resource consumption when they can better manage data locality and caching decisions than the filesystem's general-purpose algorithms.  

Recordsizes Dataset Property 

ZFS recordsize is a dataset property that defines the maximum block size used for storing file data, with a default of 128KB that can be adjusted from 4K up to 16MB depending on workload characteristics.  

Selecting the optimal recordsize is crucial for performance because it directly impacts how efficiently ZFS can read, write, and compress data based on typical file sizes and access patterns in your workload. For applications that modify files in-place, especially large files like databases or VM images, or perform random I/O operations, smaller recordsizes (like 8KiB or 16KiB) reduce write amplification by avoiding the overhead of partially editing large blocks.  

Conversely, workloads involving large sequential files like video editing, database dumps, or backup operations benefit from larger recordsizes (1MiB to 16MiB) that maximize throughput by reducing the number of I/O operations, improve space efficiency by storing fewer indirect blocks, and improving compression ratios. An appropriately tuned recordsize ensures that ZFS's block-level operations align with your application's data patterns, minimizing wasted space, reducing unnecessary I/O overhead, and maximizing both storage efficiency and performance. 

ZFS Parameters 

ZFS provides extensive control over I/O threading and parallelism through adjustable module parameters that allow administrators to optimize performance for specific hardware configurations and workload patterns. Key parameters like zfs_vdev_async_read_max_active and zfs_vdev_async_write_max_active control the number of concurrent I/O operations per device, while settings like zfs_top_maxinflight manage queue depths to prevent overwhelming storage controllers or network-attached storage systems.  

By tuning these parameters alongside device-specific settings, administrators can maximize throughput on high-performance NVMe arrays, optimize latency for virtualization workloads, or balance resources across mixed storage tiers. This granular I/O control highlights ZFS's broader philosophy of providing administrators with fine-grained tunability across virtually every aspect of filesystem behavior.  

From memory allocation and caching policies to compression algorithms and transaction group sizing, ZFS exposes hundreds of configurable parameters that allow system administrators to precisely tailor the filesystem's behavior to their specific use cases, whether optimizing for database performance, media streaming, backup operations, backing for VMs, or other high-frequency transaction applications where every microsecond matters. 

Real Performance Examples 

Through proper configuration and tuning of ZFS to match the needs of specific workloads, we routinely seen gains from anywhere from 10% to 300% greater IO performance depending on the exact use-case. On a recent solution with a customer who had purchased a new NVMe server that needed to optimize for reads of large media files, we were able to improve the systems performance from 31GiB/s with the defaults to over 56GiB/s, a gain of ~85% through careful profiling of their system and customizing the ZFS pool, datasets, and tuning parameters. 

Whether you are planning a new storage deployment, or dealing with a system that is not meeting your performance expectations, the team at Klara has the expertise to help. 

 

Back to Articles