Improve the way you make use of ZFS in your company.
Did you know you can rely on Klara engineers for anything from a ZFS performance audit to developing new ZFS features to ultimately deploying an entire storage system on ZFS?
ZFS Support ZFS DevelopmentAdditional Resources
Here are more interesting articles on ZFS that you may find useful:
- Using Object Storage with OpenZFS and SeaweedFS
- Managing Cache and DirectIO for Databases on ZFS
- Why ZFS Is the Ideal Filesystem for Multi-User/Department Media Production
- How Klara and TrueNAS collaborated to fix one of ZFS’s longest standing limitations
- Safe ZFS Tuning Practices for Production Databases
Which ZFS Storage Metrics Matter for Database Performance
ZFS has a reputation for prioritizing correctness over performance, which is not entirely wrong, but it is only half the truth. In practice, ZFS can deliver excellent database performance when it is understood and tuned with intent. The difficulty is not a lack of capability but the number of metrics exposed and the temptation to watch all of them without understanding which ones actually correlate with database behavior.
A database workload is not abstract. It is a pattern of reads, writes, sync operations, and latency sensitivity. The challenge is to filter the noise and focus on the metrics that meaningfully predict performance under load.
This piece isolates those metrics and connects them directly to database outcomes. The goal is not to list every counter available in zpool iostat or arcstat, but to identify which signals matter when a database becomes slow, stuttery, or unpredictable.
Understanding the Database I/O Profile
Before discussing metrics, it helps to define what the database is asking from the storage layer.
Most databases generate a mix of:
- Small random reads, often index-driven
- Small to medium writes, sometimes sequential within logs
- Synchronous writes for durability guarantees
- Occasional large scans, especially for analytics or maintenance
This combination creates tension as random reads want low latency, synchronous writes want durability without stalling, and large scans want throughput but can evict useful cache data.
ZFS sits between these demands and the physical devices. The metrics that matter are the ones that reveal how well ZFS is reconciling these competing requirements.
Latency
Latency is the single most important metric for database performance. Throughput matters, but databases tend to degrade when latency becomes inconsistent rather than when bandwidth is saturated.
The most direct way to observe this is through:
zpool iostat -lvy 1
Focus on three latency components:
- Read latency
- Write latency
- Sync write latency
High read latency translates directly to slow query execution, especially for index lookups, high write latency affects commit times, and high sync write latency is often the most visible problem because many databases rely on synchronous durability guarantees.
A common mistake is to look at averages only but databases are sensitive to tail latency. A system that usually responds in 1 ms but occasionally spikes to 50 ms can feel slower than one that consistently responds in 5 ms.
To look at a histogram of latency, instead of just the averages:
zpool iostat -wy 1
If latency spikes appear under moderate load, the issue is rarely raw disk speed and is usually related to insufficient caching and in few cases write amplification effects.
IOPS vs Throughput
IOPS is often overemphasized. It is useful (sometimes extremely), but only in context. Databases are typically IOPS driven during transactional workloads. However, ZFS aggregates and transforms I/O internally. A single database operation may not map cleanly to a single disk operation.
High IOPS numbers do not guarantee good performance. What matters is whether those operations are completed quickly.
Throughput becomes more relevant during:
- Full table scans
- Backups
- Replication streams
For these workloads, bandwidth limits may dominate. For most OLTP systems, latency remains the dominant factor. If latency is low and stable, IOPS and throughput are rarely the bottleneck worth chasing.
ARC Efficiency
The ARC is central to ZFS performance. It determines how often the system must go to disk. The key metrics come from arcstat:
- hit% vs
- miss%
- mru vs mfu behavior
- ARC size relative to working set
A high hit rate usually indicates that frequently accessed data is staying in memory. This reduces read latency dramatically, however, the hit rate alone can mislead. A system may show a high hit rate while still performing poorly if the working set slightly exceeds ARC size. In that case, critical data may churn in and out of cache.
More useful signals include:
- Rapid ARC evictions
- Frequent transitions between MRU and MFU
- ARC size oscillating under memory pressure
For databases, stable ARC residency is more important than peak size. If the database working set fits in ARC, performance is often excellent regardless of underlying disk speed.
L2ARC
L2ARC extends ARC onto faster storage such as SSDs. It can improve performance, but again only under specific conditions.
Relevant metrics include:
- L2ARC hit rate
- Feed rate into L2ARC
- Eviction patterns
L2ARC is beneficial when the working set is larger than RAM but still exhibits locality and the secondary device is significantly faster than the main pool.
It is usually less useful when the workload is write-heavy and data access is highly random with little reuse. An important detail is that L2ARC consumes CPU and memory overhead. If ARC is already effective, adding L2ARC may provide little benefit.
While ARC and L2ARC handle data caching, databases are incredibly metadata-intensive. Every select or update requires ZFS to traverse the object tree.
ZFS allows for a Special VDEV, typically a pair of mirrored NVMe drives to store metadata and small blocks. By setting the special_small_blocks property to match your database page size (8k or 16k), you can force the database files themselves onto fast flash while leaving large, sequential files on comparatively cheaper high capacity spinning disks.
ZIL and SLOG
Synchronous writes are critical for database durability. ZFS handles these through the ZFS Intent Log.
Metrics to watch include:
- Sync write latency
- ZIL commit behavior
- SLOG device utilization, if present
When no separate log device exists, sync writes are committed to the main pool. This can introduce latency spikes, especially on rotational media.
A dedicated SLOG device can reduce this latency significantly, but only if it is low latency and properly sized for the workload
If sync write latency is high, database transactions will slow down regardless of read performance. Most databases explicitly call fsync() which forces ZFS to treat all outstanding writes as Synchronous, regardless of whether you have tuned the application.
For non-critical workloads, such as read-replicas or ETL staging areas, setting zfs set sync=disabled<dataset> can provide a massive performance boost by allowing ZFS to aggregate writes in RAM and commit them asynchronously, though this risks losing the last few seconds of data in a power failure.
Queue Depth and Disk Utilization
Disk-level metrics often reveal contention that is not obvious from ZFS statistics alone.
Key indicators include:
- Queue depth
- Device utilization percentage
- Service time per operation
High queue depth combined with rising latency indicates saturation. At this point, additional I/O requests are waiting rather than being serviced.
For databases, this often manifests as slow queries under concurrency, increased commit latency, and periodic stalls.
Adding more vdevs can reduce contention by increasing parallelism as ZFS performance scales with the number of vdevs rather than the total number of disks alone.
Record Size and Write Amplification
ZFS record size has a direct impact on database efficiency. Databases typically perform small writes, often in the range of 8 KB to 16 KB. If ZFS record size is much larger, each small write can trigger a read-modify-write cycle.
Metrics that indirectly reflect this include:
- Write amplification
- Disk write throughput exceeding application write rate
- Increased latency during write-heavy workloads
Matching record size to database page size can reduce unnecessary overhead.
For example:
- PostgreSQL often benefits from a 8 KB record size
- MySQL with InnoDB may align with 16 KB
Compression Ratio and CPU Impact
ZFS compression can improve performance by reducing I/O.
Relevant metrics include:
- Compression ratio
- CPU utilization
- Write latency under load
If compression reduces data size significantly, it can improve effective throughput and reduce disk pressure. While lz4 remains the gold standard for low-latency transactional workloads, Zstd has become the modern industry standard for everything else. For database logs, archives, or backups, zstd (specifically levels 1 through 3) provides significantly higher compression ratios than lz4 with nearly identical performance on multi-core CPUs.
Transaction Group Behavior
ZFS batches writes into transaction groups. This behavior affects both throughput and latency.
Metrics to observe:
- Transaction group quiesce time
- Transaction group commit time
- Write bursts during flush
Large transaction groups can improve throughput but may introduce periodic latency spikes when they are flushed.
For databases, consistent latency is usually more important than peak throughput. If transaction group flushes cause noticeable pauses, tuning dirty data limits and transaction timeout may help.
Beyond just watching TXG commit times, the most vital “hidden” metric for write-heavy databases is Dirty Data.
You can use arcstat or the kstats to monitor zfs_dirty_data_max vs zfs_dirty_data_sync. ZFS allows writes to accumulate in RAM until they reach a certain threshold. If your database writes faster than your disks can flush, you hit the “dirty data” ceiling.
When this happens, ZFS injects artificial delays (throttling) into the application. To a database user, this looks like a sudden, mysterious “hang” or a network timeout, but it is actually ZFS protecting the system from running out of writeable memory.
Fragmentation and Long-Term Performance
Fragmentation affects how efficiently ZFS can allocate and access data over time.
Indicators include:
- Fragmentation percentage
- Increasing read latency over time
- Decreasing sequential throughput
Highly fragmented pools require more disk seeks, which increases latency for random access workloads. Databases are particularly sensitive to this because they rely heavily on predictable access patterns.
Mitigation often involves proper initial layout, avoiding overfilling the pool, and periodic rebalancing through data migration if necessary.
Observability Tools That Matter
Several tools expose the metrics discussed:
- zpool iostat for latency and throughput
- arcstat for cache behavior
- zfs get for dataset properties
- iostat or gstat for device-level metrics
The value of these tools is not in their breadth, but in how their outputs are interpreted together.
For example, high read latency combined with low ARC hit rate points to cache inefficiency. High write latency combined with sync-heavy workload points to ZIL or SLOG limitations.
Diagnosing a Slow Database
When a database slows down, the metrics form a chain of reasoning.
Start with latency. If latency is high, determine whether it is read or write related.
If reads are slow:
- Check ARC hit rate
- Check disk latency
- Look for fragmentation or cache churn
If writes are slow:
- Check sync write latency
- Examine SLOG performance
- Look for write amplification or transaction group behavior
If both are slow:
- Check overall disk utilization
- Look for queue saturation
- Evaluate vdev layout and parallelism
Each metric narrows the problem space (don’t try not to optimize everything) and helps identify the constraint that is actually limiting the database.
Wrapping Up
ZFS exposes a rich set of metrics, but only a subset directly correlates with database performance. Latency sits at the center of this model, with cache efficiency, sync write behavior, and disk contention acting as supporting signals.
Understanding these metrics changes how ZFS is operated. Instead of reacting to isolated numbers, it becomes possible to interpret the system as a whole. That approach is what allows ZFS to support demanding database workloads without sacrificing its core guarantees of integrity and reliability.
This is where Klara’s ZFS Performance Analysis solution can help, working directly with your team running OpenZFS in production to identify bottlenecks, validate assumptions, and tune systems based on actual workload characteristics, not just generic best practices.
Whether the issue is inconsistent latency, poor cache efficiency, or unexplained stalls under load, a structured performance analysis can quickly narrow the problem space and surface the constraints that matter.

Umair Khurshid
Developer, open source contributor, and relentless homelab experimenter.
Learn About Klara




