Which ZFS Storage Metrics Matter for Database Performance

ZFS has a reputation for prioritizing correctness over performance, which is not entirely wrong, but it is only half the truth. In practice, ZFS can deliver excellent database performance when it is understood and tuned with intent. The difficulty is not a lack of capability but the number of metrics exposed and the temptation to watch all of them without understanding which ones actually correlate with database behavior.

A database workload is not abstract. It is a pattern of reads, writes, sync operations, and latency sensitivity. The challenge is to filter the noise and focus on the metrics that meaningfully predict performance under load.

This piece isolates those metrics and connects them directly to database outcomes. The goal is not to list every counter available in zpool iostat or arcstat, but to identify which signals matter when a database becomes slow, stuttery, or unpredictable.

Understanding the Database I/O Profile

Before discussing metrics, it helps to define what the database is asking from the storage layer.

Most databases generate a mix of:

Small random reads, often index-driven

Small to medium writes, sometimes sequential within logs

Synchronous writes for durability guarantees

Occasional large scans, especially for analytics or maintenance

This combination creates tension as random reads want low latency, synchronous writes want durability without stalling, and large scans want throughput but can evict useful cache data.

ZFS sits between these demands and the physical devices. The metrics that matter are the ones that reveal how well ZFS is reconciling these competing requirements.

Latency

Latency is the single most important metric for database performance. Throughput matters, but databases tend to degrade when latency becomes inconsistent rather than when bandwidth is saturated.

The most direct way to observe this is through:

zpool iostat -lvy 1

Focus on three latency components:

Read latency

Write latency

Sync write latency

High read latency translates directly to slow query execution, especially for index lookups, high write latency affects commit times, and high sync write latency is often the most visible problem because many databases rely on synchronous durability guarantees.

A common mistake is to look at averages only but databases are sensitive to tail latency. A system that usually responds in 1 ms but occasionally spikes to 50 ms can feel slower than one that consistently responds in 5 ms.

To look at a histogram of latency, instead of just the averages:

zpool iostat -wy 1

If latency spikes appear under moderate load, the issue is rarely raw disk speed and is usually related to insufficient caching and in few cases write amplification effects.

IOPS vs Throughput

IOPS is often overemphasized. It is useful (sometimes extremely), but only in context. Databases are typically IOPS driven during transactional workloads. However, ZFS aggregates and transforms I/O internally. A single database operation may not map cleanly to a single disk operation.

High IOPS numbers do not guarantee good performance. What matters is whether those operations are completed quickly.

Throughput becomes more relevant during:

Full table scans

Backups

Replication streams

For these workloads, bandwidth limits may dominate. For most OLTP systems, latency remains the dominant factor. If latency is low and stable, IOPS and throughput are rarely the bottleneck worth chasing.

ARC Efficiency

The ARC is central to ZFS performance. It determines how often the system must go to disk. The key metrics come from arcstat:

hit% vs

miss%

mru vs mfu behavior

ARC size relative to working set

A high hit rate usually indicates that frequently accessed data is staying in memory. This reduces read latency dramatically, however, the hit rate alone can mislead. A system may show a high hit rate while still performing poorly if the working set slightly exceeds ARC size. In that case, critical data may churn in and out of cache.

More useful signals include:

Rapid ARC evictions

Frequent transitions between MRU and MFU

ARC size oscillating under memory pressure

For databases, stable ARC residency is more important than peak size. If the database working set fits in ARC, performance is often excellent regardless of underlying disk speed.

L2ARC

L2ARC extends ARC onto faster storage such as SSDs. It can improve performance, but again only under specific conditions.

Relevant metrics include:

L2ARC hit rate

Feed rate into L2ARC

Eviction patterns

L2ARC is beneficial when the working set is larger than RAM but still exhibits locality and the secondary device is significantly faster than the main pool.

It is usually less useful when the workload is write-heavy and data access is highly random with little reuse. An important detail is that L2ARC consumes CPU and memory overhead. If ARC is already effective, adding L2ARC may provide little benefit.

While ARC and L2ARC handle data caching, databases are incredibly metadata-intensive. Every select or update requires ZFS to traverse the object tree.

ZFS allows for a Special VDEV, typically a pair of mirrored NVMe drives to store metadata and small blocks. By setting the special_small_blocks property to match your database page size (8k or 16k), you can force the database files themselves onto fast flash while leaving large, sequential files on comparatively cheaper high capacity spinning disks.

ZIL and SLOG

Synchronous writes are critical for database durability. ZFS handles these through the ZFS Intent Log.

Metrics to watch include:

Sync write latency

ZIL commit behavior

SLOG device utilization, if present

When no separate log device exists, sync writes are committed to the main pool. This can introduce latency spikes, especially on rotational media.

A dedicated SLOG device can reduce this latency significantly, but only if it is low latency and properly sized for the workload

If sync write latency is high, database transactions will slow down regardless of read performance. Most databases explicitly call fsync() which forces ZFS to treat all outstanding writes as Synchronous, regardless of whether you have tuned the application.

For non-critical workloads, such as read-replicas or ETL staging areas, setting zfs set sync=disabled<dataset> can provide a massive performance boost by allowing ZFS to aggregate writes in RAM and commit them asynchronously, though this risks losing the last few seconds of data in a power failure.

Queue Depth and Disk Utilization

Disk-level metrics often reveal contention that is not obvious from ZFS statistics alone.

Key indicators include:

Queue depth

Device utilization percentage

Service time per operation

High queue depth combined with rising latency indicates saturation. At this point, additional I/O requests are waiting rather than being serviced.

For databases, this often manifests as slow queries under concurrency, increased commit latency, and periodic stalls.

Adding more vdevs can reduce contention by increasing parallelism as ZFS performance scales with the number of vdevs rather than the total number of disks alone.

Record Size and Write Amplification

ZFS record size has a direct impact on database efficiency. Databases typically perform small writes, often in the range of 8 KB to 16 KB. If ZFS record size is much larger, each small write can trigger a read-modify-write cycle.

Metrics that indirectly reflect this include:

Write amplification

Disk write throughput exceeding application write rate

Increased latency during write-heavy workloads

Matching record size to database page size can reduce unnecessary overhead.

For example:

PostgreSQL often benefits from a 8 KB record size

MySQL with InnoDB may align with 16 KB

Compression Ratio and CPU Impact

ZFS compression can improve performance by reducing I/O.

Relevant metrics include:

Compression ratio

CPU utilization

Write latency under load

If compression reduces data size significantly, it can improve effective throughput and reduce disk pressure. While lz4 remains the gold standard for low-latency transactional workloads, Zstd has become the modern industry standard for everything else. For database logs, archives, or backups, zstd (specifically levels 1 through 3) provides significantly higher compression ratios than lz4 with nearly identical performance on multi-core CPUs.

Transaction Group Behavior

ZFS batches writes into transaction groups. This behavior affects both throughput and latency.

Metrics to observe:

Transaction group quiesce time

Transaction group commit time

Write bursts during flush

Large transaction groups can improve throughput but may introduce periodic latency spikes when they are flushed.

For databases, consistent latency is usually more important than peak throughput. If transaction group flushes cause noticeable pauses, tuning dirty data limits and transaction timeout may help.

Beyond just watching TXG commit times, the most vital “hidden” metric for write-heavy databases is Dirty Data.

You can use arcstat or the kstats to monitor zfs_dirty_data_max vs zfs_dirty_data_sync. ZFS allows writes to accumulate in RAM until they reach a certain threshold. If your database writes faster than your disks can flush, you hit the “dirty data” ceiling.

When this happens, ZFS injects artificial delays (throttling) into the application. To a database user, this looks like a sudden, mysterious “hang” or a network timeout, but it is actually ZFS protecting the system from running out of writeable memory.

Fragmentation and Long-Term Performance

Fragmentation affects how efficiently ZFS can allocate and access data over time.

Indicators include:

Fragmentation percentage

Increasing read latency over time

Decreasing sequential throughput

Highly fragmented pools require more disk seeks, which increases latency for random access workloads. Databases are particularly sensitive to this because they rely heavily on predictable access patterns.

Mitigation often involves proper initial layout, avoiding overfilling the pool, and periodic rebalancing through data migration if necessary.

Observability Tools That Matter

Several tools expose the metrics discussed:

zpool iostat for latency and throughput

arcstat for cache behavior

zfs get for dataset properties

iostat or gstat for device-level metrics

The value of these tools is not in their breadth, but in how their outputs are interpreted together.

For example, high read latency combined with low ARC hit rate points to cache inefficiency. High write latency combined with sync-heavy workload points to ZIL or SLOG limitations.

Diagnosing a Slow Database

When a database slows down, the metrics form a chain of reasoning.

Start with latency. If latency is high, determine whether it is read or write related.

If reads are slow:

Check ARC hit rate

Check disk latency

Look for fragmentation or cache churn

If writes are slow:

Check sync write latency

Examine SLOG performance

Look for write amplification or transaction group behavior

If both are slow:

Check overall disk utilization

Look for queue saturation

Evaluate vdev layout and parallelism

Each metric narrows the problem space (don’t try not to optimize everything) and helps identify the constraint that is actually limiting the database.

Wrapping Up

ZFS exposes a rich set of metrics, but only a subset directly correlates with database performance. Latency sits at the center of this model, with cache efficiency, sync write behavior, and disk contention acting as supporting signals.

Understanding these metrics changes how ZFS is operated. Instead of reacting to isolated numbers, it becomes possible to interpret the system as a whole. That approach is what allows ZFS to support demanding database workloads without sacrificing its core guarantees of integrity and reliability.

This is where Klara’s ZFS Performance Analysis solution can help, working directly with your team running OpenZFS in production to identify bottlenecks, validate assumptions, and tune systems based on actual workload characteristics, not just generic best practices.

Whether the issue is inconsistent latency, poor cache efficiency, or unexplained stalls under load, a structured performance analysis can quickly narrow the problem space and surface the constraints that matter.

Topics / Tags

zfs tuning L2ARC ARC

Back to Articles

Embedded ARM Development Experts

OpenZFS Development & Support

FreeBSD Development & Support

Stay Informed and Make Smart Business Decisions with Klara's Resources

Unlock the Power of OpenZFS, Linux, and FreeBSD with Klara's Open Source Development Experts

Which ZFS Storage Metrics Matter for Database Performance

Additional Resources

Which ZFS Storage Metrics Matter for Database Performance

Understanding the Database I/O Profile

Latency

IOPS vs Throughput

ARC Efficiency

L2ARC

ZIL and SLOG

Queue Depth and Disk Utilization

Record Size and Write Amplification

Compression Ratio and CPU Impact

Transaction Group Behavior

Fragmentation and Long-Term Performance

Observability Tools That Matter

Diagnosing a Slow Database

Wrapping Up