Improve the way you make use of ZFS in your company.
Did you know you can rely on Klara engineers for anything from a ZFS performance audit to developing new ZFS features to ultimately deploying an entire storage system on ZFS?
ZFS Support ZFS DevelopmentAdditional Articles
Here are more interesting articles on ZFS that you may find useful:
- Which ZFS Storage Metrics Matter for Database Performance
- How Klara and TrueNAS collaborated to fix one of ZFS’s longest standing limitations
- Safe ZFS Tuning Practices for Production Databases
- Fast Dedup Economics When Deduplication Beats Buying New Disks
- Extending ZFS Performance Without Hardware Upgrades
Compensating for RAM Constraints with L2ARC on ZFS
ZFS's built-in caching is one of its greatest strengths, but it depends heavily on available system RAM. When memory is plentiful, the ARC keeps frequently and recently accessed data close at hand, and read performance is excellent. But what happens when RAM is limited—when the server is already at capacity or the budget simply does not stretch to more DIMMs?
This is where L2ARC on ZFS comes in. By extending ZFS's read cache onto a fast secondary device such as an NVMe SSD, the L2ARC can compensate for a constrained ARC and recover read performance that would otherwise be lost. It is not, however, a drop-in replacement for RAM. Using L2ARC on ZFS effectively on a memory-limited system requires understanding how it works, what it costs in terms of overhead, and how to size and tune it appropriately. That is the focus of this article.
Quick Refresher: The ARC
The ARC, or Adaptive Replacement Cache, is ZFS's primary read cache. It resides entirely in system RAM and is responsible for keeping hot data readily available so that reads can be served from memory rather than from disk.
Unlike the simple Least Recently Used (LRU) caches found in most other filesystems, the ARC maintains two working lists: the Most Recently Used (MRU) list and the Most Frequently Used (MFU) list. Data that has been accessed once goes into the MRU, while the data that has been accessed more than once is promoted to the MFU. This dual-list design gives the ARC an important property: scan resistance. A backup job or other operation that sequentially reads large amounts of data will overflow the MRU, causing older items to be evicted to make room for recently read data, but it will not displace the frequently accessed blocks in the MFU. A pure LRU cache would lose those hot blocks immediately.
The ARC goes further with a pair of "ghost lists" — one for each of the MRU and MFU. When a block is evicted from the cache, its index entry is kept for a time on the corresponding ghost list. If a subsequent read hits one of these ghost entries, the ARC knows that the evicted block was needed again and adjusts the balance between the MRU and MFU accordingly. A ghost hit on the MRU side causes the ARC to grow the MRU at the expense of the MFU; a ghost hit on the MFU side does the reverse. This continuous self-tuning is the "adaptive" quality that gives the ARC its name. No manual intervention is required — it reacts to the workload automatically.
For a deeper exploration of the ARC algorithm and its recent enhancements—including the application of adaptive balancing to data versus metadata—see our companion article, “Applying the ARC Algorithm to itself.”
The critical point for this article is simple: the ARC lives in RAM, so its size is bounded by how much RAM the system has. On a machine with 16 GiB of memory, the ARC might only have 8–10 GiB to work with after the operating system and applications take their share. A smaller ARC means more evictions, more cache misses, and more reads served from slow pool disks. That is the problem the L2ARC is designed to address.
What Is the L2ARC?
The L2ARC—Level 2 ARC—extends ZFS's read cache beyond system RAM and onto one or more fast secondary storage devices, known as CACHE VDEVs. In practice, these are almost always NVMe or SATA SSDs.
Despite the name, the L2ARC is not an ARC. It does not implement the adaptive replacement algorithm at all. Instead, it is a relatively simple ring buffer: data is written in sequentially and, when the device is full, the write pointer wraps around and begins overwriting the oldest entries. This design is deliberate. A ring buffer allows for very efficient, sequential write operations to the CACHE device, which is important both for performance and for SSD write endurance. The trade-off is that the L2ARC's hit ratio will generally be lower than the ARC's—the hottest blocks are remain in the ARC itself, and the L2ARC catches the marginal blocks that would otherwise not be cached at all.
A background feed thread populates the L2ARC by copying blocks that are approaching eviction from the ARC onto the CACHE device—the details of this mechanism are covered in the next section. For each block written to L2ARC, the ARC retains a small in-memory header so that future reads can be served from the fast CACHE device instead of the slower pool disks. Importantly, the L2ARC is strictly a read cache—all writes go to the pool's data VDEVs as normal, and if the CACHE device is lost or removed, no data is lost.
How the L2ARC Works in Detail
Feeding the L2ARC
A background feed thread In L2ARC on ZFS runs periodically, scanning near the tail — the eviction end — of both the MRU and MFU lists in the ARC. It selects blocks that are about to be evicted and writes them to the CACHE VDEV in controlled, throttled bursts. This throttling protects SSD write endurance and ensures the CACHE device remains available to serve reads. If the ARC is evicting blocks faster than the feed thread can write them, the excess blocks are simply discarded — rapid eviction typically signals a streaming workload where caching those blocks would be wasteful.
Several tunables govern the feed rate, warm-up behavior, and scan depth — including l2arc_write_max, l2arc_write_boost,l2arc_feed_secs, and l2arc_headroom. For a detailed walkthrough of these tunables and their default values, see OpenZFS: All About L2ARC.
L2ARC Persistence
Historically, the contents of the L2ARC were lost on every reboot or kernel module reload. The CACHE device would start cold, and the system would have to go through a warm-up period before the L2ARC became useful again.
Since OpenZFS 2.0, persistent L2ARC addresses this problem. When l2arc_rebuild_enabled is set to 1 (the default), ZFS writes log blocks to the CACHE device that allow the L2ARC's contents to be reconstructed when the pool is imported. This means that after a reboot, the L2ARC can immediately populate the ARC with headers pointing to its valid cached data, avoiding the cold-start penalty. This feature is especially valuable on RAM-constrained systems where the warm-up window directly impacts user-facing performance.
The RAM Cost of L2ARC: Metadata Overhead
The L2ARC is not free in terms of RAM. Every block stored on the L2ARC device requires an in-memory header in the ARC — approximately 70 to 80 bytes per block — so that ZFS knows where to find the cached data on the CACHE device. The total RAM consumed by these headers depends on two factors: the size of the L2ARC device and the recordsize of the datasets being cached.
The formula is straightforward:
(L2ARC size in KiB) / (recordsize in KiB) × 70 bytes = ARC header overheadConsider a 512 GiB SSD used as a CACHE VDEV. On a file server with recordsize=1M, most records are 1024 KiB, and the math works out to roughly 35 MiB of ARC consumed by L2ARC headers. That is trivial — even on a system with limited RAM.
Now consider the same 512 GiB CACHE VDEV on a system running MySQL databases with recordsize=16K. Each record is only 16 KiB, so the device holds far more individual blocks, and the header overhead climbs to approximately 2.2 GiB. On a machine with 64 GiB of ARC, that is manageable. On a machine with only 8 GiB of ARC, it is devastating — nearly a quarter of the primary cache would be consumed by pointers to secondary cache, leaving less room on the fastest caching tier for the hottest data.
To guard against this, OpenZFS provides the l2arc_meta_percent tunable (default 33%), which caps the fraction of the ARC that can be used for L2ARC-only headers. If the headers would exceed this limit, L2ARC writes and rebuilds are throttled until the system is back within bounds. This acts as a safety valve, but it also means that an oversized CACHE device on a RAM-limited system may never be fully utilized.
Using L2ARC Effectively on RAM-Constrained Systems
When L2ARC Makes Sense
The L2ARC is most beneficial when the working set—the data that is actively and repeatedly read—is larger than the system RAM but small enough to fit within the combined capacity of RAM and the L2ARC device. Classic examples include read-heavy analytics workloads, serving virtual machine images, or file servers where the active dataset exceeds what the ARC can hold.
The workload should be predominantly read-oriented. Since the L2ARC is a read-only cache, it provides no benefit for write-heavy operations. Critically, it should only be considered after system RAM has already been maximized. An extra 32 GiB of RAM will always outperform a 32 GiB CACHE device because the ARC is an order of magnitude faster and uses a far superior caching algorithm.
When L2ARC Does Not Help
Conversely, L2ARC provides little benefit—or may even hurt—in several scenarios: streaming or sequential workloads where blocks are read once and never revisited, write-heavy workloads, and working sets that vastly exceed the combined size of ARC and L2ARC. On severely RAM-constrained systems, the header overhead of a large CACHE device can also be counterproductive, as discussed in the previous section.
Right-Sizing the L2ARC for Limited RAM
Before adding a CACHE VDEV, calculate the expected header overhead using the formula above. The goal is to match the L2ARC device size to what the system can realistically afford in header memory. A smaller CACHE device with low overhead may deliver better overall performance than a large one that starves the ARC of usable space.
Datasets with larger recordsizes are more L2ARC-friendly because fewer individual blocks mean fewer headers for the same amount of cached data. If you have a mix of datasets with different recordsizes, keep in mind that the datasets with the smallest recordsizes will dominate the header overhead.
Tuning Tips for RAM-Constrained Environments
Several tunables can help make the most of L2ARC on a system with limited RAM.
Be selective about what gets cached. The l2arc_mfuonly tunable controls whether only MFU blocks are written to L2ARC. Setting it to 1 restricts L2ARC to MFU data and metadata only, preventing one-time reads from consuming L2ARC space and the associated ARC header memory. Setting it to 2 caches all metadata (MRU and MFU) but restricts data caching to MFU only—a good compromise when metadata access patterns are less predictable but you want to avoid caching transient data. The default l2arc_noprefetch=1 keeps speculative prefetch data out of L2ARC, which is generally the right choice for memory-constrained systems.
At the dataset level, the `secondarycache` property controls whether a dataset's blocks are eligible for L2ARC at all. It accepts three values: `all` (the default, caching both data and metadata), `metadata` (cache only metadata), or `none` (exclude the dataset from L2ARC entirely). On a RAM-constrained system, setting `secondarycache=none` on datasets that do not benefit from caching may be helpful. Examples of when this would be benefical would be datasets with large, infrequently accessed archives or scratch datasets. This frees up both L2ARC space and the ARC header memory that would have been consumed by their blocks.
Preserve the warm cache. As discussed above, persistent L2ARC (l2arc_rebuild_enabled=1, the default) retains cached data across reboots—especially valuable when the ARC is small and slow to warm up.
Avoid redundant caching. If the pool uses a special metadata VDEV—which already places metadata on a fast device—setting l2arc_exclude_special=1prevents those blocks from being duplicated in L2ARC, saving both CACHE device space and ARC header memory.
Monitoring and Validation
Monitoring is essential when deploying L2ARC on a RAM-constrained system. Check the ARC hit ratios before adding a CACHE VDEV to establish a baseline. On Linux, the relevant statistics are in /proc/spl/kstat/zfs/arcstats; on FreeBSD, use sysctl kstat.zfs.misc.arcstats.
After adding the CACHE device, monitor l2_hits and l2_misses to confirm the L2ARC is delivering meaningful benefit.
The evict_l2_eligible_mru and evict_l2_eligible_mfu statistics show how many evicted bytes were eligible for L2ARC caching from each list. If the majority of eligible evictions are from MRU, it may be worth enabling l2arc_mfuonly to avoid caching transient data.
If overall ARC hit ratios decline after adding L2ARC, the header overhead is likely too high. The remedies are straightforward: lower `l2arc_meta_percent` to cap how much ARC space headers can consume, increase the recordsize of the affected datasets if the workload permits, or accept that the system simply does not have enough RAM to support an L2ARC of that size.
Conclusion
The ARC remains the single best investment for ZFS read performance. It is fast, adaptive, and self-tuning. For any system where read performance matters, the first step should always be to maximize available RAM.
But RAM has limits—physical, budgetary, or both. On systems where memory truly cannot be expanded further, L2ARC on ZFS offers a valuable second tier of caching that bridges the gap between fast DRAM and slow pool disks. An affordable NVMe device, properly sized and configured, can meaningfully extend the reach of ZFS's caching and recover read performance that would otherwise be lost to a small ARC.
Success with L2ARC on a RAM-constrained system comes down to three things: right-sizing the CACHE device so that header overhead does not cannibalize the ARC, choosing workloads where the L2ARC's read-cache behavior provides meaningful benefits, and monitoring the results to confirm that the addition is helping rather than hindering. With that thoughtful approach, the L2ARC becomes a practical and effective tool for getting the most out of ZFS when RAM is at a premium.
If your team needs optimization or development work around ZFS caching—such as enhancing L2ARC behavior or tailoring cache mechanisms to your specific workloads—Klara’s ZFS Caching Enhancements – L2ARC service provides specialized engineering expertise to unlock performance and efficiency for your OpenZFS deployment.

JT Pennington
JT Pennington is a ZFS Solutions Engineer at Klara Inc, an avid hardware geek, photographer, and podcast producer. JT is involved in many open source projects including the Lumina desktop environment and Fedora.
Learn About Klara




