Improve the way you make use of ZFS in your company.
Did you know you can rely on Klara engineers for anything from a ZFS performance audit to developing new ZFS features to ultimately deploying an entire storage system on ZFS?
ZFS Support ZFS DevelopmentAdditional Articles
Here are more interesting articles on ZFS that you may find useful:
- World Backup Day 2025: Robust & Reliable Backup Solutions with OpenZFS
- Accurate and Effective Storage Benchmarking
- ZFS Orchestration Tools – Part 2: Replication
- Why ZFS Reports Less Available Space: Space Accounting Explained
- Isolating Containers with ZFS and Linux Namespaces
In our Halloween special webinar we received a cheeky write-in question. We did not have a chance to cover it live, because we wanted to provide a much more in-depth answer, since this is bound to be a question that others will have, if not now, then in the future.
| Can we/should we trust AI for designing best practices when it comes to deploying technical resources? Is there a model that is best fit for these types of situations where we need the best scenario without going so far off the rails?
Before we get into this let's take a moment to discuss what an AI LLM is and how it operates.
What Is An AI LLM?
At its core an LLM (Large Language Model) is nothing more than a statistical model created through training an extremely high dimensional vector map where its connection strengths between linguistic units are saved as values in a giant database. As the input to an LLM is broken down, the connections in this vector map are used to calculate the most probable options for the next possible string.
LLMs are text completion engines that operate on statistics. If you want to understand the math behind LLMs, we highly recommend Episodes 5 and 6 of 3Blue1Brown Neural Networks playlist which focus on GPT LLMs.
It's a statistical model and nothing more.
Now while it's true that the statistical model may provide results that are often a very accurate representation, it cannot be correct all the time because that's not how statistics works. Regardless of how much training data is available and how long it is trained for, there will always be a non-zero chance that there is a valid mathematical answer not accurate according to the data it was trained on. With any statistical model, it's never black and white.
This is where the so-called “AI Hallucinations“ come from. These Hallucinations are mathematically correct statistical answers according to the model’s stored data; however, we recognize it as incorrect because it does not reflect what we know to be true. The term Hallucination is in effect a clever marketing gimmick to cover up the fact that the model can produce incorrect answers. Afterall, it's hard to sell a predictive statistical model that's incorrect a significant percent of the time. With clever marketing along with anthropomorphizing an LLM by calling it "Artificial Intelligence", you can hand wave away any mistakes.
Can We Trust AI For Designing And Tuning ZFS Pools?
So, with that understood, let's return to the issue at hand that we were asked about.
| Can we trust AI to design a ZFS pool, or recommend specific tuning? Is there a specific model that is better at these type of questions?
While an easy answer to this question would be, “no”; let’s look at this from another perspective. Is your data of such little value that you would be willing to let a sysadmin who routinely makes mistakes and recommends broken and destructive code to setup, manage, and help maintain your data?
ZFS is an incredibly powerful file system with a large number of complex adjustable parameters. The defaults in ZFS are the defaults for a reason: they will work reasonably for most workloads most of the time. We need to understand each of these parameters, what they do, how they are used by ZFS and how changing those parameters will interact with other parameters. We don’t even need to dig into the complexities of how ZFS operates and how parameters interact with each other to show that LLMs are not to be trusted.
LLMs are training on content they found on the internet, without:
- Respect for how old it might be
- The fact that it might pertain to a different version of ZFS
- Apply different operating system.
Here are six examples of the responses to simple questions about ZFS parameters from one of the most accurate commercially available LLMs models you can access.
Note: Accuracy based on model performance as of March 2025.
Example 1: spa_slop_shift

The LLM tells us that this parameter controls the extra space reserved for operations. This part is correct; however, the LLM then states that the default is 5 which it says is 32 sectors. This is not correct, as by default the last 3.2% (1/(2^spa_slop_shift)) of pool space is reserved. The minimum SPA slop space is limited to 128 MiB. Since ZFS 2.1.0, the maximum SPA slop space has been limited to 128 GiB, meaning it has not been necessary to tune this value manually to save space on very large pools since 2021.
Official documentation for spa_slop_shift.
Example 2: arc_min_prescient_prefetch_ms

The LLM tells us that the default for this value is 10ms, and that is what is recommended. It goes further to claim that there is no significant performance benefit beyond 10ms for most workloads. In reality, this tunable defaults to 6000ms (6 seconds) in ZFS. This value controls the minimum time “prescient prefetched” blocks are locked in the ARC. ZFS’s prescient prefetch feature examines usage patterns and attempts to guess which block the application will want next, prestaging those in the cache to improve read performance, but this only helps if the data is still in the cache when the application requests it. Setting this tunable to only 10ms as the LLM recommends would result in lots of prefetched data being evicted from the ARC before it may be used. This means all of the work to prefetch those blocks were wasted and will be repeated when the block is actually read a few 100 milliseconds later.
Official documentation for arc_min_prescient_prefetch_ms.
Example 3: dirty_data_max.jpg

This one isn't incorrect but does provide a poor explanation of what is actually going on and could lead to data loss. Dirty Data is information that ZFS needs to store to disk but has not yet been able to put into a TXG and commit to disk. This data has no redundancy or protection. In a system that is doing a large quantity of writes, if you increase the dirty_data amount significantly your application will appear to write faster, but in fact the data is simply being stored as dirty data in RAM. If a power loss or other system failure occurs at this time, all of that data may be lost as it has never been written to disk. The value of dirty_data_max needs to be set carefully to balance the need to improve write I/O through the ability to aggregate more data for writes, and the importance of the data being saved.
Another way to think about this is that this can give a similar experience as the file writing delay with USB 2.0 and older thumb drives. When you could copy large files to them, the file copy UI would report that the file was transferred, but in fact the file copy was on-going in the background for much longer. If the system was shut down or the drive was removed unsafely, the data in flight would be lost, and whatever file was in the process of being written would be corrupted.
Official documentation for dirty_data_max.
Example 4: dnodesize

This example also incorrect. The LLM says that the default is "auto" and that zfs will choose a size between 512b and 32k. The actual default for ZFS is "legacy", and the size can be set to 512-16K; 32K is not an option. The ZFS documentation recommends this be set to auto if you need xattr (extended attributes) properties. Specific values should only be used for performance testing or when the optimal size is already known.
Official documentation for dnodesize.
Example 5: metaslab_aliquot

The LLM misses a lot of important nuances in its explanation, including considering the reader’s familiarity with metaslabs. The value does not set a minimum allocation size so much as it sets a threshold after which ZFS will attempt to choose another vdev to allocate the next block from. This controls the frequency at which ZFS will change top level devices to balance load and free space, while optimizing the number of writes that can be done with the loaded metaslabs.
The LLM also claims that the default is 512KiB, which is incorrect. While the LLM does recommend this be set to 1MiB for Modern ZFS systems, in fact the default is 1Mib in modern ZFS. The LLM claims that this reduces fragmentation and improves allocation efficiency. Both of these are incorrect: changing this value does not reduce fragmentation for large files or improve allocation efficiency. Lastly, the LLM also recommends that this not be set to lower than 512KiB because it can increase metadata overhead. The direct effect of this is mitigated by the “log spacemap” feature introduced in 2019 and released as part of OpenZFS 2.0. While it's probably not a good idea to set this below 512KiB, doing so would not increase metadata overhead to a significant degree.
Since ZFS spreads its writes across the disks in the pool, this parameter controls how much data gets written to a disk before switching to the next disk in the pool.
Official documentation for metaslab_aliquot.
Example 6: redundant_metadata

The LLM claims several things which are incorrect.
Firstly, that only 2 copies are stored of metadata, which is not accurate. ZFS does store an extra copy of metadata, so that if a single block is corrupted, the amount of user data lost is limited. This extra copy is in addition to any redundancy provided at the pool level (e.g. by mirroring or RAID-Z) and is in addition to an extra copy specified by the user via the copies property (up to a total of 3 copies). For example, if the pool is mirrored, copies=2, and redundant_metadata=most, then ZFS stores 6 copies of most metadata, and 4 copies of data and some metadata.
The LLM also claims that the "some" may risk pool operation. This is not correct, setting "some" will not risk the pool being able to operate. If a single on-disk block is corrupt, with the value of "some" set, at worst a single user file can be lost. In fact, even with setting this value to "none" the pool's critical metadata is still redundant. While it could result in a lost dataset, the pool would continue to function.
The LLM claims the default is "most", which is also incorrect. The Default in ZFS is "all".
Official documentation for redundant_metadata.
Can LLM’s Be Used For ZFS Performance And Tuning? No.
While at first glance to a layman the answers given by the LLM may be believed. An individual who doesn't know better and believes the information is accurate would make decisions based on it, which could result in extremely poor performance and potentially suffer catastrophic data loss.
As shown with the examples above, even basic details about the ZFS parameters and their functionality were full of errors, mistakes, outdated information, and outright falsehoods. These answers are from one of the highest-quality LLMs available at the time, and you can plainly see how poorly it answered these simple questions.
While in a decade or two it may be possible to train an LLM specifically for ZFS, it is an ever-evolving project with new features and capabilities being added. An LLM made for ZFS would need to constantly be retrained to keep up with the project development.
A better solution for you, if you need help with your ZFS deployment, is to reach out and work with the engineers that are contributing to making ZFS better every day.
Klara’s ZFS Services connect you directly with upstream contributors offering production-grade support, performance tuning, and architectural guidance—far beyond what any generic AI model can provide.

JT Pennington
JT Pennington is a ZFS Solutions Engineer at Klara Inc, an avid hardware geek, photographer, and podcast producer. JT is involved in many open source projects including the Lumina desktop environment and Fedora.
Learn About Klara