Improve the way you make use of ZFS in your company.
Did you know you can rely on Klara engineers for anything from a ZFS performance audit to developing new ZFS features to ultimately deploying an entire storage system on ZFS?
ZFS Support ZFS DevelopmentAdditional Articles
Here are more interesting articles on ZFS that you may find useful:
- Isolating Containers with ZFS and Linux Namespaces
- ZFS Orchestration Tools – Part 1: Snapshots
- Managing and Tracking Storage Performance – OpenZFS Storage Bottlenecks
- Winter 2024 Roundup: Storage and Network Diagnostics
- ZFS Storage Fault Management on Linux
Considerations When Measuring Network Storage Performance
When setting up a Network Attached Storage (NAS) system using ZFS, it's essential to understand the performance users can expect when accessing files. A NAS on the office network, on a VPN, or in the cloud acts as a central hub for data. Its ability to handle multiple users reading and writing files efficiently is critical to its success.
Performance testing is a valuable step to gauge how well the NAS will serve its users, but it’s not always a straightforward task. Different protocols – such as SMB, NFS, and iSCSI – handle file operations in unique ways, resulting in their own set of challenges. Workloads vary significantly, and each workload is different. An application that accesses large volumes of small files will need different tuning than video editing software that is streaming a small volume of very large files.
Each different protocol has its benefits. For instance, SMB is optimal for compatibility across operating systems, NFS for rock solid stability for Unix systems, and iSCSI provides block-level access for VMs. However, they also have unique details that need to be considered when seeking to maximize their performance. Before diving into testing, it's important to consider these factors to ensure a meaningful and accurate assessment of your NAS's capabilities.
Benchmarking SMB over the network
Understanding SMB and Its Testing Challenges
Using ZFS as the NAS for Windows clients or mixed client infrastructure is typically accomplished through the use of SMB shares.
SMB, or Server Message Block, is a network file-sharing protocol that allows applications and users to access files and other resources on a networked computer. Originally developed by IBM and further enhanced by Microsoft, SMB is widely used in Windows environments, though it is compatible with other operating systems. SMB enables users to interact with files and directories on a remote server as if they were on their local machine. Its versatility makes it a popular choice for file sharing in both home and business networks.
CrystalDiskMark is a popular windows IO testing tool and can be a tempting method to test SMB shares. The tool has some configuration options that can be adjusted to influence how the test is run. However, it was designed for testing local disks, and its lack of precise control over how the IOs are issued results in varied reliability over the network. It is a great utility for gaining a quick estimation of a system's potential performance, but it should not be relied upon for high accuracy results.
Know that CrystalDiskMark cannot be used to test network SMB shares when run by a user that is an admin on a windows machine. The utility must be started by a non-privileged user, as detailed in the CrystalDiskMark FAQ.
Tools and Best Practices for SMB Performance Testing
Another quick and easy utility for testing SMB shares on Windows is the Blackmagic Design Raw Speed Test. BMD RAW Speed Test is especially useful for simulating a media-heavy workload. Yet, it would not be as reliable for testing other workloads.
For testing SMB we recommend the industry standard `fio` that we have covered in other articles. However, there are some special considerations when running `fio` on windows systems.
Due to Windows’ caching if you are testing using PowerShell on a windows box to an SMB share, you will need to use the --direct flag. This avoids interference from the Windows IO caching subsystem.
Klara would recommend you avoid attempting to run FIO in a WSL environment. The mix of emulation, virtualization, and caching within WSL can produce very unreliable results.
Key Metrics and Factors for Network Benchmarking
When benchmarking over the network, just as with local benchmarking, it is important to evaluate performance at every layer of the stack. In this case, that involves looking at:
- IOPS,
- Latency,
- Throughput from the benchmark,
- CPU usage by the SMB client and the OS network stack on the device under test, as well as on the server.
In addition, monitoring network utilization during the test can reveal possible bottlenecks. If the network is saturated or very unevenly loaded, it may impact the results more than the storage performance itself.
Optimizing Performance with SMB Multi-Channel Support
The performance of an SMB server and client can vary significantly based on the network's capabilities and the type of workload. Enabling multi-channel support may lead to much greater performance in these scenarios. First, this will have the client make multiple TCP connections to the server, allowing the transfer to be spread over multiple network flows. Second, it can enhance performance in several ways, including:
- Spreading the load over multiple queues on the network controller (allowing the driver to take advantage of multiple CPU cores)
- Using multiple network interfaces concurrently
- Allowing the SMB server to use multiple threads to parallelize I/O operations.
If the network is using Link Aggregation (such as IEEE 802.3ad LACP), multiple flows can also allow the load to be balanced across multiple links and switch ports, offering greater bandwidth.
In order to benchmark multi-link, enough concurrent I/O will need to be created to keep multiple flows busy. It is also important to monitor the network and determine how many flows are being used. Additionally, check which links are getting traffic to ensure multi-channel is configured correctly. The required configuration can be specific to the workload and the network environment. If you are looking to optimize your storage and network to achieve maximum throughput, consider Klara’s Performance Analysis Solution.
Benchmarking NFS over a network
Matching NFS and ZFS Block Sizes
When using NFS to share ZFS, it is important to ensure that the NFS block size and ZFS record size are matched. Mismatching these can result in catastrophic performance penalties. The first section in our "ZFS Performance Optimization Success Stories" outlines one such case.
Understanding Read and Write Amplification
The two primary performance issues that can occur are: read/write amplification and inflation.
Write amplification occurs when the physical write to disk must be larger than the logical write, the data that is changing. With a default ZFS configuration of 128 KiB records, any changes to files smaller than the record size, such as 32 KiB, require rewriting the entire 128 KiB record. This occurs even though only 32 KiB of data has been modified. As a result, the system uses four times the disk bandwidth compared to a scenario where the record size matches the size of the changes.
A similar issue can occur during reads. If the record size is 1 MiB and an application requests only 4 KiB of data, ZFS must read the entire 1 MiB record, verify its checksum, and then return just the 4 KiB the application needs. Subsequent requests for the next 4 KiB will typically be served from the ZFS ARC, making the impact less significant as the data was prefetched. However, if the application accesses a random subset of blocks, ZFS might end up reading 256 times the amount of data actually required to fulfill those requests.
Challenges with NFS Block Sizes
NFS deployment commonly encounter a version of this issue. I/O performance suffers if the NFS block size is smaller than the ZFS dataset record size due to read and write amplification. The default NFS block size in most distributions of Linux is 64 KiB. With the default ZFS record size of 128K, many writes will result in a 2x amplification.
When NFS issues a write of 64 KiB (often referred to as a logical write), ZFS must first read the old 128 KiB block from the disk (a physical read). It then modifies 64 KiB of the block, recalculates all metadata and the checksum, and saves the full 128 KiB block back to disk (a physical write). This can make the benchmark report a much lower performance than the system is truly capable of. It can be spotted by seeing the raw disk performance outpacing the reported values from the benchmark.
Write Inflation and Its Impact
Similarly, write inflation is when a single block of a file is updated, ZFS must update the indirect block that points to it. Due to copy-on-write, it also updates the parent blocks all the way to the root of the tree.
For each single block change, ZFS updates metadata blocks that track it, sometimes multiple times for data redundancy. So, a 32 KiB change could trigger 3-5 additional 128 KiB writes, especially if changes happen far apart within the file. This inflation increases disk usage and slows down performance, particularly when changes are scattered across a file.
Optimizing Record Sizes and Block Sizes
It is important to select the correct recordsize as well as NFS block size to gain the best possible performance for any given workload.
Configuring NFS Threads for Better Performance
There are a few other details to keep in mind when testing an NFS share.
The default number of NFS threads can vary across Unix-Like operating systems. On NetBSD, the default is 4 threads; FreeBSD has 8 threads per CPU core; on OmniOS, the default is 16; lastly, most Linux distributions have a default of 8 threads.
There are a number of reasons this can be important. Beyond just spreading the workload across more CPU cores, a single NFS server thread may be blocked while performing fsync() or other system calls. If all threads are busy, incoming work is queued for the next available thread. Having more threads may be required to drive enough I/O to saturate the performance of the disks. ZFS also benefits from having a larger volume of work it can aggregate, so having more threads is often better. However, this is only true up to the point where the CPU becomes saturated and a bottleneck. Then, more threads will degrade performance.
Key Differences Between NFS v3 and v4
There are also differences between NFS v3 and v4. With NFS v3, each dataset will be its own mount, requiring additional mountpoints to be configured. By contrast, NFS v4 allows you to operate the entire share through one mountpoint. Another difference is that NFS v3 can be used over UDP, whereas v4 cannot, it will always use TCP. If you are testing from a windows machine, it is important to remember that natively the Windows NFS client does not support NFSv4. Lastly, if you are utilizing the pNFS feature from NFS 4.2+, be sure to design your testing plan accordingly. For more information on pNFS, see our article Deploying pNFS file sharing with FreeBSD.
Benchmarking iSCSI over a network
Similar to NFS, it is important to ensure that the iSCSI block size and ZFS record size (or volblocksize if using a zvol) are matched. The same read and write amplification and inflation performance issues can occur if these are mismatched. With iSCSI this situation can be much worse, as typically the iSCSI block size will be 512 bytes, or 4 KiB (the sector size).
In general, a ZFS record or volblock size of 16 KiB is a good trade-off for iSCSI sectors. While perfectly matching the block sizes has many benefits, using a very small volblocksize of 4 KiB effectively disables ZFS’s transparent compression feature, as the minimum allocation is one sector. Overall performance is often better with a modestly larger record size. A larger record size also reduces the amount of metadata, helping to minimize read and write inflation.
There are a few other minor considerations to keep in mind. Jumbo frames, while critical to performance in the past and still helpful in very specific situations, these days are often more trouble than they are worth. Modern NICs using TSO (TCP Segmentation Offloading, which passes data to the NIC in 64 KiB chunks), iSCSI specific offload features, and hardware acceleration do not require reconfiguring your entire network to get the desired performance.
Similar to the other protocols, a good benchmark will look at the conditions across the entire system, that being the client, the network, and the server. With iSCSI simulating a disk, the workload often involves a higher volume of small operations. As a result, the impact of latency becomes greater, and the need for additional concurrency and proper queuing is keenly felt.
Final Considerations for Optimizing Network Storage
Most good storage benchmarks over the network start by determining the performance of the storage without the network, directly on the server. This baseline value helps determine how much performance is being lost to the network. It also identifies configuration or topology changes that could reduce the network's impact on storage workloads.
One of the most important factors to consider when benchmarking over the network is the additional latency. Packets having to traverse the network will increase both the amount of time, and the variance in that time (jitter). Some of this can be overcome with additional concurrency, to keep more data in flight at once. Yet, depending on the workload, that may not be possible. If you are reading a database, knowing which rows you need to read next will not be possible without the results of the current read operation.
With the right design, planning, and tuning, the network can be made to achieve your performance targets. As with all things, it is just a series of trade-offs between throughput, latency, complexity, and cost. If you are looking to design your next storage and network infrastructure, consider Klara’s Storage Design & Implementation Solution.

JT Pennington
JT Pennington is a ZFS Solutions Engineer at Klara Inc, an avid hardware geek, photographer, and podcast producer. JT is involved in many open source projects including the Lumina desktop environment and Fedora.
Learn About Klara