Improve the way you make use of ZFS in your company.
Did you know you can rely on Klara engineers for anything from a ZFS performance audit to developing new ZFS features to ultimately deploying an entire storage system on ZFS?
ZFS Support ZFS DevelopmentAdditional Articles
Here are more interesting articles on ZFS that you may find useful:
- When RAID Isn’t Enough: ZFS Redundancy Done Right
- ZFS Enabled Disaster Recovery for Virtualization
- Klara at the OpenZFS User & Developer Summit 2025
- How to Set Up a Highly Available ZFS Pool Using Mirroring and iSCSI
- Building Enterprise-Grade Storage on Proxmox with ZFS
If you’re at all interested in the performance of your hard drives or SSDs, you quickly come across three common metrics: IOPS, latency, and throughput. The problem is, nobody ever expends much effort explaining what each metric actually means, and how it relates to your human perception of the “overall speed” of your storage!
Today, we’re going to do our best to educate you on how to think about, measure, and choose your storage based on the real workloads that you need to get through.
Throughput Isn’t Everything
If you only see a single metric being advertised for a drive–particularly a consumer drive–throughput is the one. And more specifically, the “throughput” metric being advertised is maximum throughput–more on that in a bit.
Storage throughput is most commonly measured in MiB/sec, or MebiBytes per second. Even most storage metrics labeled “MB/sec” are actually MiB/sec!
Hold On, What’s a “Mebibyte”?
A proper MB–MegaByte–is a scientific notation unit, referring to precisely 1,000,000 bytes. But that’s not what the computing term “megabyte” originally referred to. When computer engineers first began referring to “kilo” and “mega” bytes, they were actually referring to multiples of 1,024, not multiples of 1,000.
Eventually, unscrupulous marketers realized this offered them an opportunity: they could sell a “100 megabyte” drive that was, in fact, 100 proper power-of-ten megabytes… while only offering 95.4 MiB in the power-of-two (1,024 is 2^10) multipliers that computers actually used.
This discrepancy gets more obnoxious as the numbers get larger: a few years later, it meant that a 1 terabyte drive only offers 931GiB of storage… again, despite technically giving you exactly what it says on the label: one terabyte.
This situation is further complicated by the occasional operating system vendor that decides to match the storage marketers’ craziness, by switching to actual powers-of-ten MB and GB themselves. None of those OS vendors do so consistently–for example, Apple shows MB and GB in Finder, but most of its command line tools in the Terminal use MiB and GiB.
It is, in our opinion, a well-intentioned mistake for OS vendors to try to adopt the storage marketers’ own nonsense and pretend data is stored in powers-of-ten units. While there may be any number of sectors on a drive, the sectors themselves are power-of-two units: typically, either 512 bytes per sector or 4,096 bytes (4KiB) per sector.
At some point, one must depart from the marketers’ fantasies, because the computer memory which the storage loads data into is, itself, necessarily a powers-of-two affair, because that’s what native binary addressing must look like. Attempts to use non-binary addressing schemes are simply inefficient kludges added on after the fact.
For this reason, we’re very pedantic about MB vs MiB. The mismatch between units messes up performance measurements just as much as it does the raw storage measurements–after all, 1MB/sec is only 0.95MiB/sec, just as 1MB is only 0.95MiB!
What Marketers Mean by “Throughput”
When marketers refer to throughput or speed, they generally only offer a single number–maximum throughput, which means the amount of data a drive can read or write per second under the easiest possible workload for that drive.
This, in turn, usually means sequential reads or sequential writes, with no other activity on the drive while the single sequential read or write workload is applied. Modern conventional hard drives generally produce anywhere from 120MiB/sec to 220MiB/sec, when hit with this workload. (Modern solid state drives have considerably more variance–we’ll get to that later.)
The problem with maximum throughput as a single metric is that it doesn’t relate very strongly to either the way that people use hard drives and solid state drives, or to the problems that people normally face with storage performance.
To illustrate this, think about the last desktop or laptop computer you owned which had a single conventional hard drive in it. Do you remember that feeling when you hear the hard drive start to sound like a coffee grinder, and the system became so sluggish that it might take several seconds between clicking a button and a menu opening?
That is not a problem which occurs due to insufficient maximum throughput, nor is it one that can be addressed by increasing maximum throughput.
The reason that conventional hard drive began to sound like a coffee grinder is that it had to perform a lot of head seeks, each of which produces a tiny noise. Each seek introduces additional latency between operations, and as a result, your “200MiB/sec” drive can easily only be producing 800KiB/sec throughput when you get mad at it!
With solid state drives, you lose the head seeks and you do improve performance under the exact same conditions–but you may still only be seeing 50MiB/sec from a drive which advertises itself as a “650MiB/sec” drive. The reason for the slowdown is different–now we’re looking at a lack of parallelization between discrete media banks inside the drive, instead of looking at head seeks–but the result is the same.
In theory, we could just talk about throughput on every workload one might throw at a drive–but it’s more productive to track separate metrics which can bottleneck throughput, instead. Let’s talk about one of those next.
IOPS
IOPS is an acronym for Input/output Operations Per Second, and can be considered a sort of “parent” metric to throughput under strictly academic terms. If you hit a storage queue with 200 requests to read exactly 1MiB of data each, and that drive completes those requests in precisely one second, you’re looking at 200MiB/sec and 200 IOPS.
However, if you change the operation size, you will change your results right along with it. 200 IOPS and 200MiB/sec for 1MiB random I/O is a pretty typical result for a modern conventional hard drive–and, importantly, it’s bottlenecking on IOPS.
If we ask the same conventional drive to load the same data–but this time we ask for it in slightly more than fifty-one thousand 4KiB operations, instead of only two hundred 1MiB operations–we’re still going to hit 200 IOPS, but now we’re only looking at about 800KiB/sec throughput!
This is because the conventional drive bottlenecks on IOPS across its entire performance range. The real-world limiting factor here is drive head seeks–every time the drive needs to change what track a head is reading from or writing to, a few fractions of a second are lost. Even without the seeks, the drive is limited to the number of sectors passing beneath the stationary head per second, which in turn is determined by the spindle speed of the drive.
If we instead examine the performance profile of a modern solid state drive, we might see 600MiB/sec (and therefore 600 IOPS) with 1MiB operations, and 50MiB/sec (and therefore 12,800 IOPS) with 4KiB operations.
The SSD is a more interesting case than the conventional drive, because it experiences different bottlenecks at different operation sizes. With 1MiB operations, this drive bottlenecks at 600MiB/sec because that’s as much data as the SATA controller it’s connected to can handle–which, in turn, is usually because the SATA controller only has a single PCIe lane feeding it!
On the lower end, with 4KiB operations, the same drive is bottlenecking on IOPS instead, which causes it to produce much less than its maximum throughput. In this case, the problem driving the bottleneck is decreased parallelization between physical banks of media inside the drive–an SSD is more like a RAID array in a very small box than it is like a single drive, and it can only achieve its maximum performance if the controller inside the SSD can evenly split a large workload among all available internal media banks.
Implied Metrics
Now that we’ve learned about the difference between “top end” performance when using large individual operations, and “low end” performance when using small individual operations, we can talk about how professionals casually use throughput and IOPS as individual metrics.
From a strictly academic perspective, there is no difference between throughput and IOPS–IOPS is simply throughput divided by operation size, and without specifying the workload, neither tells you much.
A conventional drive producing 200 IOPS might be giving you 800KiB/sec, or 200MiB/sec. Similarly, a solid state drive “only” delivering 2,000 IOPS is awful if those are 4KiB operations, but spectacular if it’s delivering 1MiB operations that quickly!
As a shortcut, storage professionals tend to refer to “throughput” when they’re talking about top-end performance with large operations and minimal seeks or other bottlenecks, and “IOPS” when they’re talking about small operations and otherwise maximally-bottlenecked workloads.
While there’s technically no real difference between the two measurements, having an implied shortcut to refer to either “low-end” or “high-end” performance without typing out entire paragraphs the way we’re doing here has a real appeal.
Latency
Finally, we have latency. Latency is, simply put, the time elapsed between the beginning of an operation and its completed result. On the surface, it might seem pointless to worry about it separately from throughput–after all, if you know how many MiB/sec you’re getting, you also know how many seconds it takes to move 1MiB, so why bother?
There are a few answers to this. The first is that latency is the one metric that directly affects human perception of performance. Let’s take a moment to unpack that.
Human Perception of Performance
One of the first performance metrics most of us encounter is the simple progress bar, such as the one you might see when copying a file. A typical modern file-copy progress bar offers several pieces of related information: the size of the file(s) being copied, how much remains to be copied, the current throughput, and the estimated time to completion.
Although that progress bar offers a relative wealth of information, it all relates to a single event: when the operation will complete. This brings us right back to latency–the time it takes in between beginning an operation, and its completion. Although you might stare at a long-running progress bar and fret about the currently displayed throughput, the only reason you care about that throughput is its bearing on–you guessed it–when the task will complete!
Now, let’s talk about the various types of latency one might encounter or measure.
Network Latency
Anyone who plays multiplayer online games knows at least a little something about latency–network latency, specifically, as measured by “ping times.” This is a measure of how long it takes to receive a single-packet reply from a single-packet message sent to a remote target.
If you’ve got a high ping on a particular server, you know that gaming on that server isn’t going to be the best. The extra round-trip delay across the network means there’s a lag in between when you click your mouse and when your character performs the action–and a few milliseconds here or there can be crucial, depending on the type of game you’re playing.
Storage Latency
Latency isn’t just important for network applications–it’s crucial to storage as well.
On mechanical hard drives, seek latency is a frequently-discussed statistic–it refers to the amount of time it takes a hard drive head to move from one track to another track in order to address the desired sector.
Even when the head of a mechanical drive is already positioned on the right track, there is still some latency involved in waiting for the rotation of the platter to bring the correct sector underneath the head for reading or writing. This is a much smaller value, but still contributes to both higher latency and lower total throughput!
Solid state drives don’t have physical heads to move around, and thus don’t suffer from seek latency specifically–but they also have latency issues. It still takes a finite amount of time to address a particular cell, read (or write) its value, and verify the result. There can also be added latency if the flash translation layer (FLT) needs to perform garbage collection, or other background operations that can seem to inject random latency spikes at inopportune times.
Naively, one might expect that fragmentation is thus not an issue on solid state media–if you don’t have to reposition a head, it doesn’t matter where the data is stored, so fragmented data should read as quickly as unfragmented, right?
Unfortunately, no–underneath the shell, a NAND flash SSD is more like a RAID array that you aren’t allowed to manage than it is like a single drive. And in order to get the maximum throughput out of the drive, any given write must be parallelized across multiple banks of physical media.
When you write small, heavily fragmented data to an SSD–particularly when you write it synchronously, demanding it be committed to the metal as rapidly as possible–you run a much greater risk of writing that data unevenly distributed across those internal physical banks that you have no control over. And when that happens, it’s like reading (or writing) all of your data to a single disk in an eight-drive array: you get much less throughput than you naively expected.
Application Latency
And now, we get to my favorite part–application latency is a higher-level metric than we’ve discussed so far, but it’s the only one that you and your users are likely to actually see and care about when you aren’t actively trying to take measurements.
Before attempting to define application latency, let’s look at an example: loading and rendering a web page in a browser.
At timestamp 00:00:00, we issue the command to browse to an URL. At timestamp 00:00:05, we begin receiving data from remote sources. At timestamp 00:00:50, we have enough data for the browser to begin rendering the page, and at timestamp 00:01:00 all elements have finished loading and rendering inside the page.
The application latency of that web page load and render is precisely one second–the difference between 00:00:00 (when we asked for the page) and 00:01:00 (when the page finished rendering in the browser).
But application latency is a much higher-order function than simple network latency. We can guess that our network latency is on the order of 50ms, since we began to receive data at 00:00:05–although some of that could be our request sitting in a queue at the remote server.
But our application latency of 1000ms is the one that we actually see, experience, and get frustrated by–and it’s a function of the network latency, the network throughput (depending on the size of the remote resources requested), and the parallelization or serialization of the total number of requests that must be issued.
Depending on what web page we asked for, our “simple, single” URL load request will typically require fetching hundreds or even thousands of much smaller individual resources, often from multiple remote servers–for example, huge numbers of websites use fonts and javascript libraries hosted by Google, rather than copying those resources directly to the local webserver and serving them directly.
In many cases, those requests can all be issued simultaneously–in which case, network latency only slows you down by a factor of one. But in many other cases, you don’t know which resources to request until after the first few have already been loaded, and/or some logic checks have been performed. This is what we mean when we talk about serialized requests.
If you request 500 remote resources at the same time, with a network latency of 50ms, that 50ms is all the impact that your raw network latency has. But if you can’t make request 2 until request 1 has finished loading, or request 500 until 499 has finished… now your 50ms raw network latency alone pushes your application latency well over 25 full seconds!
Similarly, if you’re trying to load a lot of data–perhaps you want to download a Linux ISO–your network throughput may be the bottleneck increasing your application latency. On the same network with 50ms network latency, you might have a maximum throughput of 500Mbps. 500Mbps equates to roughly 50iMB/sec, so you’ll need 50ms (network latency) plus 102.4 seconds (5GiB/sec * 1024 MiB/GiB * 1sec / 50MiB) to download that ISO.
The most important thing to note about this final example is that even though the operation was primarily throughput constrained, we can still talk most meaningfully about its application latency. The direct reason for a frustrated user to cancel this download is not because the throughput is low–it’s because the download is taking too long.
Conclusions
To manage any constrained computing resource effectively, an administrator must understand both the technical and human factors relevant to those resources.
Although we most frequently discuss raw metrics like throughput and IOPS when discussing storage, we must remember that end users don’t actually care about them–nor do they care about raw storage latency.
As administrators, our ultimate goal is to create a reliable, performant environment which enables end users to do what they need or want to do on the computer–which means not only understanding low-level, raw metrics like IOPS, storage throughput, or network latency, but how they all combine to limit the application latency that our users (and we ourselves) experience and may be frustrated with.
To get expert assistance monitoring and optimizing your storage system, consider the Klara ZFS Performance Analysis solution. A comprehensive review of workload, environment, and tuning can greatly improve perceived performance from the users’ perspective, and observing the most important metrics can provide key insights into how to improve user experience.

Jim Salter
Jim Salter (@jrssnet ) is an author, public speaker, mercenary sysadmin, and father of three—not necessarily in that order. He got his first real taste of open source by running Apache on his very own dedicated FreeBSD 3.1 server back in 1999, and he's been a fierce advocate of FOSS ever since. He's the author of the Sanoid hyperconverged infrastructure project, and co-host of the 2.5 Admins podcast.
Learn About Klara




