OpenZFS – Auditing for Storage Performance
Storage is a complex and important part of any project’s architecture, and it should be planned thoughtfully—ideally, ahead of time! In this article, we’ll talk about how to understand, measure, and plan for your storage performance needs.
What Bottlenecks Storage Performance?
Consider a bottle of soda: the bottle itself is much wider than the neck, where the cap screws on. But in order to get the soda out, you must pour it through the neck—and making the bottle wider won’t improve flow if the neck remains the same diameter.
This concept of “bottlenecking” can be used in nearly any computing performance discussion, very much including storage. There’s typically one of several major factors which limits your performance, and that factor is what you must improve in order to get higher performance—improving the others will have little or no effect, much like widening the base of a bottle but leaving its neck the same.
Understanding Throughput
Throughput is by far the most commonly talked-about storage metric—though it’s rarely the most important one. Typically measured in MiB/sec, throughput is a simple measurement of how rapidly your storage system can provide requested data to you.
Technically, any measurement of data in MiB/sec is a measure of throughput. In practice, however, this metric is most commonly used as a measure of top end throughput—how quickly your system can provide data under the most favorable conditions possible.
For this reason, although throughput is the simplest metric to understand, it’s usually the least useful one. A hard drive rated for 200MiB/sec of throughput only achieves that rating for a large block of data, stored contiguously on disk, and requested all at once.
If your conditions were always that ideal, you likely wouldn’t be bothering with a performance audit in the first place!
Did you know?
Our OpenZFS Support team can help your business investigate how OpenZFS is performing in your environment.
Whether it’s simple slow storage issues, or something more complex, our team can audit these issues and provide a full report and solution sheet for your infrastructure.
How to Look at IOPS
IOPS is an acronym for Input/Output Operations Per Second. Technically, IOPS could be viewed as a specialized subset of throughput—you take your throughput measured in MiB/sec, divide it by your operational block size in MiB, and you get operations per second.
In real usage, IOPS is typically used to represent the “low end” performance of a storage system. Remember how throughput is typically measured under the best possible conditions? IOPS is typically measured under the worst: small blocks of data, typically not located in any particular order or grouping.
The hard drive we used as an example above achieves 200MiB/sec of throughput when we sequentially access data stored with a 1MiB blocksize—but only 0.8MiB/sec (or less!) of throughput when asked to store or retrieve non-contiguous data in single sector (4KiB) operations. Divide that 0.8MiB/sec by 4KiB/operation, and you get just under 205 IOPS.
What About Latency
Latency is the inverse of throughput. Instead of asking “how much data can we move per second,” latency asks the question “how long will it take to retrieve (or store) a piece of data once I’ve asked you to?”
Although we typically ask “speed” questions in terms of throughput, latency is the way in which we experience it. Users don’t really care if data moves through the system at 100MiB/sec or 1GiB/sec—they care about how often they must stare at a “wait” icon, and for how long they must stare at it before getting what they asked for.
Much like throughput and IOPS, latency is frequently referred to in a very specific way—and in storage terms, “latency” is most commonly a reference to application latency. In other words, not just “how long does it take to pull 1MiB of data off disk” but “how long does it take for my database to return 1MiB of results from a query I submitted.”
When used to refer specifically to hardware, latency most commonly refers to seek latency of rotational hard drives—the time it takes the head to skip from one track to another and wait for the target sector to rotate under the head, when reading non-contiguous sectors.
And Then There Was Networking
For projects of any real size or scale, there’s one last bottleneck to talk about: the network. Accessing storage on another computer or device across a network adds the latency and throughput limitations of the network itself to those of the storage.
The network increases latency of each operation due to the time it takes to move a packet across the network, and bottlenecks throughput to that of the network (which is usually lower than the high-end throughput of any storage device itself).
For binding small block operations—meaning a sequence of operations which must be fulfilled in sequence, from first to last—network latency also decreases the apparent IOPS.
Putting It All Together
Remember that hard drive that offered 205 IOPS on 4KiB I/O? If we invert 205 operations per second, we get about 5 milliseconds per operation. But if we put that hard drive on the other side of a network with a 1ms ping time, we add 2ms to each operation—1ms to request the operation, and another 1ms to receive the result.
That brings us to seven milliseconds per op instead of five—so even if we’ve got a full symmetrical gigabit network, we’re down from 205 IOPS to 143 IOPS… and from 0.8 MiB/sec to 0.56MiB/sec.
If our individual 4KiB operations are non-binding—meaning it’s okay to request them all at once and receive them in any order—our network stops bottlenecking this particular example, since we only experience 2ms of delay at the very beginning of issuing our string of operations and receiving their results, rather than tacking an additional 2ms onto each request.
The network will still bottleneck the drive on top end throughput, though. The drive’s 200MiB/sec rating for contiguous data accessed in large blocks comes out to about 1.6Gbps—significantly faster than the 1Gbps our network is capable of moving.
Predicting Your Needs By Understanding Your Workload
Although we’ve shown you how you can always convert IOPS, latency, and throughput back and forth, you’ve hopefully gotten an idea of when each metric is more useful as a direct measurement of performance.
In short:
- Throughput most commonly refers to the maximum speed of data transfer, in MiB/sec
- IOPS most commonly refers to the minimum speed of data transfer, in 4KiB operations per second
- Latency refers to the time it takes for a single storage request to be fulfilled—typically, whatever request a user or application actually makes, as opposed to an arbitrary maximum or minimum
We can further identify the common workloads which bottleneck on each metric’s most common expression. For databases, latency is everything—the database makes enormous numbers of very small requests which must be fulfilled in sequential order, so any increase in latency sharply decreases throughput as well.
For virtual machines, we typically target IOPS—most requests can be fulfilled at least somewhat out of sequence, but we’ll still need a very large number of small blocksize operations. Since small blocksize operations go much slower than large blocksize operations, that means we want to talk IOPS!
For most fileservers, our top metric is typically throughput. Most data stored on fileservers (pictures, audio, video, office documents, etc) is well over 1MiB in size—so as long as it’s not heavily fragmented on-disk, we should be able to get fairly close to our highest possible performance.
What about our fourth bottleneck, the network? The database is significantly slowed down by the additional latency. The fileserver, on the other hand, is bottlenecked by the network’s throughput limit—and the virtual machine will generally see a mix of both bottlenecks, depending in large part on the virtual machine’s own internal workload.
Measuring Your Needs Directly
Ideally, in addition to knowing how each metric impacts your system, you’ll know what value to target for each. This, dear reader, is the storied and arcane art of benchmarking!
Throughput is, once again, the most commonly talked about and measured metric—and the least directly useful one. If all you need to do is move very large files across a network, and your storage is in relatively good health, you’re probably already saturating (using the full capacity of) your network, and happy as a clam.
For more challenging workloads, we need to talk about the observable throughput, IOPS, and latency metrics of your current system, at the application level.
The go-to tool for this task is iostat—or, if you’re using the OpenZFS storage system, zpool iostat. If we invoke iostat on this production VM system with the arguments –human –x 1 1 (human readable units, extended detail, one report per second, only one report) we get the following output:
root@prod0:/# iostat -x 1 1 sda sdb sdc sdd
Linux 5.4.0-90-generic (prod0) 01/17/2023 _x86_64_ (12 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
30.06 0.00 8.84 0.57 0.00 60.53
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util
sda 44.81 1253.39 0.17 0.38 1.45 27.97 104.90 4706.03 0.59 0.56 0.01 44.86 0.01 12.40 0.00 0.00 0.89 1781.65 0.09 5.89
sdb 44.95 1256.98 0.17 0.37 1.44 27.96 105.36 4711.55 0.61 0.58 0.00 44.72 0.01 12.40 0.00 0.00 0.89 1781.59 0.09 5.88
sdc 44.91 1255.96 0.16 0.36 1.46 27.97 105.31 4711.55 0.59 0.55 0.02 44.74 0.01 12.40 0.00 0.00 0.89 1781.61 0.09 5.93
sdd 44.73 1250.56 0.16 0.35 1.48 27.96 104.66 4706.03 0.62 0.59 0.03 44.96 0.01 12.40 0.00 0.00 0.94 1781.69 0.09 6.01
Without a wide enough terminal, these results are clearly a mess—but you need the extended results to get a clear picture which includes latency and IOPS, not just throughput. If you can’t get enough columns in your terminal to display them all without wrapping, you might consider using awk to filter just the ones you want.
Just as importantly, this single-result iostat command doesn’t show us the current statistics—it shows us the averages, since last system boot. While this can be useful information, you generally want instead to monitor these metrics on an instant to instant basis, while its most demanding operations are being performed. (To get this information, add the –y argument, which skips the initial summary.)
The throughput for each disk in this four-drive system is displayed in the rkB/s and wkB/s columns—reads and writes in KiB/sec, fairly obviously. IOPS, similarly, are shown in r/s and w/s—read and write operations issued per second.
Finally, we can see latency in the r_await and w_await columns, expressed in milliseconds. The busier the system gets, the higher those numbers will go—which won’t make much impact on a fileserver, but could easily cripple a database!
Remember, however, that these numbers change as the system’s workload itself changes: the best way to use iostat to measure your storage needs is to run it while the system is working its hardest, and your users are their unhappiest.
Conclusions
In order to design a successful storage deployment, you need to have a solid grasp of your own workload and how the major storage metrics—throughput, IOPS, and latency—impact it. Armed with this knowledge, you can select your individual drives, storage topology, and tunables to match.
At Klara, we believe the vast majority of storage workloads are best served with OpenZFS—the open source storage platform that gives you the maximum configurability, data integrity, and maintainability to meet almost any storage performance need while keeping your precious data safe and secure.
If you’re already an OpenZFS expert, you’ve already got all the tools you need to design your perfect storage solution—but if you’d like expert guidance, we can help you benchmark and understand your current storage system, as well as offer configuration and design assistance for maximum performance and reliability.