Improve the way you make use of ZFS in your company.
Did you know you can rely on Klara engineers for anything from a ZFS performance audit to developing new ZFS features to ultimately deploying an entire storage system on ZFS?
ZFS Support ZFS DevelopmentAdditional Articles
Here are more interesting articles on ZFS that you may find useful:
- GPL 3: The Controversial Licensing Model and Potential Solutions
- ZFS High Availability with Asynchronous Replication and zrep
- 8 Open Source Trends to Keep an Eye Out for in 2024
- OpenZFS Storage Best Practices and Use Cases – Part 3: Databases and VMs
- OpenZFS Storage Best Practices and Use Cases – Part 2: File Serving and SANs
Storage is a complex and
important part of any project’s architecture, and it should be planned thoughtfully—ideally,
ahead of time! In this article, we’ll talk about how to understand, measure, and plan for
your storage performance needs.
What Bottlenecks Storage
Performance?
Consider a bottle of soda: the bottle itself is much wider than the
neck, where the cap screws on. But in order to get the soda out, you must pour it through the neck—and
making the bottle wider won’t improve flow if the neck remains the same diameter.
This concept of “bottlenecking” can be used in nearly any computing performance discussion, very much
including storage. There’s typically one of several major factors which limits your performance, and that
factor is what you must improve in order to get higher performance—improving the others will have little or
no effect, much like widening the base of a bottle but leaving its neck the same.
Understanding Throughput
Throughput is by far the most commonly talked-about storage metric—though it’s rarely the most important one.
Typically measured in MiB/sec, throughput is a simple measurement of how rapidly your storage system can
provide requested data to you.
Technically, any measurement of data in
MiB/sec is a measure of throughput. In practice, however, this metric is most commonly used as a measure
of top end throughput—how quickly your system can provide data under the most favorable conditions
possible.
For this reason, although throughput is the simplest metric to understand, it’s usually the least useful one.
A hard drive rated for 200MiB/sec of throughput only achieves that rating for a large block of data, stored
contiguously on disk, and requested all at once.
If your conditions were always that ideal, you likely wouldn’t be bothering with a performance audit in the
first place!
How to Look at IOPS
IOPS is an acronym for Input/Output Operations Per Second.
Technically, IOPS could be viewed as a specialized subset of throughput—you take your throughput measured in
MiB/sec, divide it by your operational block size in MiB, and you get operations per second.
In real usage, IOPS is typically used to represent the “low end”
performance of a storage system. Remember how throughput is typically measured under the best possible
conditions? IOPS is typically measured under the worst: small blocks of data, typically not located in any
particular order or grouping.
The hard drive we used as an example above achieves 200MiB/sec of
throughput when we sequentially access data stored with a 1MiB blocksize—but only 0.8MiB/sec (or less!) of
throughput when asked to store or retrieve non-contiguous data in single sector (4KiB) operations. Divide
that 0.8MiB/sec by 4KiB/operation, and you get just under 205 IOPS.
What About Latency
Latency is the inverse of throughput. Instead of asking “how much data
can we move per second,” latency asks the question “how long will it take to retrieve (or store) a piece of
data once I’ve asked you to?”
Although we typically ask “speed” questions in terms of throughput,
latency is the way in which we experience it. Users don’t really care if data moves through the system at
100MiB/sec or 1GiB/sec—they care about how often they must stare at a “wait” icon, and for how long they
must stare at it before getting what they asked for.
Much like throughput and IOPS, latency is frequently referred to in a
very specific way—and in storage terms, “latency” is most commonly a reference
to application latency. In other words, not just “how long does it take to pull 1MiB of data off
disk” but “how long does it take for my database to return 1MiB of results from a query I submitted.”
When used to refer specifically to hardware, latency most commonly
refers to seek latency of rotational hard drives—the time it takes the head to skip from one track
to another and wait for the target sector to rotate under the head, when reading non-contiguous sectors.
And Then There Was Networking
For projects of any real size or scale, there’s one last bottleneck to
talk about: the network. Accessing storage on another computer or device across a network adds the latency
and throughput limitations of the network itself to those of the storage.
The network increases latency of each operation due to the time it
takes to move a packet across the network, and bottlenecks throughput to that of the network (which is
usually lower than the high-end throughput of any storage device itself).
For binding small block operations—meaning a sequence of operations
which must be fulfilled in sequence, from first to last—network latency also decreases the apparent IOPS.
Putting It All Together
Remember that hard drive that offered 205 IOPS on 4KiB I/O? If we
invert 205 operations per second, we get about 5 milliseconds per operation. But if we put that hard drive
on the other side of a network with a 1ms ping time, we add 2ms to each operation—1ms to request the
operation, and another 1ms to receive the result.
That brings us to seven milliseconds per op instead of five—so even if
we’ve got a full symmetrical gigabit network, we’re down from 205 IOPS to 143 IOPS... and from 0.8 MiB/sec
to 0.56MiB/sec.
If our individual 4KiB operations are non-binding—meaning
it’s okay to request them all at once and receive them in any order—our network stops bottlenecking this
particular example, since we only experience 2ms of delay at the very beginning of issuing our string of
operations and receiving their results, rather than tacking an additional 2ms onto each request.
The network will still bottleneck the drive on top end throughput,
though. The drive’s 200MiB/sec rating for contiguous data accessed in large blocks comes out to about
1.6Gbps—significantly faster than the 1Gbps our network is capable of moving.
Predicting Your Needs By Understanding Your
Workload
Although we’ve shown you how you can always convert IOPS, latency, and
throughput back and forth, you’ve hopefully gotten an idea of when each metric is more useful as a direct
measurement of performance.
In short:
- Throughput most commonly refers to the maximum speed of data transfer, in
MiB/sec - IOPS most commonly refers to the minimum speed of data transfer, in
4KiB operations per second - Latency refers to the time it takes for a single storage request to be
fulfilled—typically, whatever request a user or application actually makes, as opposed to an
arbitrary maximum or minimum
We can further identify the common workloads which bottleneck on each
metric’s most common expression. For databases, latency is everything—the database makes
enormous numbers of very small requests which must be fulfilled in sequential order, so any increase in
latency sharply decreases throughput as well.
For virtual machines, we typically target IOPS—most
requests can be fulfilled at least somewhat out of sequence, but we’ll still need a very large number of
small blocksize operations. Since small blocksize operations go much slower than large blocksize operations,
that means we want to talk IOPS!
For most fileservers, our top metric is typically
throughput. Most data stored on fileservers (pictures, audio, video, office documents, etc) is well over
1MiB in size—so as long as it’s not heavily fragmented on-disk, we should be able to get fairly close to our
highest possible performance.
What about our fourth bottleneck, the network?
The database is significantly slowed down by the additional latency.
The fileserver, on the other hand, is bottlenecked by the network’s throughput limit—and
the virtual machine will generally see a mix of both bottlenecks, depending in large part
on the virtual machine’s own internal workload.
Measuring Your Needs Directly
Ideally, in addition to knowing how each metric impacts your system,
you’ll know what value to target for each. This, dear reader, is the storied and arcane art
of benchmarking!
Throughput is, once again, the most commonly talked about and measured
metric—and the least directly useful one. If all you need to do is move very large files across a network,
and your storage is in relatively good health, you’re probably already saturating (using the full capacity
of) your network, and happy as a clam.
For more challenging workloads, we need to talk about the observable
throughput, IOPS, and latency metrics of your current system, at the application level.
The go-to tool for this task is iostat—or, if you’re using the OpenZFS storage
system, zpool iostat. If we invoke iostat on this production VM system
with the arguments --human –x 1 1 (human readable units, extended detail, one report per
second, only one report) we get the following output:
root@prod0:/# iostat -x 1 1 sda sdb sdc sdd
Linux 5.4.0-90-generic (prod0) 01/17/2023 _x86_64_ (12 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
30.06 0.00 8.84 0.57 0.00 60.53
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util
sda 44.81 1253.39 0.17 0.38 1.45 27.97 104.90 4706.03 0.59 0.56 0.01 44.86 0.01 12.40 0.00 0.00 0.89 1781.65 0.09 5.89
sdb 44.95 1256.98 0.17 0.37 1.44 27.96 105.36 4711.55 0.61 0.58 0.00 44.72 0.01 12.40 0.00 0.00 0.89 1781.59 0.09 5.88
sdc 44.91 1255.96 0.16 0.36 1.46 27.97 105.31 4711.55 0.59 0.55 0.02 44.74 0.01 12.40 0.00 0.00 0.89 1781.61 0.09 5.93
sdd 44.73 1250.56 0.16 0.35 1.48 27.96 104.66 4706.03 0.62 0.59 0.03 44.96 0.01 12.40 0.00 0.00 0.94 1781.69 0.09 6.01
Without a wide enough terminal, these results are clearly a mess—but
you need the extended results to get a clear picture which includes latency and IOPS, not just throughput.
If you can’t get enough columns in your terminal to display them all without wrapping, you might consider
using awk to filter just the ones you want.
Just as importantly, this
single-result iostat command doesn’t show us the current statistics—it shows us the
averages, since last system boot. While this can be useful information, you generally want instead to
monitor these metrics on an instant to instant basis, while its most demanding operations are being
performed. (To get this information, add the -y argument, which skips the initial summary.)
The throughput for each disk in this four-drive system is displayed in
the rkB/s and wkB/s columns—reads and writes in KiB/sec, fairly obviously. IOPS, similarly, are shown in r/s
and w/s—read and write operations issued per second.
Finally, we can see latency in the r_await and w_await columns,
expressed in milliseconds. The busier the system gets, the higher those numbers will go—which won’t make
much impact on a fileserver, but could easily cripple a database!
Remember, however, that these numbers change as the system’s workload
itself changes: the best way to use iostat to measure your storage needs is to run
it while the system is working its hardest, and your users are their unhappiest.
Conclusions
In order to design a successful storage deployment, you need to have a
solid grasp of your own workload and how the major storage metrics—throughput, IOPS, and latency—impact it.
Armed with this knowledge, you can select your individual drives, storage topology, and tunables to match.
At Klara, we believe the vast majority of storage workloads are best
served with OpenZFS—the open source storage platform that gives you the maximum configurability, data
integrity, and maintainability to meet almost any storage performance need while keeping your precious data
safe and secure.
If you’re already an OpenZFS expert, you’ve already got all the tools
you need to design your perfect storage solution—but if you’d like expert guidance, we can help you
benchmark and understand your current storage system, as well as offer configuration and design assistance
for maximum performance and reliability.
Jim Salter
Jim Salter (@jrssnet ) is an author, public speaker, mercenary sysadmin, and father of three—not necessarily in that order. He got his first real taste of open source by running Apache on his very own dedicated FreeBSD 3.1 server back in 1999, and he's been a fierce advocate of FOSS ever since. He's the author of the Sanoid hyperconverged infrastructure project, and co-host of the 2.5 Admins podcast.
Learn About KlaraGetting expert ZFS advice is as easy as reaching out to us!
At Klara, we have an entire team dedicated to helping you with your ZFS Projects. Whether you’re planning a ZFS project or are in the middle of one and need a bit of extra insight, we are here to help!