Announcement

Upcoming Webinar: Cost-Efficient Storage on TrueNAS with Fast Dedup  Learn More

Klara

Fiveyear storage design is where OpenZFS stops being “just a filesystem” and becomes a strategic control plane for cost, risk, and hardware independence. Over that time horizon, the real questions are how you will refresh media, rebalance pools, and change hardware generations without replatforming your data—and those are exactly the problems OpenZFS was built to solve and that Klara systematically addresses in its OpenZFS storage design work. 

Why Think in Fiveyear Horizons 

Storage arrays are often bought with a threeyear budget but live for five, seven, or more years. In that period, you will experience: 

  • Capacity growth from backup, analytics, logs, and AI workloads landing on the same pool. 
  • At least one full drivegeneration shift (larger HDDs, denser NVMe, new endurance profiles, changing prices). 
  • Realworld failures: failing drives, media defects, firmware regressions, exhausted endurance, and latent sector errors exposing bit-rot. 

If you design only for day-one capacity and performance, you end up with brittle pools that are difficult to expand or refresh without disruptive, adhoc migrations. A five-year storage design instead treats OpenZFS features—VDEV layout, send/receive, checksumming, replication—as tools to deliberately manage change, not just to survive it. 

OpenZFS as Hardware‑Independent Control Plane 

OpenZFS organizes storage into pools made of VDEVs (virtual devices), and each VDEV groups one or more physical devices into a redundancy set (mirrors or RAIDZ). The pool stripes data across VDEVs; each VDEV defines its own failure domain and performance profile. However you cannot change a VDEV from RAIDZ1 to RAIDZ2 later; those failuredomain decisions must be correct on day-one. 

Four properties make this model ideal for fiveyear planning: 

  • Redundancy lives at the VDEV level: ZFS can scale up both capacity and performance by adding more VDEVs over time.  
  • Pools can also be grown by replacing devices with larger ones, providing an incremental path for expansion and both media and technology refresh. 
  • ZFS is portable at every layer. A pool can be moved to a new host simply by connecting the drives, even if the new host uses a different CPU architecture. OpenZFS is available for multiple operating systems and is storage technology agnostic. 
  • ZFS replication allows data to easily and safely be copied or migrated to other devices, ensuring the filesystems and objects storage on ZFS remain intact through multiple hardware lifecycles.  
  • Data is protected endtoend by copyonwrite and checksumming, meaning every block is validated during scrubs and resilvers, which is crucial as disk capacities grow but URE (Uncorrectable Read Error) rates do not improve at the same pace. 

This design deliberately separates data integrity and layout policy from any particular controller or chassis vendor, and Klara’s ZFS storage design offering builds directly on that separation to keep you out of proprietary corners. 

Mirrors vs RAIDZ Over a Fiveyear Horizon 

The mirrorversusRAIDZ decision is about capacity efficiency, performance, and how you want expansion and resilver behavior to look several years out. 

Mirrors 

Twoway or threeway mirrors are often used for performancesensitive workloads (VMs, databases, latencysensitive services). Properties: 

  • Strong random I/O performance and predictable latency. 
  • Simple failure behavior: losing one side of a mirror keeps the VDEV online. 
  • Flexible expansion: you can add mirror VDEVs as small, predictable units. 
  • Resilvers are efficient because only inuse blocks need to be copied to the replacement device, and the rebuild can be done sequentially. 

The tradeoff is space efficiency: you pay more raw capacity per usable terabyte. Over five years, mirrors pay for themselves where operational risk and performance matter more than the last bit of capacity efficiency. 

RAIDZ2 / RAIDZ3 

RAIDZ2 and RAIDZ3 provide paritybased redundancy and better usable capacity, well suited to streaming and capacityoriented workloads (backup, archive, media, some analytics). Properties: 

  • Higher usable capacity per raw TB than mirrors. 
  • Multidisk fault tolerance per VDEV. 
  • Good sequential throughput for large I/O. 

Tradeoffs: 

  • Expansion is coarsegrained: you add full RAIDZ VDEVs, not single disks. 
  • With large disks (24 TB, 30 TB and beyond), resilvers take longer, keeping the VDEV in a degraded state for extended periods and stressing remaining disks. This pushes you toward RAIDZ2/3 rather than RAIDZ1 and toward moderate VDEV widths. 

Over a fiveyear window, a good-to-implement pattern is: 

  • Mirrors for VM and database pools. 
  • RAIDZ2 or RAIDZ3 with moderate VDEV width for capacity pools. 
  • Consistent VDEV geometry inside each pool to keep performance and resilver behavior predictable. 

Designing pools for expansion and refresh 

The most important fiveyear decision is not the pool size on day-one—it is the VDEV geometry and pool topology that control how you will grow and refresh. 

Key practices: 

  • Fix VDEV width based on resilver risk, not just capacity efficiency (for example, 6–8 disks in a RAIDZ2 VDEV rather than 12+ wide). 
  • Use multiple similar VDEVs instead of a few giant ones; this adds parallelism, shortens resilvers, and makes future expansion increments smaller and more predictable. 
  • Separate workloads with very different I/O profiles into distinct pools instead of mixing everything into one huge pool. 

For example: 

  • tank_vm: 10–12 × 2way mirrored VDEVs of NVMe for virtualization and databases. 
  • tank_cap: 8 × 8wide RAIDZ2 VDEVs of large HDDs, possibly with SSD special VDEVs for metadata and small files, for backup and file workloads. 

This gives you two independent levers: scale performance pools by adding mirrors; scale capacity pools by adding RAIDZ VDEVs or entire disk shelves. 

Media refresh as a firstclass lifecycle 

Over five years, media refresh is not a rare event; it is a recurring operation. You will: 

  • Replace failed disks. 
  • Proactively retire aging devices. 
  • Introduce larger disks or newer NVMe generations. 
  • Refresh mirrors. 

Mirrors give you the cleanest refresh story. A typical refresh workflow per disk: 

  • Add a new disk to the system. 
  • Perform an online replacement of the old disk at the pool level: zpool replace tank <old-disk-id> <new-disk-id> 
  • Monitor zpool status and system metrics during resilver to verify there are no read errors and dips in performance. 
  • Once resilver completes and a scrub passes cleanly, schedule the next replacement. 

As you replace all members of a mirror with larger drives, OpenZFS will, once every device in that VDEV is larger, expand the VDEV’s usable size. Over years 3–5 this gives you a staged capacity uplift without disruptive migrations. 

Refreshing RAIDZ 

RAIDZ refresh requires more planning: 

  • All devices in a VDEV must be replaced with larger ones before the VDEV’s logical capacity increases. 
  • Each resilver reads the entire allocated space of the VDEV, so resilver times increase with both disk size and occupancy. 

For large disks and multiyear operation, practical safeguards are: 

  • Use RAIDZ2 or RAIDZ3 on large drives; RAIDZ1 increasingly exposes you to doublefailure risk, especially during long resilvers. 
  • Keep VDEV width moderate to keep resilvers within acceptable windows. 
  • Refresh in small, controlled batches (for example, rotating replacements across multiple VDEVs) so you never have too many disks in a single VDEV simultaneously degraded. 

Newer OpenZFS features, including ongoing work on improved resilver algorithms and sequential replacement are aimed at making these workflows safer on very large disks, which will only become more relevant over a fiveyear horizon. 

Rebalancing and Growth Without Downtime 

When you add VDEVs to a pool, OpenZFS does not retroactively redistribute all existing data to achieve a perfect balance. New writes will bias toward emptier VDEVs, but older data stays where it is. Over several years of incremental growth, this can leave early VDEVs heavily utilized while later VDEVs are comparatively empty. 

There are two complementary ways to handle this. 

Symmetric Expansion 

Grow the pool in regular, symmetric increments. For example: 

  • Always add two identical RAIDZ2 VDEVs at a time. 
  • Always add mirror VDEVs in sets that match your chassis or fault domain layout. 

If you maintain reasonable free space (for example, staying under 70–80% pool fullness), you generally do not need perfect balance; consistent increments keep I/O and capacity skew within acceptable bounds. 

Intentional Rebalancing Via Send/Receive 

For larger adjustments, use ZFS’ replication primitives: 

  • Use zfs send -R pool/fs@snap | zfs receive pool/newfs to relocate datasets or entire filesystem hierarchies. 
  • Validate, then swap mounts or update consumers to point at the new datasets and retire the old ones. 

This datasetcentric approach depends on sane dataset boundaries and naming from day-one. When datasets map cleanly to workloads (for example, one toplevel dataset per application or tenant), rebalancing and crosspool migrations over time become routine operations, not bespoke projects. 

With OpenZFS 2.4, the zfs rewrite command was introduced, which allows an administrator to instruct ZFS to rewrite a specific file (or recursively rewrite an entire directory). This will cause the data to be rebalanced across the free space on all VDEVs and ensure that when this data is read later, all VDEVs contribute to the read performance of that data. 

Hardware Independence by Design 

A fiveyear plan must assume hardware churn: controllers change, drive models are discontinued, supply chains shift, and sometimes whole vendors exit a product line. The design objective is to treat hardware as largely interchangeable beneath the OpenZFS layer. 

Key principles: 

  • Avoid hardware RAID under ZFS. Present raw disks to ZFS via HBAs so ZFS can see errors, manage redundancy, and control scrubs and resilvers. 
  • Prefer commodity controllers and backplanes with good mainline OS support over proprietary abstractions that hide drives or require vendorspecific tooling. 
  • Design pools and service layout so you can mix generations: introduce a new chassis with newer drives, stand up a new pool, and migrate datasets over time using send/receive. 

This is where OpenZFS’ separation of logical layout and physical devices pays off: you can change vendors, change media, and even change entire nodes while keeping the logical representation of your storage (datasets, snapshots, replication topology) consistent. Refresh your hardware without having to replatform your data. 

When you are ready to plan your next storage refresh, contact Klara to discuss your requirements and ensure your storage will serve you well for the next five years and beyond.

Topics / Tags
Back to Articles