Pool and VDEV topology for Proxmox workloads

ZFS has become a strategic building block for modern infrastructure teams that want enterprise-class data services without inheriting the classic enterprise vendor lock-in. It delivers the core primitives that allow us to out-class proprietary arrays and hyperconverged stacks—checksummed storage, snapshots, clones, inline data services, and replication—and exposes them as open, tunable components rather than opaque features hidden behind licensing tiers.

Klara continues to work to turn ZFS from "powerful but sharp-edged" into a predictable, engineered platform that can be used confidently to achieve your desired storage outcomes. ZFS gives architects extremely strong guarantees, but only if the pool layout, VDEV topology, SLOG, ARC/L2ARC, and recordsize are matched to workload profiles.

Proxmox is the logical virtualization substrate in this architecture because it treats ZFS as a first-class storage backend rather than an afterthought. VM disks can be provisioned directly on ZFS datasets or zvols, Proxmox snapshots are implemented as native ZFS snapshots, and replication jobs are literally zfs send/zfs receive pipelines driven by the cluster.

That means the same OpenZFS pool can simultaneously serve as the hypervisor datastore, a snapshot/clone factory for Dev/Test, and the replication fabric for DR, without new licensing or hardware dependencies. In that context, optimizing the pool and VDEV topology for VM workloads stops being an internal storage concern and becomes an infrastructure-level design decision that directly defines performance, resiliency, and operability.

Why Pool and VDEV Topology Matter for Proxmox

ZFS organizes storage into pools (zpool) composed of one or more VDEVs; each VDEV is a redundancy group (mirror, RAID-Z1/2/3, dRAID) or an accelerator (special, log, cache). At the pool level, performance is essentially the aggregate of all VDEVs, while reliability is bounded by the weakest VDEV. For Proxmox, where each dataset or zvol typically backs a VM disk, that means:

IOPS and latency are dominated by VDEV type and count.
Failure domains are defined at the VDEV level, not the individual disks.
Expansion happens by adding VDEVs, not "growing a RAID group" in-place (RAID-Z expansion exists but is higher-complexity operation).

In other words, pool design is the storage architecture for your Proxmox cluster; everything else (recordsize, compression, caching) is secondary.

Mirrors vs RAID-Z for Proxmox Workloads

The starting point is choosing between mirror VDEVs and RAID-Z for the primary VM pool.

Mirror VDEVs

Mirror VDEVs (e.g., mirror sdb sdc, mirror sdd sde, …) are generally preferred for high-IOPS, latency-sensitive Proxmox workloads:

Each mirror VDEV can service read requests from either side, so read IOPS scale roughly with the number of disks and mirror groups.
Write IOPS are bounded by the slowest member of each mirror, but writes can be spread across multiple mirror VDEVs, delivering predictable scaling as you add VDEV.
Rebuilds (resilvering) are fast and localized, since data is copied from the surviving mirror member rather than reconstructed from parity across the whole pool.

A typical design for a mixed VM cluster node with six or eight SSDs is a pool like:

zpool create tank mirror /dev/sdb /dev/sdc mirror /dev/sdd /dev/sde mirror /dev/sdf /dev/sdg

This gives three mirror VDEVs, good random I/O characteristics, and simple expansion: add another mirror pair when you add more drives.

RAID-Z VDEVs

RAID-Z (Z1/Z2/Z3) is more capacity-efficient but delivers fewer random IOPS per spindle and slower rebuild behavior, making it better suited to:

Capacity-oriented workloads: bulk VM hosting, archives, or backup targets.
Sequential-heavy workloads: large streaming reads/writes, backup jobs, media workloads.

A classic pool for capacity might look like:

zpool create bulk raidz2 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg

In a Proxmox context, Klara typically recommends aligning with your workload:

Mirror pools for "hot" VM and database storage.
RAID-Z pools for bulk VM repositories, ISO/templates, and backup/archival datasets.

On Proxmox, you then register each pool as separate storage classes (e.g., fast-vmstore, bulk-vmstore) so administrators can place workloads according to latency and capacity needs.

Topology Patterns: OS vs Data and Special VDEVs

A robust Proxmox design usually separates the host OS from the VM storage pool:

Small mirror of SSDs (or even a RAID-1 boot device) for Proxmox VE itself.
Independent ZFS pool(s) for VM/datastore workloads.

That isolation reduces risk during upgrades and simplifies lifecycle operations.

On top of that, ZFS supports specialized VDEV classes that can be used carefully in Proxmox designs:

Special VDEV: store metadata and small blocks on fast NVMe, accelerating metadata-heavy operations, freeing up IOPS on the primary data VDEVs. For VM workloads with many small blocks and snapshots, this can significantly improve responsiveness but introduces a critical dependency: losing a special VDEV can destroy the entire pool.
Dedup VDEV: Locates the dedup table on a specific device, ensuring it has all of the available IOPS from that device and does not contend with other workloads. The dedup table is memory-hungry and can be redirected to the dedup (or failing that, special) VDEVs, however Klara generally discourages dedup for generic VM stores because it is often not worth the small storage savings, compared to the cost in RAM and CPU unless there is a very strong, proven dedup use-case.

These advanced topologies require explicit operational acceptance of the new failure modes; they are powerful tools, not defaults.

Integrating SLOG and Synchronous I/O Semantics

For Proxmox workloads, the critical behavior is how ZFS treats synchronous writes, which are common when:

The guest filesystem is mounted with sync semantics.
Databases, NFS servers, or journaling filesystems inside VMs issue fsync()/O_DSYNC aggressively.

ZFS writes synchronous I/O to the ZIL (ZFS Intent Log). By default, ZIL records live on the main pool VDEVs; adding a SLOG (separate log VDEV) moves them to a dedicated device.

Key points for Proxmox designs:

A SLOG only affects synchronous writes; asynchronous writes bypass it.
For HDD-backed mirror or RAID-Z pools that host sync-heavy VM workloads, a mirrored, power-loss-protected NVMe SLOG is effectively mandatory for acceptable latency.
A typical configuration:

zpool add tank log mirror /dev/nvme0n1 /dev/nvme1n1

This isolates ZIL traffic onto very low-latency devices, reducing write stalls and minimizing fragmentation on the main data VDEVs.

Klara's guidance is to treat SLOG devices as critical infrastructure: use only enterprise-grade NVMes with power-loss protection, mirror them, and size them to the peak synchronous write rate rather than raw capacity (since ZIL is a log, not a full data copy).

ARC, L2ARC, and VDEV-Aware Capacity Planning

Although ARC/L2ARC are not "topology" in the VDEV sense, they are tightly coupled to VDEV design because they determine how much I/O reaches the disks:

ARC lives in RAM and will opportunistically consume memory; in a Proxmox node, you must leave headroom for KVM guests while still giving ZFS enough cache to avoid thrashing.
L2ARC extends ARC onto SSD/NVMe cache devices added as cache VDEVs. Each entry stored in the L2ARC will requires a small header entry in the ARC, so the L2ARC must be sized relative to the available memory, to avoid sacrificing more performant memory for secondary storage.

From a topology standpoint:

For small and medium Proxmox nodes, Klara typically prefers more mirror VDEVs of SSDs and a well-sized ARC over aggressive L2ARC, because L2ARC entries consume ARC metadata and can backfire if RAM is constrained.
L2ARC becomes attractive when:
You have abundant RAM (so ARC hit ratios are already high).
Working sets are larger than RAM but remain cache-friendly.
You can dedicate fast devices purely to read caching.

Example configuration:

zpool add tank cache /dev/nvme2n1

The sizing rule of thumb is to keep L2ARC to a modest multiple of ARC size and to treat it as an optimization to a sound pool design, not a fix for an underprovisioned VDEV layout.

Recordsize and Its Interaction with VDEVs

Recordsize is a dataset-level property, but its value interacts with topology:

Small records on RAID-Z amplify parity overhead and can fragment I/O patterns; large records on random-I/O workloads waste bandwidth and capacity with amplification.
For Proxmox VM images on mirror pools, a 32K recordsize is a good general default:

zfs set recordsize=32K tank/vmstore

For database datasets (either raw zvols or filesystems on ZFS shared to guests), smaller recordsize (e.g., 16K) aligns with common page sizes and improves efficiency, without sacrificing compression:

zfs set recordsize=16K tank/pgdata

For sequential/archival datasets on RAID-Z pools, 1M recordsize maximizes throughput:

zfs set recordsize=1M tank/archive

Klara's practice is to partition Proxmox storage into multiple ZFS datasets with recordsize tuned per workload type, all sitting on top of a pool whose VDEV topology is chosen first for reliability and baseline I/O behavior.

Why Proxmox + ZFS + Klara Is a Coherent Stack Choice

In combination, Proxmox, ZFS, and Klara's engineering methodology form a coherent alternative to proprietary HCI and storage array stacks:

Proxmox provides the virtualization and cluster control plane, with native awareness of ZFS snapshots and replication via zfs send/receive.

ZFS provides the data plane, including integrity, RAID, snapshots, clones, compression, deduplication, and replication.

Klara provides the design and implimentation solutions: mapping VDEV topologies, SLOG and cache devices, and dataset parameters to Proxmox workload profiles, and documenting the operational playbooks for resilvering, expansion, and disaster recovery.

The result is an infrastructure where "pool and VDEV topology" is not an afterthought but a consciously chosen architecture element, delivering predictable behavior under failure and load, and an open platform that customers can understand, audit, and evolve over time.

Topics / Tags

vdev zpool proxmox

Back to Articles

Pool and VDEV Topology for Proxmox Workloads

Additional Resources

Pool and VDEV topology for Proxmox workloads

Why Pool and VDEV Topology Matter for Proxmox

Mirrors vs RAID-Z for Proxmox Workloads

Mirror VDEVs

RAID-Z VDEVs

Topology Patterns: OS vs Data and Special VDEVs

Integrating SLOG and Synchronous I/O Semantics

ARC, L2ARC, and VDEV-Aware Capacity Planning

Recordsize and Its Interaction with VDEVs

Why Proxmox + ZFS + Klara Is a Coherent Stack Choice

Embedded ARM Development Experts

OpenZFS Development & Support

FreeBSD Development & Support

Stay Informed and Make Smart Business Decisions with Klara's Resources

Unlock the Power of OpenZFS, Linux, and FreeBSD with Klara's Open Source Development Experts

Pool and VDEV Topology for Proxmox Workloads

Additional Resources

Pool and VDEV topology for Proxmox workloads

Why Pool and VDEV Topology Matter for Proxmox

Mirrors vs RAID-Z for Proxmox Workloads

Mirror VDEVs

RAID-Z VDEVs

Topology Patterns: OS vs Data and Special VDEVs

Integrating SLOG and Synchronous I/O Semantics

ARC, L2ARC, and VDEV-Aware Capacity Planning

Recordsize and Its Interaction with VDEVs

Why Proxmox + ZFS + Klara Is a Coherent Stack Choice