How to Set Up a Highly Available ZFS Pool Using Mirroring and iSCSI

In this article, we look into how to leverage iSCSI with ZFS to implement a redundant storage solution that allows for rapid failover between systems. In particular, we consider the case of running an NFS service to provide network access to your data.

In previous articles, we have discussed other highly available storage solution options. For example, in High Availability with Asynchronous Replication and zrepl, we considered the approach of using ZFS replication. While simple and fast, it allows a write to complete on the primary system before being copied to the replica. This approach has limitations for NFS services: after a failover, clients may see a changed filesystem and will mark their mounts as stale. We also looked at a more native NFS solution in Efficient File Sharing with pNFS. Support for pNFS in operating systems is still relatively new, and its use involves compromises in how data is stored on the server.

Reliable shared block storage makes for a great foundation on which to build a highly available service. This is typically associated with Storage Area Networks (SAN)s accessed via Fibre Channel, but can also be built from commodity hardware using standard IP networking for connectivity. The Small Computer System Interface (SCSI) is a protocol for communicating with attached I/O devices, and while once used for pretty much all external peripherals, including things like scanners, it is now primarily associated with storage devices. iSCSI is a well-supported SAN protocol that sends SCSI commands over an IP network. This allows a physically remote disk to resemble a locally attached disk. Other similar protocols include FreeBSD’s ggated(8) utility, which is part of the GEOM subsystem, and NVMe over Fabrics (oF), which will be available in FreeBSD 15.

The Basic Concept

A common pattern when using ZFS in combination with iSCSI is to create volumes in a ZFS pool that are then exported over iSCSI. This is particularly common where the volume is used for VM images. We will instead flip this and build a ZFS pool on top of iSCSI block devices. With ZFS mirroring, we create a mirror that spans physical servers.

For simplicity, we will describe a setup with just two servers, each containing a locally attached disk. The ZFS mirror will consist of one local disk and one remote disk.

For the provision of the actual service, NFS in this example, we will use one or other of the servers with a virtual IP. It is convenient to package this up in a form that is easy to start and stop. A Bhyve VM is a tempting choice, but FreeBSD 14 added support for running the kernel NFS server in a lightweight VNET jail (container), so we’ll instead use that. It would also be possible to run the NFS service directly on the host system.

Server and Network Setup

We’ll assume that each of our servers has three available network interfaces: one for normal use by the host system, one dedicated to the VNET jail, and a third used for a private interconnect to the other server.

As an aid to clarity, we can give the network interfaces meaningful names. So, for the jail dedicated to the NFS jail, we might use the following in /etc/rc.conf:

ifconfig_mce1="name nfs0"

Having a dedicated private network connecting the servers is not a hard requirement from a technical perspective, but it is good practice. iSCSI traffic is unencrypted, so providing it with an isolated network is a way to protect it. If using separate physical switches is not an option, a dedicated non-routable VLAN is also a good choice.

For iSCSI traffic, it can help performance to enable jumbo frames to use larger network packets. To work, all devices on a link must enable it. This is thus another advantage of the dedicated network link. If in doubt, use ping -D -s 8900 to test the link. For the interface associated with our private interconnect, let’s identify the name of the peer system in the interface name and set the MTU to 9000 for jumbo frames:

ifconfig_mce2="name privb0 mtu 9000 up"
ipv4_addrs_privb0="192.0.2.1/29"

Unlike the nfs0 interface, the host system uses this link, so we also specify “up” to make the interface administratively up and provide an IP address. It’s also helpful to give such additional IP addresses names in /etc/hosts, for example:

192.0.2.2   serverb-priv

In a real deployment, you will likely have to adapt the network setup. You may be using link aggregation. If you don’t have a separate interface available for the NFS service, you may want to share the host’s interface with a bridge or via SR-IOV. Or with more than two servers, you may need additional private links. If you have many network interfaces on a server, it can be helpful to install and enable lldpd. This uses a Layer 2 network protocol to identify the peer on a network link, which can be helpful when diagnosing issues with connectivity.

iSCSI Configuration

The iSCSI implementation in the operating system consists of two components - the target, which is the server component, and the initiator, which is the client. We’re going to configure these on both of the servers.

Target

To begin with, we’re concerned with the target, which involves the ctld(8) daemon on FreeBSD.

The configuration for ctld comes from /etc/ctl.conf. We first define the auth-group and portal-group sections to hold authentication and network options. Then we define a target that associates LUNs with those two groups. LUN is a term for a numbered disk device (logical unit number). You can configure more than one target if needed, but if you just want to export multiple disks, a simpler approach is to list additional LUNs under a target.

An example configuration is as follows:

auth-group ag0 {
    chap iscsiuser secret
}

portal-group pg0 {
    discovery-auth-group ag0
    listen 192.0.2.1
    listen localhost
}

target iqn.2025-01.com.klarasystems:servera {
    portal-group pg0 ag0 

    lun 1 {
        path /dev/nda1
    }
}

You can drop the auth-group and specify no-authentication instead of ag0 to disable the use of CHAP for validating the initiator. You may have come across CHAP in other contexts, such as for PPP; it entails the use of a challenge and response to avoid sending passwords, but relies on MD5 as a hash algorithm, which isn’t state-of-the-art in modern security practices. It is also possible to configure mutual CHAP so that the initiator validates the target. Given the presence of a password in the config files, they should set permissions of 0600on the configuration file.

Given the earlier recommendation of a dedicated network link, the more relevant security measure is the use of the listen directives in the portal-group to limit the network links on which the ctld daemon will accept connections. We intentionally list localhost in this example because we run an initiator locally. For testing, you may find it helpful to specify 0.0.0.0 to listen on all IPv4 interfaces.

iSCSI targets follow a specific naming convention where they start with iqn followed by the date, reverse domain name, and a free-form final component. For each exported LUN, you will need to specify the path to the disk device.

As is customary for services on FreeBSD, use sysrc ctld_enable=YES to enable the service on startup, and service service ctld start to start it immediately.

Initiator

FreeBSD’s iSCSI initiator daemon is iscsid(8). This is a kernel-level facility, so it can’t run in the jail. The configuration requires the names of the hosts and targets and, if authentication is enabled, a username and password. An example for /etc/iscsi.conf is as follows:

servera {
    targetaddress = localhost;
    targetname = iqn.2025-01.com.klarasystems:servera;
    chapiname = iscsiuser;
    chapsecret = secret;
}
serverb {
    targetaddress = serverb-priv;
    targetname = iqn.2025-01.com.klarasystems:serverb;
    chapiname = iscsiuser;
    chapsecret = secret;
}

It may seem pointless to use iSCSI to export a disk to the local host, but this allows the path to the disk to be consistent regardless of which server accesses it. As with ctld, we need to enable the initiator service, called iSCSI. Furthermore, we need to arrange for iscsictl, the management tool, to run at startup to add the iSCSI sessions. This entails the following lines in /etc/rc.conf:

iscsid_enable="YES"
iscsictl_enable="YES"
iscsictl_flags="-Aa"

The -Aa flags here add everything from the config file. This can also be done manually – when making changes to the configuration file, you can run iscsictl -Ra followed by iscsictl -Aa to remove the sessions and add them anew.

A final tweak for iSCSI configuration is a sysctl tunable. If one of the systems fails, we want the ZFS pool to quickly transition to a degraded state. By default, the iSCSI initiator is rather patient with the failed system, so operations hang. As we’ve got a mirror, it is preferable to just move on and leave ZFS to worry about bringing the failed half of the mirror in line when the other system returns. There’s a sysctl tunable that changes this behavior, so add kern.iscsi.fail_on_disconnection=1 to /etc/sysctl.conf.

Disk Setup

Once the initiator is operating, run iscsictl without arguments to see a list of connected targets. The following is an example:

Target name                       Target portal    State
iqn.2025-01.com.klarasystems:servera   localhost        Connected: da0
iqn.2025-01.com.klarasystems:serverb   serverb-priv     Connected: da1

The output shows the local SCSI device names to which the exported LUNs are associated. Adding labels to these makes it easier to reference them, for example:

geom label label -v data1-a da0
geom label label -v data1-b da1

We’re now ready to create a ZFS pool to mirror the two devices:

zpool create -o multihost=on data1 mirror label/data1-a label/data1-b

The multihost property protects the pool against more than one host importing it. Periodic writes signal activity. We do have the iSCSI LUNs simultaneously attached to both servers, which is harmless so long as only one server uses them at a time.

The ZFS fault management daemon (zfsd) is especially convenient where connectivity to disks might be prone to temporary outages. It automatically brings disks back online when they reappear. This should be enabled and started on both servers:

service zfsd enable
service zfsd start

Jail Setup

As we mentioned, the iSCSI initiator doesn’t run in a jail, but we don’t need it to. It is also useful to be able to put the jail image on the mirrored pool so that it is available on either host. This also ensures that any state data is mirrored. Some parts of the jail could use a local-only filesystem if you prefer, in particular /tmp and /var/tmp.

There are management tools for jails, but for the purposes of this article, we’ll stick with functionality in the base system. The following steps are one approach to creating an initial jail. The additional use of the ZFS snapshot and clone reflects my personal preferences.

zfs create -o mountpoint=/jail data1/jail
zfs create data1/jail/14.2
bsdinstall jail /jail/14.2
zfs snapshot data1/jail/14.2@installed
zfs clone data1/jail/14.2@installed data1/jail/iscsinfs

Let’s also prepare a ZFS dataset for the NFS exports. The jailed property gives control of the dataset to the jail, which is not strictly necessary.

zfs create -o mountpoint=/export/nfs1 -o jailed=on data1/nfs1

Before we can start the jail, it needs a configuration file. For running nfsd, this will need to be a VNET jail, and we hand it control of the network interface that we named nfs0. There are options to grant the jail access to functionality needed by the NFS server and for mounting the ZFS dataset. It also needs a /dev mount using the usual VNET jail ruleset. We need to install the /etc/jail.conf file on both servers:

iscsinfs {
    path = "/jail/$name";
    host.hostname = "$name.example.com";
    vnet;
    vnet.interface = "nfs0";

    devfs_ruleset = 5;
    mount.devfs;

    exec.consolelog = "/var/log/jail_${name}_console.log";
    exec.clean;

    exec.prepare += "! arping -q -c 1 iscsinfs";
    exec.prepare += "zpool list -H -o name,health data1 2>/dev/null || zpool import -f -o cachefile=none data1";
    exec.created += "zfs jail $name data1/nfs1";
    exec.start = "/bin/sh /etc/rc";

    exec.prestop += "zfs unjail $name data1/nfs1";
    exec.prestop += "ifconfig nfs0 -vnet ${name}";
    exec.stop = "/bin/sh /etc/rc.shutdown";
    exec.release += "zpool export -f data1";

    allow.nfsd;
    allow.mount;
    allow.mount.zfs;
    enforce_statfs = 1;
}

Of particular note are the exec options, which specify commands run as part of starting and stopping the jail. It uses arping as an initial guard to check whether the virtual IP is already in use and ensure that we don’t accidentally start the jail on both servers. arping is available from ports. The next step is to import the pool, which is only done if it isn’t already imported. The cachefile property ensures that the system won’t import the pool if it reboots. At the end, the steps are unrolled in reverse. With VNET jails, it is especially important to return the network interface to the host system. The final exec.release command exports the ZFS pool. The -f option to force this action is unfortunately necessary when storing the jail on the pool.

The jails’s /etc/rc.conf needs settings to configure the network and enable NFS. The example below will need substantial tweaking for options such as NFS version and whether you want features like ID mapping, delegations, quotas, and Kerberos.

hostname="iscsinfs.example.com"
ifconfig_nfs0="up"
ipv4_addrs_nfs0="198.51.100.4"
default_router="198.51.100.1"
zfs_enable="YES"
nfs_server_enable="YES"
nfsv4_server_enable="YES"
nfs_server_flags="-t"
nfsuserd_enable="YES"

As part of adapting the network settings, don’t forget to copy /etc/resolv.conf into the jail for DNS.

NFS also needs /etc/exports populated with the exported filesystems, for example:

/export/nfs1 -alldirs -sec=sys
V4: /export -sec=sys

Note that you can’t use the ZFS sharenfs property on datasets that also have the jailed property. Even if that were possible you also have the problem that it generates /etc/zfs/exports on the host.

For specific guidance on running NFS in a VNET jail, see the NFS Server in a VNET Jail Setup document from when the feature was added.

We’re now ready to start the jail with jail -c iscsinfs after which it should be possible to mount the NFS export from a client system. To stop the jail, run jail -r iscsinfs at which point you can switch over to the second server and start the jail there. The client should pick up where it was with no more than a brief pause.

Variations

The setup we’ve described is an example to demonstrate a basic concept. There is scope to vary this to form a better match to your requirements. In this section, we’ll consider some options.

To add additional data storage redundancy, the simplest path is to add systems, so three systems with a three-way ZFS mirror, for example. Alternatively, each server might contribute two disks to the mirror. This may sound rather wasteful, with the usable capacity reduced to only a quarter of the total disk capacity. But evaluate the tradeoff based on how much storage space you need and the value of your data. Disk capacities have grown and may now exceed your needs. In such a setup, if the disks are all added to the ZFS mirror, it will send data twice over the network link to the remote server.

You may come across the multipath feature of iSCSI to allow traffic to make use of separate network paths. FreeBSD’s target implementation supports this, but the initiator does not. To add redundancy to the network link, the easiest approach is to use link aggregation via a lagg(4) interface. For more complex cases, it may be possible to make use of gmultipath(8), which handles multiple paths to a disk via the GEOM subsystem.

The setup relies on the manual intervention of an administrator to fail the NFS jail over between servers. It is possible to set up automatic failovers to minimize the time it takes to recover from common failures, but this adds complexity.

With iSCSI being a standard protocol, it is possible to use a mix of operating systems. Perhaps one iSCSI target running Linux. For a discussion of the benefits of OS diversity, see our The Case for OS Diversity and Independence webinar on the subject. To use iSCSI on Linux, search for packages named targetcli for the target and open-iscsi for the initiator.

We have already alluded to the fact that we could have used a virtual machine instead of a jail for the NFS server. This would be necessary if you want to run Linux on the hardware and FreeBSD in the VM. A disadvantage of using a VM is that in order to have the VM image available, you need to run an iSCSI initiator on both host systems and the VM. This makes the network setup more complex because the private interconnects need to be available to both.

Conclusion

The ability to export disk devices over the network with iSCSI is a useful capability that can form part of the solution to a variety of problems, especially when combined with other components like ZFS and FreeBSD’s GEOM subsystem. We demonstrated that this can enable ZFS mirrors to span multiple systems. In practical experiments during which test servers bounced up and down, I was impressed by how quickly and seamlessly ZFS handled the unreliable connections and brought mirrors up-to-date quickly.

The lightweight nature of jails means that they start and stop quickly. With the jail residing on the shared storage, we can quickly migrate a service to another system, provided that, as with NFS, restarting the processes is not overly disruptive.

Topics / Tags

RAIDZ SCSI FreeBSD jails infrastructure

Back to Articles

Oliver Kiddle

Oliver Kiddle has many years of professional experience doing system administration of Unix systems along build management and maintaining old software. He has contributed to many open source projects over the years especially Z Shell. In his free time, he enjoys hiking and skiing.

Embedded ARM Development Experts

OpenZFS Development & Support

FreeBSD Development & Support

Stay Informed and Make Smart Business Decisions with Klara's Resources

Unlock the Power of OpenZFS, Linux, and FreeBSD with Klara's Open Source Development Experts

How to Set Up a Highly Available ZFS Pool Using Mirroring and iSCSI

Additional Articles

The Basic Concept

Server and Network Setup

iSCSI Configuration

Target

Initiator

Disk Setup

Jail Setup

Variations

Conclusion

Oliver Kiddle