Managing Disk Arrays on FreeBSD/TrueNAS Core

There are many different ways that media can be attached to a machine to provide storage, but in all cases the administrator needs to be able to monitor and manage those devices to ensure their health and to facilitate their replacement when they eventually fail.

In this article we will discuss some strategies and tools to make managing disk arrays on FreeBSD (and related platforms like TrueNAS Core) much easier. These concepts also apply to other operating systems, but the tools might differ slightly.

Understanding Storage Protocols

There are many types of storage devices, with the most popular being magnetic (“spinning rust”) hard drives, and solid state flash devices. These can be broken down further by the protocol used to connect them to the computer.

Serial ATA (SATA) is the familiar interface used for non-enterprise storage, and is an extension of the original ATA interface dating from the 1980s. SATA was introduced twenty years ago, coupled with the Advanced Host Controller Interface (AHCI). SATA+AHCI improved data transfer speeds, simplicity of communication, and included abilities that we today take for granted, such as “hot swap” and command queueing. This has long been the interface bus used by most home users to connect their hard drives, and is supported by nearly every motherboard.

Serial Attached SCSI (SAS) is the most common interface for enterprise storage, first appearing in 2004. It too was an extension on an existing interface bus which offered greatly improved performance. SAS can also support SATA devices with some limitations. SAS provides many more features than SATA does—including full duplex operations, advanced error recovery, multipath, and disk reservations. SAS disk reservations provide the ability to connect to the disk redundantly—or even across multiple machines—while ensuring it is only used by one of them at a time.

Non-Volatile Memory Express (NVMe) is a newer storage interface that is becoming very popular for flash storage devices. NVMe connects storage devices directly to the PCIe bus, offering extremely low latency and high throughput. It also overcomes one of the primary limitations of SATA and SAS: the inability to perform more than a single command at a time.

While both SATA and SAS allow multiple commands to be issued at once to the device, these commands cannot actually be executed concurrently—instead, they are queued for sequential operation. NVMe on the other hand, supports multiple queues (often 64 queues, but the official specification allows for up to 65,536 queues) allowing for many commands to be run concurrently. This both greatly reduces latency and increases maximum throughput. NVMe storage comes in many form factors, from small M.2 devices to U.2 and other hot-swappable formats intended for servers.

The NVMe interface is also extensible to allow operating over the network (where it is known as NVMe Over Fabric or NVMe-oF). NVME-oF allows storage devices and arrays in remote chassis to be connected to local motherboards.

Other interfaces for remote storage include iSCSI, Fiber-Channel, Infiniband, RoCE, and others, but those specialized solutions are beyond the scope of this article.

Types of Disk Arrays

When building a storage system, there are many different ways the disks might be connected to the system. We are going to focus on some of the most popular for SATA and SAS drives.

AHCI Attached

For smaller numbers of drives, and for most home systems, the most common way the disks are attached is to the SATA controllers built into the motherboard. SATA disks plugged directly into the motherboard use an interface called AHCI which does not provide much in the way of advanced management features. But, if the number of ports on the motherboard is sufficient to your needs, this is the easiest way to connect the drives to the system. Often you may only have a single HDD activity LED for the entire system, with no other status information, but this is typically sufficient for builds of this scale.

Direct Attached

At somewhat larger scales, a number of drives can be connected directly to a SAS (or SATA) controller PCIe card. Some server motherboards contain an embedded controller of this type.

Direct Attached deployments require a bit more hardware and cabling. For example, direct attachment of disks usually requires the use of breakout cables which allow each drive to be connected by a SAS (or SATA) interface for each of the available lanes (typically four per connector, with two connectors) in the interface.

It is also possible to use a direct attachment backplane. In these configurations, your system may or may not support features like individual “locate” and “fault” LEDs. When using SATA disks, direct attach may be preferred to using expanders (see the next section) as it avoids some potential problems with a failing disk causing issues for all of the drives that share the same communication lane back to the controller.

Most common direct attach controllers—such as the popular LSI 9207-8i or 9300-8i—only feature two connectors for a total of 8 lanes. An eight lane controller can only directly attach to 8 disks, requiring more controllers (consuming additional PCI-E slots) to connect more drives.

SAS Expanders

For chassis with larger numbers of drives, or when connecting external JBOD chassis, it is common for the drives to connect to a specialized board that provides power and routing for the SATA/SAS signals to the controller.

These special boards, called SAS Expanders, reduce the total cabling required to provide power and signal pathways to all connected disks. Typical SAS connectors support up to 4 drives per “lane”, but with an expander up to 255 devices are possible. The total throughput possible from the connected disks is still limited by the number of lanes available, but this is likely the best approach in systems with more than a dozen disks.

Best Practices

Though a truism, it bears emphasizing that with a little planning, management and maintenance of storage systems can be made easier and safer. Keeping an accurate inventory of your storage media, knowing which disks are in which slots, their models and serial numbers, their remaining warranty durations, and other information of this type will save you from confusion, annoyance, and needless extra effort when a disk inevitably fails.

The first step is to map out the relationship between the physical chassis where the disks reside, and the logical devices enumerated by the operating system. Below we will discuss exactly how to do this with FreeBSD’s sesutil or the management tools for your HBA. In these examples, we are going to assume you are using ZFS (because why wouldn’t you be?).

While the operating system typically provides device aliases based on the disk’s serial number, WWN, or some other static identifier, this does not provide all of the information you might want. Professionals tend to prefer something like: e<enclosure#>s<slot#>-<serial#> as this gives us all of the information we need to locate, confirm the identity of, and replace the failing disk, while still being concise and easy to read. See the sesutil section later in this article for details on how to find this information and create such labels.

Experienced enterprise storage managers also keep extensive notes including the model number, SKU and/or URL for reordering, purchase order information, warranty end date, warranty URL, and any other useful information about each drive. Klara recommends embedding these details directly into the ZFS vdev properties of each disk—a feature Klara created, which will become generally available in the upcoming OpenZFS 2.2 release.

zpool set systems.klara:disk-model=”ST8000VN004-2M21” mypool e0s06-WQP46GLG
zpool set systems.klara:disk-warranty=”2024-12-01” mypool e0s06-WQP46GLG

You may also want to label the hot-swap bay itself with the serial number to make identification even easier—which is good practice, as long as you make sure the labels don’t impede airflow.

Another important aspect of managing your storage system is configuring notifications. If you rely on manually checking on your storage periodically, you will regret it. If you’d feel safer with a team of experts monitoring your storage, consider a ZFS Support Subscription. Configuring your system to notify you when a disk has errors, or when the filesystem reports a degraded device, will ensure your system gets prompt attention when something goes wrong. For ZFS users, automating fault responses with tools like ZED (ZFS Event Daemon) can simplify disk replacement and minimize downtime. Learn more in our guide to ZFS Storage Fault Management on Linux.

At a minimum, configure the daily ZFS status check by adding this line in /etc/periodic.conf:

daily_status_zfs_enable="YES"

And ensure that mail directed to the local root user is forwarded to your inbox by editing the corresponding line in /etc/aliases. Once you’ve done so, you must test delivery to your “real” inbox—you don’t want to learn that delivery isn’t working after your storage has already become unavailable!

You should also configure smartd to monitor your disks and send you alerts, which may give you advanced notice when a drive is starting to fail.

pkg install smartmontools
service smartd enable
cp /usr/local/etc/smartd.conf.sample /usr/local/etc/smartd.conf
    Edit /usr/local/etc/smartd.conf
    Comment out the line: DEVICESCAN
    Add lines for each disk:
/dev/ada1 -d removable -a -n never -m email@address
/dev/da0 -d removable -a -n never -m email@address
service smartd start

Better yet, consider configuring some kind of active monitoring with push notifications, rather than relying on often-unreliable email delivery (and your ability to keep on top of your inbox). A classic Nagios installation is ideal for this, especially when paired with the aNag app or something like PagerDuty.

With disk metadata embedded into your pool and monitoring in place to notify you when there is an issue, you can keep track of your disks and may be able to replace them before they fail getting the most out of whatever warranty they might have.

SES

Many backplanes include support for SCSI Enclosure Services (SES). SES provides a mechanism to query information from the enclosure, including temperature, fan speed, and status of power supplies. It also provides information about each slot in the enclosure (even if empty), including a flag to indicate if the device has recently been swapped.

In addition to the above query types, SES also supports a number of commands, including activating the “locate” and “fault” LEDs if present, and the ability to individually power off drives.

Of course, all of this chassis management technology isn’t very effective without tools to make it usable. Rather than being subject to the whims of the vendor, you can use the tool built into FreeBSD, called sesutil.

SESUtil

FreeBSD’s sesutil is a tool to interface with the SES devices on your system. It features the main commands you might need: map, show, fault, locate, and status.

sesutil map

The map command displays all of the SES devices and each element (this is the nomenclature in SES) connected to them.

Looking at a few items from the output, we can see the device names (/dev/da0 and /dev/da7 respectively) of the disks in Slot00 and Slot07. We can also see that the disk in Slot07 was recently swapped, and that Slot08 does not contain a disk and its locate LED is activated.

We can also examine the various sensors for temperature and voltage included in the backplane.

ses0:
        Enclosure Name: SMC SC846P 0c1f
        Enclosure ID: 500304801820593f
        Element 0, Type: Array Device Slot
                Status: Unsupported (0x00 0x00 0x00 0x00)
                Description: ArrayDevicesInSubEnclsr0
        Element 1, Type: Array Device Slot
                Status: OK (0x01 0x00 0x00 0x00)
                Description: Slot00
                Device Names: da0,pass0
        Element 8, Type: Array Device Slot
                Status: OK (0x11 0x00 0x00 0x00)
                Description: Slot07
                Device Names: da7,pass7
                Extra status:
                -  Swapped
        Element 9, Type: Array Device Slot
                Status: OK (0x01 0x00 0x00 0x00)
                Description: Slot08
                Extra status:
                - LED=locate
        Element 60, Type: Temperature Sensor
                Status: OK (0x01 0x00 0x57 0x00)
                Description: ChipDie
                Extra status:
                - Temperature: 67 C

sesutil show

The sesutil show subcommand provides an easy to read summary of the most commonly desired information:

ses3: <LSI SAS2X28 0e12>; ID: 5003048000b40b3f
Desc     Dev     Model                     Ident         Size/Status
Slot 01  da36    ATA ST4000DM005-2DP1      ZGY0XH87   4T
Slot 02  da35    ATA ST4000DM000-2AE1      ZGY07YC3   4T
Slot 03  da34    ATA ST4000DM000-2AE1      ZGY07VS1   4T
Slot 04  da37    ATA ST4000DM000-2AE1      ZGY06NB8   4T
Slot 05  da38    ATA ST4000DM005-2DP1      ZGY1YT0C   4T,LED=locate
Slot 06  da23    ATA ST8000VN004-2M21      WQP46GLG   8T,Swapped

Labeling with GEOM Multipath

We can now use this information to label our disks. Each SAS Expander will present as a new /dev/ses# device, so your system may have more than one. FreeBSD supports a number of different ways to label the disk, depending on your use case.

If your system has multipath SAS, each disk will be present more than once, and you should use the gmultipathcommand to deduplicate your disks and for labeling as well.

gmultipath label e3s02-ZGY07YC3 da199
true > /dev/da324

This will write a GEOM Multipath label to the last sector of the disk. Using the no-op true command on other paths to that disk, will cause GEOM to re-”taste” the disk and see the label and automatically add the additional paths to the existing multipath.

You can also reboot, and GEOM will pick up the multipath when it first tastes the disks during boot. The device will be accessible as /dev/multipath/e3s02-ZGY07YC3.

Labeling with GUID Partition Table (GPT)

If you are going to partition the disks, you can use GPT labels:

gpart create -s gpt da36
gpart add -t 4g -a 1m -t freebsd-swap da36
gpart add -a 1m -t freebsd-zfs -l e3s01-ZGY0XH87 da36

This example creates a new GPT partition scheme on da36, creates a 4 GiB swap partition aligned to 1 MiB boundaries, and then adds a ZFS partition with the label e3s01-ZGY0XH87 using the remainder of the space on the disk. That partition will now be accessible as /dev/gpt/e3s01-ZGY0XH87.

Labeling with GEOM Labels

Lastly, you can use the GEOM Label system, similar to multipath, to store a small chunk of data at the end of the disk to persistently identify it:

glabel label e3s06-WQP46GLG da23

That disk will now be accessible as /dev/label/e3s06-WQP46GLG.

sesutil locate

sesutil can also be used to locate the disk in the physical array.While the SES data tells us that there is an 8 TB disk in Slot 06, it does not tell us which slot in the chassis corresponds to 06. Some chassis count from 0, others from 1—and there’s not even a set standard for whether labeling is left to right, top to bottom or bottom to top, left to right.

You can avoid any uncertainty by enabling the “locate” or “fault” LED for the drive you mean to replace. This greatly reduces the chance of getting it wrong when you (or the datacenter technician) physically pulls the disk.

To disable the locate LED that is already activated on Slot 05 from above:

sesutil locate da38 off

However, if a disk has died entirely, or a slot is empty, it might not have a device name. Unnamed devices can be specified by their specific SES device and element number. These can be found with the sesutil map command.

Note that the element number usually is different than the slot number:

sesutil fault -u /dev/ses0 9 on

This will activate the fault LED for element 9 (Slot 08) on the first SES device. (Note: some chassis have separate LEDs for fault and locate, while others use a single LED with different color or blink patterns for the two different conditions.)

sesutil status

For a quick overview, the status command can be used to tell if there is anything that requires further investigation. This makes sesutil status a great summary to connect to your monitoring system.

#sesutil status
ses0: OK
ses1: INFO
ses2: OK
ses3: CRITICAL

If we examine ses3 more closely:

# sesutil -u /dev/ses3 map
        Element 38, Type: Power Supply
                Status: Not Available (0x47 0x80 0x00 0x20)
                Description: Power Supply 2
                Extra status:
                - Predicted Failure

We see it is predicting the failure of its number 2 power supply.
Whereas ses1 is just informing us one of the locate LEDs is on:

# sesutil -u /dev/ses1 map
        Element 1, Type: Array Device Slot
                Status: OK (0x01 0x00 0x02 0x00)
                Description: Slot 01
                Device Names: da44,pass50
                Extra status:
                - LED=locate

sesutil JSON output

As with a number of tools in FreeBSD, sesutil supports outputting JSON via the libxo library. When combined with a JSON parser like jq, this can be used to automate tasks for each disk.

Consider the following example:

sesutil show --libxo json,pretty |
        jq '
            .sesutil.enclosures[] | .enc as $enc | 
            .elements[]|select(.type == "device_slot" and .model != "") | 
            $enc + ":" + .description + ":" + .serial + ":" + .device_names
        ' | 
        sed 's/ //g' | cut -d '"' -f 2 | 
        sed -E 's/ses([[:digit:]]+):Slot([[:digit:]]+)/e\1s\2/g' | 
        sh -c 'for line in $(cat -); do 
                substring=${line#*:}; 
                slot=${line%%:*}; 
                serial=${string%%:*}; 
                disk=${line##*:}; 
                gpart create -s gpt $disk; 
                gpart add -t efi -s 256mb -a 4k $disk; 
                gpart add -t freebsd-swap -s 6g -a 1m $disk; 
                gpart add -t freebsd-zfs -l $slot-$serial $disk; 
        done'

This partitions each disk and labels the ZFS partition with the enclosure, slot, and serial number of the corresponding disk.

Note: each enclosure is different, and you will likely need to make minor modifications to this example pipeline before it works for your specific configuration.

mpsutil / mprutil

If your system uses an LSI/Avago/Broadcom SAS Controller supported by the FreeBSD mps (SAS2xxx chip) or mpr(SAS3xxx chip) driver, then you can use the corresponding tool to manage your disks even without an SES device:

# mpsutil show adapters
Device Name           Chip Name        Board Name        Firmware
/dev/mps0             LSISAS2308       SAS9207-8i        14000700
/dev/mps1             LSISAS2308       SAS9207-8i        14000700
/dev/mps2             LSISAS2308       SAS9207-8i        14000700

If you have multiple adapters, mpsutil will default to the first logical adapter. You should specify which adapter to operate on:

#mpsutil -u 2 show all
Adapter:
mps2 Adapter:
       Board Name: SAS9207-8i
        Chip Name: LSISAS2308

Devices:
B____T    SAS Address      Handle    Device    Speed  Enc  Slot
00   11   5000cca25323147d 0009    0001 SAS Target    6.0  0001 03
00   09   5000cca253253cf1 000a    0002     SAS Target    6.0  0001 01
00   08   5000cca253254e1d 000b    0003     SAS Target    6.0  0001 00
00   15   5000cca253252a45 000c    0004     SAS Target    6.0  0001 07
00   14   5000cca2532527d1 000d    0005     SAS Target    6.0  0001 06
00   13   5000cca253252ddd 000e    0006     SAS Target    6.0  0001 05
00   12   5000cca25315a53d 000f    0007     SAS Target    6.0  0001 04
00   10   5000cca253254fd1 0010    0008     SAS Target    6.0  0001 02

Enclosures:
Slots      Logical ID     SEPHandle  EncHandle    Type
  08    500605b009d01dc0               0001     Direct Attached SGPIO

So, to activate the LED for the first disk displayed above, we first need to determine the enclosure handle number (0001), and then the slot number of the disk (03). The status field is a bitmask supporting a number of different options, but the main ones we care about are 1 (OK), and 2 (FAULTED). Set enclosure 1, disk 3 to the faulted (2) status:

# mpsutil slot set status 1 3 2
Successfully set slot status

On my system, this command produces a bright red LED lit for that slot, physically highlighting the correct drive to replace. Setting the status back to 1 returns the activity light back to normal—on this system, a blinking blue.

mpsutil and mprutil can also be used to upgrade the firmware on the HBA from within the FreeBSD operating system, saving you from dealing with the horror that is megacli, the hassle of creating a USB image that pretends to be a floppy disk, and/or the pseudo-MSDOS of the EFI shell.

First, create a backup of the old firmware revision:

mprutil -u 1 flash save firmware file.img

Then, flash the new version to the HBA:

mprutil -u 1 flash upload firmware SAS9300_8i_IT.bin

If you need more advanced functionality than mpsutil provides, LSI provides their native tools sas2ircu and sas3ircu for FreeBSD. Although they offer additional functionality, they are less user friendly and require agreeing to a long EULA.

Conclusion

Monitoring and maintaining your storage media is one of the most important parts of keeping your data safe.

With the tools presented here, the reader is well armed to react to failed disks and ensure that the wrong disk isn’t accidentally pulled.

With these types of best practices, we eliminate the confusion or even chaos that might cause a storage administrator to disconnect the wrong drive—which happens with disturbing frequency, and can result in degrading a redundant array beyond its ability to recover.

When dealing with critical data, you only get one chance to do it right. Experience is invaluable—so if you are unsure, consult an expert to make sure you get it right the first time, as that may well be the only chance you get.

Topics / Tags

disk

Back to Articles

Managing Disk Arrays on FreeBSD/TrueNAS Core

Additional Articles

Understanding Storage Protocols

There are many types of storage devices, with the most popular being magnetic (“spinning rust”) hard drives, and solid state flash devices. These can be broken down further by the protocol used to connect them to the computer.

Types of Disk Arrays

AHCI Attached

Direct Attached

SAS Expanders

Best Practices

SES

SESUtil

sesutil map

sesutil show

Labeling with GEOM Multipath

Labeling with GUID Partition Table (GPT)

Labeling with GEOM Labels

sesutil locate

sesutil status

sesutil JSON output

mpsutil / mprutil

Conclusion

More on This Topic

If Scrubs Hurt, Your ZFS Design Is Broken

ZFS vs Ceph: Do You Actually Need Ceph?

Jails, Not Containers: FreeBSD Isolation Done Right

Native inotify in FreeBSD

Using Object Storage with OpenZFS and SeaweedFS

Managing Cache and DirectIO for Databases on ZFS

Why ZFS Is the Ideal Filesystem for Multi-User/Department Media Production

Which ZFS Storage Metrics Matter for Database Performance

Embedded ARM Development Experts

OpenZFS Development & Support

FreeBSD Development & Support

Stay Informed and Make Smart Business Decisions with Klara's Resources

Unlock the Power of OpenZFS, Linux, and FreeBSD with Klara's Open Source Development Experts

Managing Disk Arrays on FreeBSD/TrueNAS Core

Additional Articles

Understanding Storage Protocols

There are many types of storage devices, with the most popular being magnetic (“spinning rust”) hard drives, and solid state flash devices. These can be broken down further by the protocol used to connect them to the computer.

Types of Disk Arrays

AHCI Attached

Direct Attached

SAS Expanders

Best Practices

SES

SESUtil

sesutil map

sesutil show

Labeling with GEOM Multipath

Labeling with GUID Partition Table (GPT)

Labeling with GEOM Labels

sesutil locate

sesutil status

sesutil JSON output

mpsutil / mprutil

Conclusion

More on This Topic

If Scrubs Hurt, Your ZFS Design Is Broken

ZFS vs Ceph: Do You Actually Need Ceph?

Jails, Not Containers: FreeBSD Isolation Done Right

Native inotify in FreeBSD

Using Object Storage with OpenZFS and SeaweedFS

Managing Cache and DirectIO for Databases on ZFS

Why ZFS Is the Ideal Filesystem for Multi-User/Department Media Production

Which ZFS Storage Metrics Matter for Database Performance