ZFS Storage Fault Management on Linux

The media used to store digital data will eventually degrade and fail, seemingly always at the worst possible time. It doesn’t matter if it’s an old spinning hard disk drive or a brand-new NVMe U.3—failure can strike at any moment, taking your data with it. One of the fundamental features of ZFS is the ability to withstand these failures and heal to be ready for the next failure. To maintain the health of your storage, ZFS needs to replace failed disks. Temporarily offline disks must also be brought back in sync with the rest of the pool to ensure optimal performance.

Introducing ZED for Fault Management

The ZED (ZFS Event Daemon) monitors for events generated by the ZFS kernel module. From disk failures to checksum errors and other pool-related activities, ZED helps automate the process of handling these events.

Its capabilities include:

Proactive Notifications: Keeping you informed about what’s happening in your storage pool.
Error Reporting: Providing detailed insights to help diagnose problems.
Recovery Actions: Taking steps to resolve issues before they escalate.

ZED can be a powerful tool for sysadmins managing ZFS pools. It automates the detection and response to faults that would otherwise require manual intervention. The automatic replacement of a failed disk with a hot-spare immediately upon failure eliminates the delay of waiting for the sysadmin to discover the problem. This can mean the difference between successfully completing the replacement process before a second disk fails and a catastrophic concurrent failure that faults the pool and causes massive data loss.

Automating Fault Response with ZED

ZED monitors for ZFS Events (zevent) that are generated by ZFS and queued for processing by the fault management system. A zevent is a report consisting of key-value pairs in response to certain actions or conditions within a ZFS pool or dataset. These events represent changes, errors, or notifications about the state of the ZFS system. When a zevent occurs, ZED will respond accordingly depending on the type of zevent that occurred, and the additional configuration applied by the sysadmin.

ZED has built-in fault resolution capabilities to deal with some common scenarios. One of these is to recognize new disks that are attached to the system, including disks that were only recently detached. When a new drive appears, ZED will query it, and if it sees that the disk label matches an active pool, it can automatically run `zpool online $pool $disk` to bring the disk online allowing ZFS to apply the DTL (Dirty Time Log) to reply the transactions that disk missed, avoiding the need for a lengthy resilvering process. Getting a disk that temporarily lost connection back online quickly and ensure that the pool does not exceed its fault tolerance, while also reducing the total recovery time. Without ZED, when a drive is reinserted, ZFS will wait for the sysadmin to manually flag the drive as available again.

Enhancing Pool Performance and Disk Replacement

If a drive is removed from a system, the generated events will allow ZED to take the required actions to flag the drive as disconnected (and if one is available) activate a hot-spare in its place. Without ZED, when a drive fails ZFS may not notice until it tries to use the disk, and only once that fails will it flag the drive as faulted. ZFS would then wait for the sysadmin to manually initiate a zfs to replace the disk and start the resilvering process.

ZED can also deal with entirely new disks. If a disk fails and a new drive is inserted into the same slot in the chassis, ZED can recognize the new disk as a replacement for the failed one and automatically start the resilvering. When combined with a ZEDLET (described below) that automatically activates the “locate” LED in the chassis, this feature can significantly simplify the process of replacing a failed drive. It eliminates the need for an operator to manually run any commands on the host.

Recently, Klara developed a new capability for ZED for a customer, and upstreamed it to OpenZFS, allowing ZED to automatically react to a slow disk. A disk taking a significant amount of time to respond to a request is often a sign of impending failure and can reduce the performance of the entire pool. ZFS has a configurable threshold that marks an I/O as being too slow, the default is 30 seconds. If a disk experiences more than 10 slow I/Os within 15 minutes (both values are configurable), ZED can automatically offline the disk. The system will then replace the disk with a spare to prevent a single disk from dragging the performance of the entire pool down.

Customizing ZFS Event Responses with ZEDLETs

However, ZED is not limited to its innate capabilities. ZED can be extended to run custom scripts that a sysadmin creates. These are called ZEDLETs. ZEDLET stands for “ZFS Event Daemon Linkage for Executable Tasks” and is associated with specific types of zevents.

A ZEDLET is essentially a response script or routine that ZED invokes to handle a specific ZFS event, like a disk failure, checksum error, or a pool state change. ZEDLETs are designed to be lightweight and efficient, allowing administrators to define customized event responses without overloading the system.

They can provide flexibility by allowing sysadmins to enable different actions to be performed when certain events are triggered. An example of these actions would be:

Sending email alerts to system administrators
Running diagnostic commands or custom scripts to gather additional system information
Activating the “locate” LED in the disk chassis, so the sysadmin can identify the correct disk to swap with a new drive.

When a ZFS event occurs, ZED scans a directory (typically /etc/zfs/zed.d) for scripts or programs designated to handle the specific event. These scripts are executed with a set of environment variables that provide context about the event, allowing the scripts to interpret the details and take appropriate action based on the event type and associated metadata.

Examples of ZEDLET Actions

An example ZEDLET script to send an email when there is an IO failure would be:

#!/bin/bash 

# zed script to notify admin of disk failure 
EVENT_CLASS="$ZEVENT_CLASS" 
DEVICE_NAME="$ZEVENT_VDEV_PATH" 

if [[ "$EVENT_CLASS" == "ereport.fs.zfs.io_failure" ]]; then 
    echo "Disk failure detected on device $DEVICE_NAME" | mutt -s "ZFS Disk Failure" [email protected] 
fi

Because ZEDLETS are individual scripts that are triggered by an event, an administrator can create as many as they desire with each specifically written to respond in the exact way desired. These can be added without interfering with or needing to alter the standard 'ZEDLETs' provided by ZFS on Linux. A sysadmin's additions can be as simple or complex as desired. ZEDLETS should be written according to the Unix Philosophy and be written to handle a specific task, making the system modular and easy to maintain or extend.

Creating and maintaining a collection of ZEDLETS can offer a lot of advantages for busy sysadmins. ZEDLETs can enable automatic, real-time responses to critical events without human intervention. They can be tailored to the system’s design and to the exact needs of the users. By automating tasks like disk offlining, email notifications, logging, etc; ZEDLETs can reduce the overhead and manual work required of ZFS storage administrators.

There are a set of scripts included with ZFS, such as “statechange-led.sh” which will:

# Turn a vdev's fault LED on if it becomes FAULTED, DEGRADED or UNAVAIL.
# Turn its LED off when it's back ONLINE again.

Or “deadman-slot_off.sh” which detects a hung I/O that triggers the ZFS deadman timer (no progress has been made for 5 minutes), and can power off the specific drive causing the problem, forcing a failure of the outstanding I/O and allowing the system to recover.

Streamlining ZFS Fault Management with ZED

Deploying a well configured ZED as part of your storage environment can provide a flexible, modular way to log and automatically respond to important ZFS events. With the combination of ZED’s innate fault management capabilities, the provided library of existing ZEDLETs, and the ability to customizable or create your own ZEDLETs, your data has never been safer. These capabilities make ZFS storage systems more resilient and easier to manage by offloading repetitive or time critical tasks from system and storage administrators.

For further insights into monitoring and maintaining ZFS performance and health, consider reading Klara's article on using zpool iostat.

Topics / Tags

observability automation Linux disk system administration

Back to Articles

JT Pennington

JT Pennington is a ZFS Solutions Engineer at Klara Inc, an avid hardware geek, photographer, and podcast producer. JT is involved in many open source projects including the Lumina desktop environment and Fedora.

Embedded ARM Development Experts

OpenZFS Development & Support

FreeBSD Development & Support

Stay Informed and Make Smart Business Decisions with Klara's Resources

Unlock the Power of OpenZFS, Linux, and FreeBSD with Klara's Open Source Development Experts

ZFS Storage Fault Management on Linux

Additional Articles

Introducing ZED for Fault Management

Automating Fault Response with ZED

Enhancing Pool Performance and Disk Replacement

Customizing ZFS Event Responses with ZEDLETs

Examples of ZEDLET Actions

Streamlining ZFS Fault Management with ZED

JT Pennington