Klara

Going From Recovery Mode to Normal Operations with OpenZFS

Manipulating a Pool from the Rescue System

We’ve all been there: that moment of panic when a system fails to boot back up. Perhaps there was a glitch with an upgrade. Maybe you’re wondering if you fumble-fingered a typo when you made that last change to loader.conf. But there you are, staring at single-user mode, or worse, the boot loader prompt thinking “what now?”.


Fortunately, FreeBSD and its built-in rescue mechanisms have you covered. Barring a truly catastrophic hardware failure, it is possible to quickly recover from most scenarios that prevent a system from booting into normal operation. And if you’re using OpenZFS, you can rest assured that your data is intact.

Let’s take a look at some common recovery scenarios.

Single-User Mode

If a system starts to boot normally but stops with this prompt after probing the disks, you’re in single-user mode:

Enter full pathname of shell or RETURN for /bin/sh:

In this mode, there is only one user (the superuser) and no authentication, no networking or running daemons, and most filesystems are unmounted. While that sounds rather dire, this mode provides the tools needed to repair whatever is preventing the system from completing the booting process.

Start by pressing enter. If you get a # prompt, you’re now in the Bourne (/bin/sh) shell. Next, see if your OpenZFS pools are mounted:

# mount
zroot/ROOT/default on / (zfs, local, noatime, read-only, nfsv4acls)

In this example, only the root dataset of the zroot pool is mounted, and it is mounted as read-only. This means if I try a command such as vi /etc/rc.conf, I’ll receive “read-only file system” errors. To remedy this, unset the read-only property on the specified pool:

# zfs set read-only=off zroot

Then, mount all of the filesystems:

# zfs mount -a

Rerunning the mount command should show that all of the filesystems--including zroot/var, zroot/tmp, and zroot/usr--are mounted as read-write. You should now be able to make any configuration file edits as well as use any other utilities needed to investigate and fix the problem. When finished, type exit. If your changes were successful, the system will continue to boot into normal operation.

Using the Rescue Utilities

Since it is possible that the base utilities themselves (such as sh, mount, or vi) could become corrupt, FreeBSD provides a /rescue directory containing statically linked versions of these utilities.

This means that if a system in single-user mode is too damaged to enter the Bourne shell, you can type:

/rescue/sh

Did you know?

Getting your ZFS infrastructure up to date has never been easier!

Our team provides consistent, expert advice tailored to your business.

Find out more

This should give you a shell prompt. As another example, if the single-user mode shell cannot open the vi command and you need to edit rc.conf, try this:

/rescue/vi /etc/rc.conf

If that fails, try the rescue version of the ed editor:

/rescue/ed /etc/rc.conf

Most of the commands you need to repair a system in single-user mode have rescue equivalents. You can see which utilities are available by typing ls rescue.

Mount Root Prompt

If a system boot experiences an issue when mounting the root filesystem, it will stop at a prompt which looks like this:

mountroot>

If you suspect that the system is attempting to mount the wrong pool location, you can input the correct location at this prompt. This example points zfs to the location for a default FreeBSD installation, where zroot is the pool name, ROOT is the parent dataset, and default is the default boot environment:

mountroot> zfs:zroot/ROOT/default

However, that command will fail if the problem isn’t the location but rather that the required kernel modules were not loaded. To fix that situation, perform a cold boot and press 3 at the FreeBSD boot menu to “Escape to loader prompt”. This will stop the boot and display:

Boot Loader Prompt

Exiting menu!
Type ‘?’  for a list of commands, ‘help’ for more detailed help.
OK

The commands available at this prompt differ from those available from single-user mode or a fully booted system. Start by unloading any loaded kernel and modules and reloading the kernel. You should get an OK after each command:

OK unload
OK load /boot/kernel/kernel

Then, load the opensolaris and zfs kernel modules (.ko) which are needed to successfully mount an OpenZFS pool:

OK load /boot/kernel/opensolaris.ko
OK load /boot/kernel/zfs.ko
OK

If you get errors with the load commands and the boot failure followed a system upgrade, you can instead try loading the previous version of the kernel by replacing the /boot/kernel/ portion with /boot/kernel.old/ in all 3 load commands.

Once the load commands complete without errors, you should be able to successfully boot with a mounted pool:

OK boot

Did you know?

Want to learn more about ZFS? We consistently write about the awesome powers of OpenZFS in our article series.

Read More >

Once the system is up, double-check that these lines exist and are typed correctly in /boot/loader.conf:

opensolaris_load=”YES”
zfs_load=”YES”

Those lines instruct the system to load those kernel modules for you at boot. If the boot was into kernel.old, you will want to investigate and fix the reason for the upgrade failure so that you don’t have to repeat going into the loader prompt whenever the system reboots.

Mounting a Boot Environment

One of the quickest ways to recover from a boot failure due to a misconfiguration or failed update is to select a previous boot environment from the boot menu. (See our article on Managing Boot Environments  for instructions on how to create and use boot environments, particularly the section on repairing a system from a boot environment in the If Something Goes Wrong section.)

If you’re in the habit of making a boot environment before performing an update or when testing configuration changes, the only down-time is the time it takes to reboot and select the previous boot environment. You can then mount the failing boot environment while the system is up and operational in order to fix the failure without disruption to the users of the system.

Live USB

If all else fails, booting the system from the FreeBSD installation disk (USB or CD/DVD image) offers an option to enter a shell where you can try to repair your system.

First, import the pool with an ‘altroot’ (a path prepended to all of the mountpoints, so as not to mount over top of the live system you are using). We also set the “do not automatically mount filesystems” flag, because we need to manually mount the boot environment.

# zpool import -R /mnt -N -f zroot

Once that completes, confirm you can see all of your datasets:

# mount -t zfs zroot/ROOT/default /mnt

If you are not sure what is currently the default boot environment, you can use:

# zpool get bootfs zroot

And it will return the name of the current default boot environment. Once you have mounted the root directory, you can mount the rest of the ZFS datasets:

# zfs mount -a

Now your broken system is mounted with a prefix of /mnt, so rc.conf will be in /mnt/etc/rc.conf, and you can edit files as required to repair your system.

Then just reboot without the USB/CD image connected, and your system should boot as normal.

Takeaways

Part of an administrator’s toolkit for preparing a system for resiliency against boot failures is taking advantage of OpenZFS boot environments. They’re instantaneous to create: get in the habit of creating one before any operation that could potentially prevent the system from booting.

When attempting the first boot of an just-upgraded system, consider setting the default boot environment to the snapshot you created before you started, and using the boot-once feature (`bectl activate -t newbootenvironment`). The system will override the default boot environment for the next boot only, allowing you to revert to the working system by just rebooting after the failed boot.

All is not lost if a FreeBSD administrator didn’t create a boot environment and the system isn’t booting. The built in tools in single-user mode, the loader prompt, and the rescue utilities are capable of recovering a system from most boot failures.

Back to Articles