Deploying pNFS file sharing with FreeBSD

November 20, 2024

The venerable Network File System (NFS) has been expanded with distributed capabilities, pNFS v4.2 allows distributing data across multiple servers to increase performance and fault tolerance. Learn how to deploy a high-speed resilient storage system quickly and easily with FreeBSD.

FreeBSD supports a variety of protocols for sharing files over a network. The two most well-known are NFS (Network File System) and SMB (Server Message Block). NFS originally came from Sun Microsystems and is widely used between Unix systems. Comparatively, SMB comes to us from the Windows world.

SMB is a good fit when sharing files with end-user client systems when relying primarily on user-based authentication. NFS is more appropriate in the following ways:

Where client machines are trusted,
Tied tightly to the local infrastructure,
One-to-one mapping of file ownership and permissions between client and server is appropriate.

With the ability to map identities and the optional use of Kerberos, NFSv4 offers more flexibility than the earlier versions of NFS did. As a non-proprietary protocol that is widely supported by many operating systems, NFS remains a popular choice for network file sharing.

One of the newer innovations in NFS is pNFS. The "p" stands for parallel and that is what we'll introduce in this article.

Introducing pNFS

Traditional network file systems typically rely on a single server to handle all data storage requests. It causes a limit to how far they can scale as the volume of data and the number of clients increases. pNFS enables operations to be distributed across multiple servers in a parallel but coordinated manner. This distributed architecture allows for increased data throughput. There are other solutions that take a similar approach such as the Lustre filesystems. However, pNFS can leverage existing NFS infrastructure and supporting technologies.

The fundamental concept behind pNFS involves separating the control and data planes. While the control plane remains responsible for managing metadata and coordinating access to the distributed filesystem, the data plane handles actual data transfers.

By decoupling these two functions, pNFS can support parallelism, with multiple clients able to access data simultaneously from different backend storage devices. Clients access data directly from the data servers, bypassing the controlling metadata server. This allows for a significantly greater scale. The metadata server does remain as a single point of failure, and a bottleneck for metadata operations, however.

The NFS protocol standard allows for a variety of possible backend storage layouts for pNFS, including:

Block-based storage (Disks, Fibre Channel, or iSCSI),

Object-based storage, and

Files stored directly onto a traditional filesystem.

pNFS exposes aspects of how the data is stored and distributed (such as mirroring) to the client as layouts, the client gets via a request to the metadata server. The metadata server can delegate control of an area of storage to a client and recall it at any point if needed. The metadata server can also decide to degrade to normal NFS operation. This is either to improve performance for a particular request or to support an older client.

Deploying pNFS on FreeBSD

FreeBSD 12.0 and later support file-based layouts for both the server and client. The metadata server and all data servers need to be FreeBSD systems, but the clients can run any other operating system.

There are also ways to run multiple data servers on a single system, if you need to spread the data across different underlying file systems. Let's look at the setup of each of the component systems in turn.

How to Configure a Data Server With pNFS

The data server is essentially setup as a regular NFS file server. To start, we need to enable NFS in /etc/rc.conf. Although not strictly necessary, we can also disable earlier NFS clients by setting nfsv4_server_only, or further tweak nfsd options by use of nfs_server_flags.

nfs_server_enable="YES"
nfsv4_server_enable="YES"
nfsv4_server_only="YES"
nfs_server_flags="-t -n 32"

Next, we must declare the root of the exported filesystem tree in /etc/exports. The metadata server needs the -maproot=root option, but this can be skipped for other clients. An example follows, though you may need to add a -sec option for your choice of security flavor and adjust the data path, network range and name of the metadata server.

/data -maproot=root nfs-mds
/data -network 192.168.1 -mask 255.255.255.0
V4: /data

If the NFS server was already running, restart mountd for the changes to be activated. If it isn’t already running, start it with service nfsd start. You should then be able to see the exported filesystem listed in the output of showmount -e.

The data directory further needs to contain subdirectories named ds0 through to ds19. These steps are well covered in the pNFSserver manual page where the following command is suggested to create them:

jot -w ds 20 0 | xargs mkdir -m 700

All these directories including the top-level need to be owned by root. The actual number of subdirectories can be configured on the metadata server with the vfs.nfs.dsdirsize sysctl tunable. The criteria depend on how many files in a subdirectory the underlying filesystem can handle before performance degrades. This is less problematic on modern filesystems. Also, it can be increased retrospectively–if you only anticipate a moderate number of files, the default may be sufficient. While ZFS may cope fine even with millions of files per-directory, you may find other utilities balk at such directories.

How to Configure the pNFS Metadata Server

The metadata server needs to mount the filesystems from each of the data servers on itself. Using NFS 4.2, the man page is very specific about the mount options utilized. The example there for /etc/fstab is as follows:

nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0

We must also enable NFS in /etc/rc.conf, just as we did on the data server(s). We must also add the –p option to nfs_server_flags.

Borrowing another example from the man page, the metadata server’s nfs_server_flags in /etc/rc.conf should look like this:

nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3"

Note: the paths specified in this line are not the paths as exported by the data server. They’re the local mountpoints on the metadata server itself.

It is possible to assign particular data servers to exports on the metadata server by adding a hash symbol and the export path. For example, nfsv4-data0:/data0#/export1.

The inclusion of the -u option to enable UDP clients may seem somewhat curious because NFSv4 requires a transport with congestion control such as TCP. But it does make sense; the metadata server can handle NFSv3 clients directly, without the benefits of pNFS.

The -t flag is the corresponding option to enable TCP. We again see the -n option for the number of threads.

In /etc/exports, we primarily need to enable NFSv4 with a line such as the following:

V4: / -network 192.168.1 -mask 255.255.255.0

Each of the exported directories can also be listed here or the sharenfs ZFS property can be set for each of them, either to "on" or to specific NFS export options.

For things to work smoothly, the data block sizes need to match across the various NFS servers. On the metadata server, first:

Check the values for the rsize and wsize options on the NFS mounts by running nfsstat -m. These default to 65536 but should match the same size as shown by sysctl vfs.nfsd.srvmaxio which currently defaults to 131072.
To correct this, add vfs.maxbcachebuf=131072 to /boot/loader.conf on the metadata server. This change requires a reboot, following which the NFS server should be started. If using ZFS, these values should also match the ZFS recordsize.

The documentation also recommends increasing vfs.nfsd.fhhashsize in /boot/loader.conf.

How to Configure a Client

Once the servers are set up, clients simply mount the filesystem from the metadata server in the normal way.

Enabling a FreeBSD Client

For a FreeBSD client, the nfscbd daemon should be enabled via /etc/rc.conf and run. The nfscbd daemon is used for delegations—a separate but related feature that allows for control of a particular file to be delegated to a client temporarily.

Once nfscbd is enabled and started on the client, the NFS export can be directly mounted:

mount -t nfs -o nfsv4,minorversion=2,pnfs nfs-mds:/data /mnt

Enabling a Linux Client

On a Linux system, the NFS client implementation is a kernel feature. Yet, you may find that key utilities such as mount.nfs4 are missing. If so, you need to first run apt install nfs-common or dnf install nfs-utils or the applicable command for your distribution. It isn't necessary to run the blkmapd daemon as that deals with block layouts for pNFS. It is sufficient to use mount, for example:

mount -t nfs -o v4.2 nfs-mds:/data /mnt

To verify that pNFS is working, use tcpdump to watch for traffic going directly between the client and the data server.

Data Mirroring for Data Storage

Data mirroring can be configured so that each data storage file is on two or more data servers. The pNFS service as a whole can be resilient to failures of a single data server in this configuration. However, mirroring may defeat some of the benefits of pNFS as data writes need to be sent to multiple servers.

With a more traditional NAS gateway solution, mirrored writes can take advantage of going over dedicated links to the storage media. To enable mirroring, add the -m option to the nfs_server_flags on the metadata server. The option takes an argument specifying the number of copies.

When using pNFS mirroring, there are two utilities to control the cluster, run from the metadata server:

pnfsdskill allows a particular data server to be brought offline.
pnfsdscopymr is used to restore files onto a repaired data server.

Local File Access

When looking at the files as they appear on the servers, it becomes abundantly clear that local access to the files isn’t very usable, since the local paths don’t match the exported paths.

NFSv4 file delegations have a similar, subtler effect. While you may not have end-user processes accessing files on your file server, this does have other unfortunate side-effects. It is not possible to directly export the same set of files with a different protocol. For example, look at using Samba to make the files available to Windows hosts. A solution is to configure Samba to use an NFS client mountpoint. It is also better to do backups from an NFS client because that is the best way to get a coherent view of the files. If it helps, the metadata server can act also as a client by mounting its own NFS exports.

If you are accustomed to taking advantage of ZFS snapshots then there are also caveats associated with this. While you might use ZFS as the backing store and can certainly create snapshots their use will be less convenient. Below the .zfs/snapshot directory, you won't have the combination of both meaningful file names and data contents. However, if you synchronize the creation of snapshots across both the metadata and all the data servers, the combined snapshots can be used. Files on data servers do have their normal ownership assigned so user disk quotas are usable on individual data servers. These can't be combined across the NFS service completely. If you are wondering whether it makes sense to use UFS on your data servers, know that other features of ZFS (such as transparent compression and encryption) are still useful in conjunction with pNFS.

There is a tool named pnfsdsfile which can be used on the metadata server to get the associated files on the data servers for a file. This information is stored in extended file attributes to work even for files in a ZFS snapshot. So, to retrieve a file from a snapshot, first locate the file in a snapshot on the metadata server. That file will have the correct name but will be empty. If you pass the filename as a parameter to pnfsdsfile it will output the name of the corresponding data file. pnfsdsfile also allows the extended attributes to be modified which can be useful if a data server is renamed.

Conclusion

With the pNFS functionality, FreeBSD gains support for a distributed file system in a form that builds on the stable and familiar base of the existing NFS implementation. While there are some notable limitations stemming from the separation of data and control planes, pNFS offers improved performance and scalability and provides a useful additional tool for organizations wishing to optimize their storage infrastructure.

Article Resources

www.pnfs.com

pnfs(4) man page

pnfsserver(4) man page

nfsv4(4) man page

Documentation from the time of the commit

FreeBSD Handbook §32.3. Network File System (NFS)

Topics / Tags

NFS

Back to Articles

Oliver Kiddle

Oliver Kiddle has many years of professional experience doing system administration of Unix systems along build management and maintaining old software. He has contributed to many open source projects over the years especially Z Shell. In his free time, he enjoys hiking and skiing.

Maximizing your FreeBSD performance starts with understanding its current state.

A FreeBSD performance audit can help you identify areas for improvement and optimize your systems.

Get a FreeBSD Audit Check out our FreeBSD Subscription >

Embedded ARM Development Experts

OpenZFS Development & Support

FreeBSD Development & Support

Stay Informed and Make Smart Business Decisions with Klara's Resources

Unlock the Power of OpenZFS, Linux, and FreeBSD with Klara's Open Source Development Experts

Deploying pNFS file sharing with FreeBSD

Additional Articles

Introducing pNFS

Deploying pNFS on FreeBSD

How to Configure a Data Server With pNFS

How to Configure the pNFS Metadata Server

How to Configure a Client

Enabling a FreeBSD Client

Enabling a Linux Client

Data Mirroring for Data Storage

Local File Access

Conclusion

Article Resources

Oliver Kiddle

Maximizing your FreeBSD performance starts with understanding its current state.