Cluster provisioning with Nomad and Pot on FreeBSD

Cluster Provisioning on FreeBSD

Although efforts have been made to bring Docker to FreeBSD, none of these are really mature. But thankfully, the lack of Docker support in FreeBSD doesn’t mean that you are out of luck if you want DevOps workflows for managing clusters of computers.

Pot is FreeBSD’s answer to Docker

Pot is a jail abstraction framework/management tool that aims to replace Docker in your DevOps tool chest, and it supports using Nomad to orchestrate clustered services. The team behind Pot aim to provide modern container infrastructure on top of FreeBSD, and have been progressing over the last 3 years. Take the best things from Linux container management and create a new container model—a model based on FreeBSD technologies and running on FreeBSD.

Pot is based on the core proven FreeBSD tools jails, zfs, VNET and pf, and uses rctl and cpuset to constrain the resources available to each container.

These tools are used to manage: 

  • Jail configuration
  • Dataset/Filesystem management
  • Network management
  • Resource limitation

The success of Docker and similar tools came as a surprise to many FreeBSD sysadmins, because FreeBSD’s core tools already made the job of running relatively complex clusters quite straight forward.

Pot aims to keep things simple, and uses core FreeBSD features to implement functionality when possible. For example, FreeBSD didn’t need to invent new functionality to move images between hosts, since in the FreeBSD world, using OpenZFS snapshots and replication  is universally available, extremely performant, and well understood.

Nomad, FreeBSD’s cluster manager

Nomad is a cluster manager and scheduler that provides a common workflow to deploy applications across an infrastructure. A cluster manager handles distributing applications across a set of hosts based on load and cluster usage.

Nomad has support for provisioning and managing images of many different types, with Pot’s goal of creating modern container infrastructure for FreeBSD.  It was not a huge leap to add support to Nomad for creating pot style containers.

Pot support for Nomad is provided through the nomad-pot-driver package.

Setting up Minipot 

It can be difficult to experiment with cluster-based software, since it typically requires a lot of initial setup work to create a pool of usable nodes. Minipot alleviates this requirement, giving us a single node environment to test Pot and Nomad on, while avoiding much of the onerous preliminary setup.

Minipot handles setting up and configuring all of the services required to run a nomad cluster, including consul service directory and the traefik http proxy.

Minipot is available from FreeBSD’s package repository and can be installed with:

# pkg install minipot 
 

You might also be interested in

Get more out of your FreeBSD development

Kernel development is crucial to many companies. If you have a FreeBSD implementation or you’re looking at scoping out work for the future, our team can help you further enable your efforts.

However, we do need to configure pot before minipot will run correctly. Refer to the pot installation instructions for what all the pot config controls do.

We need to configure two things for pot: the network and storage layers. For storage, pot depends on zfs datasets.

Three configuration values need to be set so that Pot can figure out how to create datasets and configure the network. You need to edit /usr/local/etc/pot/pot.conf and uncomment the ‘POT_ZFS_ROOT’, ‘POT_NETWORK’ and ‘POT_EXTIF’ lines, changing their values to reflect your network and storage layout:

# pot configuration file                                         
                                                             
# All datasets related to pot use the some zfs dataset as parent 
# With this variable, you can choose which dataset has to be used
POT_ZFS_ROOT=zroot/pot                                           
...
# Internal Virtual Network configuration

# IPv4 Internal Virtual network                                             
POT_NETWORK=10.192.0.0/10                                                   
                                                                        
# Internal Virtual Network netmask                                          
# POT_NETMASK=255.192.0.0
                                                                        
# The default gateway of the Internal Virtual Network                       
# POT_GATEWAY=10.192.0.1
                                                                        
# The name of the network physical interface, to be used as default gateway 
POT_EXTIF=igb0

The configuration can be tested by initializing pot and VNET, running a test ping, and then tidying it up:

# pot init 
# pot vnet-init 

The first two commands create the pot ZFS datasets required and configure pf for the pot network. This can be tested by pinging the default bridge IP address: 

# ping 10.192.0.1 

Then we tidy everything up using pot de-init: 

# pot de-init 

Using Minipot 

With Pot configured, we can experiment with Nomad using Minipot. Running minipot-init will create and configure a cluster for us:

# sudo minipot-init
0
Creating a backup of your /etc/rc.conf
/etc/rc.conf -> /etc/rc.conf.bkp-pot
syslogd_flags: -b 127.0.0.1 -b 10.192.0.1 -a 10.192.0.0/10 -> -b 127.0.0.1 -b 10.192.0.1 -a 10.192.0.0/10
Creating a backup of your /etc/pf.conf
/etc/pf.conf -> /etc/pf.conf.bkp-pot
auto-magically editing your /etc/pf.conf
Please, check that your PF configuration file /etc/pf.conf is still valid!
nomad_user:  -> root
nomad_env:  -> PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/sbin:/bin
nomad_args:  -> -config=/usr/local/etc/nomad/minipot-server.hcl
consul_enable:  -> YES
nomad_enable:  -> YES
traefik_enable:  -> YES
traefik_conf:  -> /usr/local/etc/minipot-traefik.toml

Pot can control a container’s allocation of system resources. To do so, the kern.racct.enable needs to be set. This is set via /boot/loader.conf, and to enable this you need to add kern.racct.enable=1 to /boot/loader.conf and reboot.

If you rebooted to enable resource limits, minipot can be restarted with:

# minipot-start         

Minipot ships with an example nginx webserver that can be used for testing that the infrastructure works:

# cd /usr/local/share/examples/minipot
# nomad run nginx.job
Job Warnings:
1 warning(s):

* Group "group1" has warnings: 1 error occurred:
        * 1 error occurred:
        * Task "www1": task network resources have been deprecated as of Nomad 0.12.0. Please configure networking via group network block.





==> 2021-11-19T15:13:11Z: Monitoring evaluation "38349af2"
    2021-11-19T15:13:11Z: Evaluation triggered by job "nginx-minipot"
==> 2021-11-19T15:13:12Z: Monitoring evaluation "38349af2"
    2021-11-19T15:13:12Z: Evaluation within deployment: "a81f3323"
    2021-11-19T15:13:12Z: Allocation "bb57cb82" created: node "83947219", group "group1"
    2021-11-19T15:13:12Z: Evaluation status changed: "pending" -> "complete"
==> 2021-11-19T15:13:12Z: Evaluation "38349af2" finished with status "complete"
==> 2021-11-19T15:13:12Z: Monitoring deployment "a81f3323"
  ✓ Deployment "a81f3323" successful
    
    2021-11-19T15:13:42Z
    ID          = a81f3323
    Job ID      = nginx-minipot
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully
    
    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    group1      1        1       1        0          2021-11-19T15:23:40Z

When the example job has launched there will be a webserver running on localhost 8080:

$ curl -H 'host: hello-web.minipot' 127.0.0.1:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

If we have a look around the system, we can get an idea of what minipot is doing. First, we can see that it has created a nginx jail:

# jls
JID  IP Address      Hostname                      Path
1                  nginx-minipotwww1_bb57cb82-64 /opt/pot/jails/nginx-minipotwww1_bb57cb82-6490-f8d3-cfbb-f12204c997e8/m

To support the container, a number of zfs datasets that host the jail have been created:

# zfs list | grep pot                    
NAME                                                                       USED    AVAIL    REFER  MOUNTPOINT
zroot/pot/bases
zroot/pot                                                                  132M   888G       96K  /opt/pot
zroot/pot/bases                                                             96K   888G       96K  /opt/pot/bases
zroot/pot/cache                                                           36.9M   888G     36.9M  /var/cache/pot
zroot/pot/fscomp                                                            96K   888G       96K  /opt/pot/fscomp
zroot/pot/jails                                                           95.0M   888G      128K  /opt/pot/jails
zroot/pot/jails/FBSD120-nginx_1_2                                         94.1M   888G       92K  /opt/pot/jails/FBSD120-nginx_1_2
zroot/pot/jails/FBSD120-nginx_1_2/m                                       93.9M   888G     93.9M  /opt/pot/jails/FBSD120-nginx_1_2/m
zroot/pot/jails/nginx-minipotwww1_bb57cb82-6490-f8d3-cfbb-f12204c997e8     804K   888G      616K  /opt/pot/jails/nginx-minipotwww1_bb57cb82-6490-f8d3-cfbb-f12204c997e8
zroot/pot/jails/nginx-minipotwww1_bb57cb82-6490-f8d3-cfbb-f12204c997e8/m   188K   888G     93.9M  /opt/pot/jails/nginx-minipotwww1_bb57cb82-6490-f8d3-cfbb-f12204c997e8/m 

Minipot has also created a bridge interface on the host, which allows our jails to connect to the outside world. Each jail gets its own epair attached to the bridge:

# ifconfig
igb0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
        ether a8:a1:59:95:87:60
        inet 192.168.100.50 netmask 0xffffff00 broadcast 192.168.100.255
        inet6 fe80::aaa1:59ff:fe95:8760%igb0 prefixlen 64 scopeid 0x1
        inet6 fddd:3c85:d32c:0:aaa1:59ff:fe95:8760 prefixlen 64 autoconf
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
        inet 127.0.0.1 netmask 0xff000000
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
epair0a: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:4a:22:96:d3:0a
        groups: epair
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 58:9c:fc:10:ff:a2
        inet 10.192.0.1 netmask 0xffc00000 broadcast 10.255.255.255
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 3 priority 128 path cost 2000
        groups: bridge
        nd6 options=9<PERFORMNUD,IFDISABLED>

The bridge allows pot jails to connect to each other. Inside the jail, the b side of the epair has been given an ip address. We can check this with jexec:

# sudo jexec nginx-minipotwww1_bb57cb82-6490-f8d3-cfbb-f12204c997e8 ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000 
        inet6 ::1 prefixlen 128 
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
        groups: lo 
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
epair0b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:4a:22:96:d3:0b
        inet 10.192.0.3 netmask 0xffc00000 broadcast 10.255.255.255 
        groups: epair 
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

The pot configuration is derived from the defaults we configured, and the image itself came from a public registry of images. The registry is very clearly marked NOT FOR PRODUCTION USE and should be used for testing and demonstrations only. As of this writing, it only contains nginx images with different FreeBSD bases.

The pot developers have stated that they are interested in running a Docker-like repository of images—but until that exists, you’ll need to create your own images using Pot.

The documentation on the github wiki covers how to make your own images using Pot tools and how to bundle them into a repository.

Next Steps

Minipot is a great way to experiment with Pot and Nomad, but there are clear warnings on the label. The next step is to experiment with larger deployments using Nomad, rather than just a single node.

The Pot developers have a three part series of articles where they walk through building a virtual data center using Pot and Nomad. The first part covers an overview and setup of potluck, the second part discusses setting up nomad, and the third part shows how to test the example services. They acknowledged in the Q3 2021 FreeBSD quarterly update report that the documentation is a little stiff and they are working on improving it.

Pot is developed by Luca Pizzamiglio and about 10 other contributors. The nomad-pot-driver which enables using pot from nomad was developed by Esteban Barrios.

If you use Pot and Nomad on FreeBSD and experience bugs or rough edges, both projects would love to receive feedback and patches.

<strong>Meet the author:</strong> Tom Jones
Meet the author: Tom Jones

Tom Jones is an Internet Researcher and FreeBSD developer that works on improving the core protocols that drive the Internet. He is a contributor to open standards in the IETF and is enthusiastic about using FreeBSD as a platform to experiment with new networking ideas as they progress towards standardisation.

Like this article? Share it!

You might also be interested in

Get more out of your FreeBSD development

Kernel development is crucial to many companies. If you have a FreeBSD implementation or you’re looking at scoping out work for the future, our team can help you further enable your efforts.

More on this topic

FreeBSD Jails

Red Hat’s OpenShift vs FreeBSD Jails

Kubernetes has become a hot technology for managing clusters of applications, but it is famously difficult technology to use and understand. RedShift is an Enterprise cloud platform for running and managing Kubes without tying you into a single platform. FreeBSD is our favorite platform for running applications here at Klara, how does RedShift relate to the technologies that FreeBSD provides and can we create similar environments on top of FreeBSD?

rc(8) Operating System

Your Comprehensive Guide to rc(8): FreeBSD Services and Automation

The FreeBSD rc(8) subsystem is a sensible & elastic services management framework which enables extension automation as well as customizable start/stop scripts for your services. It’s also deterministic – which means services always start in the same order every boot, a critically important feature in service critical environments. Take a deep dive into FreeBSD services and automation with this new article!

A Quick Look at the History of Package Management on FreeBSD

Pkgng became FreeBSD’s official package manager in FreeBSD 10 in 2014. Applications can be easily installed from either pkg—a system managing precompiled binary packages—or the ports tree, which automates building and installation of packages directly from their source code.

One Comment on “Cluster provisioning with Nomad and Pot on FreeBSD

  1. Pingback: Valuable News – 2022/01/24 | 𝚟𝚎𝚛𝚖𝚊𝚍𝚎𝚗

Tell us what you think!