A Somewhat Opinionated Guide to Effective ZFS Snapshots

Much too long ago, I was asked by a user of httm to describe the way I (over)use snapshots, and, though I said I would explain, I never thought I had anything important enough to say about best practices.

However, I'm increasingly seeing posts on r/ZFS requesting instruction of how best to use the snapshot mechanisms of ZFS, and though I can only share my opinions regarding my perhaps very idiosyncratic snapshot setup, maybe these opinions could be helpful for those just starting out.

Of course, this Getting Started guide will make some assumptions that may not apply to your setup. This guide assumes you're using:

  • ZFS for a data and a root pool
  • Running Ubuntu

But I'm certain at least some of this advice will be useful even if you're on a different setup.

I believe an effective snapshot scheme is composed of possibly three types of snapshots:

  1. Periodic Snapshots
  2. Triggered Snapshots
  3. Dynamic Snapshots

Let's discuss each.

Periodic Snapshots§

Periodic snapshots, or snapshots taken at a regular intervals, should be considered the base of any good snapshot scheme, and I wholeheartedly recommend sanoid as a periodic snapshot tool. For those that don't already know, sanoid describes itself as "a policy-driven snapshot management tool for ZFS filesystems". Although other tools exist and perhaps deserve mention, sanoid and syncoid, its replication tools, have made periodic snapshots and replication simple for me.

But note -- like any good tool -- sanoid isn't a perfect fit for every use case (and shouldn't be, see later zsys and the "Curse of Trying to Do Too Much"). But what sanoid is is simple and composable.

To my mind, what makes sanoid great is that it doesn't force you into using all its features, all the time. It doesn't force you into a complicated policy/scheme (even I can do it!), and since it includes such good documentation, I won't waste time discussing how to initially configure your pools here.

Instead, I'd like to highlight, how, in contrast to some other tools, it allows you to compose it with your other Linux utilities, and doesn't break when you do something a little different. For instance, suppose you want to sleep your NAS drives occasionally and not invoke sanoid when your drives are sleeping. Here's how this user solved this particular problem.

First, you'll need to check if any of the spinning rust drives are sleeping:

find /dev/disk/by-id/ -type l | \
grep -v -e part -e wwn | \
while read disk; do
 if [[ $(lsblk -o rota "$disk" | grep -c "1") -gt 0 ]]; then
  smartctl -d sat --nocheck=standby "$disk"
 fi
done

Then, it becomes simple to check that condition whenever you invoke sanoid:

...
[[ $( /usr/local/sbin/checkHDstatus | grep -i 'Device' | /usr/bin/grep -i -c 'STANDBY' ) -gt 0 ]] || \
/usr/sbin/sanoid --prune-snapshots --verbose --configdir=/etc/sanoid/
...

Now imagine you want to use an alternate configuration file when that spinning rust is asleep. Just change the configuration file path:

...
if [[ $( /usr/local/sbin/checkHDstatus | grep -i 'Device' | /usr/bin/grep -i -c 'STANDBY' ) -gt 0 ]]; then
 /usr/sbin/sanoid --take-snapshots --verbose --configdir=/etc/sanoid/awake/
else
 /usr/sbin/sanoid --take-snapshots --verbose --configdir=/etc/sanoid/sleep/
fi
...

sanoid handles all this in stride. It doesn't panic when you skip a few hours worth of snapshots on a certain pool. It doesn't need to control heaven and earth. It just keeps trucking.

Replicating§

Pop Quiz Hotshot: now that you've made a few snapshots, how would you replicate your rpool to your local datapool using your own custom zfs send/recv options?

Easy peasy, you say. Just like sanoid, syncoid has simple options, sane defaults, and an ability to cut a rug when you need to.

/usr/sbin/syncoid -r --sendoptions="L ec" --recvoptions="o recordsize=1M o compression=zstd" \
--force-delete --exclude=scratch --exclude=test --exclude=tmp rpool datapool/rpool 2>&1 | logger -t syncoid

Triggered Snapshots§

Canonical's zsys promised snapshots of every system update and seamless rollback on boot as well as periodic snapshots of significant datasets/directories. It's unfortunate zsys wasn't ready for the Ubuntu 22.04 release, and may never be ready. I won't rehash "Why?" here, but one basic zsys premise is sound: Periodic Snapshots are not enough.

You may ask -- why? Because, for me, it's sometimes important to know that a snapshot was triggered on a date and time certain. Let's discuss a few examples of triggers you might like to use for a snapshot and how you might take those snapshots.

Before a System Upgrade§

The first triggers we might consider are snapshots upon apt upgrade and kernel updates.

First, you'll need a snapshot script to execute (perhaps called /usr/local/sbin/snapPrepApt):

DATE="$( /bin/date +%F-%T )"
# FYI a user helpfully notes there may be some issue with snapshot-ing a bpool and GRUB
# See: https://github.com/kimono-koans/httm/issues/11#issuecomment-1860329869
#zfs snapshot -r bpool@snap_"$DATE"_prepApt
zfs snapshot -r bpool/BOOT@snap_"$DATE"_prepApt
zfs snapshot rpool@snap_"$DATE"_prepApt
zfs snapshot -r rpool/ROOT@snap_"$DATE"_prepApt
zfs snapshot -r rpool/USERDATA@snap_"$DATE"_prepApt

Next, you'll need to execute such a script automatically upon apt upgrade. A simple script in /etc/apt/apt.conf.d will suffice:

// Takes a snapshot of the system before package changes.
DPkg::Pre-Invoke {"[ -x /usr/local/sbin/snapPrepApt ] && /usr/local/sbin/snapPrepApt || true";};

And you will also probably want to execute a script each time you update your kernel. A script invoked from /etc/kernel/preinst.d might look something like:

[ -x /usr/local/sbin/snapPrepApt ] && /usr/local/sbin/snapPrepApt || true

Before Service Launch§

Sometimes you will want to take a snapshot when a service starts up or shuts down. For instance perhaps you have a service, with a database, that needs to be cleanly shutdown so that its state can also be cleanly snapshot-ed.

Just add a little script to execute before or after start up via systemctl edit:

...
[Service]
ExecStartPre=/bin/bash -c "/usr/local/sbin/snapDataService"
...

On Network Mount§

Perhaps you want to take a snapshot every time a network drive is mounted or unmounted. So, when you or a program deletes something over the network, you have snapshot of the state just prior to mount or just after unmount.

Your smb.conf allows you to execute scripts just like this:

...
[TM Volume]
path = "/srv/timemachine"
valid users = timemachine
read only = no
wide links = no
create mask = 0740
directory mask = 0750
root preexec = "/usr/local/sbin/TMpre"
root postexec = "/usr/local/sbin/TMpost"
...

Watch a Directory§

Maybe you only want to take a snapshot of a folder when new files are added to it. inotifywait is a wonderful tool for just that use case (and many others!):

inotifywait -m -e moved_to "/srv/downloads/" | while read -r line; do
 snapDownloads
done

Cleanup§

Now that you've made these snapshots, if you're anything like me, you must clean them up once they aren't needed anymore. I suggest zfs-prune-snapshots for this task.

Just run as a cron script daily to cleanup any triggered or dynamic snapshots that have outlived their usefulness:

/usr/local/sbin/zfs-prune-snapshots -s '_prepApt' 2w 2>&1 | logger -t sanoid

Dynamic Snapshots§

"Dynamic Snapshots" is a term I'll credit myself for, at least, popularizing. Dynamic Snapshots are very similar to Triggered Snapshots. In fact, I think it is appropriate to think of Dynamic Snapshots as simply a breed of Triggered Snapshots. The key difference is how ad-hoc a Dynamic Snapshot feels.

I'll try to give a few examples.

Imagine -- you're in a folder and you realize you're about to change a bunch of files, and you want a snapshot of the state of the folder before you make any edits. You don't know precisely which dataset your working directory resides. And you're not really in the mood to think about it. Of course, you could just copy that folder, rename it, and make edits in the new folder. Or you could do the same with each file as you go. Or you could determine the dataset upon which this folder is located and manually do a snapshot yourself.

All of these things you could do, feel like leg work. A Dynamic Snapshot shouldn't feel like leg work. A Dynamic Snapshot should just work by determining the ZFS mount of the $PWD and taking a snapshot immediately.

For instance, with httm, you might invoke a Dynamic Snapshot like so:

➜ httm -S .
httm took a snapshot named: rpool/ROOT/ubuntu_tiebek@snap_2022-12-14-12:31:41_httmSnapFileMount

In order for the above to work you will need permissions. The most elegant way to give a user ZFS permissions is now through the new zfs allow function. If you have httm installed on Ubuntu, running ounce --give-priv, as an ordinary user, will execute something like the block below to give an ordinary user (you!) permissions to snapshot on all pools.

for pool in "$( sudo zpool list -o name | grep -v -e NAME )"; do
 sudo zfs allow "$( whoami )" mount,snapshot "$pool"
done

Speaking of ounce, ounce is a script I wrote which wraps a target executable, can trace its system calls, and will execute snapshots before you do something silly. ounce is my canonical example of a dynamic snapshot script. When I type ounce nano /etc/samba/smb.conf (I actually alias 'nano'='ounce --trace nano'), ounce knows that it's smart and I'm dumb, so -- it traces each file open call, sees that I just edited /etc/samba/smb.conf a few short minutes ago. Once ounce sees I have no snapshot of those file changes, it takes a snapshot of the dataset upon which /etc/samba/smb.conf is located, before I edit and save the file again.

We can check that ounce worked as advertised via httm:

➜ httm /etc/samba/smb.conf
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Fri Dec 09 07:45:41 2022  17.6 KiB  "/.zfs/snapshot/autosnap_2022-12-13_18:00:27_hourly/etc/samba/smb.conf"
Wed Dec 14 12:58:10 2022  17.6 KiB  "/.zfs/snapshot/snap_2022-12-14-12:58:18_ounceSnapFileMount/etc/samba/smb.conf""
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Wed Dec 14 12:58:10 2022  17.6 KiB  "/etc/samba/smb.conf"
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Conclusion§

Good luck with your future snapshot adventures! I'd love to know your thoughts on how you do it better/more cleverly!