ZFS Dataset Hierarchy | Data Hoarder Edition

OpenZFS LogoZFS is flexible and will let you name and organize datasets however you choose–but before you start building datasets there’s some ways to make management easier in the long term.  I’ve found the following convention works well for me.  It’s not “the” way by any means, but I hope you will find it helpful, I wish some tips like this had been written when I built my first storage system 4 years ago.

Here are my personal ZFS best practices and naming conventions to structure and manage ZFS data sets.

ZFS Pool Naming

I never give two zpools the same name even if they’re in different servers in case there is the off-chance that sometime down the road I’ll need to import two pools into the same system.  I generally like to name my zpool tank[n] where is an incremental number that’s unique across all my servers.

So if I have two servers, say stor1 and stor2 I might have two zpools :

stor1.b3n.org: tank1
stor2.b3n.org: tank2

Top Level ZFS Datasets for Simple Recursive Management

Create a top level dataset called ds[n] where n is unique number across all your pools just in case you ever have to bring two separate datasets onto the same zpool.  The reason I like to create one main top-level dataset is it makes it easy to manage high level tasks recursively on all sub-datasets (such as snapshots, replication, backups, etc.).  If you have more than a handful of datasets you really don’t want to be configuring replication on every single one individually.  So on my first server I have:


I usually mount tank/ds1 as readonly from my CrashPlan VM for backups.  You can configure snapshot tasks, replication tasks, backups, all at this top level and be done with it.

ZFS snaps and pruning recursively managed at the top level dataset

Name ZFS Datasets for Replication

One of the reasons to have a top level dataset is if you’ll ever have two servers…

   | - tank1/ds1

   | - tank2/ds2

I replicate them to each other for backup.  Having that top level ds[n] dataset lets me manage ds1 (the primary dataset on the server) completely separately from the replicated dataset (ds2) on stor1.

 | - tank1/ds1
 | - tank2/ds2 (replicated)

 | - tank2/ds2
 | - tank1/ds1 (replicated)

Advice for Data Hoarders.  Overkill for the Rest of Us


The ideal is we backup everything.  But in reality storage costs money, WAN bandwidth isn’t always available to backup everything remotely.  I like to structure my datasets such that I can manage them by importance.  So under the ds[n] dataset create sub-datasets.

 | - tank1/ds1/kirk – very important – family pictures, personal files
 | - tank1/ds1/spock – important – ripped media, ISO files, etc.
 | - tank1/ds1/redshirt – scratch data, tmp data, testing area
 | - tank1/ds1/archive – archived data
 | - tank1/ds1/backups – backups

Kirk – Very Important.  Family photos, home videos, journal, code, projects, scans, crypto-currency wallets, etc.  I like to keep four to five copies of this data using multiple backup methods and multiple locations.  It’s backed up to CrashPlan offsite, rsynced to a friend’s remote server, snapshots are replicated to a local ZFS server, plus an annual backup to a local hard drive for cold storage.  3 copies onsite, 2 copies offsite, 2 different file-system types (ZFS, XFS) and 3 different backup technologies (CrashPlan, Rsync, and  ZFS replication) .  I do not want to lose this data.

Multiple Backup Locations Across the World
Important data is backed up to multiple geographic locations

Spock – Important.  Important data that would be a pain to lose, might cost money to reproduce, but it isn’t catastrophic.  If I had to go a few weeks without it I’d be fine.  For example, rips of all my movies, downloaded Linux ISO files, Logos library and index, etc.  If I lost this data and the house burned down I might have to repurchase my movies and spend a few weeks ripping them again, but I can reproduce the data.  For this dataset I want at least 2 copies, everything is backed up offsite to CrashPlan and if I have the space local ZFS snapshots are replicated to a 2nd server giving me 3 copies.


Redshirt – This is my expendable dataset.  This might be a staging area to store MakeMKV rips until they’re transcoded, I might do video editing here or test out VMs.  This data doesn’t get backed up… I may run snapshots with a short retention policy.  Losing this data would mean losing no more than a days worth of work.  I might also run zfs sync=disabled to get maximum performance here.  And typically I don’t do ZFS snapshot replication to a 2nd server.  In many cases it will make sense to pull this out from under the top level ds[n] dataset and have it be by itself.

Backups – Dataset contains backups of workstations, servers, cloud services–I may backup the backups to CrashPlan or some online service and usually that is sufficient as I already have multiple copies elsewhere.

Archive – This is data I no longer use regularly but don’t want to lose. Old school papers that I’ll probably never need again, backup images of old computers, etc.  I set set this dataset to compression=gzip9, and back it up to CrashPlan plus a local backup and try to have at least 3 copies.

Now, you don’t have to name the datasets Kirk, Spock, and Redshirt… but the idea is to identify importance so that you’re only managing a few datasets when configuring ZFS snapshots, replication, etc.  If you have unlimited cheap storage and bandwidth it may not worth it to do this–but it’s nice to have the option to prioritize.

Now… once I’ve established that hierarchy I start defining my datasets that actually store data which may look something like this:

| - tank1/ds1/kirk/photos
| - tank1/ds1/kirk/git
| - tank1/ds1/kirk/documents
| - tank1/ds1/kirk/vmware-kirk-nfs
| - tank1/ds1/spock/media
| - tank1/ds1/spock/vmware-spock-nfs
| - tank1/ds1/spock/vmware-iso
| - tank1/ds1/redshirt/raw-rips
| - tank1/ds1/redshirt/tmp
| - tank1/ds1/archive
| - tank1/ds1/archive/2000
| - tank1/ds1/archive/2001
| - tank1/ds1/archive/2002
| - tank1/ds1/backups
| - tank1/ds1/backups/incoming-rsync-backups
| - tank1/ds1/backups/windows
| - tank1/ds1/backups/windows-file-history


With this ZFS hierarchy I can manage everything at the top level of ds1 and just setup the same automatic snapshot, replication, and backups for everything.  Or if I need to be more precise I have the ability to handle Kirk, Spock, and Redshirt differently.


Journey to Facebook

Week 1:

Number of Friends: 6.  (That’s probably enough)
Number of Likes: 0.
Species: Kind of like the Borg.

Defender (Star Trek USS Enterprise) of Freedom vs Facebook (Borg ship)

I see my home, b3n.org, getting further into the distance.  My blog is in one of the most beautiful locations nestled in the mountains between the Tech and Conservative Blogs, definitely more on the Tech side and well away from the Bay of Flame.  I can see the tech blogging area I’m most familiar with getting smaller and smaller.  A few minutes later I see Lifehacker passing by and I’m flying over the Sea of Opinions.   And then it hit me.   I’ve left the Blogosphere.

After a long flight I stop for a layover at Reddit, then I was back in the air and landed just north of Data Mines, Facebook.  And I joined Facebook.  The reason for my travel?  I’m looking for information locked away in a closed Facebook group.

That was last week.

Map of Social Networks showing my travel from the Blogosphere to Facebook

Most of my friends left the Blogosphere for MySpace, and then moved further north to Facebook years ago (and I’ve re-united with six of them so far).   My impression of Facebook so far: It’s like a bunch of mindless drones all talking at once–well, let me start over.  It’s like a bunch of ads all talking at once and mindless drones trying to shout above them.

Facebook is a land I’ve always avoided–It’s basically what AOL or Geocities should have become–a step back from freedom and individuality.

It’s Not Social Networking That’s the Problem

When you join Facebook, you have to abide by their rules and subject yourself to their censorship.  If you disagree with Facebook, you either comply or you’re out.  There’s no alternative.

Websites, Blogging, and Email on the other hand are based on what the internet should be–open protocols.  If I run my own email server I can send an email to anybody else no-matter what provider they use!  This blog is run on a server I control.  Currently it’s rented from DigitalOcean because I no longer have the bandwidth at my house to run it, but in the past I’ve run it from my dorm room, my bedroom closet, from right under my desk, and from Jeff’s house.  And the thing is anybody can setup their own server–but they don’t have to.  They can use a provider like Blogger or Gmail if they prefer–but if you can get better service somewhere else you can migrate to different provider at will and not lose anything.

But Facebook isn’t open and federated.  Facebook users can only talk to other Facebook users and as long as you want to talk to your Facebook friends the only way is to be on Facebook yourself.  The content is all stored on their servers so you are at their mercy for control and privacy of your content.  Or is it your content?  On Facebook, you are not your own individual, or your own community.  You are part of the Borg.

I’m not against social networking, but Facebook is designed in a very centralized manner which isn’t consistent with how the internet services should be–more distributed and federated.  Some social networks I might be more interested in are Friendica and Diaspora but I don’t think they have much traction yet.

One More Thing

One particularly concerning thing about Facebook is you don’t pay for it–which means, that you’re not Facebook’s customer.  No, indeed.  You, my liked Friend, are the product being sold.