ZFS Dataset Hierarchy | Data Hoarder Edition

OpenZFS LogoZFS is flexible and will let you name and organize datasets however you choose–but before you start building datasets there’s some ways to make management easier in the long term.  I’ve found the following convention works well for me.  It’s not “the” way by any means, but I hope you will find it helpful, I wish some tips like this had been written when I built my first storage system 4 years ago.

Here are my personal ZFS best practices and naming conventions to structure and manage ZFS data sets.

ZFS Pool Naming

I never give two zpools the same name even if they’re in different servers in case there is the off-chance that sometime down the road I’ll need to import two pools into the same system.  I generally like to name my zpool tank[n] where is an incremental number that’s unique across all my servers.

So if I have two servers, say stor1 and stor2 I might have two zpools :

stor1.b3n.org: tank1
stor2.b3n.org: tank2

Top Level ZFS Datasets for Simple Recursive Management

Create a top level dataset called ds[n] where n is unique number across all your pools just in case you ever have to bring two separate datasets onto the same zpool.  The reason I like to create one main top-level dataset is it makes it easy to manage high level tasks recursively on all sub-datasets (such as snapshots, replication, backups, etc.).  If you have more than a handful of datasets you really don’t want to be configuring replication on every single one individually.  So on my first server I have:

tank1/ds1

I usually mount tank/ds1 as readonly from my CrashPlan VM for backups.  You can configure snapshot tasks, replication tasks, backups, all at this top level and be done with it.

freenas_snapshot_pruning
ZFS snaps and pruning recursively managed at the top level dataset

Name ZFS Datasets for Replication

One of the reasons to have a top level dataset is if you’ll ever have two servers…

stor1.b3n.org
   | - tank1/ds1

stor2.b3n.org
   | - tank2/ds2

I replicate them to each other for backup.  Having that top level ds[n] dataset lets me manage ds1 (the primary dataset on the server) completely separately from the replicated dataset (ds2) on stor1.

stor1.b3n.org
 | - tank1/ds1
 | - tank2/ds2 (replicated)

stor2.b3n.org
 | - tank2/ds2
 | - tank1/ds1 (replicated)

Advice for Data Hoarders.  Overkill for the Rest of Us

supermicro_zfs

The ideal is we backup everything.  But in reality storage costs money, WAN bandwidth isn’t always available to backup everything remotely.  I like to structure my datasets such that I can manage them by importance.  So under the ds[n] dataset create sub-datasets.

stor1.b3n.org
 | - tank1/ds1/kirk – very important – family pictures, personal files
 | - tank1/ds1/spock – important – ripped media, ISO files, etc.
 | - tank1/ds1/redshirt – scratch data, tmp data, testing area
 | - tank1/ds1/archive – archived data
 | - tank1/ds1/backups – backups

Kirk – Very Important.  Family photos, home videos, journal, code, projects, scans, crypto-currency wallets, etc.  I like to keep four to five copies of this data using multiple backup methods and multiple locations.  It’s backed up to CrashPlan offsite, rsynced to a friend’s remote server, snapshots are replicated to a local ZFS server, plus an annual backup to a local hard drive for cold storage.  3 copies onsite, 2 copies offsite, 2 different file-system types (ZFS, XFS) and 3 different backup technologies (CrashPlan, Rsync, and  ZFS replication) .  I do not want to lose this data.

Multiple Backup Locations Across the World
Important data is backed up to multiple geographic locations

Spock – Important.  Important data that would be a pain to lose, might cost money to reproduce, but it isn’t catastrophic.  If I had to go a few weeks without it I’d be fine.  For example, rips of all my movies, downloaded Linux ISO files, Logos library and index, etc.  If I lost this data and the house burned down I might have to repurchase my movies and spend a few weeks ripping them again, but I can reproduce the data.  For this dataset I want at least 2 copies, everything is backed up offsite to CrashPlan and if I have the space local ZFS snapshots are replicated to a 2nd server giving me 3 copies.

redshirt_startrek

Redshirt – This is my expendable dataset.  This might be a staging area to store MakeMKV rips until they’re transcoded, I might do video editing here or test out VMs.  This data doesn’t get backed up… I may run snapshots with a short retention policy.  Losing this data would mean losing no more than a days worth of work.  I might also run zfs sync=disabled to get maximum performance here.  And typically I don’t do ZFS snapshot replication to a 2nd server.  In many cases it will make sense to pull this out from under the top level ds[n] dataset and have it be by itself.

Backups – Dataset contains backups of workstations, servers, cloud services–I may backup the backups to CrashPlan or some online service and usually that is sufficient as I already have multiple copies elsewhere.

Archive – This is data I no longer use regularly but don’t want to lose. Old school papers that I’ll probably never need again, backup images of old computers, etc.  I set set this dataset to compression=gzip9, and back it up to CrashPlan plus a local backup and try to have at least 3 copies.

Now, you don’t have to name the datasets Kirk, Spock, and Redshirt… but the idea is to identify importance so that you’re only managing a few datasets when configuring ZFS snapshots, replication, etc.  If you have unlimited cheap storage and bandwidth it may not worth it to do this–but it’s nice to have the option to prioritize.

Now… once I’ve established that hierarchy I start defining my datasets that actually store data which may look something like this:

stor1.b3n.org
| - tank1/ds1/kirk/photos
| - tank1/ds1/kirk/git
| - tank1/ds1/kirk/documents
| - tank1/ds1/kirk/vmware-kirk-nfs
| - tank1/ds1/spock/media
| - tank1/ds1/spock/vmware-spock-nfs
| - tank1/ds1/spock/vmware-iso
| - tank1/ds1/redshirt/raw-rips
| - tank1/ds1/redshirt/tmp
| - tank1/ds1/archive
| - tank1/ds1/archive/2000
| - tank1/ds1/archive/2001
| - tank1/ds1/archive/2002
| - tank1/ds1/backups
| - tank1/ds1/backups/incoming-rsync-backups
| - tank1/ds1/backups/windows
| - tank1/ds1/backups/windows-file-history

 

With this ZFS hierarchy I can manage everything at the top level of ds1 and just setup the same automatic snapshot, replication, and backups for everything.  Or if I need to be more precise I have the ability to handle Kirk, Spock, and Redshirt differently.

 

VMware vs bhyve Performance Comparison

Playing with bhyve

Here’s a look at Gea’s popular All-in-one design which allows VMware to run on top of ZFS on a single box using a virtual 10Gbe storage network.  The design requires an HBA, and a CPU that supports VT-d so that the storage can be passed directly to a guest VM running a ZFS server (such as OmniOS or FreeNAS).  Then a virtual storage network is used to share the storage back to VMware.

vmware_all_in_one_with_storage_network
VMware and ZFS: All-In-One Design

bhyve, can simplify this design since it runs under FreeBSD it already has a ZFS server.  This not only simplifies the design, but it could potentially allow a hypervisor to run on simpler less expensive hardware.  The same design in bhyve eliminates the need to use a dedicated HBA and a CPU that supports VT-d.

freebsd_bhyve
Simpler bhyve design

I’ve never understood the advantage of type-1 hypervisors (such as VMware and Xen) over Type-2 hypervisors (like KVM and bhyve).  Type-1 proponents say the hypervisor runs on bare metal instead of an OS… I’m not sure how VMware isn’t considered an OS except that it is a purpose-built OS and probably smaller.  It seems you could take a Linux distribution running KVM and take away features until at some point it becomes a Type-1 hypervisor.  Which is all fine but it could actually be a disadvantage if you wanted some of those features (like ZFS).  A type-2 hypervisor that supports ZFS appears to have a clear advantage (at least theoretically) over a type-1 for this type of setup.

In fact, FreeBSD may be the best visualization / storage platform.  You get ZFS and bhyve, and also jails.  You really only need to run bhyve when virtualizing a different OS.

bhyve is still pretty young, but I thought I’d run some tests to see where it’s at…

Environments

This is running on my X10SDV-F Datacenter in a Box Build.

In all environments the following parameters were used:

  • Supermico X10SDV-F
  • Xeon D-1540
  • 32GB ECC DDR4 memory
  • IBM ServerRaid M1015 flashed to IT mode.
  • 4 x HGST Ultrastar 7K300 HGST 2TB enterprise drives in RAID-Z
  • One DC S3700 100GB over-provisioned to 8GB used as the log device.
  • No L2ARC.
  • Compression = LZ4
  • Sync = standard (unless specified).
  • Guest (where tests are run): Ubuntu 14.04 LTS, 16GB, 4 cores, 1GB memory.
  • OS defaults are left as is, I didn’t try to tweak number of NFS servers, sd.conf, etc.
  • My tests fit inside of ARC.  I ran each test 5 times on each platform to warm up the ARC.  The results are the average of the next 5 test runs.
  • I only tested an Ubuntu guest because it’s the only distribution I run in (in quantity anyway) addition to FreeBSD, I suppose a more thorough test should include other operating systems.

The environments were setup as follows:

1 – VM under ESXi 6 using NFS storage from FreeNAS 9.3 VM via VT-d

  • FreeNAS 9.3 installed under ESXi.
  • FreeNAS is given 24GB memory.
  • HBA is passed to it via VT-d.
  • Storage shared to VMware via NFSv3, virtual storage network on VMXNET3.
  • Ubuntu guest given VMware para-virtual drivers

2 – VM under ESXi 6 using NFS storage from OmniOS VM via VT-d

  • OmniOS r151014 LTS installed under ESXi.
  • OmniOS is given 24GB memory.
  • HBA is passed to it via VT-d.
  • Storage shared to VMware via NFSv3, virtual storage network on VMXNET3.
  • Ubuntu guest given VMware para-virtual drivers

3 – VM under FreeBSD bhyve

  • bhyve running on FreeBSD 10.1-Release
  • Guest storage is file image on ZFS dataset.

4 – VM under FreeBSD bhyve sync always

  • bhyve running on FreeBSD 10.1-Release
  • Guest storage is file image on ZFS dataset.
  • Sync=always

Benchmark Results

MariaDB OLTP Load

This test is a mix of CPU and storage I/O.  bhyve (yellow) pulls ahead in the 2 threaded test, probably because it doesn’t have to issue a sync after each write.  However, it falls behind on the 4 threaded test even with that advantage, probably because it isn’t as efficient at handling CPU processing as VMware (see next chart on finding primes).
sysbench_oltp

Finding Primes

Finding prime numbers with a VM under VMware is significantly faster than under bhyve.

sysbench_primes

Random Read

byhve has an advantage, probably because it has direct access to ZFS.

sysbench_rndrd

Random Write

With sync=standard bhyve has a clear advantage.  I’m not sure why VMware can outperform bhyve sync=always.  I am merely speculating but I wonder if VMware over NFS is translating smaller writes into larger blocks (maybe 64k or 128k) before sending them to the NFS server.

sysbench_rndwr

Random Read/Write

sysbench_rndrw

Sequential Read

Sequential reads are faster with bhyve’s direct storage access.

sysbench_seqrd

Sequential Write

What not having to sync every write will gain you..

sysbench_seqwr

Sequential Rewrite

sysbench_seqrewr

 

Summary

VMware is a very fine virtualization platform that’s been well tuned.  All that overhead of VT-d, virtual 10gbe switches for the storage network, VM storage over NFS, etc. are not hurting it’s performance except perhaps on sequential reads.

For as young as bhyve is I’m happy with the performance compared to VMware, it appears to be a slower on the CPU intensive tests.   I didn’t intend on comparing CPU performance so I haven’t done enough variety of tests to see what the difference is there but it appears VMware has an advantage.

One thing that is not clear to me is how safe running sync=standard is on bhyve.  The ideal scenario would be honoring fsync requests from the guest, however I’m not sure if bhyve has that kind of insight from the guest.  Probably the worst case under this scenario with sync=standard is losing the last 5 seconds of writes–but even that risk can be mitigated with battery backup. With standard sync there’s a lot of performance to be gained over VMware with NFS.  Even if you run bhyve with sync=always it does not perform badly, and even outperforms VMware All-in-one design on some tests.

The upcoming FreeNAS 10 may be an interesting hypervisor + storage platform, especially if it provides a GUI to manage bhyve.

 

Best Hard Drives for ZFS Server (Updated 2016)

Today’s question comes from Jeff….

Q. What drives should I buy for my ZFS server? 

Answer: Here’s what I recommend, considering a balance of cost per TB, performance, and reliability.  I prefer NAS class drives since they are designed to run 24/7 and also are better at tolerating vibration from other drives.  I prefer SATA but SAS drives would be better in some designs (especially when using expanders).  For a home or small business storage server with 8 or fewer drives I think these are the best options.

Updated: July 19, 2015 – Added quieter HGST, and updated prices.
Updated: July 30, 2016 – Updated prices, and added WL drives

2TB HGST OEM Drives – $23/TB

They won’t carry the HGST 5-year warranty but you can usually get a 1-year warranty from the seller.  HGST drives are reliable so the lower cost probably justifies the lack of a warranty.  2TB HGST drives also boast a MTBF of 2 million hours!

HGST Deskstar 7K4000 2TB 64MB Cache 7200RPM SATA III (DS724020ALE640)  Desktop/NAS grade.  1-year Warranty.  $46 / $23/TB.  This Deskstar is nearly silent.  It isn’t an enterprise class drive, however the internals are nearly identical (and might be identical).  TLER is disabled by default but, unlike most desktop drives it can be enabled manually (see notes on enabling TLER belolw).

5TB, and 6TB White Label Drives $23/TB to $27/TB

A great way to save money is to get White Label Drives.  These are NAS class drives with branding removed.  Seller usually provides 1-year warranty.  A great deal is this 6TB WhiteLabel Drive for $160 or 5TB WhiteLabel Drive for $115. These are most likely re-branded Western Digital Reds.   I hate dealing with paperwork and warranty returns–I’d much rather just buy an extra drive to have sitting on the shelf in case one fails than pay more for a warranty.

3TB, 4TB, 5TB, 6TB, and 8TB Drives $37/TB to $50/TB

I’d purchase either HGST Deskstar NAS or the WD RED series.  Both are designed for 24-7 operation and for use in systems with up to 8-bays.

hgst_deskstar_nas

HGST Deskstar NAS 64MB Cache 7200RPM SATA III 3-year Warranty.  The main advantage of this drive is it’s faster at 7200RPM and as a result it significantly outperforms the WD Red.  See StorageReview’s benchmarks on the 4TB Deskstar.  Also at 5TB and 6TB the cache doubles to 128MB.  In general if the price is the same or pretty close I’d prefer the HGST drive.

 

WD RED NAS 64MB ~5400RPM SATA III 3-year Warrantywd_red.  The WD drive runs a little cheaper, if the price is less than the HGST by more than $5/TB I would consider this drive to save a little money.

Careful with 8TB+ drives

Some drives 8TB and larger are using SMR (Shingled Magnetic Recording) which should not be used with ZFS if you care about performance until drivers are developed.  Be careful about any drive that says it’s for archiving purposes.

SLOG and L2ARC

For SLOG and L2ARC see my comparison of SSDs.

Capcity Planning for Failure

Most drives running 24/7 start having a high failure rate after 3-years, you might be able to squeeze 4 or 5 years out of them if you’re lucky.  So a good rule of thumb is to estimate your growth and buy drives big enough that you will start to outgrow them in 4 to 5 years.  The price of hard drives is always dropping so you don’t really want to buy more much than you’ll need before they start failing.  Consider that in ZFS you shouldn’t run more than 70% full (with 80% being max) for your typical NAS applications including VMs on NFS.  But if you’re planning to use iSCSI you shouldn’t run more than 50% full.

ZFS Drive Configurations

My preference is almost always RAID-Z2 (RAID-6) with 6 to 8 drives which provides a storage efficiency of .66 to .75.  This scales pretty well as far as capacity is concerned and with double-parity I’m not that concerned if a drive fails.  6 drives in RAID-Z2 would net 8TB capacity all the way up to 24TB with 6TB drives.  For larger setups use multiple vdevs.  E.g. with 60 bays use 10 six drive RAID-Z2 vdevs (each vdev will increase IOPS).  For smaller setups I run 3 or 4 drives in RAID-Z (RAID-5).  In all cases it’s essential to have backups… and I’d rather have two smaller servers with RAID-Z mirroring to each other than one server with RAID-Z2.  The nice thing about smaller setups is the cost of upgrading 4 drives isn’t as bad as 6 or 8!

Enabling CCTL/TLER

Time-Limited Error Recovery (TLER) or Command Completion Time Limit (CCTL).

On desktop class drives such as the HGST Deskstar, they’re typically not run in RAID mode so by default they are configured to take as long as needed (sometimes several minutes) to try to recover a bad sector of data.  This is what you’d want on a desktop, however performance grinds to a halt during this time which can cause your ZFS server to hang for several minutes waiting on a recovery.  If you already have ZFS redundancy it’s a pretty low risk to just tell the drive to give up after a few seconds, and let ZFS rebuild the data.

The basic rule of thumb.  If you’re running RAID-Z, you have two copies so I’d be a little cautious about enabling TLER.  If you’re running RAID-Z2 or RAID-Z3 you have three or four copies of data so in that case there’s very little risk in enabling it.

Viewing the TLER setting:

Enabling TLER

Disabling TLER

(TLER should always be disabled if you have no redundancy).

FreeNAS vs. OmniOS / Napp-It

freenas      OmniOS_logo_200px

2015-01-07: I’ve updated this post to to reflect changes in FreeNAS 9.3. 

I’ve been using OpenIndiana since late 2011, and switched to OmniOS in 2013.  Lately I started testing FreeNAS, what drove me to do this is I use  CrashPlan to backup my pool but recently Code 42 announced they’ll be discontinuing Solaris support for Crashplan so I needed to start looking for an alternative OS or an alternative backup solution.  I decided to look at FreeNAS because it has a CrashPlan plugin that runs in a jail using Linux emulation.  After testing it out for awhile I am likely going to stay on OmniOS since it suits my needs better and instead switch out CrashPlan for ZnapZend for my backup solution.  But after running FreeNAS for a few months here are my thoughts on both platforms and their strengths and weaknesses as a ZFS storage server.

Update: 2015-01-07: After a lot of testing ZnapZend ended up not working for me, this is not it’s fault, but because I have limited and bandwidth the snapshots don’t catch up and it gets further and further behind so for now I’m continuing with Crashplan on OmniOS.  I am also testing FreeNAS and may consider a switch at some point.

CIFS / SMB Performance for Windows Shares

FreeNAS has a newer implementation of SMB, supporting SMB3, I think OmniOS is at SMB1.  FreeNAS can actually function as an Active Directory Domain Controller.

OmniOS is slightly faster, writing a large file over my LAN gets around 115MBps vs 98MBps on FreeNAS.  I suspect this is because OmniOS runs NFS SMB at the kernel level and FreeNAS runs it in user space.  I tried changing the FreeNAS protocol to SMB2, and even SMB1 but couldn’t get past 99MBps.  This is on a Xeon E3-1240V3 so there’s plenty of CPU power, Samba on FreeNAS just can’t keep up.

CIFS / SMB Snapshot Integration with Previous Versions

Previous Versions Snapshot Integration with Windows is far superior in OmniOS.   I always use multiple snapshot jobs to do progressive thinning of snapshots.  So for example I’ll setup monthly snaps with a 6 month retention, weekly with two month retention, daily with two week, hourly with 1 week, and every 5 minutes for two days.   FreeNAS will let you setup the snap jobs this way, but in Windows Previous Versions it will only show the snapshots from one of the snap jobs under Previous Versions (so you may see your every 5 minute snaps but you can’t see the hourly or weekly snaps).  OmniOS handles this nicely.  As a bonus Napp-It has an option to automatically delete empty snapshots sooner than their retention expiration so I don’t see them in Previous Versions unless some data actually changed.

previous_versions

Enclosure Management

Both platforms struggle here, FreeNAS has a bit of an edge here… probably the best thing to do is write down the serial number of each drive with the slot number.  In FreeNAS drives are given device names like da0, da1, etc. but unfortunately the numbers don’t seem to correspond to anything and they can even change between reboots.  FreeNAS does have the ability to label drives so you could insert one drive at a time and label them with the slot they’re in.

OmniOS drives are given names like c3t5000C5005328D67Bd0 which isn’t entirely helpful.

For LSI controllers the sas2irc utility (which works on FreeBSD or Solaris) will map the drives to slots.

Fault Management

The ZFS fault management daemon will automatically replace a failed drive with a hot spare… but it hasn’t been ported to FreeBSD yet so FreeNAS really only has warm spare capability.  Update: FreeNAS added hot spare capability on Feb 27, 2015.   To me this is a minor concern… if you’re going to use RAID-Z with a hot spare why not just configure the pool with RAID-Z2 or RAID-Z3 to begin with?  However, I can see how the fault management daemon on OmniOS would reduce the amount of work if you had several hundred drives and failures were routine.

SWAP issue on FreeNAS

While I was testing I actually had a drive fail (this is why 3-year old Seagate drives are great to test with) and FreeNAS crashed!  The NFS pool dropped out from under VMware.  When I looked at the console I saw “swap_pager: I/O error – pagein failed”   I had run into FreeNAS Bug 208 which was closed a year ago but never resolved.  The default setting in FreeNAS is to create a 2GB swap partition on every drive which acts like striped swap space (I am not making this up, this is the default setting).  So if any one of the drives fails it can take FreeNAS down.  The argument from FreeNAS is that you shouldn’t be using swap–and perhaps that’s true but I had a FreeNAS box with 8GB memory and running only one jail with CrashPlan bring my entire system down because a single drive failed.  That’s not an acceptable default setting.  Fortunately there is a way to disable automatically creating swap partitions on FreeNAS, it’s best to disable the setting before initializing any disks.

In my three years of running an OpenSolaris / Illumos based OS I’ve never had a drive failure bring the system down

Running under VMware

FreeNAS is not supported running under a VM but OmniOS is.  In my testing both OmniOS and FreeNAS work well under VMware under the best practices of passing an LSI controller flashed into IT mode to the VM using VT-d.  I did find that OmniOS does a lot better virtualized on slower hardware than FreeNAS.  On an Avaton C2750 FreeNAS performed well on bare metal, but when I virtualized it using vmdks on drives instead of VT-d FreeNAS suffered in performance but OmniOS performed quite well under the same scenario.

Both platforms have VMXNET3 drivers, neither has a Paravirtual SCSI driver.

Encryption

Unfortunately Oracle did not release the source for Solaris 11, so there is no encryption support on OpenZFS directly.

FreeNAS can take advantage of FreeBSD’s GELI based encryption.  FreeBSD’s implementation can use the AES instruction set, last I tested Solaris 11 the AES instruction set was not used so FreeBSD/FreeNAS probably has the fastest encryption implementation for ZFS.

There isn’t a good encryption option on OmniOS.

ZFS High Availability

Neither systems supports ZFS high availability out of the box.  OmniOS can use a third party tool like RSF-1 (paid) to accomplish this.  The commercially supported TrueNAS uses RSF-1 so it should also work in FreeNAS.

ZFS Replication & Backups

FreeNAS has the ability to easily setup replication as often as every 5 minutes which is a great way to have a standby host to failover to.  Replication can be done over the network.  If you’re going to replicate over the internet I’d say you want a small data set or a very fast connection–I ran into issues a couple of times where the replication got interrupted and it needed to start all over from scratch.  On OmniOS Napp-It does not offer a free replication solution, but there is a paid replication feature, however there are also numerous free ZFS replication scripts that people have written such as ZnapZend.

I did get the CrashPlan plugin to work under FreeNAS, however I found that after a reboot the CrashPlan jail sometimes wouldn’t auto-mount my main pool so it ended up not being a reliable enough solution for me to be comfortable with.  I wish FreeNAS made it so that it wasn’t in a jail.

Memory Requirements

FreeNAS is a little more power hungry than OmniOS.  For my 8TB pool a bare minimum for FreeNAS is 8GB while OmniOS is quite happy with 4GB, although I run it with 6GB to give it a little more ARC.

Hardware Support

FreeNAS supports more hardware than OmniOS.  I generally virtualize my ZFS server so it doesn’t matter too much to me but if you’re running bare metal and on obscure or newer hardware there’s a much better chance that FreeNAS supports it.  Also in 9.3 you have the ability to configure IPMI from the web interface.

VAAI (VMware vSphere Storage API’s — Array Integration)

FreeNAS now has VAAI support for iSCSI.  OmniOS has no VAAI support.  As of FreeNAS 9.3 and Napp-It 0.9f4 both control panels have the ability to enable VMware snapshot integration / ESXi hot snaps.  The way this works is before every ZFS snapshot is taken FreeNAS has VMware snap all the VMs, then the ZFS snapshot is taken, then the VMware snapshots are released.  This is really nice and allows for proper consistent snapshots.

 

GUI

The FreeNAS GUI looks a little nicer and is probably a little easier for a beginner.  The background of the screen turns red whenever you’re about to do something dangerous.  I found you can setup just about everything from the GUI, where I had to drop into the command line more often with OmniOS.  The FreeNAS web interface seems to hang for a few seconds from time to time compared to Napp-It, but nothing major.  I believe FreeNAS will have an asynchronous GUI in version 10.

One frustration I have with FreeNAS is it doesn’t quite do things that are compatible with CLI.  For example, if you create a pool via CLI FreeNAS doesn’t see it, you actually have to import it using the GUI to use it there.  Napp-it is essentially an interface that runs CLI commands so you can seamlessly switch back and forth between managing things on CLI and Napp-It.  This is a difference in philosophy.   Napp-It is just a web interface meant to run on top of an OS, where FreeNAS is more than just a webapp on top of FreeBSD, FreeNAS is it’s own OS.

I think most people experienced with the zfs command line and Solaris are going to be a little more at home with Napp-It’s control panel, but it’s easy enough to figure out what FreeNAS is doing.  You just have to be careful what you do in the CLI.

On both platforms I found I had to switch into CLI from time to time to do things right (e.g. FreeNAS can’t set sync=always from the GUI, Napp-It can’t setup networking).

As far as managing a ZFS file system both have what I want.. email alerts when there’s a problem, scheduling for data scrubs, snapshots, etc.

FreeNAS has better security, it’s much easier to setup an SSL cert on the management interface, in fact you can create an internal CA to sign certificates from the GUI.  Security updates are easier to manage from the web interface in FreeNAS as well.

Community

FreeNAS and OmniOS both have great communities.  If you post anything at HardForum chances are you’ll get a response from Gea and he’s usually quite helpful.  Post anything on the FreeNAS forums and Cyberjock will tell you that you need more RAM and that you’ll lose all your data.  There is a lot of info on the FreeNAS forums and the FreeNAS Redmine project is open so you can see all the issues, it’s great way to see what bugs and feature requests are out there and when they were or will be fixed.  OmniOS has an active OmniOS Discuss mailman list and Gea, the author of Napp-It is active on various forums.  He has answered my questions on several occasions over at HardForum’s Data Storage subforum.  I’ve found the HardForum community a little more helpful…I’ve always gotten a response there while several questions I posted on the FreeNAS forums went unanswered.

Documentation

FreeNAS documentation is great, like FreeBSD’s.  Just about everything is in the FreeNAS Guide

OmniOS isn’t as organized.  I found some howtos here, not nearly as comprehensive as FreeNAS.  Most of what I find from OmniOS I find in forums or the Napp-It site.

Mirrored ZFS boot device / rpool

OmniOS can boot to a mirrored ZFS rpool.

FreeNAS does not have a way to mirror the ZFS boot device.  FreeBSD does have this capability but it turns out FreeNAS is built on NanoBSD.  The only way to get FreeNAS to have redundancy on the boot device that I know of is to set it up on a hardware RAID card.

FreeNAS 9.3 can now install to a mirrored ZFS rpool!

Features / Plugins / Extensions

Napp-It’s extensions include:

  • AMP (Apache, MySQL, PHP stack)
  • Baikal CalDAV / CardDAV Server
  • Logitech MediaServer
  • MediaTomb (DLNA / UPnP server)
  • Owncloud (Dropbox alternative)
  • PHPvirtualbox (VirtualBox interface)
  • Pydio Sharing
  • FTP Server
  • Serviio Mediaserver
  • Tine Groupware

FreeNAS plugins:

  • Bacula (Backup Server)
  • BTSync (Bittorrent Sync)
  • CouchPotato (NZB and Torrent downloader)
  • CrashPlan (Backup client/server)
  • Cruciblewds (Computer imaging / cloning)
  • Firefly (media server for Roku SoundBridge and Apple iTunes)
  • Headphones (automatic music downloader for SABnzbd)
  • HTPC-Manager
  • LazyLibrarian (follow authors and grab metadata for digital reading)
  • Maraschino (web interfrace for XBMC HTPC)
  • MineOS (Minecraft control panel)
  • Mylar (Comic book downloader)
  • OwnCloud (Dropbox alternative)
  • PlexMediaServer
  • s3cmd
  • SABnzbd (Binary newsreader)
  • SickBeard (PVR for newsgroup users)
  • SickRage (Video file manager for TV shows)
  • Subsonic (music streaming server)
  • Syncthing (Open source cluster synchronization)
  • Transmission (BitTorrent client)
  • XDM (eXtendable Download Manager)

All FreeNAS plugins run in a jail so you must mount the storage that service will need inside the jail… this can be kind of annoying but it does allow for some nice security–for example CrashPlan can mount the storage you want to backup as read-only.

Protocols and Services

Both systems offer a standard stack of AFP, SMB/CIFS, iSCSI, FTP, NFS, RSYNC, TFTP
FreeNAS also has WebDAV and few extra services like Dynamic DNS, LLDP, and UPS (the ability to connect to a UPS unit and shutdown automatically).

Performance Reporting and Monitoring

Napp-It does not have reports and graphs in the free version.  FreeNAS has reports and you can look back as far as you want to see historical performance metrics.

freenas_stats

As a Hypervisor

Both systems are very efficient running guests of the same OS.  OmniOS has Zones, FreeNAS can run FreeBSD Jails.  OmniOS also has KVM which can be used to run any OS.  I suspect that FreeNAS 10 will have Bhyve.  Also both can run VirtualBox.

Stability vs Latest

Both systems are stable, OmniOS/Napp-It seems to be the most robust of the two.  The OmniOS LTS updates are very minimal, mostly security updates and a few bug fixes.  Infrequent and minimal updates are what I like to see in a storage solution.

FreeNAS is pushing a little close to the cutting edge.  They have frequent updates pushed out–sometimes I think they are too frequent to have been thoroughly tested.  On the other hand if you come across an issue or feature request in FreeNAS and report it chances are they’ll get it  in the next release pretty quickly.

Because of this, OmniOS is behind FreeNAS on some things like NFS and SMB protocol versions, VAAI support for iSCSI, etc.

I think this is an important consideration.  With FreeNAS you’ll get newer features and later technologies while OmniOS LTS is generally the better platform for stability.  The commercial TrueNAS soltution is also going to robust.  For FreeNAS you could always pick a stable version and not update very often–I really wish FreeNAS had an LTS, or at least a slower moving stable branch that maybe only did quarterly updates except for security fixes.

ZFS Integration

OmniOS has a slight edge on ZFS integration.  As I mentioned earlier OmniOS has multi-tiered snapshot integration into the the Windows Previous versions feature where FreeNAS can only pick one snap-frequency to show up there.  Also, in OmniOS NFS and SMB shares are stored as properties on the datasets so you can export the pool, import it somewhere else and the shares stay with the pool so you don’t have to reconfigure them.

Commercial Support

OmniOS offers Commercial Support if you want it.

iX Systems offers supported TrueNAS appliances.

Performance

On an All-in-one setup, I setup VMware ESXi 6.0, a virtual storage network and tested FreeNAS and OmniOS using iSCSI and NFS.  On all tests MTU is set to 9000 on the storage network, and compression is set to LZ4.  iSCSI volumes are sparse ZVOLs.  I gave the ZFS server 2 cores and 8GB memory, and the guest VM 2 cores and 8GB memory.  The guest VM is Windows 10 running Crystal Benchmark.

Environment:

  • Supermicro X10SL7-F with LSI 2308 HBA flashed to IT firmware and passed to ZFS server via VT-d (I flashed the P19 firmware for OmniOS and then re-flashed to P16 for FreeNAS).
  • Intel Xeon E3-1240v3 3.40Ghz.
  • 16GB ECC Memory.
  • 6 x 2TB Seagate 7200 drives in RAID-Z2
  • 2 x 100GB DC S3700s striped for ZIL/SLOG.  Over-provisioned to 8GB.

Latest stable updates on both operating systems:

FreeNAS 9.3  update 201503200528
OmniOS r151012 omnios-10b9c79

On Crystal Benchmark I ran 5 each of the 4000MB, 1000MB, and 50MB size tests, the results are the average of the results.

On all tests every write was going to the ZIL / SLOG devices.  On NFS I left the default sync=standard (which results in every write being a sync with ESXi).  On iSCSI I set sync=always, ESXi doesn’t honor sync requests from the guest with iSCSI so it’s not safe to run with sync=standard.

Update: 7/30/2015: FreeNAS has pushed out some updates that appear to improve NFS performance.  See the newer results here: VMware vs bhyve Performance Comparison.  Original results below.

Sequential Read MBps

seqrd

Sequential Write MBps

seqwr2

Random Read 512K MBps

rndrd512

Random Write 512K MBps

rndwr512

Random Read 4K IOPS

randrd4kqd1

Random Write 4K IOPS

rndwr4kqd1

Random Read 4K QD=32 IOPS

rdndrd4kqd32

Random Write 4K QD=32 IOPS

rndwr4kqd32

Performance Thoughts

So it appears, from these unscientific benchmarks that OmniOS on NFS is  the fastest configuration, iSCSI performs pretty similarly on both FreeNAS and OmniOS depending on the test.  One other thing I should mention, which doesn’t show up in the tests is latency.  With NFS I saw latency on the ESXi storage as high as 14ms during the tests,  while latency never broke a millisecond with iSCSI.

One major drawback to my benchmarks is it’s only one guest hitting the storage. It would be interesting to repeat the test with several VMs accessing the storage simultaneously, I expect the results may be different under heavy concurrent load.

I chose 64k iSCSI block size because the larger blocks result in a higher LZ4 compression ratio, I did several quick benchmarks and found 16K and 64K performed pretty similarly, 16K did perform better at random 4K write QD=1, but otherwise 64K was close to a 16K block size depending on the test.   I saw significant drop in random performance at 128k.  Once again under different scenarios this may not be the optimal block-size for all types of workloads.

NAS Server Build – ASRock C2750D4I & Silverstone DS380

The other day I got a little frustrated with my Gen 8 Microserver, I was trying to upgrade ESXi to 5.5 but the virtual media feature kept disconnecting in the middle of the install due to not having an ILO4 license–I actually bought an ILO4 enterprise license but I have no idea where I put it!  What’s the point of IPMI when you get stopped by licensing?  I hate having to physically plug in a USB key to upgrade VMware so much that I decided I’d just build a new server–which I honestly think is faster than messing around with getting an ISO image on a USB stick.

Warning: I’m sorry to say that I cannot recommend this motherboard that I reviewed earlier:  I ended up having to RMA this board twice to get one that didn’t crash.  The Marvell SATA Controller was never stable long term under load even after multiple RMAs so I ran it without using those ports which sort of defeated the reason I got the board in the first place.  Then in 2017 the board died shy of 3 years old, the shortest I have ever had a motherboard last me.  Generally I have been pretty happy with ASRock desktop boards but this server board isn’t stable enough for business or home use.  I have switched to Supermicro X10SDV Motherboards for my home server builds.

Build List

ASRock C2750D4I Motherboard / CPU

C2750D4I-1(L)

Update: 2014-05-11.  Here’s a great video review on the motherboard…

12 SATA ports!  This motherboard is perfect for ZFS which loves having direct access to JBOD disks.  The Marvell SATA controllers did not show up in VMware initially,  however Andreas Peetz provides a package for adding unsupported drivers in VMware, and this worked perfectly.  It took me a couple minutes to realize all you need to do is run these three commands:

Update November 16, 2014 .. it turned out the below issue was caused by a faulty Marvell controller on the motherboard, I ran FreeBSD (a supported OS) and the fault also occurred there so I RMAed the motherboard … I ended up getting a bad motherboard again but after a second RMA everything is stable in VMware… so you can disregard the below warning.

Update March 12, 2015.  My board continues to function okay, but some people are having issues with the drives working under VMware ESXi.  Read the comments for details.

Update August 23, 2014 ** WARNING Read this before you run the command below **  I had stability issues using the below hack to get the Marvell controllers to show up.  VMware started hanging as often as several times a day requiring a system reboot.  This is the entry in the motherboard’s event log: Critical Interrupt – I/O Channel Check NMI Asserted.  I swapped the Kingston memory out for Crucial on ASRock’s HCL list but the issue still persisted so I can’t recommend this drive for VMware.  After heavy I/O tests ZFS also detected data corruption on two drives connected to the Marvell controllers.  I am pretty sure this is because VMware does not officially support these drives so this issue likely doesn’t exist for operating systems that officially support the Marvell controller.

asrock_kvm_over_ipIPMI (allows for KVM over IP).  After being spoiled by this on a Supermicro board IPMI with KVM over IP is a must have feature for me, I’ll never plug a keyboard and monitor into a server again.

Avoton Octa-Core processor.  Normally I don’t even look at Atom processors, but this is not your grandfather’s Atom.  The Avoton processor supports VT-x, ECC memory,  AES instructions, and is a lot more powerful and at only 20 W TDP.  This CPU Boss benchmark says it will probably perform similarly to the Xeon E3-1220L.  The Avoton can also go up to 64GB memory where the E3 series is limited to 32GB making it a good option for VMware or for a high performance ZFS NAS.  The Avoton does not support VT-d so there is no passing devices directly to VMs.

My only two disappointments are no internal USB header on the board (I always install VMware on a USB stick so right now there’s a USB stick hanging on the back) and I wish they had used SFF-8087 mini-SAS connectors instead of individual SATA ports on the board to cut down on the number of SATA cables.

Overall I am very impressed with this board and it’s server-grade features like IPMI.

Instead of going into more detail here, I’ll just reference Patrick’s review of the ASRock C2750D4I

Alternative Avoton Boards

There are a few other options worth looking at.  The ASRock C2550D4I is the same board but Quad core instead of Octa Core.  I actually almost bought this one except I got the 2750 at a good price on SuperBiiz.

Also the SuperMicro A1SAi-2750F (Octa core) and A1SAi-2550F (Quad core) are good options if you don’t need as many SATA ports or you’re going to use a PCI-E SATA/SAS controller.  Supermicro’s motherboards have the advantage of Quad GbE ports, an internal USB header (not to mention USB 3.0), while sacrificing the number of SATA ports–only 2 SATA3 ports and 4 SATA2 ports.  These Supermicro boards use the smaller SO-DIMM memory.

 Silverstone DS-380: 8 hot-swap bay chassis

ds-380-and-asrock

 

The DS-380 has 8 hot-swap bays, plus room for four fixed 2.5″ drives for up to 12 drives.  As I started building this server I found the design was very well thought out.  Power button lockout (a necessity if you have kids), locking door, dust screens on fan intakes, etc.  The case is practical in that the designers cut costs where they could (like not painting the inside) but didn’t sacrifice anything of importance.

gen8_microserver_and_ds380_silverstone
HP Gen 8 Microserver (Left) next to Silverstone DS-380 (right)

A little larger than the HP Gen8 Microserver, but it can hold more than twice as many drives.  Also the Gen8 Microserver is a bit noisier.

ds-380-open

ds-380-tray-with-ssdYou’ll notice above from the top there is a set of two drives, then one drive by itself, and a set of five drives.  This struck me as odd at first, but this is actually that way by design.  If you have a tall PCI card plugged into your motherboard  (such as a video card) you can forfeit the 3rd drive from the top to make room for it.

The drive trays are plastic, obviously not as nice as a metal tray but not too bad either.  One nice feature is screw holes on the bottom allow for mounting a 2.5″ drive such as an SSD!  That’s well thought out!  Also there’s a clear plastic piece that runs alongside the left of each tray that carries the hard drive activity LED light to the front of the case (see video below).

Here’s the official Silverstone DS-380 site, and here’s a very detailed review of the DS-380 with lots of pictures by Lawrence Lee.

Storage

Using 4TB drives 8 bays would get you to 24TB using RAID-Z2 or RAID-6.  Plus have 4 2.5″ fixed bays left for SSDs.

Virtual NAS

I run a virtualized ZFS server on OmniOS following Gea’s Napp-in-one guide.  I deviate from his design slightly because I run on top of VMDKs instead of Passing the controllers to the guest VM (because I don’t have VT-d on the Avoton).

ZIL – Seagate SSD Pro

120GB Seagate Pro SSD.  The ZIL (ZFS Intent Log) is the real trick to high performance random writes, by being able to cache writes on capacitor backed cache the SSD can guarantee a write to the requesting application before it is transferred out of RAM and onto spindles.

So far…

I’m pretty happy with the custom build.  I think the Gen 8 HP Microserver looks more professional compared to the DS-380 which looks more like a DIY server.  But what matters is on the inside, and having access to IPMI when I need it without having to worry about licensing is worth something in my book.

Restore Zpool with CrashPlan

I messed up my zpool with a stuck log device, and rather than try to fix it I decided to wipe out the zpool and restore the zfs datasets using CrashPlan.

CrashPlan Restore

It took CrashPlan roughly a week to restore all my data (~1TB) which is making me consider local backups.  All permissions intact.  Dirty VM backups seemed to boot just fine (I also have daily clean backups using GhettoVCB just in case but I didn’t need them).  One nice thing is I could prioritize certain files or folders on the restore by creating multiple restore jobs.  I had a couple of high priority VMs and then wanted kid movies to come in next and then the rest of the data and VMs.

Installing CrashPlan on OmniOS

OmniOSCrashPlanI’m in the progress of switching my ZFS Server from OpenIndiana to OmniOS, mainly because OmniOS is designed only to be a server system so it’s a little cleaner, and they have a stable production release that’s commercially supported, and it has become Gea’s OS of choice for Napp-It.  One of the last things I had to do was get CrashPlan up and running, here’s a quick little howto…

Unfortunately, I got the error below:

So I went to look at the checkinstall script…

I’m not entirely sure how I fixed it, I modified the checkinstall script to look for java in /usr/java/bin, but then ran pkgadd and the CrashPlan installer refused to run because it detected the file had been modified, so I undid my change and re-ran pkgadd and it worked…

Or if you have a secure network you can set serviceHost in /opt/sfw/crashplan/conf from 127.0.0.1 to 0.0.0.0 and then on your client change the serviceHost in C:\Program Files\CrashPlan\conf\ui.properties to your OmniOS IP Address.
I also like to move my CrashPlan install and config onto the main pool where all my storage is… I had already created a dataset called /tank/crashplan:

On a side note my ZFS server CrashPlan backup passed the 1TB mark today!
Here’s a video…