Best Hard Drives for ZFS Server (Updated Apr 2019)

Today’s question comes from Jeff….

Q. What drives should I buy for my ZFS server? 

Answer: Here’s what I recommend, considering a balance of cost per TB, performance, and reliability.  I prefer NAS class drives since they are designed to run 24/7 and also are better at tolerating vibration from other drives.  I prefer SATA but SAS drives would be better in some designs (especially when using expanders).

For a home or small business FreeNAS storage server I think these are the best options, and I’ve also included some enterprise class drives.

Updated: July 19, 2015 – Added quieter HGST, and updated prices.
Updated: July 30, 2016 – Updated prices, and added WL drives
Updated July 15, 2017 – Updated prices, added larger drives, removed drives no longer being sold.
Updated September 17, 2018 – Added WD Gold drives.
Updated April 27, 2019 — Removed WL and HGST drives, added Seagate, updated all product lines.

Western Digital 3TB, 4TB, 5TB, 6TB, 8TB, 10TB, 12TB, and 14TB Drives

The highest rated and consistently available NAS class drives on the market today are made by Western Digital.  The 3 product lines are:

WD Red are tried and true NAS class drives designed to run 24/7.  Very stable and popular in FreeNAS systems.

    • 5400RPM
    • Supported in up to 8 drive bays.
    • Workload: 180TB/year
    • 3-year warranty

WD Red Pro designed for larger deployments suitable for small/medium businesses.

    • 7200RPM
    • Supported in up to 24 drive bays
    • Workload: 300TB/year
    • 5-year warranty

WD  HGST Ultrastar DC Datacenter class hard drives designed for heavy workloads (this lineup Replaces WD Golds).

    • 7200RPM
    • Supported in unlimited drive bays
    • Workload: 550TB/year
    • 5-year warranty

Seagate IronWolf – up to 14TB drives

Seagate had a bad reputation because of high failure rates in the past, but the newer offerings are more reliable and given the competitive prices they’re worth another look.  I would consider them again if building a new server.  Seagate has 3 product lines suitable for ZFS, all running at 7200RPM:

Seagate IronWolf (up to 14TB) are NAS class drives targeted at smaller deployments.

    • 5900-7200RPM
    • Supported in up to 8 drive bays
    • Workload: 180TB/year
    • 3-year warranty

Seagate IronWolf Pro are the next step up…

    • 7200RPM
    • Supported in configurations up to 16-24 bays
    • Workload: 300TB/year
    • 5-year warranty

Seagate Exos is the enterprise offering designed for enterprise workloads.

    • 7200RPM
    • Supports unlimited bays
    • Workload: 550TB/year
    • 5-year warranty

Buying Tips:

  • If you read reviews about failures, I discount negative reviews with DOAs or drives that fail within the first few days.  You’re going to be able to return those rather quickly.  What you want to avoid is a drive that fails a year or two in and have the hassle of dealing with a warranty claim.
  • Higher RPMs and larger disks are typically going to have faster seek times.
  • Gone are the days when you need a 24-bay server for large amounts of storage.  It’s far simpler to get a 4-bay chassis with 14TB drives.  If you don’t need more capacity or IOPS keep it simple.

Or buy a TrueNAS Storage Server from iXsystems

I’m cheap and tend to go with a DIY approach most of the time, but when I’m recommending ZFS systems in environments where availability is important I like the TrueNAS servers from iX Systems which will of course come with drives in configurations that have been well tested.  The prices on a TrueNAS are very reasonable compared to other storage systems and it can be setup in an HA cluster.  Even a FreeNAS Certified Server is probably not going to cost much more than doing it yourself (more often than not it ends up being less expensive than DIY).  And of course for a small server you can grab the 4-bay FreeNAS Mini (which ships with WD REDs).

Careful with “archival” drives

If you don’t get one of the drives above, some larger hard drives are using SMR (Shingled Magnetic Recording) which should not be used with ZFS if you care about performance until drivers are developed.  Be careful about any drive that says it’s for archiving purposes.

The ZIL / SLOG and L2ARC

The ZFS Intent Log (ZIL) should be on a SSD with battery backed capacitor that can flush out the cache in case of a drive failure.  I have done quite a bit of testing and like the Intel DC SSD series drives and also HGST’s S840Z.  These are rated to have their data overwritten many times and will not lose data on power loss.  These run on the expensive side, so for a home setup I typically try to find them used on eBay.  From a ZIL perspective there’s not a reason to get a large drive–but keep in mind  you get better performance with larger drives.  In my home I use 100GB DC S3700s and they do just fine.

I generally don’t use an L2ARC (SSH read cache) and instead opt to add more memory.  There are a few cases where an L2ARC makes sense when you have very large working sets.

For SLOG and L2ARC see my comparison of SSDs.

Capacity Planning for Failure

Most drives running 24/7 start having a high failure rate after 5-years, you might be able to squeeze 6 or 7 years out of them if you’re lucky.  So a good rule of thumb is to estimate your growth and buy drives big enough that you will start to outgrow them in 5+ years.  The price of hard drives is always dropping so you don’t really want to buy more much than you’ll need before they start failing.  Consider that in ZFS you shouldn’t run more than 70% full (with 80% being max) for your typical NAS applications including VMs on NFS.  But if you’re planning to use iSCSI you shouldn’t run more than 50% full.

ZFS Drive Configurations

My preference at home is almost always RAID-Z2 (RAID-6) with 6 to 8 drives which provides a storage efficiency of .66 to .75.  This scales pretty well as far as capacity is concerned and with double-parity I’m not that concerned if a drive fails.  6 drives in RAID-Z2 would net 8TB capacity all the way up to 24TB with 6TB drives.  For larger setups use multiple vdevs.  E.g. with 60 bays use 10 six drive RAID-Z2 vdevs (each vdev will increase IOPS).  For smaller setups I run 3 or 4 drives in RAID-Z (RAID-5).  In all cases it’s essential to have backups… and I’d rather have two smaller servers with RAID-Z mirroring to each other than one server with RAID-Z2.  The nice thing about smaller setups is the cost of upgrading 4 drives isn’t as bad as 6 or 8!  For enterprise setups I like ZFS mirrored pairs (RAID-10) for fast rebuild times and performance at storage efficiency of 0.50.

Enabling CCTL/TLER on Desktop Drives

Time-Limited Error Recovery (TLER) or Command Completion Time Limit (CCTL).

If you must run desktop drives… On desktop class drives such as the HGST Deskstar, they’re typically not run in RAID mode so by default they are configured to take as long as needed (sometimes several minutes) to try to recover a bad sector of data.  This is what you’d want on a desktop, however performance grinds to a halt during this time which can cause your ZFS server to hang for several minutes waiting on a recovery.  If you already have ZFS redundancy it’s a pretty low risk to just tell the drive to give up after a few seconds, and let ZFS rebuild the data.

The basic rule of thumb.  If you’re running RAID-Z, you have two copies so I’d be a little cautious about enabling TLER.  If you’re running RAID-Z2 or RAID-Z3 you have three or four copies of data so in that case there’s very little risk in enabling it.

Viewing the TLER setting:

Enabling TLER

Disabling TLER

(TLER should always be disabled if you have no redundancy).

9 thoughts on “Best Hard Drives for ZFS Server (Updated Apr 2019)”

  1. Is there a benefit in performance to sticking with 512n (native not emulated sector size) disks.

    Yes I know you can do 4K alignment with “ashift” but. I believe you will incur lest overhead for small update but have never benchmarked this assumption.

    In the past I always try use 512n when possible recently becuase of this the 4TB 512N ST4000NM0033 (128MB cache -SATA) as my goto large drive about $56 to $75 per TB

    1. Hi, Jon. I haven’t tested native vs emulated sector disks. I am curious so if you come across any benchmarks let me know.

      I have a 6-disk RAID-Z2 of 2TB Seagate Barracudas on my main pool and I haven’t been very happy with them. Now these are not enterprise grade because I didn’t know any better when I first built my pool. Last year I had two fail out of warranty. A couple a weeks ago another one failed, this time I got smart and replaced it with an HGST. Of the three I haven’t replaced 2 are reporting SMART issues and the other one is corrupting data (ZFS keeps reporting read errors on and on every scrub I see it being resilvered). That’s a 100% failure rate within 4 years. So I went ahead and ordered 5 more HGST 7K4000s to replace them all and hopefully be done with it.

      The 4TB Seagates are a lot more reliable than 2TB. I feel like if Seagate wants more of my business they should be giving me a partial credit on those 6 drives since they didn’t last very long. Last week I sat down to write a letter to Seagate to see if they’d be interested in earning back my business by giving me a partial credit or a couple of free drives but they don’t have a mailing address listed on their website.

      1. Ben,

        I always try to buy with a five (5) year warranty and have a spare drive lying around (even if I’m burning the warranty). For that matter I always buy computers and servers in threes just so I have some “swap” the part capability. Sorry about your experience with Seagate.

        I too was less than enthused about ZFS RAID (I stopped using them years ago and went to mirrors) at first they seemed to perform like “magic” but then after they got to 70% full or above they really started to crawl (due to inherent COW fragmentation). The lousy thing is you can really get rid of it is as there is no (and probably never will be) block pointer rewrite (BPR). Once I had the pain of dealing with a Sun X4500 “thumper” with two 10 disk ZFS RAIDs which became fragmented and useless – I had to move the data off chassis rebuild smaller pools and then move the data back (and that took a long long time). That was the final straw which caused me to switched to mirrored pairs and I keep them relatively empty if I want to refresh performance I copy the data to a freshly minted mirrored pair. In general I’m pretty happy with this arraignment.

        As to benchmarks I did a little more digging (no benchmarks on the 512 v. 4K sector by me):

        This is not my benchmark but viable for your use case of RAID-Z2 seems like you gain some space with ashift=9 BUT loose 1/3 of your write performance at least in this test (I always use mirrors) http://louwrentius.com/zfs-performance-and-capacity-impact-of-ashift9-on-4k-sector-drives.html

        This also is not my benchmark – albeit for windows – the big issue here is miss alignment – the unaligned setup results in a performance impact of up to 50%, depending on the workload. However workloads that do not involve write operations, such as the Web server test pattern, don’t show any disadvantage at all in I/O testing.

        In PCMark Vantage application test shows decreased performance for the new drive with its 4 KB sector size in popular Windows scenarios. … However, we cannot control the way data writes are actually executed and organized. In the case of PCMark Vantage, the benchmark was never tweaked to minimize the number of smallest-size write requests in favor of larger chunks of data (this is something ZFS does). http://www.tomshardware.com/reviews/advanced-format-4k-sector-size-hard-drive,2759.html

        FYI in ZFS RAID scenarios (which I stopped using years ago) If a 4k bytes is the minimum block of data that can be written or read, data blocks smaller than 4k will also be padded to form the 4k. This article says that this can hurt the most, when parity blocks have to be created for small chunks of data. http://www.docs.cloudbyte.com/knowledge-base-articles/implications-of-using-4k-sector-size-zfs-pools/
        a) So as a thumb-rule the ZFS record size should be multiple of number of data disks in the RAIDZ x Sector Size.
        b) For example for 4+1, the record size should be 16K (4 x 4096) and for 2+1 it should be 8K (2 x 4096).
        c) CloudByte recommends 32K as the sector size for least space overhead.

        BTW a nice article/tutorial about forcing the ashift (9 or 12) value http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/

        Summary: Seems like maybe I should bite the bullet and switch to 4K drives and always make sure I use ashift=12 in my new rigs – as performance seems to be similar if the drives are aligned.

        1. Well, you’re wise to get a 5-year warranty. Mine were bought shortly after the Thailand flooding so that may have played a role in the shorter lifespan.

          You’ve done quite a bit of research there. For heavy I/O I agree that mirrors are the way to go, mostly for the extra IOPS. At home I use 6 x RAID-Z2 any my performance is fine for my needs… I have a few VMware VMs on NFS but the ARC and SLOG is more than enough to run it on RAID-Z2–but the bulk of my data is movies and pictures.

          I think those articles on the record size might be a little dated. Now days everyone runs with compression=on (which for FreeNAS and OmniOS is LZ4) which changes the theory on record sizes and keeping the data disks in powers of 2 for RAID-Z/Z2/Z3. Here’s some newer articles you may find interesting:

          Matt makes the point that you don’t need to worry about the record size being a multiple of data disks: See: http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/ With LZ4 compression (which at worst doesn’t hurt and at best saves space and increases performance) it doesn’t really matter that much since most of the time it will compress into a smaller record anyway.

          Max Bruning has an interesting article about how parity works with RAID-Z. https://www.joyent.com/blog/zfs-raidz-striping He states that in the case of a small write, RAID-Z will only put the data on enough disks to get the required redundancy. So if you have 4+1 RAID-Z and write a block less than 4K (assuming ashift-12) RAID-Z will place the data on only 2 disks (effectively mirroring the data) instead of wasting space on all 5 drives.

          So correct me if I’m interpreting this wrong, but I think the best record size from a storage efficiency standpoint is the largest possible (1M). Smaller writes are going to use the smallest record-size they can anyway. Using a large block-size record-size will help throughput but could come at the cost of IOPS, so there may be performance reasons to force a smaller record-size.

      2. I talked to somebody from iXsystems … and apparently they are using 4K Native sector drives, specifically HGST they told me they provide the best performance.

        Also … I had some major issues with recent WD RED drives … they would timeout and ABRT block read commands randomly on high usage, so i switched to HGST NAS for now. But HGST also has the HE series drives with 4KN and the non HE 4KN drives also available.

  2. Could you provide some examples of ‘archival’ drives? Would you consider WD Red as an archival drive?

    1. The Seagate ST8000AS0002 is a good example of what you want to stay away from. SMR does not play well with CoW filesystems like ZFS.

      WD Reds are not archive drives and will work great with ZFS.

  3. Update for 2018? HE6 – HE12 HGST Helium drives, are revolutionary, come in both SAS/SATA interface, longer life, less power, less heat, better speed than most competing models, ~$28/tb. Backblaze is marking leading reliability ratings for these. For ZIL, the Intel Optane 900P is a new winner for budget users, it has no cache, so no need to worry about cache power Caps, this means writes are direct to media always. Like your posts, bless you and yours.

Leave a Reply