Best Hard Drives for ZFS Server (Updated 2017)

Today’s question comes from Jeff….

Q. What drives should I buy for my ZFS server? 

Answer: Here’s what I recommend, considering a balance of cost per TB, performance, and reliability.  I prefer NAS class drives since they are designed to run 24/7 and also are better at tolerating vibration from other drives.  I prefer SATA but SAS drives would be better in some designs (especially when using expanders).  For a home or small business FreeNAS storage server with 8 or fewer drives I think these are the best options.

Updated: July 19, 2015 – Added quieter HGST, and updated prices.
Updated: July 30, 2016 – Updated prices, and added WL drives
Updated July 15, 2017 – Updated prices, added larger drives, removed drives no longer being sold.

2TB Hitachi Drives – $24/TB – Budget

They won’t carry the HGST 5-year warranty but you can usually get a 1-year warranty from the seller.  HGST drives are reliable so the lower cost probably justifies the lack of a warranty.  2TB HGST drives also boast a MTBF of 2 million hours!

Hitachi 2TB 7200RPM 64MB Cache.  Certified Refurbished.  Desktop/NAS grade.  1-year Warranty.  This drive is nearly silent.  It isn’t an enterprise class drive, however the internals are nearly identical (and might be identical).  TLER is disabled by default but, unlike most desktop drives it can be enabled manually (see notes on enabling TLER belolw).

4TB and 6TB White Label Drives $23/TB to $27/TB – Budget

White Label DriveA great way to save money is to get White Label Drives.  These are NAS class drives, made by the same manufacturers with branding removed.  Seller usually provides 1-year warranty. 4TB WhiteLabel Drive or 6TB WhiteLabel Drive .  These are what I buy for my home and I’ve yet to have one fail.  These are most likely re-branded Western Digital Reds.   I hate dealing with paperwork and warranty returns–I’d much rather pay a little less and buy an extra drive to have sitting on the shelf in case one fails than pay more for a warranty and deal with the paperwork and hassle of exchanging it.

3TB, 4TB, 5TB, 6TB, 8TB, and 10TB Drives $37/TB to $40/TB

I’d purchase either HGST Deskstar NAS or the WD RED NAS model.  Both brands NAS class hard drives are designed for 24-7 operation and for use in systems with up to 8-bays.  The difference is HGST is 7200RPM and WD REDs are ~5400RPM so it’s a performance vs cost/energy/heat trade-off.

hgst_deskstar_nas

HGST Deskstar NAS 64-128MB Cache 7200RPM SATA III 3-year Warranty. The main advantage of this drive is it’s faster at 7200RPM and as a result it significantly outperforms the WD Red.  See StorageReview’s benchmarks on the 4TB Deskstar.  Also at 5TB and 6TB the cache doubles to 128MB.  In general if the price is the same or pretty close I’d prefer the HGST drive.

 

WD RED NAS 64MB-128MB ~5400RPM SATA III 3-year Warranty
wd_redThis drive is available from 1TB to 10TB this WD drive runs a little cheaper than HGST and is a fantastic drive., if the price is less than the HGST by more than $5/TB I would consider this drive to save a little money.

 

Or buy a TrueNAS Storage Server from iXsystems

I’m cheap and tend to go with a DIY approach most of the time, but when I’m recommending ZFS systems in environments where availability is important I like the TrueNAS servers from iX Systems which will of course come with drives in configurations that have been well tested.  The prices on a TrueNAS are very reasonable compared to other storage systems and it can be setup in an HA cluster.  Even a FreeNAS Certified Server is probably not going to cost much more than doing it yourself (more often than not it ends up being less expensive than DIY).  And of course for a small server you can grab the 4-bay FreeNAS Mini (which ships with WD REDs).

Careful with “archival” drives (usually 8TB+)

If you don’t get one of the drives above, some 8TB and larger hard drives are using SMR (Shingled Magnetic Recording) which should not be used with ZFS if you care about performance until drivers are developed.  Be careful about any drive that says it’s for archiving purposes.

The ZIL / SLOG and L2ARC

The ZFS Intent Log (ZIL) should be on a SSD with battery backed capacitor that can flush out the cache in case of a drive failure.  I have done quite a bit of testing and like Intel’s DC S35xx, S36xx, or S37xx series drives and also HGST’s S840Z.  These are rated to have their data overwritten many times and will not lose data on power loss.  These run on the expensive side, so for a home setup I typically try to find them used on eBay.  From a ZIL perspective there’s not a reason to get a large drive–but keep in mind  you get better performance with larger drives.  In my home I use 100GB  DC S3700s and they do just fine.

I generally don’t use an L2ARC (SSH read cache) and instead opt to add more memory.  There are a few cases where an L2ARC makes sense when you have very large working sets.

For SLOG and L2ARC see my comparison of SSDs.

Capcity Planning for Failure

Most drives running 24/7 start having a high failure rate after 3-years, you might be able to squeeze 4 or 5 years out of them if you’re lucky.  So a good rule of thumb is to estimate your growth and buy drives big enough that you will start to outgrow them in 4 to 5 years.  The price of hard drives is always dropping so you don’t really want to buy more much than you’ll need before they start failing.  Consider that in ZFS you shouldn’t run more than 70% full (with 80% being max) for your typical NAS applications including VMs on NFS.  But if you’re planning to use iSCSI you shouldn’t run more than 50% full.

ZFS Drive Configurations

My preference is almost always RAID-Z2 (RAID-6) with 6 to 8 drives which provides a storage efficiency of .66 to .75.  This scales pretty well as far as capacity is concerned and with double-parity I’m not that concerned if a drive fails.  6 drives in RAID-Z2 would net 8TB capacity all the way up to 24TB with 6TB drives.  For larger setups use multiple vdevs.  E.g. with 60 bays use 10 six drive RAID-Z2 vdevs (each vdev will increase IOPS).  For smaller setups I run 3 or 4 drives in RAID-Z (RAID-5).  In all cases it’s essential to have backups… and I’d rather have two smaller servers with RAID-Z mirroring to each other than one server with RAID-Z2.  The nice thing about smaller setups is the cost of upgrading 4 drives isn’t as bad as 6 or 8!

Enabling CCTL/TLER

Time-Limited Error Recovery (TLER) or Command Completion Time Limit (CCTL).

On desktop class drives such as the HGST Deskstar, they’re typically not run in RAID mode so by default they are configured to take as long as needed (sometimes several minutes) to try to recover a bad sector of data.  This is what you’d want on a desktop, however performance grinds to a halt during this time which can cause your ZFS server to hang for several minutes waiting on a recovery.  If you already have ZFS redundancy it’s a pretty low risk to just tell the drive to give up after a few seconds, and let ZFS rebuild the data.

The basic rule of thumb.  If you’re running RAID-Z, you have two copies so I’d be a little cautious about enabling TLER.  If you’re running RAID-Z2 or RAID-Z3 you have three or four copies of data so in that case there’s very little risk in enabling it.

Viewing the TLER setting:

Enabling TLER

Disabling TLER

(TLER should always be disabled if you have no redundancy).

8 thoughts on “Best Hard Drives for ZFS Server (Updated 2017)”

  1. Is there a benefit in performance to sticking with 512n (native not emulated sector size) disks.

    Yes I know you can do 4K alignment with “ashift” but. I believe you will incur lest overhead for small update but have never benchmarked this assumption.

    In the past I always try use 512n when possible recently becuase of this the 4TB 512N ST4000NM0033 (128MB cache -SATA) as my goto large drive about $56 to $75 per TB

    1. Hi, Jon. I haven’t tested native vs emulated sector disks. I am curious so if you come across any benchmarks let me know.

      I have a 6-disk RAID-Z2 of 2TB Seagate Barracudas on my main pool and I haven’t been very happy with them. Now these are not enterprise grade because I didn’t know any better when I first built my pool. Last year I had two fail out of warranty. A couple a weeks ago another one failed, this time I got smart and replaced it with an HGST. Of the three I haven’t replaced 2 are reporting SMART issues and the other one is corrupting data (ZFS keeps reporting read errors on and on every scrub I see it being resilvered). That’s a 100% failure rate within 4 years. So I went ahead and ordered 5 more HGST 7K4000s to replace them all and hopefully be done with it.

      The 4TB Seagates are a lot more reliable than 2TB. I feel like if Seagate wants more of my business they should be giving me a partial credit on those 6 drives since they didn’t last very long. Last week I sat down to write a letter to Seagate to see if they’d be interested in earning back my business by giving me a partial credit or a couple of free drives but they don’t have a mailing address listed on their website.

      1. Ben,

        I always try to buy with a five (5) year warranty and have a spare drive lying around (even if I’m burning the warranty). For that matter I always buy computers and servers in threes just so I have some “swap” the part capability. Sorry about your experience with Seagate.

        I too was less than enthused about ZFS RAID (I stopped using them years ago and went to mirrors) at first they seemed to perform like “magic” but then after they got to 70% full or above they really started to crawl (due to inherent COW fragmentation). The lousy thing is you can really get rid of it is as there is no (and probably never will be) block pointer rewrite (BPR). Once I had the pain of dealing with a Sun X4500 “thumper” with two 10 disk ZFS RAIDs which became fragmented and useless – I had to move the data off chassis rebuild smaller pools and then move the data back (and that took a long long time). That was the final straw which caused me to switched to mirrored pairs and I keep them relatively empty if I want to refresh performance I copy the data to a freshly minted mirrored pair. In general I’m pretty happy with this arraignment.

        As to benchmarks I did a little more digging (no benchmarks on the 512 v. 4K sector by me):

        This is not my benchmark but viable for your use case of RAID-Z2 seems like you gain some space with ashift=9 BUT loose 1/3 of your write performance at least in this test (I always use mirrors) http://louwrentius.com/zfs-performance-and-capacity-impact-of-ashift9-on-4k-sector-drives.html

        This also is not my benchmark – albeit for windows – the big issue here is miss alignment – the unaligned setup results in a performance impact of up to 50%, depending on the workload. However workloads that do not involve write operations, such as the Web server test pattern, don’t show any disadvantage at all in I/O testing.

        In PCMark Vantage application test shows decreased performance for the new drive with its 4 KB sector size in popular Windows scenarios. … However, we cannot control the way data writes are actually executed and organized. In the case of PCMark Vantage, the benchmark was never tweaked to minimize the number of smallest-size write requests in favor of larger chunks of data (this is something ZFS does). http://www.tomshardware.com/reviews/advanced-format-4k-sector-size-hard-drive,2759.html

        FYI in ZFS RAID scenarios (which I stopped using years ago) If a 4k bytes is the minimum block of data that can be written or read, data blocks smaller than 4k will also be padded to form the 4k. This article says that this can hurt the most, when parity blocks have to be created for small chunks of data. http://www.docs.cloudbyte.com/knowledge-base-articles/implications-of-using-4k-sector-size-zfs-pools/
        a) So as a thumb-rule the ZFS record size should be multiple of number of data disks in the RAIDZ x Sector Size.
        b) For example for 4+1, the record size should be 16K (4 x 4096) and for 2+1 it should be 8K (2 x 4096).
        c) CloudByte recommends 32K as the sector size for least space overhead.

        BTW a nice article/tutorial about forcing the ashift (9 or 12) value http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/

        Summary: Seems like maybe I should bite the bullet and switch to 4K drives and always make sure I use ashift=12 in my new rigs – as performance seems to be similar if the drives are aligned.

        1. Well, you’re wise to get a 5-year warranty. Mine were bought shortly after the Thailand flooding so that may have played a role in the shorter lifespan.

          You’ve done quite a bit of research there. For heavy I/O I agree that mirrors are the way to go, mostly for the extra IOPS. At home I use 6 x RAID-Z2 any my performance is fine for my needs… I have a few VMware VMs on NFS but the ARC and SLOG is more than enough to run it on RAID-Z2–but the bulk of my data is movies and pictures.

          I think those articles on the record size might be a little dated. Now days everyone runs with compression=on (which for FreeNAS and OmniOS is LZ4) which changes the theory on record sizes and keeping the data disks in powers of 2 for RAID-Z/Z2/Z3. Here’s some newer articles you may find interesting:

          Matt makes the point that you don’t need to worry about the record size being a multiple of data disks: See: http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/ With LZ4 compression (which at worst doesn’t hurt and at best saves space and increases performance) it doesn’t really matter that much since most of the time it will compress into a smaller record anyway.

          Max Bruning has an interesting article about how parity works with RAID-Z. https://www.joyent.com/blog/zfs-raidz-striping He states that in the case of a small write, RAID-Z will only put the data on enough disks to get the required redundancy. So if you have 4+1 RAID-Z and write a block less than 4K (assuming ashift-12) RAID-Z will place the data on only 2 disks (effectively mirroring the data) instead of wasting space on all 5 drives.

          So correct me if I’m interpreting this wrong, but I think the best record size from a storage efficiency standpoint is the largest possible (1M). Smaller writes are going to use the smallest record-size they can anyway. Using a large block-size record-size will help throughput but could come at the cost of IOPS, so there may be performance reasons to force a smaller record-size.

      2. I talked to somebody from iXsystems … and apparently they are using 4K Native sector drives, specifically HGST they told me they provide the best performance.

        Also … I had some major issues with recent WD RED drives … they would timeout and ABRT block read commands randomly on high usage, so i switched to HGST NAS for now. But HGST also has the HE series drives with 4KN and the non HE 4KN drives also available.

  2. Could you provide some examples of ‘archival’ drives? Would you consider WD Red as an archival drive?

    1. The Seagate ST8000AS0002 is a good example of what you want to stay away from. SMR does not play well with CoW filesystems like ZFS.

      WD Reds are not archive drives and will work great with ZFS.

Leave a Reply