Best Hard Drives for ZFS Server (Updated 2016)

Today’s question comes from Jeff….

Q. What drives should I buy for my ZFS server? 

Answer: Here’s what I recommend, considering a balance of cost per TB, performance, and reliability.  I prefer NAS class drives since they are designed to run 24/7 and also are better at tolerating vibration from other drives.  I prefer SATA but SAS drives would be better in some designs (especially when using expanders).  For a home or small business storage server with 8 or fewer drives I think these are the best options.

Updated: July 19, 2015 – Added quieter HGST, and updated prices.
Updated: July 30, 2016 – Updated prices, and added WL drives

2TB HGST OEM Drives – $23/TB

They won’t carry the HGST 5-year warranty but you can usually get a 1-year warranty from the seller.  HGST drives are reliable so the lower cost probably justifies the lack of a warranty.  2TB HGST drives also boast a MTBF of 2 million hours!

HGST Deskstar 7K4000 2TB 64MB Cache 7200RPM SATA III (DS724020ALE640)  Desktop/NAS grade.  1-year Warranty.  $46 / $23/TB.  This Deskstar is nearly silent.  It isn’t an enterprise class drive, however the internals are nearly identical (and might be identical).  TLER is disabled by default but, unlike most desktop drives it can be enabled manually (see notes on enabling TLER belolw).

5TB, and 6TB White Label Drives $23/TB to $27/TB

A great way to save money is to get White Label Drives.  These are NAS class drives with branding removed.  Seller usually provides 1-year warranty.  A great deal is this 6TB WhiteLabel Drive for $160 or 5TB WhiteLabel Drive for $115. These are most likely re-branded Western Digital Reds.   I hate dealing with paperwork and warranty returns–I’d much rather just buy an extra drive to have sitting on the shelf in case one fails than pay more for a warranty.

3TB, 4TB, 5TB, 6TB, and 8TB Drives $37/TB to $50/TB

I’d purchase either HGST Deskstar NAS or the WD RED series.  Both are designed for 24-7 operation and for use in systems with up to 8-bays.

hgst_deskstar_nas

HGST Deskstar NAS 64MB Cache 7200RPM SATA III 3-year Warranty.  The main advantage of this drive is it’s faster at 7200RPM and as a result it significantly outperforms the WD Red.  See StorageReview’s benchmarks on the 4TB Deskstar.  Also at 5TB and 6TB the cache doubles to 128MB.  In general if the price is the same or pretty close I’d prefer the HGST drive.

 

WD RED NAS 64MB ~5400RPM SATA III 3-year Warrantywd_red.  The WD drive runs a little cheaper, if the price is less than the HGST by more than $5/TB I would consider this drive to save a little money.

Careful with 8TB+ drives

Some drives 8TB and larger are using SMR (Shingled Magnetic Recording) which should not be used with ZFS if you care about performance until drivers are developed.  Be careful about any drive that says it’s for archiving purposes.

SLOG and L2ARC

For SLOG and L2ARC see my comparison of SSDs.

Capcity Planning for Failure

Most drives running 24/7 start having a high failure rate after 3-years, you might be able to squeeze 4 or 5 years out of them if you’re lucky.  So a good rule of thumb is to estimate your growth and buy drives big enough that you will start to outgrow them in 4 to 5 years.  The price of hard drives is always dropping so you don’t really want to buy more much than you’ll need before they start failing.  Consider that in ZFS you shouldn’t run more than 70% full (with 80% being max) for your typical NAS applications including VMs on NFS.  But if you’re planning to use iSCSI you shouldn’t run more than 50% full.

ZFS Drive Configurations

My preference is almost always RAID-Z2 (RAID-6) with 6 to 8 drives which provides a storage efficiency of .66 to .75.  This scales pretty well as far as capacity is concerned and with double-parity I’m not that concerned if a drive fails.  6 drives in RAID-Z2 would net 8TB capacity all the way up to 24TB with 6TB drives.  For larger setups use multiple vdevs.  E.g. with 60 bays use 10 six drive RAID-Z2 vdevs (each vdev will increase IOPS).  For smaller setups I run 3 or 4 drives in RAID-Z (RAID-5).  In all cases it’s essential to have backups… and I’d rather have two smaller servers with RAID-Z mirroring to each other than one server with RAID-Z2.  The nice thing about smaller setups is the cost of upgrading 4 drives isn’t as bad as 6 or 8!

Enabling CCTL/TLER

Time-Limited Error Recovery (TLER) or Command Completion Time Limit (CCTL).

On desktop class drives such as the HGST Deskstar, they’re typically not run in RAID mode so by default they are configured to take as long as needed (sometimes several minutes) to try to recover a bad sector of data.  This is what you’d want on a desktop, however performance grinds to a halt during this time which can cause your ZFS server to hang for several minutes waiting on a recovery.  If you already have ZFS redundancy it’s a pretty low risk to just tell the drive to give up after a few seconds, and let ZFS rebuild the data.

The basic rule of thumb.  If you’re running RAID-Z, you have two copies so I’d be a little cautious about enabling TLER.  If you’re running RAID-Z2 or RAID-Z3 you have three or four copies of data so in that case there’s very little risk in enabling it.

Viewing the TLER setting:

Enabling TLER

Disabling TLER

(TLER should always be disabled if you have no redundancy).

5 thoughts on “Best Hard Drives for ZFS Server (Updated 2016)”

  1. Is there a benefit in performance to sticking with 512n (native not emulated sector size) disks.

    Yes I know you can do 4K alignment with “ashift” but. I believe you will incur lest overhead for small update but have never benchmarked this assumption.

    In the past I always try use 512n when possible recently becuase of this the 4TB 512N ST4000NM0033 (128MB cache -SATA) as my goto large drive about $56 to $75 per TB

    1. Hi, Jon. I haven’t tested native vs emulated sector disks. I am curious so if you come across any benchmarks let me know.

      I have a 6-disk RAID-Z2 of 2TB Seagate Barracudas on my main pool and I haven’t been very happy with them. Now these are not enterprise grade because I didn’t know any better when I first built my pool. Last year I had two fail out of warranty. A couple a weeks ago another one failed, this time I got smart and replaced it with an HGST. Of the three I haven’t replaced 2 are reporting SMART issues and the other one is corrupting data (ZFS keeps reporting read errors on and on every scrub I see it being resilvered). That’s a 100% failure rate within 4 years. So I went ahead and ordered 5 more HGST 7K4000s to replace them all and hopefully be done with it.

      The 4TB Seagates are a lot more reliable than 2TB. I feel like if Seagate wants more of my business they should be giving me a partial credit on those 6 drives since they didn’t last very long. Last week I sat down to write a letter to Seagate to see if they’d be interested in earning back my business by giving me a partial credit or a couple of free drives but they don’t have a mailing address listed on their website.

      1. Ben,

        I always try to buy with a five (5) year warranty and have a spare drive lying around (even if I’m burning the warranty). For that matter I always buy computers and servers in threes just so I have some “swap” the part capability. Sorry about your experience with Seagate.

        I too was less than enthused about ZFS RAID (I stopped using them years ago and went to mirrors) at first they seemed to perform like “magic” but then after they got to 70% full or above they really started to crawl (due to inherent COW fragmentation). The lousy thing is you can really get rid of it is as there is no (and probably never will be) block pointer rewrite (BPR). Once I had the pain of dealing with a Sun X4500 “thumper” with two 10 disk ZFS RAIDs which became fragmented and useless – I had to move the data off chassis rebuild smaller pools and then move the data back (and that took a long long time). That was the final straw which caused me to switched to mirrored pairs and I keep them relatively empty if I want to refresh performance I copy the data to a freshly minted mirrored pair. In general I’m pretty happy with this arraignment.

        As to benchmarks I did a little more digging (no benchmarks on the 512 v. 4K sector by me):

        This is not my benchmark but viable for your use case of RAID-Z2 seems like you gain some space with ashift=9 BUT loose 1/3 of your write performance at least in this test (I always use mirrors) http://louwrentius.com/zfs-performance-and-capacity-impact-of-ashift9-on-4k-sector-drives.html

        This also is not my benchmark – albeit for windows – the big issue here is miss alignment – the unaligned setup results in a performance impact of up to 50%, depending on the workload. However workloads that do not involve write operations, such as the Web server test pattern, don’t show any disadvantage at all in I/O testing.

        In PCMark Vantage application test shows decreased performance for the new drive with its 4 KB sector size in popular Windows scenarios. … However, we cannot control the way data writes are actually executed and organized. In the case of PCMark Vantage, the benchmark was never tweaked to minimize the number of smallest-size write requests in favor of larger chunks of data (this is something ZFS does). http://www.tomshardware.com/reviews/advanced-format-4k-sector-size-hard-drive,2759.html

        FYI in ZFS RAID scenarios (which I stopped using years ago) If a 4k bytes is the minimum block of data that can be written or read, data blocks smaller than 4k will also be padded to form the 4k. This article says that this can hurt the most, when parity blocks have to be created for small chunks of data. http://www.docs.cloudbyte.com/knowledge-base-articles/implications-of-using-4k-sector-size-zfs-pools/
        a) So as a thumb-rule the ZFS record size should be multiple of number of data disks in the RAIDZ x Sector Size.
        b) For example for 4+1, the record size should be 16K (4 x 4096) and for 2+1 it should be 8K (2 x 4096).
        c) CloudByte recommends 32K as the sector size for least space overhead.

        BTW a nice article/tutorial about forcing the ashift (9 or 12) value http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/

        Summary: Seems like maybe I should bite the bullet and switch to 4K drives and always make sure I use ashift=12 in my new rigs – as performance seems to be similar if the drives are aligned.

        1. Well, you’re wise to get a 5-year warranty. Mine were bought shortly after the Thailand flooding so that may have played a role in the shorter lifespan.

          You’ve done quite a bit of research there. For heavy I/O I agree that mirrors are the way to go, mostly for the extra IOPS. At home I use 6 x RAID-Z2 any my performance is fine for my needs… I have a few VMware VMs on NFS but the ARC and SLOG is more than enough to run it on RAID-Z2–but the bulk of my data is movies and pictures.

          I think those articles on the record size might be a little dated. Now days everyone runs with compression=on (which for FreeNAS and OmniOS is LZ4) which changes the theory on record sizes and keeping the data disks in powers of 2 for RAID-Z/Z2/Z3. Here’s some newer articles you may find interesting:

          Matt makes the point that you don’t need to worry about the record size being a multiple of data disks: See: http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/ With LZ4 compression (which at worst doesn’t hurt and at best saves space and increases performance) it doesn’t really matter that much since most of the time it will compress into a smaller record anyway.

          Max Bruning has an interesting article about how parity works with RAID-Z. https://www.joyent.com/blog/zfs-raidz-striping He states that in the case of a small write, RAID-Z will only put the data on enough disks to get the required redundancy. So if you have 4+1 RAID-Z and write a block less than 4K (assuming ashift-12) RAID-Z will place the data on only 2 disks (effectively mirroring the data) instead of wasting space on all 5 drives.

          So correct me if I’m interpreting this wrong, but I think the best record size from a storage efficiency standpoint is the largest possible (1M). Smaller writes are going to use the smallest record-size they can anyway. Using a large block-size record-size will help throughput but could come at the cost of IOPS, so there may be performance reasons to force a smaller record-size.

      2. I talked to somebody from iXsystems … and apparently they are using 4K Native sector drives, specifically HGST they told me they provide the best performance.

        Also … I had some major issues with recent WD RED drives … they would timeout and ABRT block read commands randomly on high usage, so i switched to HGST NAS for now. But HGST also has the HE series drives with 4KN and the non HE 4KN drives also available.

Leave a Reply