My Proxmox server was at 85% capacity. The daily email from Proxmox reminded me of two drives throwing SMART errors. It was time to replace the array. I like to think hard drives can last 5-years so it’s best to proactively replace drives around that age before the failure rate starts to increase.
I ask myself, what’s the most I will need 5-years from now.
The answer is always lots of Terrabytes.
Here’s what 56TB looks like. I am amazed how much data you can store in a 4-bay NAS these days. At RAID-Z (like RAID-5 which uses one drive for parity) this will be 42TB.
I referenced my own ZFS hard drive guide (I eat my own dogfood) and looking at prices decided on the 14TB Seagate Exos X16. They’re 7200RPM Enterprise-class, and starting at 14TB they get a 256GB cache. These are CMR (not SMR) which is what you want for ZFS or RAID systems.
Upgrading a ZFS Zpool by replacing disks with larger drives
RAID-Z VDEVs can be expanded in size by replacing and resilvering all of the drives in the array, one at a time.
The first step, is to make sure backups are good. I would not proceed without good backups.
Next, for safety, run a zpool scrub. This makes ZFS read every block of data on every drive and verifies it against a checksum to make sure all of the data is readable and there is no data corruption. Having bad data during a rebuild would be bad because there wouldn’t be enough parity to reconstruct it and then I’d have to recover from backup.
I should note that at the cost of another drive dedicated to parity, a raid level like RAID-Z2 (RAID-6) results in double-parity so has a lesser risk of failure during resilvering. But I didn’t want to lose half my drives to parity.
zpool scrub pvepool1
This is a great time to go to bed and check on things in the morning.
In the morning, check the status.
zpool status pvepool1
If the scrub had found a drive with any errors, I would replace that disk first. But in my case zpool status showed no errors so I’ll just replace the first disk (number 0). To do this, no need to offline anything, just physically pull the first tray and wait for the drive to spin down.
For those wondering, this is my AMD 8-Core EPYC Home Server Build.
Swap out the drive for a new one (don’t torque the screws, just tighten them).
Re-insert the tray. Run zpool status to see which drive needs to be replaced. You can also run dmesg to see the device name of the device you just inserted. In my case it was /dev/sda
zpool status pvepool1 dmesg zpool replace pvepool1 sda
Run the status command to see how it’s progressing. In my case, I was rebuilding a nearly full four 2TB RAID-Z array and it took a bit less than 5 hours per drive. It’s also a good idea to run an extra scrub after replacing each drive to verify that all the data can be read back before moving on to the next drive.
zpool scrub pvepool1 zpool status pvepool1
I repeated the process for sdb, sdc, and sdd to replace all 4 drives.
The whole process took a few days. Finally, after replacing and resilvering the last drive I turn on auto-expand for the pool, and expand all of the devices.
zpool set autoexpand=on pvepool1 zpool online -e pvepool1 sda zpool online -e pvepool1 sdb zpool online -e pvepool1 sdc zpool online -e pvepool1 sdd
And now, Proxmox can see the additional space.
The new Seagate Exos X16 drives are not silent, but fairly quiet for Enterprise-class drives. I can hear them thumping during data writes but it’s not loud. It kind of reminds me of my first IBM computer.
I’d much rather have a little noise and character from my computers than the soulless silence of an SSD. As far as noise level they’re fine in an office, basement, or closet. But I wouldn’t run these drives in a bedroom.
Happy Halloween, and Happy Reformation Day!