pfSense Firewall HA Failover Cluster

Last night, I was attending a LAN party remotely and between games I noticed my pfSense router needed to be updated but of course an update brings down my internet for 30 seconds while it reboots which I didn’t want to do–and then I thought, I should really cluster this.  I found a straight-forward pfSense HA (Hardware/Device Failover) Configuration using CARP guide written by Michael Holloway and after following it ended up with something like this:

pfSense HA Diagram

I run pfSense under VMware (which I do not recommend unless you know what you are doing–if you do know what you are doing be sure to enable promiscuous mode on the VM switches  [which if you didn’t know perhaps you shouldn’t do this–you can end up getting into a circular dependency situation so just be sure you have an alternate way to get into VMware to troubleshoot pfSense in case it dies for some reason–there are several ways to do this: you can setup a backup VMkernel port with management enabled on a vSwitch connected to another physical adapter, or set VMware’s management interface to a static IP and set workstation to another static IP on that subnet.]) so networking hardware is free, so I deployed a second pfSense VM more or less identical to the master–I don’t run pfSense on my ZFS storage because I want my networking to come up before storage, but I did put each pfSense server on a separate hard drive.

Essentially I setup a sync interface (10.99.0.1 and .2)  where the pfsense-master syncs everything in real-time to the pfsense-slave.  My “WAN Gateway” is a CradlePoint router.  The pfsense-master WAN IP is 10.1.0.11, and the pfsense-slave WAN IP is 10.1.0.12, and then I setup a WAN-CARP virtual IP of 10.1.0.10 which is where all the WAN traffic goes out on, the master assumes the 10.1.0.10 WAN-CARP.  If the master goes down then the slave will take it over and the CradlePoint router is none-the-wiser.  Pretty much the same thing for the LAN and DMZ, the CARP virtual IP is 10.2.0.1 and 10.3.0.1 respectively, if the master goes down the slave assumes the IPs.

pfSense actually syncs the connection states.  I established an ssh connection to a remote server, hard powered off the master and didn’t lose the connection!  I was also pinging a remote host and didn’t drop a single packet.  This is good, now I can upgrade each pfsense router independently with no downtime.  If the pfsense master goes down, or somehow gets disconnected from the WAN or LAN the pfsense slave will assume the virtual IPs.  I’ve tested powering off the master, disconnecting the WAN port, disconnecting the LAN port, etc.  As long as the 10.99.0.1->10.99.0.2 link stays up the salve will assume the role of the master during those scenarios, and as soon as the real master recovers it re-assumes the role of master.