pfSense Firewall HA Failover Cluster

Last night, I was attending a LAN party remotely and between games I noticed my pfSense router needed to be updated but of course an update brings down my internet for 30 seconds while it reboots which I didn’t want to do–and then I thought, I should really cluster this.  I found a straight-forward pfSense HA (Hardware/Device Failover) Configuration using CARP guide written by Michael Holloway and after following it ended up with something like this:

pfSense HA Diagram

I run pfSense under VMware (which I do not recommend unless you know what you are doing–if you do know what you are doing be sure to enable promiscuous mode on the VM switches  [which if you didn’t know perhaps you shouldn’t do this–you can end up getting into a circular dependency situation so just be sure you have an alternate way to get into VMware to troubleshoot pfSense in case it dies for some reason–there are several ways to do this: you can setup a backup VMkernel port with management enabled on a vSwitch connected to another physical adapter, or set VMware’s management interface to a static IP and set workstation to another static IP on that subnet.]) so networking hardware is free, so I deployed a second pfSense VM more or less identical to the master–I don’t run pfSense on my ZFS storage because I want my networking to come up before storage, but I did put each pfSense server on a separate hard drive.

Essentially I setup a sync interface (10.99.0.1 and .2)  where the pfsense-master syncs everything in real-time to the pfsense-slave.  My “WAN Gateway” is a CradlePoint router.  The pfsense-master WAN IP is 10.1.0.11, and the pfsense-slave WAN IP is 10.1.0.12, and then I setup a WAN-CARP virtual IP of 10.1.0.10 which is where all the WAN traffic goes out on, the master assumes the 10.1.0.10 WAN-CARP.  If the master goes down then the slave will take it over and the CradlePoint router is none-the-wiser.  Pretty much the same thing for the LAN and DMZ, the CARP virtual IP is 10.2.0.1 and 10.3.0.1 respectively, if the master goes down the slave assumes the IPs.

pfSense actually syncs the connection states.  I established an ssh connection to a remote server, hard powered off the master and didn’t lose the connection!  I was also pinging a remote host and didn’t drop a single packet.  This is good, now I can upgrade each pfsense router independently with no downtime.  If the pfsense master goes down, or somehow gets disconnected from the WAN or LAN the pfsense slave will assume the virtual IPs.  I’ve tested powering off the master, disconnecting the WAN port, disconnecting the LAN port, etc.  As long as the 10.99.0.1->10.99.0.2 link stays up the salve will assume the role of the master during those scenarios, and as soon as the real master recovers it re-assumes the role of master.

6 thoughts on “pfSense Firewall HA Failover Cluster”

  1. I’m living on the edge with a single PFSense instance as a VM on VMware ESXi that has been up and running for over two years now. I use this as a firewall and VPN server to secure that physical server itself so it’s a little hairy. I’m afraid to restart the server or upgrade pfsense for fear it wouldn’t come back up just right automatically and thank god it has been rock solid and stable for so long! I did test it thoroughly when I set it up and restarting it seemed to work just fine. I have physical ethernet ports mapped to the internal VM network so worst case I can have the data center guys take a crash cart over there to get in somehow or another. When I set it up, I was local so it wasn’t as hairy as it is now when I’m several states away.

  2. I don’t know if you will see this, what kind of throughput are you getting doing this? Any latency? I am considering removing my physical router and wiring the incoming ethernet directly into my vsphere cluster on its own vlan. Then setting up a cluster of pfsense machines as above. I worry that I am going to increase latency and decrease my throughput 150/150 as I have not found a lot of documented experiences doing this.

    Thanks!

  3. I’m attempting to set up CARP for the first time, and the transparent failover part is NOT working. I can pull the wan connection on the master, and it immediately communicates to the slave and the slave status changes to master… BUT hosts on the LAN side lose their connectivity and never regain it.

    I do note that there’s a brief note in the official setup instructions that says, without further elaboration:
    “CARP utilizes multicast, so care must be taken that the switches properly handle and do not block, filter, or limit multicast.”

    Could this have something to do with it? The LAN side uses a Cisco 2950 switch, right-out-of-the-box with no other settings made to it. How do I even tell if it’s blocking multicast?

Leave a Reply