I switched to Duplicati for Windows Backups and Restic for Linux Servers

So long, CrashPlan! After using it for 5 years, CrashPlan with less than a day notice decided to delete many of my files I had backed up. Once again, the deal got altered. Deleting files with no advanced notice is something I might expect from a totalitarian leader, but it isn’t acceptable for a backup service.

Darth Vader altering the deal
I am altering the deal. Pray I don’t alter it any further.

CrashPlan used to be the best offering for backups by far, but those days are gone. I needed to find something else. To start with I noted my requirements for a backup solution:

  1. Fully Automated. I am not going to remember to do something like take a backup on a regular basis. Between the demands from all aspects of life I already have trouble doing the thousands of things I should already be doing and I don’t need another thing to remember.
  2. Should alert me on failure. If my backups start failing. I want to know. I don’t want to check on the status periodically.
  3. Efficient with bandwidth, time, and price.
  4. Protect against my backup threat model (below).
  5. Not Unlimited. I’m tired of “unlimited” backup providers like CrashPlan not being able to handle unlimited and going out of business or altering the deal. I either want to provide my own hardware or pay by the GB.

Backup Strategy

Relayed Backups

This also gave me a good opportunity to review my backup strategy. I had been using a strategy where all local and cloud devices backed up to a NAS on my network, and then those backups were relayed to a remote (formerly CrashPlan) backup service. The other model is a direct backup. I like this a little better because living in North Idaho I don’t have a good upload speed so in several cases I’ve been in situations where my remote backups from the NAS would never complete because I don’t have enough bandwidth to keep up.

Now if Ting could get permission to run fiber under the railroad tracks and to my house I’d have gigabit upload speed, but until then the less I have to upload from home the better.

Direct Backups

Backup Threat Model

It’s best practice to think through all the threats you are protecting against. If you don’t do this exercise you may not think about something important… like keeping your only backup in the same location as your computer. My backup threat model (these are the threats which my backups should protect against):

  1. Disasters. If a fire sweeps through North Idaho burning every building but I somehow survive I want my data. So must have offsite backups in a different geo-location. We can assume that all keys and hardware tokens will be lost in a disaster so those must not be required to restore. At least one backup should be in a geographically separate area from me.
  2. Malware or ransomware. Must have an unavailable or offline backup.
  3. Physical theft or data leaks. Backups must be encrypted.
  4. Silent Data Corruption. Data integrity must be verified regularly and protected against bitrot.
  5. Time. I do not ever want to lose more than a days worth of work so backups must run on a daily basis and must not consume too much of my time maintaining them.
  6. Fast and easy targeted restores. I may need to recover an individual file I have accidentally deleted.
  7. Accidental Corruption. I may have a file corrupted or accidentally overwrite it and may not realize it until a week later or even a year alter. Therefore I need versioned backups to be able to restore a file from points in time up to several years.
  8. Complexity. If something were to happen to me, the workstation backups must be simple enough that Kris would be able to get to them. It’s okay if she has to call one of my tech friends for help, but it should be simple enough that they could figure it out.
  9. Non-payment of backup services. Backups must persist on their own in the event that I am unaware of failed payments or unable to pay for backups. If I’m traveling and my CC gets compromised I don’t want to not have backups.
  10. Bad backup software. The last thing you need is your backup software corrupting all your data because of some bug (I have seen this happen with rsync) so it should be stable. Looking at the git history I should be seeing minor fixes and infrequent releases instead of major rewrites and data corruption bug fixes.
Raspberry Pi and 4TB drive on wooden shelf
Raspberry Pi 4TB WD Backup

My friend Meredith had contacted me about swapping backup storage. We’re geographically separated so that works to cover local disasters. So that’s what we did, each of us setup an SSH/SFTP server for the other to backup to. I had plenty of space on my Proxmox environment so I created a VM for him and put it in an isolated DMZ. He had a Raspberry Pi and bought a new 4TB western digital external USB drive that he setup at his house for me.

Duplicati Backup Solution for Workstations

For Windows desktops I chose Duplicati 2. It also works with Mac, and Linux but for my purposes I just evaluated Windows.

Duplicati screenshot of main page

Duplicati has a nice local web interface. It’s simple and easy to use. Adding a new backup job is simple and gives plenty of options for my backup sets and destinations (this allows me to backup not only to a remote SFTP server, but also to any cloud service such as Backblaze B2 or Amazon S3).

Animation of setting up a duplicati backup job

Duplicati 2 has status icons in the system tray that quickly indicate any issues. The first few runs I was seeing a red icon indicating the backup had an error. Looking at the log it was because I had left programs open locking files it was trying to back up. I like that it warns about this instead of silently not backing up files.

Green play icon
Grey paused icon
Black idle icon
Red error icon

Green=In Progress, Grey=Paused, Black=Idle, Red=Error on the last backup.

Duplicati 2 seems to work well. I have tested restores and they come back pretty quickly. I can backup to my NAS as well as a remote server and a cloud server.

Two things I don’t care for Duplicati 2.

  1. It is still labeled Beta. That said it is a lot more stable than some GA software I’ve used.
  2. There are too many projects with similar names. Duplicati, Duplicity, Duplicacy. It’s hard to keep them straight.

Other considerations for workstation backups:

  • rsync – no gui
  • restic- no gui
  • Borg backup – Windows not officially supported
  • Duplicacy- License only allows personal

Restic Backup for Linux Servers

I settled on Restic for Linux servers. I have used Restic on several small projects over the years and it is a solid backup program. Once the environment variables are set it’s one command to backup or restore which can be run from cron.

Screenshot of restic animation

It’s also easy to mount any point in time snapshot as a read-only filesystem.

Borg backup came in pretty close to Restic, the main reason I chose Restic is the support for backends other than sftp. The cheapest storage these days is object storage such as Backblaze B2 and Wasabi. If Meredith’s server goes down, with Borg backup I’d have to redo my backup strategy entirely. With restic I have the option to quickly add a new cloud backup target.

Looking at my threat model there are two potential issues with Restic:

  1. A compromised server would have access to delete it’s own backups. This can be mitigated by storing the backup on a VM that is backed by storage configured with periodic immutable ZFS snapshots.
  2. Because restic uses a push instead of a pull model, a compromised server would also have access to other server’s backups increasing the risk of data exfiltration. At the cost of some deduplication benefits this can be mitigated by setting up one backup repository per host, or at the very least by creating separate repos for groups of hosts. (e.g. a restic repo set for minecraft servers and separate restic repo for web servers).

Automating Restic Deployment

Obviously it would be ridiculous to configure 50 servers by hand. To automate I used two Ansible Galaxy roles. I created https://galaxy.ansible.com/ahnooie/generate_ssh_keys which automatically generates ssh keys and copies the key ids to the restic backup target. The second role https://galaxy.ansible.com/paulfantom/restic automatically installs and configures a restic job on each server to run from cron.

Utilizing the above roles here is the Ansible Playbook I used to configure restic backups across all my servers. This sets it up so that each server is backed up once a day at a random time:

Manual Steps

I’ve minimized manual steps but some still must be performed:

  1. Backup to cold storage. This is archiving everything to an external hard drive and then leaving it offline. I do this manually once a year on world backup day and also after major events (e.g. doing taxes, taking awesome photos, etc.). This is my safety in case online backups get destroyed.
  2. Test restores. I do this once a year on world backup day.
  3. Verify backups are running. I have a reminder set to do this once a quarter. With Duplicati I can check in the web UI, and with a single Restic command it can get a list of hosts with the most recent backup date for each.

Cast your bread upon the waters,
for you will find it after many days.


Give a portion to seven,

or even to eight,
for you know not what disaster may happen on earth.

Solomon
Ecclesiastes 11:1-2 ESV

Ansible Role Rdiff-Backup Script | World Backup Day

Happy World Backup Day!  Here’s a quick little Ansible Role I wrote to automate backup configuration for hordes of servers using Rdiff-Backup from an Ansible inventory file.  If you have no idea what I just said you may want to skip to “I’m Confused” at the very bottom of this post.

Ansible Rdiff-Backup Configuration Diagram

What does the Rdiff-Backup Ansible Role do?

  • Creates a folder on the backup storage server to store backups.
  • Creates a backup script on the backup server.  This script will use rdiff-backup over ssh to backup every server on the list (below) and prune backups older than 1-year (default).
  • Adds/removes servers in an Ansible inventory file to a backup list which the backup script calls as servers are provisioned/decommissioned (the script will not delete backups on a decommission, only stop taking them).
  • Installs the rdiff-backup program on both the client and backup server.
  • Generates an SSH key-pair on the backup server and adds that public key to the authorized key file on each client to allow the backup server to ssh into the clients.
  • Scan ssh-key from client and add it to known hosts on backup server
  • Create a cron job on the backup server to run the backup script once a day.

The rdiff-backup-script role is available on Ansible Galaxy and I’ve uploaded the source to GitHub under the MIT license.

Requirements

You will need:

  1. Ansible (best to be installed on it’s own Linux server).
  2. A Linux server with lots of disk storage to serve as a backup server.
  3. Lots of Linux servers in your Ansible inventory that need to be backed up.

I have tested this on Ubuntu 16.04, but it should work with any distribution (CentOS, RedHat, Debian, etc.) as long as rdiff-backup is in the repositories available to the package manager.

Install the Rdiff-Backup Script Ansible Role

On your Ansible server install the rdiff-backup-script role with:

Create a playbook file, rdiff-bakcup-script.yml with the contents:

Your Ansible inventory file would look something like this:

Run the playbook with:

Once the playbook has run all servers will be configured for backup which will occur at the next cron job run (defaults to 01:43 am).

The above playbook should be added to your site config so it is run automatically with the rest of your Ansible playbooks.  It would also be wise to have something like Nagios or Logcheck watch the logs and alert on failures or stale log last modified dates.

The backup script does not try to create an LVM snapshot and then backup the snapshot.  That would certainly be cleaner and I may add that ability later.  The default settings exclude quite a few files from the backup so make sure those exclusions are what you want.  One thing I excluded by default is a lot of LXC files.  If you’re using LXC you may want them.  Also always test a restore before relying on it.

Obviously, test it in a test environment and make sure you understand what it does before trying it on anything important.

Check your backup strategy

This is a good day to check your backup strategy.  A few things to consider:

  1. System backups are important, not just the data files.  You never know what you’re missing in your Document only backups and restoring service from system backups is much faster than rebuilding systems.
  2. Frequency.  If you can’t afford to lose an hour of work backup at least every hour.
  3. Geographic redundancy.  Local fires, hurricanes, fires earthquakes can wipe out multiple locations in cities all at once.  Keep at least one backup in a separate part of the globe.
  4. Versioned backups.  On Monday you took a backup.  On Tuesday your file got corrupted.  On Wednesday you overwrote Monday’s backups with Wednesday’s backup.  Enough said.
  5. Test restoring from your backups (it’s good to test at least once a year on World Backup Day) to make sure they work.
  6. Encrypt.  Make sure your backups to cloud services, insecure locations are encrypted (but also make sure you have provisions to decrypt it when needed).
  7. Cold storage.  Keep at least one backup offline.  When a bug in your backup program deletes all your live data and your backups you’ll be glad you did.
  8. Keep at least 3 copies of data you don’t want to lose.  Your live version (obviously), one onsite backup that will allow you to restore quickly, and one offsite backup in a far away state or country.

I’m Confused

You might want to backup your computer.  I’d suggest looking at CrashPlan, SpiderOak, or BackBlaze which are all reputable companies that offer automatic cloud backup services for your computer.  The main thing you want to look at for pricing is how much data you have vs the number of computers you have. CrashPlan and BackBlaze charge by the computer but offer unlimited data so they would be ideal if you have a lot of data but few computers.  SpiderOak lets you have unlimited computers but charges you by how much space you use making it ideal if you have little data and many devices.