site.lyte.dev/content/blog/restic-backups.md

5.1 KiB

date title draft toc
2023-07-06T14:32:00-05:00 Backups with Restic true false

For the longest time, my backup setup has been a script I run manually that was quite dumb that had no features other than encryption. After getting my feet wet with btrfs somewhat recently and seeing the magic of deduplication, compression, and snapshots, I was all-in on these features and also wanted them for my backups.

TL;DR

  • Install restic on both machines (may only be needed on the backupper?)
  • Create a restic user on the backuppee, setup a directory for backups with the right permissions, and add the backupper's public key
  • restic -r sftp:restic@backuppee:/backups init to setup a repo, put the password in a secret place accessible only to the backupper user
  • for d in $DIRS; do RESTIC_PASSWORD_COMMAND="load secret restic-key" restic -r sftp:restic@backuppee:/backups "$d"; done

Planning

The most important thing to think about when it comes to backups is to think about what you are protecting. It's easy enough to just backup everything and I know plenty of folks that do this! However, I'm not that type. I like to keep things pretty minimal, so I'll evaluate which things truly are worth backing up:

In my case, the only things I really want to back up are anything that might be considered unique data that cannot be easily reproduced such as the following:

  • Family photos and videos
  • Secrets, keys, and anything else that provides access to other important assets
  • Communications and their context (emails, texts, etc.)
  • Backups of devices for restoration in case of bricking (phones and nintendo consoles come to mind)
  • Source code for projects

My current solutions for these are varied:

  • Family Pictures: Google Photos
    • I would love to possibly look into a self-hosted solution, but Google Photos is unfortunately too easy and cheap
  • Secrets: These go into a combination of password-store and Vaultwarden.
    • The password-store database is backed up pretty much automatically via git to a handful of places and the data is encrypted at rest.
    • The Vaultwarden database is part of the "mission critical" server backup that happens manually. These backups are untested, but anything in here should ultimately be recoverable from redundancies in the password-store database via "forgot my password" mechanisms.
  • Communications: These are handled by whatever cloud services the communications happen over. Email is all Gmail at the moment, chats are varied, etc.
  • Device Backups: These have been simple enough. Copy the backups to the various backup locations and forget about them.
  • Code: I have pretty good git habits after almost 15 years of version control'ing, so I push code all the time to a server which in-turn backs up the code.

So where am I putting this data? I have a few larger disks here at my house, but I also host a sprinkling of machines at friends and family's houses out of the way and we share space as needed, allowing all of us to have redundant remote backups. That said, my machines there are not the most robust. Here are things I'm concerned about:

  • Running out of space
    • No deduplication means this will happen eventually.
  • Bitrot
    • They say it's rare, and perhaps I'm confusing disk damage with bitrot, but I definitely have been bit by this in some form or another. I want my backup system to combat this as much as possible (checksums and error correction via btrfs) but also to somehow regularly and automatically let me know if and when it occurs
  • Not automated
    • I would have a lot more peace-of-mind if I knew I could just backup everything nightly and not worry about it.

Backing up everything nightly was not an option currently, since I have ~1TB of data backed up and I currently just sync over everything in the local backup directory via rsync. I know, I've probably got the wrong flags, since rsync should be just fine for this, but I also wanted deduplication and a system that would let me pull out individual files if I wanted.

Enter Restic

Restic pretty much seemed perfect. Seemed simple enough to setup and manage, so I gave it a shot

My current goals are certainly "good enough", but the lack of automation and terribly inefficient use of bandwidth with my remote hosts was not ideal:

Setup

I aimed to keep things pretty secure, so I setup a user specifically for my backups on my backuppee devices, the machines with big hard disks (calm down) that would hold the backed-up data.

My backupper machines would be the ones pushing data, which, for now, was really just one box. My server which houses the actually important data.

All my other machines really just interface with that server to push and pull code and data via git. Pretty much anything else should be lose-able in my situation. I use Google Photos for backing up pictures, so I don't really worry about those for now. The only other data I want backed up is older backups (giant tarballs).