A couple of weeks ago, I decided to setup a backup system for all my data. The goal was to setup an automated system which needs no intervention.
I self host a lot of things and have a lot of personal data. Popular tools like Google Drive are good for syncing, not for backup. It’s important to note the difference between the two here.
Backup software
Searching the internet led me towards many open source backup tools; notably Restic and Borg Backup. Borg required a server to be run at the backup location. This was a deal breaker since I wanted to save the backup to multiple locations with or without a server.
Restic seemed to perfectly fit the bill so I decided to give it a try. It provided all the features I wanted:
- open source
- actively maintained
- a single binary
- supports a lot of backends: S3 compatible, SFTP, WebDav, Rclone
- has encrypted backups
- has deduplication
- has good documentation and community
- is CLI based
Backup location
The next step was to decide a backup location. I wanted to keep two copies of backup, one on-site and one off-site. For on-site I went ahead with a simple external SSD. For off-site, the decision required more effort.
There are a lot of popular cloud storage options. S3 based like Amazon S3, Backblaze B2, iDrive E2. Others like Hetzner Storage Box, Google Drive, Dropbox, etc.
I decided to go with Hetzner Storage Box. It seemed to fit the bill because:
- I wanted to stay away from Big Tech (Google Drive, etc). Maybe a personal bias.
- I didn’t want egress costs of S3 based storage.
- Hetzner supported a lot of protocols, SFTP, SCP, rSync, rClone, WebDav, to name a few.
- I was getting decent transfer speeds in my testing.
- The pricing was reasonable.
- Hetzner has a good reputation.
- The data center is in Germany, close to where I live (UK).
I setup Hetzner Storage Box over WebDav with rClone. I used it as a backend in Restic to prepare the backup repository. This is a one-time process.
restic -r rclone:hetzner-webdav:restic-repo init
Restic will ask you to set a password to encrypt the repository. Make sure to setup a strong one.
After the repository is prepared, the process to backup is a single CLI command:
restic -r rclone:hetzner-webdav:restic-repo backup /path/to/local/directory
The backup will be saved as a snapshot. Since Restic supports deduplication, the storage and time cost of future snapshots is low.
Restic supports removing old snapshots according to a policy. This is the policy I wrote which is self explanatory:
restic forget --prune --keep-daily 10 --keep-weekly 8 --keep-monthly 12 --keep-yearly 5
Automation
I used Syncthing to sync data from all my devices to my Raspberry Pi 5. In the Pi, I wrote a simple shell script to backup the data and remove older snapshots as per the policy above. Then I setup a cron job to run the script every day at 4am.
0 4 * * * /path/to/backup/script.sh >> /path/to/log/file 2>&1
Once this is setup, the entire process is automatic. Syncthing syncs data in real time to the Pi. And the cron job makes sure the data is backed up every day.
Conclusion
I will evolve this system over time as my needs change. Till now, I am happy with this. I invite you to suggest improvements and share your backup strategy as well.