Backing up data is one of the most important tasks everyone should be doing regularly. This series will demonstrate using three software tools to backup your important data.

When planning a backup strategy, consider the “Three Rs of backup”:

  • Redundant: Backups must be redundant. Backup media can fail. The backup storage site can be compromised (fire, theft, flood, etc.). It’s always a good idea to have more than one destination for backup data.
  • Regular: Backups only help if you run them often. Schedule and run them regularly to keep adding new data and prune off old data.
  • Remote: At least one backup should be kept off-site. In the case of one site being physically compromised (fire, theft, flood, etc.), the remote backup becomes a fail-safe.

duplicity is an advanced commandline backup utility built on top of librsync and GnuPG. By producing GPG-encrypted backup volumes in tar-format, it offers secure incremental archives (a huge space saver, especially when backing up to remote services like S3 or an FTP server).

To get started, install duplicity:

dnf install duplicity

Choose a backend

duplicity supports a lot of backend services categorized into two groups: hosted storage providers and local media. Selecting a backend is mostly a personal preference,  but select at least two (Redundant). This article uses an Amazon S3 bucket as an example backend service.

Set up GnuPG

duplicity encrypts volumes before uploading them to the specified backend using a GnuPG key. If you haven’t already created a GPG key, follow GPG key management, part 1 to create one. Look up the long key ID and keep it nearby:

gpg2 --list-keys --keyid-format long me@mydomain.com

Set up Amazon AWS

AWS recommends using individual accounts to isolate programmatic access to your account. Log into the AWS IAM Console. If you don’t have an AWS account, you’ll be prompted to create one.

Click on Users in the list of sections on the left. Click the blue Add user button. Choose a descriptive user name, and set the Access type to Programmatic access only. There is no need for a backup account to have console access.

Next, attach the AmazonS3FullAccess policy directly to the account. duplicity needs this policy to create the bucket automatically the first time it runs.


After the user is created, save the access key ID and secret access key. They are required by duplicity when connecting to S3.

Choose backup data

When choosing data to back up, a good rule of thumb is to back up data you’ve created that can’t be re-downloaded from the Internet. Good candidates that meet this criteria are ~/Documents and ~/Pictures. Source code and “dot files” are also excellent candidates if they aren’t under version control.

Create a full backup

The general form for running duplicity is:

duplicity [OPTIONS] SRC DEST

In order to backup ~/Documents, but preserve the Documents folder within the backup volume, run duplicity with $HOME as the source, specify the –include option to include only ~/Documents, and exclude everything else with –exclude ‘**’. The –include and –exclude options can be combined in various ways to create specific file matching patterns. Experiment with these options before creating the initial backup. The –dry-run option simulates running duplicity. This is a great way to preview what a particular duplicity invocation will do.

duplicity will automatically determine whether a full or incremental backup is needed. The first time you run a source/destination, duplicity creates a full backup. Be sure to first export the access key ID and secret access key as environment variables. The –name option enables forward compatibility with duply (coming in part 2). Specify the long form GPG key ID that should be used to sign and encrypt the backup volumes with –encrypt-sign-key.

$ export AWS_ACCESS_KEY_ID=********************
$ export AWS_SECRET_ACCESS_KEY=****************************************
$ duplicity --dry-run --name duply_documents --encrypt-sign-key **************** --include $HOME/Documents --exclude '**' $HOME s3+http://**********-backup-docs
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
GnuPG passphrase: 
GnuPG passphrase for signing key: 
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
StartTime 1499399355.05 (Thu Jul 6 20:49:15 2017)
EndTime 1499399355.09 (Thu Jul 6 20:49:15 2017)
ElapsedTime 0.05 (0.05 seconds)
SourceFiles 102
SourceFileSize 40845801 (39.0 MB)
NewFiles 59
NewFileSize 40845801 (39.0 MB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 59
RawDeltaSize 0 (0 bytes)
TotalDestinationSizeChange 0 (0 bytes)
Errors 0
-------------------------------------------------

When you’re ready, remove the –dry-run option and start the backup. Plan ahead for the initial backup. It can often be a large amount of data and can take hours to upload, depending on your Internet connection.

After the backup is complete, the AWS S3 Console lists the new full backup volume.

 

Create an incremental backup

Run the same command again to create an incremental backup.

$ export AWS_ACCESS_KEY_ID=********************
$ export AWS_SECRET_ACCESS_KEY=****************************************
$ duplicity --dry-run --name duply_documents --encrypt-sign-key **************** --include $HOME/Documents --exclude '**' $HOME s3+http://**********-backup-docs
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Thu Jul 6 20:50:20 2017
GnuPG passphrase: 
GnuPG passphrase for signing key: 
--------------[ Backup Statistics ]--------------
StartTime 1499399964.77 (Thu Jul 6 20:59:24 2017)
EndTime 1499399964.79 (Thu Jul 6 20:59:24 2017)
ElapsedTime 0.02 (0.02 seconds)
SourceFiles 60
SourceFileSize 40845801 (39.0 MB)
NewFiles 3
NewFileSize 8192 (8.00 KB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 3
RawDeltaSize 0 (0 bytes)
TotalDestinationSizeChange 845 (845 bytes)
Errors 0
-------------------------------------------------

Again, the AWS S3 Console lists the new incremental backup volumes.

Restore a file

Backups aren’t useful without the ability to restore from them. duplicity makes restoration straightforward by simply reversing the SRC and DEST in the general form: duplicity [OPTIONS] DEST SRC.

$ export AWS_ACCESS_KEY_ID=********************
$ export AWS_SECRET_ACCESS_KEY=****************************************
$ duplicity --name duply_documents s3+http://**********-backup-docs $HOME/Restore
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Thu Jul 6 21:46:01 2017
GnuPG passphrase:
$ du -sh Restore/
783M Restore/

This restores the entire backup volume. Specific files or directories are restored using the –file-to-restore option, specifying a path relative to the backup root. For example:

$ export AWS_ACCESS_KEY_ID=********************
$ export AWS_SECRET_ACCESS_KEY=****************************************
$ duplicity --name duply_documents --file-to-restore Documents/post_install s3+http://**********-backup-docs $HOME/Restore
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Tue Jul 4 14:16:00 2017
GnuPG passphrase: 
$ tree Restore/
Restore/
├── files
│ ├── 10-doxie-scanner.rules
│ ├── 99-superdrive.rules
│ └── simple-scan.dconf
└── post_install.sh

1 directory, 4 files

Automate with a timer

The example above is clearly a manual process. “Regular” from the Three R philosophy requires this duplicity command run repeatedly. Create a simple shell script that wraps these environment variables and command invocation.

#!/bin/bash

export AWS_ACCESS_KEY_ID=********************
export AWS_SECRET_ACCESS_KEY=****************************************
export PASSPHRASE=************

duplicity --name duply_documents --encrypt-sign-key **************** --include $HOME/Documents --exclude '**' $HOME s3+http://**********-backup-docs

Notice the addition of the PASSPHRASE variable. This allows duplicity to run without prompting for your GPG passphrase. Save this file somewhere in your home directory. It doesn’t have to be in your $PATH. Make sure  the permissions are set to user read/write/execute only to protect the plain text GPG passphrase.

Now create a timer and service unit to run it daily.

$ cat $HOME/.config/systemd/user/backup.timer
[Unit]
Description=Run duplicity backup timer

[Timer]
OnCalendar=daily
Unit=backup.service

[Install]
WantedBy=default.target
$ cat $HOME/.config/systemd/user/backup.service
[Service]
Type=oneshot
ExecStart=/home/link/backup.sh

[Unit]
Description=Run duplicity backup
$ systemctl --user enable --now backup.timer
Created symlink /home/link/.config/systemd/user/default.target.wants/backup.timer → /home/link/.config/systemd/user/backup.timer.

Conclusion

This article has described a manual process. But the flexibility in creating specific, customized backup targets is one of duplicity’s most powerful features. The duplicity man page has a lot more detail about various options. The next article will build on this one by creating backup profiles with duply, a wrapper program that makes the raw duplicity invocations easier.