Incremental backup with Butterfly Backup

Introduction

This article explains how to make incremental or differential backups, with a catalog available to restore (or export) at the point you want, with Butterfly Backup.

Requirements

Butterfly Backup is a simple wrapper of rsync written in python; the first requirement is python3.3 or higher (plus module cryptography for init action). Other requirements are openssh and rsync (version 2.5 or higher). Ok, let’s go!

[Editors note: rsync version 3.2.3 is already installed on Fedora 33 systems]

$ sudo dnf install python3 openssh rsync git
$ sudo pip3 install cryptography

Installation

After that, installing Butterfly Backup is very simple by using the following commands to clone the repository locally, and set up Butterfly Backup for use:

$ git clone https://github.com/MatteoGuadrini/Butterfly-Backup.git
$ cd Butterfly-Backup
$ sudo python3 setup.py
$ bb --help
$ man bb

To upgrade, you would use the same commands too.

Example

Butterfly Backup is a server to client tool and is installed on a server (or workstation). The restore process restores the files into the specified client. This process shares some of the options available to the backup process.

Backups are organized accord to precise catalog; this is an example:

$ tree destination/of/backup
.
├── destination
│   ├── hostname or ip of the PC under backup
│   │   ├── timestamp folder
│   │   │   ├── backup folders
│   │   │   ├── backup.log
│   │   │   └── restore.log
│   │   ├─── general.log
│   │   └─── symlink of last backup
│
├── export.log
├── backup.list
└── .catalog.cfg

Butterfly Backup has six main operations, referred to as actions, you can get information about them with the –help command.

$ bb --help
usage: bb [-h] [--verbose] [--log] [--dry-run] [--version]
          {config,backup,restore,archive,list,export} ...

Butterfly Backup

optional arguments:
  -h, --help            show this help message and exit
  --verbose, -v         Enable verbosity
  --log, -l             Create a log
  --dry-run, -N         Dry run mode
  --version, -V         Print version

action:
  Valid action

  {config,backup,restore,archive,list,export}
                        Available actions
    config              Configuration options
    backup              Backup options
    restore             Restore options
    archive             Archive options
    list                List options
    export              Export options

Configuration

Configuration mode is straight forward; If you’re already familiar with the exchange keys and OpenSSH, you probably won’t need it. First, you must create a configuration (rsa keys), for instance:

$ bb config --new
SUCCESS: New configuration successfully created!

After creating the configuration, the keys will be installed (copied) on the hosts you want to backup:

$ bb config --deploy host1
Copying configuration to host1; write the password:
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/arthur/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
arthur@host1's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'arthur@host1'"
and check to make sure that only the key(s) you wanted were added.

SUCCESS: Configuration copied successfully on host1!

Backup

There are two backup modes: single and bulk.
The most relevant features of the two backup modes are the parallelism and retention of old backups. See the two parameters –parallel and –retention in the documentation.

Single backup

The backup of a single machine consists in taking the files and folders indicated in the command line, and putting them into the cataloging structure indicated above. In other words, copy all file and folders of a machine into a path.

This is an examples:

$ bb backup --computer host1 --destination /mnt/backup --data User Config --type Unix
Start backup on host1
SUCCESS: Command rsync -ah --no-links arthur@host1:/home :/etc /mnt/backup/host1/2020_09_19__10_28

Bulk backup

Above all, bulk mode backups share the same options as single mode, with the difference that they accept a file containing a list of hostnames or ips. In this mode backups will performed in parallel (by default 5 machines at a time). Above all, if you want to run fewer or more machines in parallel, specify the –parallel parameter.

Incremental of the previous backup, for instance:

$ cat /home/arthur/pclist.txt
host1
host2
host3
$ bb backup --list /home/arthur/pclist.txt --destination /mnt/backup --data User Config --type Unix
ERROR: The port 22 on host2 is closed!
ERROR: The port 22 on host3 is closed!
Start backup on host1
SUCCESS: Command rsync -ahu --no-links --link-dest=/mnt/backup/host1/2020_09_19__10_28 arthur@host1:/home :/etc /mnt/backup/host1/2020_09_19__10_50

There are four backup modes, which you specify with the –mode flag: Full (backup all files) , Mirror (backup all files in mirror mode), Differential (is based on the latest Full backup) and Incremental (is based on the latest backup).
The default mode is Incremental; Full mode is set by default when the flag is not specified.

Listing catalog

The first time you run backup commands, the catalog is created. The catalog is used for future backups and all the restores that are made through Butterfly Backup. To query this catalog use the list command.
First, let’s query the catalog in our example:

$ bb list --catalog /mnt/backup

BUTTERFLY BACKUP CATALOG

Backup id: aba860b0-9944-11e8-a93f-005056a664e0
Hostname or ip: host1
Timestamp: 2020-09-19 10:28:12

Backup id: dd6de2f2-9a1e-11e8-82b0-005056a664e0
Hostname or ip: host1
Timestamp: 2020-09-19 10:50:59

Press q for exit and select a backup-id:

$ bb list --catalog /mnt/backup --backup-id dd6de2f2-9a1e-11e8-82b0-005056a664e0
Backup id: dd6de2f2-9a1e-11e8-82b0-005056a664e0
Hostname or ip: host1
Type: Incremental
Timestamp: 2020-09-19 10:50:59
Start: 2020-09-19 10:50:59
Finish: 2020-09-19 11:43:51
OS: Unix
ExitCode: 0
Path: /mnt/backup/host1/2020_09_19__10_50
List: backup.log
etc
home

To export the catalog list use it with an external tool like cat, include the log flag:

$ bb list --catalog /mnt/backup --log
$ cat /mnt/backup/backup.list

Restore

The restore process is the exact opposite of the backup process. It takes the files from a specific backup and push it to the destination computer.
This command perform a restore on the same machine of the backup, for instance:

$ bb restore --catalog /mnt/backup --backup-id dd6de2f2-9a1e-11e8-82b0-005056a664e0 --computer host1 --log
Want to do restore path /mnt/backup/host1/2020_09_19__10_50/etc? To continue [Y/N]? y
Want to do restore path /mnt/backup/host1/2020_09_19__10_50/home? To continue [Y/N]? y
SUCCESS: Command rsync -ahu -vP --log-file=/mnt/backup/host1/2020_09_19__10_50/restore.log /mnt/backup/host1/2020_09_19__10_50/etc arthur@host1:/restore_2020_09_19__10_50
SUCCESS: Command rsync -ahu -vP --log-file=/mnt/backup/host1/2020_09_19__10_50/restore.log /mnt/backup/host1/2020_09_19__10_50/home/* arthur@host1:/home

Without specifying the “type” flag that indicates the operating system on which the data are being retrieved, Butterfly Backup will select it directly from the catalog via the backup-id.

Archive old backup

Archive operations are used to store backups by saving disk space.

$ bb archive --catalog /mnt/backup/ --days 1 --destination /mnt/archive/ --verbose --log
INFO: Check archive this backup f65e5afe-9734-11e8-b0bb-005056a664e0. Folder /mnt/backup/host1/2020_09_18__17_50
INFO: Check archive this backup 4f2b5f6e-9939-11e8-9ab6-005056a664e0. Folder /mnt/backup/host1/2020_09_15__07_26
SUCCESS: Delete /mnt/backup/host1/2020_09_15__07_26 successfully.
SUCCESS: Archive /mnt/backup/host1/2020_09_15__07_26 successfully.
$ ls /mnt/archive
host1
$ ls /mnt/archive/host1
2020_09_15__07_26.zip

After that, look in the catalog and see that the backup was actually archived:

$ bb list --catalog /mnt/backup/ -i 4f2b5f6e-9939-11e8-9ab6-005056a664e0
Backup id: 4f2b5f6e-9939-11e8-9ab6-005056a664e0
Hostname or ip: host1
Type: Incremental
Timestamp: 2020-09-15 07:26:46
Start: 2020-09-15 07:26:46
Finish: 2020-09-15 08:43:45
OS: Unix
ExitCode: 0
Path: /mnt/backup/host1/2020_09_15__07_26
Archived: True

Conclusion

Butterfly Backup was born from a very complex need; this tool provides superpowers to rsync, automates the backup and restore process. In addition, the catalog allows you to have a system similar to a “time machine”.

In conclusion, Butterfly Backup is a lightweight, versatile, simple and scriptable backup tool.

One more thing; Easter egg: bb -Vv

Thank you for reading my post.

Full documentation: https://butterfly-backup.readthedocs.io/
Github: https://github.com/MatteoGuadrini/Butterfly-Backup


Photo by Manu M on Unsplash.

Fedora Project community For System Administrators Using Software

20 Comments

  1. I really like Borgbackup (with the Vorta GUI) for backups. It’s much like Time Machine for macOS, allows verifyable, signed and encrypted backups and it’s deduplicating data blockwise, which saves a lot of space.

    I used rsnapshot in the past, which is also based on rsync, but that one’s not really in the same league as Borgbackup.

    Here’s a list of some other cool backup tools: https://github.com/n1trux/awesome-sysadmin#backups

    • Wot Aho

      Very interesting. Borgbackup however has no bulk mode and no parallelism algorithm to optimize multiple backups at the same time. Furthermore, Butterfly Backup optimizes space thanks to the hard links supported by the various file systems.
      Butterfly Backup boasts of being very simple to use even though it includes many options, complete and very useful.
      Furthermore, to automate it, you only need crontab or some other scheduler.
      I have been using it for at least a year and it has given me great satisfaction.

      • I didn’t say borgbackup or rsnapshot were better or worse than Butterfly backup.

        You don’t really need parallelism when you just read serialised. This is what IO schedulers are for.

        You can recreate bulk mode with using systemd services and timers. The same is true for pull backups. And you can also automate it that way.

        I use borgmatic in addition to borgbackup to handle the retention settings and up/down scripts, which I automated with systemd timers.

  2. Vernon Van Steenkist

    Seems much more complicated than https://rdiff-backup.net/%5Brdiff-backup%5D without any additional features. https://rdiff-backup.net/%5Brdiff-backup%5D creates a mirror and reverse differential backups and snapshots with one simple command
    [source.bash]
    rdiff-backup foo bar

    More information is https://rdiff-backup.net/docs/examples.html%5Bhere%5D

    https://rdiff-backup.net/%5Brdiff-backup%5D has 20 years of usage behind it and is in the Fedora repositories. https://rdiff-backup.net/%5Brdiff-backup%5D can be installed by simply issuing the command
    [source.bash]
    dnf install rdiff-backup

    When people write these kind of articles, it would be helpful if they explain the advantages of the program in their article over similar programs already in the Fedora repositories,

  3. Anon Ymous

    Thanks for the article, i actually will check out BB so one person at least did because of your article 🙂 Of first notice, that is lots of dependencies to install just to be able to back up one’s system, especially since btrfs has its own backup system baked in. (Especially in releases after Fedora 33, they said they will work on updating the kernel to support more and more btrfs capabilities) Also of note, prolly the oldest backup system that works is CloneZilla – it feels like 1980 but the darn thing still works. Anyways, it is nice to learn new things so i will actually follow your guide and use BB and see if i can backup/restore, so thanks!

  4. It's the current year

    “$ sudo dnf install python3 openssh rsync git
    $ sudo pip3 install cryptography

    Installation

    After that, installing Butterfly Backup is very simple by using the following commands to clone the repository locally, and set up Butterfly Backup for use:

    $ git clone https://github.com/MatteoGuadrini/Butterfly-Backup.git
    $ cd Butterfly-Backup
    $ sudo python3 setup.py
    $ bb –help
    $ man bb”

    Easy!? Can we have a flatpak or a snap, please?

    • Helix

      Why use snap when you could also have a package in the official repository? (Which is also not the case)

  5. cstratak

    python3-cryptography is already available in the repos though. Any reason for using the pypi package instead?

  6. @Helix
    I’m on GNU/Linux Ubuntu — which doesn’t have the Vorta GUI in their repos. Is there a repo available for other distros?

  7. Ah sorry for the followup to my own post – Should have searched first before asking.
    Git for Vorta is here: https://github.com/borgbase/vorta

  8. Thanks everyone for the feedback and ideas. I will take note, as the next version is under development, which will not only be an executable but also a python module.

    So, @Vernon Van Steenkist,

    rdiff-backup

    is a tool that I used too, very nice and versatile but it only performs one-to-one backups. If you need simultaneous backups at the same time, you need to run multiple commands or scripts. Furthermore, the only mode present in rdiff-backup is the incremental one. BB supports various modes, including differential (better explained here: https://butterfly-backup.readthedocs.io/en/latest/#backup). Furthermore, the fact that it has been around for at least 20 years does not mean that it can cover all needs. Someone mentioned CloneZilla (which is not a backup tool but a ghosting tool, as it does a raw image of the disk); others have mentioned other very nice and useful tools, such as

    rsnapshot

    (BB in Mirror mode). BB does not want to replace any or reimplement these tools. BB wants to be “a simple wrapper of rsync”. Over time, BB has grown thanks to user requests and today supports many ways to perform backup/restore/export/archive/retention (older than and number).
    Furthermore, rdiff-backup has no catalog other than the structure in the filesystem, which is fine for a daily backup, but if we think of a bulk backup of 100 machines in the cloud, it becomes very complex in the restore case to understand which backup is which machine.

    @It’s the current year, thanks for the clarification. In fact, in the next release I’m going to do an rpm and a deb for the major distributions and in case of fedora I’ve been using for 15 years now, I’ll open an official copr of BB. I do not know snap but it seems to me something additional than the official fedora tools (like dnf). I don’t think I’ll go in this direction. While I have already discarded the flatpak idea because they have a strange way of interacting with the operating system. Flatpak modifies the umask of the home and other folder; for the single user there is no problem, but for a multi-user wokstation it does.

    By this, I don’t mean that BB is better than everyone else; it is just a lightweight alternative to having an exhaustive catalog of your backups. If you were to do hourly backups with rdiff-backup or BorgBackup (

    borg extract

    ==

    bb export

    and not

    bb restore

    [they do two different things]) and restore after days by saying” two days ago “, it would be really difficult to understand what is being restored.

    These tools are great when you have a backed up machine, but then again, when you have 100?

    @all, thanks for all the ideas and opinions that will come.

  9. Nick

    @cstratak because, having been a pythonist for years, pip is much more up to date on python modules than the official repos. That’s all. It seems like a good choice for a real python developer.

    @Matteo Guadrini I tried BB and I must say that I was surprised. I thought it was another tool like the others, but actually, it is very different. Good performance and the fact that in bulk mode you fork processes (so you can give priority with the nice command).
    In my opinion something needs to be improved at the python code level.

  10. Jasper

    I’m also a big fan of borgbackup. Have restored the full file system of Ubuntu&Fedora several times and it works really well. I create a backup every day and the automatic purge function lets me only keep about 10 different backups (~5 months, 2 months & the last 2 weeks).

    I like this bulk feature, but for me this is not a requirement. I have an rpi that sends a wake-on-lan signal to start up the computers every day (crontab). In the crontab of each computer there’s a bash script that runs and creates the borg-backup on a server.

  11. Alexis Jeandet

    Please avoid telling people to use ‘sudo pip install’, that’s a really bad practice which can lead to messing up system python install. Using ‘pip install –user <package|git+url>’ is way safer.

    • Nick

      It’s okay if you need to isolate the module to be installed by user, but in this case, being a command line tool, you don’t know which user or users use this command and so it’s okay to omit the –user.

      • Alexis Jeandet

        Well, it still exposes the user to high risks of breaking his system:
        https://fedoraproject.org/wiki/Changes/Making_sudo_pip_safe
        Most common case is upgrading a dependency with pip that is incompatible with system tools such as dnf.

        • Nick

          This is a warning regarding python modules that you don’t know where they come from. In the case of the cryptography module it comes from the Python Cryptographic Authority, a reputable and secure source. I don’t see any problems. It all depends on the context in which you run the scripts and import the modules. “Each of us is a responsible user”, he knows what is best for his system. And I agree with Matteo Guadrini’s github caption: “Simple is better than complex. Complex is better than complicated.”

  12. ollie john

    Thanks for the article. How do I uninstall this? Do I have to backtrack the setup.py?

    • Nick

      Hi ollie john, in setup.py there are four paths specified:
      /usr/share/man/man1/bb.man
      /usr/share/man/man1/bb.1
      /opt/bb (in Darwin == OSX is /Applications/bb)
      /usr/bin/bb
      so I would say that removing these paths is uninstalling:

      rm -rf /usr/share/man/man1/bb* /opt/bb /usr/bin/bb

      I agree it’s not the correct way, but this is it. I imagine that in the next release that the maintainer was talking about above, it will integrate the uninstall as well.

      Anyway, all the software weighs less than 1MB …
      I see no reason to uninstall it

Comments are Closed

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat. Fedora Magazine aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. The Fedora logo is a trademark of Red Hat, Inc. Terms and Conditions