We all know that good backups are of great importance, but creating them can be quite a hassle. With Duply, a wrapper for the popular Duplicity, you're able to backup any Linux server or Desktop to Aurora Objects in a fast, efficient and safe manner.
First of all, configuring the system to get started is really easy, while still providing robust and, more importantly, useful backups. One of the advantages of Duplicity is that it allows you to create incremental backups. Duply can rapidly calculate what the differences are between the latest backup and the next in line, because the software keeps track of a local catalogue and only backs up changes, instead of creating a backup from scratch every time.
When following this tutorial we assume that:
You have an active Aurora Objects account.
You're familiar with s3cmd and have this working with your Aurora Objects account.
You created a (private) bucket on the Aurora Objects platform.
You're familiar with the command line.
This tutorial can be followed on both CentOS and Ubuntu and we tested this tutorial with CentOS 7 and Ubuntu 14.04. There's a good chance that this tutorial can be followed with other distributions as well (such as Debian), but we haven't tested this.
To start we begin with installing the duply, duplicity and python-boto packages. The last package will allow us to establish a connection with Aurora Objects.
Installation on CentOS
If you're using CentOS you must have the EPEL repository activated, otherwise the package manager will not be able to find the packages. To add the EPEL repository, use this command:
$ yum install epel-release
Now we install the packages using this command:
$ yum install duplicity duply python-boto
Installation on Ubuntu
$ apt-get install duplicity duply python-boto
Creating a backup profile
After installing the packages we'll create the first backup profile with Duply. While it is possible to create multiple profiles, in this tutorial we only use one profile for the entire machine. We named our server hulk, so we're giving our backup profile the same name.
$ duply hulk create
This should give back the following result:
Congratulations. You just created the profile ‘hulk’.
The initial config file has been created as
You should now adjust this config file to your needs
If you executed the command on CentOS, you'll see that the folder where Duply stores its configuration files is different. Ubuntu stores the config on the user level, such as /root/.duply, where CentOS uses /etc/duply.
We installed the required packages and we created a profile. Now its time to change the general configuration. In this example we use the nano editor, of course you can use whichever editor you prefer.
$ nano /etc/duply/hulk/conf
$ nano /root/.duply/hulk/conf
For a properly functioning backup profile we only need to make a few changes in the configuration file. All backups are encrypted with GPG. You can use an existing GPG key for your backup profile, or enter a password and a GPG key will be generated, which is what we're doing in this example.
Now we need to be able to connect to Aurora Objects. TARGET consists of the URL followed by the bucket name and the folder. In this example the used bucket is called backup and the backups will be stored in the hulk folder.
Where it says access key and secret key you fill in the keys of the Aurora Objects user that has access to the bucket.
# optionally the username/password can be defined as extra variables
# setting them here _and_ in TARGET results in an error
Where it says source you enter the folder that's the highest in hierarchy of which you want to create backups of. If you want to backup multiple folders, then "/" is probably what you'll want to use. If you only intent to create backups of certain files, then /home might be better suited.
# base directory to backup
The last thing we'll change is the time you wish to save your backups. In this example we store all backups for one month before they are deleted. If you were to change this to 7D, then they will be stored for 7 days. More information on the time formats can be seen further down this tutorial.
# Time frame for old backups to keep, Used for the “purge” command.
# see duplicity man page, chapter TIME_FORMATS)
We're almost ready to create our first backup, the only thing left to do is to let Duply know which files we want to have backed up. This is done by changed the exclude file in the Duply configuration folder. If this is not done then all files located in the source (base directory) will be backed up.
Open the exclude file:
$ nano /etc/duply/hulk/exclude
$ nano /root/.duply/hulk/exclude
In this file you enter the folders you wish to backup. In the following example we backup 3 directories and nothing else.
As you can see you can add directories that you wish to backup with a "+", and by using "**" all other folders will be excluded. Of course there are more things that you can do, on this blog you can find a good example of a comprehensive exclude configuration file.
We're now ready to create the first backup. We do this with the backup command.
$ duply hulk backup
This should give back a result like the following:
Start duply v1.5.10, time is 2015-03-12 13:42:28.
Using profile ‘/root/.duply/hulk’.
Using installed duplicity version 0.6.23, python 2.7.6, gpg 1.4.16 (Home: ~/.gnupg), awk ‘mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan’, bash ‘4.3.11(1)-release (x86_64-pc-linux-gnu)‘.
Signing disabled. Not GPG_KEY entries in config.
Test - Encryption with passphrase (OK)
Test - Decryption with passphrase (OK)
Test - Compare (OK)
Cleanup - Delete ‘/tmp/duply.19890.1426164148_*‘(OK)
--- Start running command PRE at 13:42:28.332 ---
Skipping n/a script ‘/root/.duply/hulk/pre’.
--- Finished state OK at 13:42:28.339 - Runtime 00:00:00.007 ---
--- Start running command BKP at 13:42:28.345 ---
Reading globbing filelist /root/.duply/hulk/exclude
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
StartTime 1426164148.47 (Thu Mar 12 13:42:28 2015)
EndTime 1426164149.50 (Thu Mar 12 13:42:29 2015)
ElapsedTime 1.02 (1.02 seconds)
SourceFileSize 2258787 (2.15 MB)
NewFileSize 2258787 (2.15 MB)
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
RawDeltaSize 1542708 (1.47 MB)
TotalDestinationSizeChange 556062 (543 KB)
--- Finished state OK at 13:42:29.923 - Runtime 00:00:01.577 ---
--- Start running command POST at 13:42:29.930 ---
Skipping n/a script ‘/root/.duply/hulk/post’.
--- Finished state OK at 13:42:29.939 - Runtime 00:00:00.008 ---
To check whether everything was successful we use s3cmd to see if the files are stored on Aurora Objects:
$ s3cmd ls s3://backup/hulk/
2015-03-12 12:42 90253 s3://backup/hulk/duplicity-full-signatures.20150312T124228Z.sigtar.gpg
2015-03-12 12:42 215 s3://backup/hulk/duplicity-full.20150312T124228Z.manifest.gpg
2015-03-12 12:42 556062 s3://backup/hulk/duplicity-full.20150312T124228Z.vol1.difftar.gpg
If they are indeed stored on Aurora Objects then your backup was a success. You can now run the backup command as many times as you like and insted of creating a full backup of all files, an incremental backup is made of only the files that were changed. Manually executing the command is not something you'd want to be doing of course, to automate this task is explained later in this tutorial.
The status command gives you an overview of your backup profile, including the amount of backups that have been made.
$ duply hulk status
The list command shows a list of all files in the backup (note: this can be a lot).
$ duply hulk list|more
Restoring with backups
Restoring your data with a backup is very easy to do with Duply and can be done with the restore command. For example, the following command restores everything into the /tmp/restore folder.
$ duply hulk restore /tmp/restore
If you'd like to restore an older backup, of 2 days ago for example, then this can be done like this:
$ duply hulk restore /tmp/restore 2D
On the manpage of Duply you can find all possible time formats:
For all time related parameters like age, max_age etc.
Refer to the duplicity manpage for all available formats. Here some examples:
2002-01-25T07:00:00+02:00 (full date time format string)
2002/3/5 (date string YYYY/MM/DD)
12D (interval, 12 days ago)
1h78m (interval, 1 hour 78 minutes ago)
If you'd like to restore a single file, instead of a whole backup, then you can use the fetch command.
$ duply hulk fetch etc/passwd /tmp/restore/passwd
Just like with restore you can add a certain time to the fetch command.
Creating backups automatically
The only thing we need to do to automize things is to execute the backup command from time to time, which can be done with cron:
$ crontab -e
For daily backups you can add the cron listed below. By adding the mailto line you will receive an email with information whether the backup succeeded.
@daily duply hulk backup
Of course more can be done with Duply than what has been described in this tutorial, which can be seen on the manpage of Duply. One of the features that you might want to check out are pre and post scripts that you can execute. These are commands that will be executed before or after the backup takes place. One use case of this is to create MySQL dumps before a backup takes place and to clean the dumps after the backup has finished.
To do this you will need to add two files in the configuration directory, call them pre and post, and enter the commands you want to have executed in these files.
/usr/bin/mysqldump --all-databases -u root -p > \
/bin/rm /var/backups/mysql/dump-$(date ‘+%F’).sql
With the example above you'll always backup the latest database dumps. Of course don't forget to exclude /var or /var/backups/mysql in your exclude file.