Implementing a backup plan

Backups are one of those things that some people don’t seem to take seriously until it’s too late. Data loss can be a catastrophic event for an organisation, so it’s imperative that you implement a solid backup plan.

There’s no one best backup solution, since it all depends on what kind of data you need to secure, and what software and hardware resources are available to you. This may mean that you’ll need to make some compromises, such as creating regular snapshots of your database server’s storage volume or regularly dumping a backup of your important databases to an external storage device.

The rsync utility is one of the most valuable pieces of software around to server administrators. It allows us to do some really wonderful things. In some cases, it can save us quite a bit of money. For example, online backup solutions are wonderful in the sense that we can use them to store off-site copies of our important files. And depending on the volume of data, they can be quite expensive. With rsync, we can back up our data in much the same way, with not only our current files copied over to a backup target but also differentials. If we have another server to send the backup to, even better.

sudo rsync -a /home /backup

Where the -a option, the archive mode, is a wrapper option that includes the following options all at the same time:

-rlptgoD

To point rsync to another server, rather than to another directory on the local server:

sudo rsync -av /home/myuser admin@IP_ADDRESS:/backup

By default, rsync copies data between two locations, but it doesn’t remove anything. With the --delete option, you can synchronise two points, telling rsync to make them the same by allowing it to delete files in the target that are no longer in the source.

Incremental backups

sudo rsync -avb --delete --backup-dir=/backup/incremental /src /target

Copying files from /src to /target, but now sending replaced files to the /backup/incremental directory. This means that when a file is going to be replaced on the target, the original file will be copied to /backup/incremental. This works because we used the-b option (backup) and the --backup-dir option, which means that the replaced files will not be renamed; they’ll simply be moved to the designated directory. This allows us to effectively perform incremental backups.

Using the Bash shell to make incremental backups work even better:

CURDATE=$(date +%m-%d-%Y)
sudo rsync -avb --delete --backup-dir=/backup/incremental/$CURDATE /src /target

Simple script

""" Simple backup script which just creates the root structure in an other
folder and syncs everything which recursively lies within one of the source
folders. For files bigger than a threshold they are first gziped."""

import argparse
import gzip
import os
import shutil
import sys
import threading

def parse_input():
    parser = argparse.ArgumentParser()
    parser.add_argument('-target', nargs=1, required=True,
                        help='Target Backup folder')
    parser.add_argument('-source', nargs='+', required=True,
                        help='Source Files to be added')
    parser.add_argument('-compress', nargs=1,  type=int,
                        help='Gzip threshold in bytes', default=[100000])

    # no input means show me the help
    if len(sys.argv) == 1:
        parser.print_help()
        sys.exit()

    return parser.parse_args()


def size_if_newer(source, target):
    # If newer it returns size, otherwise it returns False

    src_stat = os.stat(source)
    try:
        target_ts = os.stat(target).st_mtime
    except FileNotFoundError:
        try:
            target_ts = os.stat(target + '.gz').st_mtime
        except FileNotFoundError:
            target_ts = 0

    # The time difference of one second is necessary since subsecond accuracy
    # of os.st_mtime is striped by copy2
    return src_stat.st_size if (src_stat.st_mtime - target_ts > 1) else False

def threaded_sync_file(source, target, compress):
    size = size_if_newer(source, target)

    if size:
        thread = threading.Thread(target=transfer_file, 
                                  args=(source, target, size > compress))
        thread.start()
        return thread

def sync_file(source, target, compress):
    size = size_if_newer(source, target)

    if size:
        transfer_file(source, target, size > compress)


def transfer_file(source, target, compress):
    """ Either copy or compress and copies the file """

    try:
        if compress:
            with gzip.open(target + '.gz', 'wb') as target_fid:
                with open(source, 'rb') as source_fid:
                    target_fid.writelines(source_fid)
            print('Compress {}'.format(source))
        else:
            shutil.copy2(source, target)
            print('Copy {}'.format(source))
    except FileNotFoundError:
        os.makedirs(os.path.dirname(target))
        transfer_file(source, target, compress)


def sync_root(root, arg):
    target = arg.target[0]
    compress = arg.compress[0]
    threads = []

    for path, _, files in os.walk(root):
        for source in files:
            source = path + '/' + source
            threads.append(threaded_sync_file(source, 
                           target + source, compress))
#            sync_file(source, target + source, compress)
    for thread in threads:
        thread.join()


if __name__ == '__main__':
    arg = parse_input()
    print('### Start copy ####')
    for root in arg.source:
        sync_root(root, arg)
    print('### Done ###')