The Only Backup You'll Ever Need

Table of Contents

1 backup Introduction

This idea for a simple backup function, stems from my days at Fidessa, where the question arose:

When using a script to back up a file, what if you successively copy the file down a chain of suffixes, say .001, .002, …, but it's not long enough?

For example, you may have stopped at .004, when it was the non-existent .005 you really needed.

$ cp this.003 this.004
$ cp this.002 this.003
$ cp this.001 this.002
$ cp this.txt this.001

Here are the requirements for this backup, and the related simple version command.

To save unlimited backups, the file system offers a ready solution. Rather than play with suffixes, just push a differing copy of the current file down a directory stack. Do this indefinitely until you're satisfied. In practice, I've seen this grow to thirty-something entries before the need to clean-up arose.

$ ...
$ cp .bak/.bak/.bak/this.txt  .bak/.bak/.bak/.bak/this.txt 
$ cp .bak/.bak/this.txt       .bak/.bak/.bak/this.txt 
$ cp .bak/this.txt            .bak/.bak/this.txt 
$ cp this.txt                 .bak/this.txt

Notice the first line, "…". It suggests "one more if needed". With this backup, You'll never run out of backup directories. Also, notice that after copying this.txt to .bak, then .bak/this.txt is identical to the current file. That suggests an even simpler version-ing system, which is the subject of this note.

This paper is part of my Commonplace Book

2 backup code

In any case, here's the code for backup:

2.1 functions

function backup                      # (backup)
{ 
    : backup file arguments
    case $1 in 
	"")                          # (gracefully)
	    usage file ... recursively backup to .bak/.bak/...;
	    return
	;;
	*) 
	    local fun=backup_$1
	    is_function $fun && {
		shift
		$fun $*
		return
	    }
	;;
    esac;
    trace_call $*;                   # (trace)
    foreach backup_one "$@"
}
function backup_here                 # (backup_here)
{
    : recursively backup one
    report_notcalledby backup_one && return 1   
    set $1 .bak ${2:-$PWD};          # (set)
    [[ -d $2 ]] || mkdir $2;         # (mkdir)
    [[ $3 == $PWD ]] && {            # (base directory)
	cmp $1 $2/$1 2> /dev/null 1>&2 && return # (cmp)
    };
    [[ -f $2/$1 ]] && {            # more work to do, so
	cd $2;                        # recursively descend
	backup_here $1 $3             # and backup this too
    } || {                         # OR ...
	mv $1 $2/$1;                  # (to the backup)
	[[ $3 == $PWD ]] && {         # (back to home base)
	    cp $2/$1 $1;              # (current)
	    timestamp $2/$1 $1;       # (sametime)
	    return
	};
	cd ..;                        # (recursively ascend)
	backup_here $1 $3
    }
}
function backup_one                   # (backup_one)
{ 
    : allows non-local file names
    report_notfile $1 && return 1;
    report_notcalledby backup && return 2  # (notcalledby)
    ignore pushd $(dirname $1);       # (ignore)
    backup_here $(basename $1);       # (invoked)
    ignore popd
}

2.2 comment

The three backup functions (backup, backup_here, and backup_one) highlight an approach I'm now taking. The function now named backup_here was originally the first backup function I wrote. As a unit-test, I tested it on single files in the current directory. I tested the simple features before adding the ability to backup multiple files and files not in the current directory. So, as the backup function evolved to add these separate features, the new features are recognized by new names. While these names are accessible to the user, they are rarely needed, and backing up a single file in the same directory works to the same original interface.

The instances of notcalledby effectively turn the function into a local function. Comment the statement out to perform a unit test of the function.

3 Code reader's guide

3.1 getting started

The first trick is on the first line, using a set idiom, which in this case, sets the shell's positional parameters. Here, the first parameter, $1 is unchanged and was assigned the name of the backup file; the second is assigned the name of the backup directory, .bak, while the third parameter is assigned either the second argument to the function ($2), or defaults to the current directory ($PWD).

param source means
1 $1 backup file
2 .bak backup directory
3 $2 or $PWD current directory

When invoked the first time, the second argument is empty, so the third positional parameter is set to the current working directory. Now the assurance offered by (backup_one): insulating the backup function from multiple file names is seen to be useful. The third parameter then holds the starting directory of the backup chain as a means of branching; either are we starting down the chain, or are we done, having returned from the last nested recursive call.

Then create (mkdir) a missing backup (.bak) directory.

3.2 down the chain

If we are just starting down the chain, the condition for (base directory) is true. Therefore, and only this first time, is it necessary to compare the current file $1 with it's most recent backup $2/$1, which by the way, needn't exist. If they are identical the comparison cmp succeeds and there is no further need of comparing with other files. Return. There is nothing further to do.

Since there may not be a nil $2/$1, a decision is necessary. If there is a backup file, since it's now known to be different, some precautions are necessary. Since there is a backup file, it too must be backed up. So, go to the backup directory and nil through any backup directories.

At some point there is no backup file. Why? We've descended to the bottom of the chain of existing backups.

On the very first pass, you may have created the backup directory, the cmp failed, the existence of a lower name is irrelevant (for the moment). So, now we are in a directory that has a proper backup file, and the directory below does not, and therefore is ready for the file to move to the backup directory.

3.3 and back up

And worth thinking about for a moment. We have just moved the backup file to a lower directory that was open to receive it. The directory we are now in is now in a position to retrieve the file in its parent. If we are back to home base, then we need to copy the current file we just moved into the top backup, now back to the base directory. And, as a little flourish, set the time stamp on the file to the sametime as the just-moved file. And return. We're done.

If on the other hand, we have not returned up to the base directory, then we recursively ascend the backup directory tree. Remember on the way up we are returning to a directory whose backup copy had just been pulled down. And we do this until we return to the base directory.

3.4 general principals

Before leaving, I'm employing some general principals that my shell practice has evolved:

  • do nothing (gracefully)

Here, the easy thing to do is give the user a help message. For those commands which will do nothing without an argument, take the opportunity to show a help message. In this case, remind the user the arguments are files and may be many of them.

A grace I've learned to use: when there is a main function like backup with a number of related sub-functions, allow a command-line user the option of using a space, rather

While I like to trace every function, the exception here is the preponderant use of backup from the command line, or other script, and the much rarer use of backup_here or backup_one in similar circumstances.

  • use semantic commands, in this case ignore the standard output

This is a big deal with me. Here is the code for ignore and, while we're at it quietly:

function ignore
{ 
    $@ > /dev/null
}
function quietly
{ 
    $@ 2> /dev/null
}

I would just as soon suffer any inefficiency at run-time to use a semantic function over the syntactic means. Syntax is for those who need it to feel they understand the medium. Semantics is for those of use who'd like to read what they are doing.

E.g:

ignore   the (standard) output of this command.
quietly  do this command -- I don't need the error messages.

Question: do you think this means anything:

quietly ignore everything   

is a vast improvement over

everything 2>&1 >/dev/null         # or
everything 2>/dev/null >/dev/null   

4 Auxiliary Code

You have noted there are a number of commands in the backup code you won't find in the *nix utilities on any system you've seen. So, here is the description (and code) for the Auxiliary Library

4.1 An application

This challenge, noted in my Software diary on Tuesday, 28th, 2015 is to produce a fully-functioning application of the library functions. There are three pieces: the functions above, the remainder of the functions in the backup library, and any functions in the auxiliary library.

Also the tooling it takes to move the pieces into place and write their separate instances.

  1. backup 0
    @include backup.1
    @include backup.2
    
  2. backup L
    @include backup.0
    
    function backup_init
    { 
        source auxlib;
        backup_copyright
    }
    backup_init 1>&2
    
  3. backup A
    @include backup.0
    @include ../auxlib
    

4.2 make the library, application

As it now stands, here is the sequence to create both the local library, and wrap it up as an application. For the reader, I've hidden the most of the backup functions in a library I'm not showing in the on-line document, but are available to an editor of this OrgMode source (and in the run-time code library and application).

  1. "tangle" the files here to their run-time locations
  2. push the .include directory on the run-time stack in a shell window
  3. run two commands, probably perfectly general for other instances, to both build the library and the application

5 reference

Author: Marty McGowan

Created: 2018-07-10 Tue 16:49

Emacs 24.4.1 (Org mode 8.2.10)

Validate