Patrick Truchon's Web Portal

Backup

Posted by Patrick on April 24, 2010

(Photo by natetherobot licensed CC By-Nc-Na.)

The Hard Way

We all know that we should backup our files regularly, but most of us don’t.  Who wants to spend half an hour every few days sifting through folders and copying important files on an external hard drive.  Of course, one way to not think about this process is to simply copy everything, but that takes a lot of extra disk space and a time.  Thinking this way about backing up is the same as thinking about cleaning your desk: it takes precious brain resources, and time.

The Automatic Way

The thing is, though, that backing up is not at all like cleaning a desk.  We’re not good at doing repetitive work, but computers are; all they need are clear instructions: what to backup, and where to back it up.  There are lots of free backup utilities [1] to help give those instructions, but not all of them are that good.  Here are three basic features a good backup utility should have:

  1. The first time you use it, it should help you decide what you want to backup, then it should be able to do its job as often as you need it to without bugging you again.
  2. To save time, it should be able to know which files changed since the last backup and which didn’t, and only copy those that changed (instead of all of them).
  3. It should keep different “versions” of your backups so you can travel back in time when needed.  But to save disk space, it shouldn’t duplicate the actual data of different versions when it’s the same.

What I Do

I searched a long time for a utility that could do all of this, but found that most of them added too many unnecessary features or were too “user friendly” by forcing me to use an automatic recovery procedure that made the whole thing kind of cryptic.  If I ever need to recover a lost file, I don’t want to just press a button to recover the whole thing, I want to navigate to my file and recopy it to my computer.  Not wanting an “auto-recovery” feature may seem like a step back, but I have two reasons for this: first, most of the time, when I need to recover files, it’s because I made a mistake by deleting them, not because my system crashed.  But even in the event of a system failure, I would probably use the “opportunity” to install a new system, so I don’t necessarily want to recover all my files.

In any case, what I wanted was a backup utility, not a recovery program.  So finally, after failing to find what I wanted, I ended up writing my own backup script using rsync [3] and cp -al [4].  Here is what it does.

  1. First, I need to tell it what I want to backup and where.  I do this (only once) by editing the script.
  2. When I’m ready to backup, the first stage of the process copies the files.  The program rsync looks at the folder I want to backup and makes itself a list all the files in it (with some details like their size, date of last change, etc.).  Then, it does the same thing with the backup folder.  Finally, it compares the lists, deletes the items from the backup folder that are no longer on my computer, and updates the ones that have changed.
  3. Finally, the program cp -al (on GNU/Linux) or cpio (on OS X) makes what seems to be a copy of the entire backup folder and adds the date.  The important difference is that it doesn’t actually copy the files, but created hard links [5] instead.  What that means is that even though it looks like there are two folders containing the same data, they are actually two folder names sharing the data.  When I make a new backup, the recent folder will be updated (in step 2), and a new hard link copy will be made.  Each time, the hard linked copies share the data that hasn’t changed, but possess their own versions of the files that are different.

The Result

The only thing I have to do is press on a button to load the script, enter my password, and watch the whole thing go.  At the end, I get a new version of my backup sitting along side the previous ones. If I check the size of each of these folders, it looks like they are all about 170GB in size.  In truth, though, all these folders share most of the data.  When I use a different way to look at the size, I see that the first backup has the biggest size, and the newest backups are smaller since they only contain the “difference”:

ptruchon@Home:/media/HD$ sudo du -shc *

93G BackUp_20091113_030205
81G BackUp_20091206_095256
12G BackUp_20100105_103154
4.3G BackUp_20100120_064439
7.3G BackUp_20100203_064727
6.9G BackUp_20100209_202706
795M BackUp_20100301_063811
659M BackUp_20100312_152826
788M BackUp_20100322_232813
9.8G BackUp_20100329_093746
1.3G BackUp_20100401_105604
1.2G BackUp_20100415_093503
3.0G BackUp_20100418_063632
878M BackUp_20100421_224022

At School

At our school, we also give all our students 3GB of server space to store their files or post things online.  I’ve adapted my script so that every student can backup their school folder (not their entire computer) to their server space easily.  Here’s a quick introductory video demonstrating the procedure in Quicktime or ogg Theora format.

To use these scripts, you need to be using GNU/Linux, or Mac OS X (sorry Windows).  Simply download them from my Webfolder [2] and read my Wiki Notes [6] for more information.  I will probably add more detailed instructions later, but feel free to email me if you have any questions.

Links:

  1. Life Hacker: Backup Utilities, <http://lifehacker.com/tag/backup-utilities>
  2. Patrick Truchon’ Webfolder: Bash Scripts, <http://dl.dropbox.com/u/3896319/WebFolder/Bash_Scripts/index.html>
  3. Wikipedia: Rsych, <http://en.wikipedia.org/wiki/Rsync>
  4. Wikipedia: Cp, <http://en.wikipedia.org/wiki/Cp_(Unix)>
  5. Wikipedia: Hard link, <http://en.wikipedia.org/wiki/Hard_link>
  6. Patrick Truchon’s Wiki Notes: Shell Scripting, <http://ptruchon.byethost32.com/doku.php?id=shell>
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s