Sunday, May 30, 2010

Return With Us Now to the Thrilling Days of Yesteryear

I'm not very good at doing disk backups.

OK, let's be honest: I'm terrible at doing disk backups. Oh, back in the day, every once in a while I'd copy as much of the disk as I could onto CDs or DVDs. But that takes a lot of time, and, as disks become larger while DVDs don't, requires a lot of picking and choosing to figure out what goes where.

To be fair, I haven't needed to. In all the years I've run computers here at home, I've never had a disk crash that lost data, even after our most serious disk mishap. So not having a backup hasn't been that big a problem. (Insert your favorite ominous foreboding music here.) Anyway, backups are a pain in the neck.

My view of backups has changed, though, primarily because they're so much easier now. Easiest, of course, is Apple's Time Machine. Plug a second disk (USB, Firewire, internal, if you've got such a thing) into your Mac, let it format to Apple's HFS format (or not), and start Time Machine. Tell it which directories you want to back up, and it will do that, every few minutes. Then you can actually scroll back through time (Tardis sounds not included) and pick out a file you might have deleted a year ago — assuming you were running Time Machine back then. It's an extremely neat and easy-to-use utility, and takes the pain out of doing backups, as long as you remember to leave the external disk plugged in, if that's what you've got.

Then, last week, Office Depot ran a sale where you could get a Terabyte Verbatim USB disk for $79.99 + tax. I bought three, one for Hal, one each for the college students.

Now the disk comes with Nero backup software for Windows (taking care of student I), and Apple's got Time Machine (taking care of student II). But what about poor Hal, stuck in Linux?

I called up the Synaptic package manager and typed backup in the search box. That brought up, among other things, a package called backintime, which is a front-end which uses the utilities cron (to schedule backups), diff (to find which files are changed), and rsync (to copy files that have changed to the backup disk). Perfect. I installed the program — actually both backintime-common, the guts, and backintime-gnome, the graphical front end, then went to get the disk.

Problem: Verbatim formatted the disk to the lowest common denominator, FAT32. This is not suitable for low-volume backups, because it does not allow multiple hard links to the same file. What backintime does, you see, is create a copy of your chosen disk directories every time you schedule a backup. The trick is that unchanged files are hard-linked from one backup to the next, taking a minimum of space. Only the changed files are stored in multiple copies. Since the FAT format doesn't do multiple hard links, it can't do the trick, and so we need to reformat the disk to a better file system.

So I installed gparted from Synaptic, launched it (using sudo gparted), and formatted the Verbatim disk to ext3 format.

Second problem: If you just plug a USB disk into a Ubuntu box, it auto-mounts, but is usually only accessible to the user who did the mounting (to be fair, I haven't tried this with a USB ext3 disk). That's unacceptable in a backup system that needs to be accessible to everyone. So we have to set up /etc/fstab. To reflect this. Ideally, we'd add a line which looks something like this:

/dev/sdd /backup ext3 errors=remount-ro 0 2

(Note added after the fact: The last entry on this line should indeed be a 2, not a 1 as previously listed. This this tells fsck to check this file system after any file system with a 1 in the last column. The only file system with 1 there should be the boot partition. Everything else can be a 2, unless you don't want fsck to look at the file, in which case you put in a 0. Got that?)

create a directory entry /backup, and then doing

mount /backup

would load up the disk drive.

However, since you usually have multiple USB ports on your system, there's no guarantee that this disk is going to always get /dev/sdd as its port. The solution is to use the ext3's UUID, which was created when gparted did its thing. You find the UUID with the command:

sudo blkid
[sudo] password for hal:************
/dev/sda1: LABEL="PQSERVICE" UUID="0A72323D72322DB7" TYPE="ntfs" 
/dev/sda2: LABEL="SYSTEM RESERVED" UUID="CCA811C2A811AC48" TYPE="ntfs" 
/dev/sda3: LABEL="Gateway" UUID="7E143E7F143E3A8B" TYPE="ntfs" 
/dev/sda5: UUID="2eec3fff-73d2-419c-8a3c-b92733d46da2" TYPE="swap" 
/dev/sda6: UUID="f3781533-d891-46e1-b9f0-839e8a538d35" TYPE="ext4" 
/dev/sdb1: LABEL="Verbatim" UUID="9deee42d-539b-45c5-8683-e95f889c1792" TYPE="ext3"

Verbtim is how I labeled the disk partition of the USB drive, so that's the UUID we want. Fill out the line in /etc/fstab as:

UUID=9deee42d-539b-45c5-8683-e95f889c1792 /backup ext3 errors=remount-ro 0 1

and we're ready to go. Note that this will automount so that everyone can read it every time the disk drive is plugged in and turned on.

That settled, setting up backintime is easy. You can specify the directories you want to back up, and the file types (say, those ending in ~ or .o) and directories you want to exclude. You can then select a backup time interval: minutes, hours, days, weeks ... Once you have a set of backups, you can use the graphical interface to look through your files of any snapshot, and restore a file that was deleted or moved on your main system, or copy it to another location if you like. It's very nice and easy to use.

You can see your backup schedule using crontab:

crontab -l
@daily nice -n 19 /usr/bin/backintime --backup-job >/dev/null 2>&1

Note the @daily there. That means the backup will start every night at midnight, and it's the only choice backintime gives you if you select a daily backup. This is unsatisfactory for me, as someone is frequently using the computer on and after midnight. So I used crontab -e to edit this entry, changing it to
0 3 * * * nice -n 19 /usr/bin/backintime --backup-job >/dev/null 2>&1
which launches the backup at 3am. backintime accepts this, but note that every time you change your setup it will reset the time to @daily. Just watch it.

Anyway, now I have a disk backup. And since I've never had a major loss of disk data from one disk, having two disks copying the data should mean that it can never, ever, ever happen, right? (Insert even more ominous and foreboding musing here.)