Thursday, June 07, 2007

Fixing My Bug

I finally have enough time to write up the resolution to my difficulties with the 2.6.20-16 kernel upgrade:

First of all, there is a bug, or at least an inconsistency, in the way the kernel is handling SATA disks. (Warning: Blogger is about to dive into areas he knows almost nothing about. Please keep your various appendages inside the vehicle while he attempts this dangerous maneuver.)

Apparently, disks (or, at least, SATA disks) can be mounted using one of two drivers: piix (Pci Ide/Isa Accelerator), which treats the disks as IDE drives, and ata_piix, which treats them as SCSI drives. In the former case, drives are mapped to /dev/hda, /dev/hdb, /dev/hdc, etc. In the latter, the drives are mapped to /dev/sda, /dev/sdb, /dev/sdc, and so on.

In the Dapper release, my drives were mapped to /dev/hda and /dev/hdd, meaning that we were using piix. In the initial install of Feisty, however, the drives were mapped to /dev/sda and /dev/sdb. (Drives “d” and “b” are the same drive, IDE worries about which controller the drive is connected to, SCSI doesn't.)

It doesn't matter, so long as we're consistent. So long as we can construct a proper /etc/fstab file, all will be well.

My problem began with the aforementioned update. From what I learned reading bug report, sometimes the piix driver gets control of the disks, and sometimes the ata_piix driver does. The kernel should choose one or the other consistently, but different kernels choose different drivers — that's the bug. (I think.)

Now, as to what was happening to me. Let's look at my pre-20-16 /etc/fstab file:

# /etc/fstab: static file system information.
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
# /dev/sda1
UUID=38afc33a-e732-45eb-8271-cd23555ce5bc /               ext3    defaults,errors=remount-ro 0       1
/dev/sdb1                                 /home           ext3    defaults        0       2
# /dev/sda6
UUID=efc49726-51d6-411e-9745-a05edad31c21 /opt            ext3    defaults        0       2
# /dev/sda2
UUID=a955cf3a-b1bc-4998-92e1-0073aba94c4c /scratch        ext3    defaults        0       2
# /dev/sda5
UUID=79a7a282-c5b8-416b-a738-89ed35013381 /usr/local      ext3    defaults        0       2
# /dev/sda3
UUID=080af2c2-1bd6-4b4a-bceb-2e7c6c2a8fa2 none            swap    sw   0       0
/dev/scd0       /media/cdrom0   udf,iso9660 user,noauto     0       0
/dev/fd0        /media/floppy0  auto    rw,user,noauto      0       0

One of these lines is not like the others, one of these lines just doesn't belong. Spot it? It's the line that starts /dev/sdb1. There is no UUID for this drive, which just happens to be the partition holding all my data. Apparently (Danger! He's thinking again), the piix and ata_piix drivers can locate disk partitions by their Universal Unique IDs, and the initial Feisty install determined the UUIDs for all of the drives in my system, and entered them into /etc/fstab, helpfully putting up a # /dev/sda? comment for each drive, so we know how to refer to it in the traditional way.

Except that I didn't have my data disk hooked up when I installed Feisty — if something went wrong, I didn't want to accidentally erase all my files. I installed the drive later, and manually added the /dev/sdb1 line to /etc/fstab.

Which works fine, so long as the kernel keeps using the ata_piix driver! With the update to 20-16, the kernel started referring to my data drive as /dev/hdd1, which wasn't in the /etc/fstab file.

OK, I could just change /dev/sdb1 to /dev/hdd1 and all would work, but suppose the next kernel starts using ata_piix again? A more elegant solution is needed, and that is to refer to my data disk by its UUID.

And how to I do that you ask? Well, I sure didn't know, so I looked around the Ubuntu Forums and general Googling®. There, I discovered the command

$ ls -l /dev/disk/by-uuid/
total 0
lrwxrwxrwx 1 root root 10 2007-06-06 13:06 080af2c2-1bd6-4b4a-bceb-2e7c6c2a8fa2 -> ../../hda3
lrwxrwxrwx 1 root root 10 2007-06-06 13:06 38afc33a-e732-45eb-8271-cd23555ce5bc -> ../../hda1
lrwxrwxrwx 1 root root 10 2007-06-06 13:06 79a7a282-c5b8-416b-a738-89ed35013381 -> ../../hda5
lrwxrwxrwx 1 root root 10 2007-06-06 13:06 a955cf3a-b1bc-4998-92e1-0073aba94c4c -> ../../hda2
lrwxrwxrwx 1 root root 10 2007-06-06 13:06 bdad36ce-b67a-4c39-988e-94f40df90f67 -> ../../hdd1

Which tells me that /dev/hdd1 has the UUID bdad36ce-b67a-4c39-988e-94f40df90f67.

So now the resolution of the problem (for me, anyway) is simple. I just replace the offending line in my /etc/fstab file by

# /dev/sdb1
UUID=bdad36ce-b67a-4c39-988e-94f40df90f67 /home           ext3    defaults        0       2

Problem solved. I can now use the new kernel, and, hopefully, will be immune from this type of problem.