It promptly booted up into recovery mode on the failure of a filesystem to mount, so I panicked and backed up /home as long as it was there. However, this made even less sense as to why sda was missing, because that's where the active MBR, grub, and /boot are: the latter holding the kernel and initramfs image that were now running. If the disk was gone, how did I boot from it?
Long, boring story short: the disk didn't vanish until something triggered a read of the whole structure of the disk. grub only needs the first few sectors to get started booting, so it wasn't until the later search of filesystem UUIDs that the kernel somehow wedged the disk so hard that it needed a cold reboot to re-appear. A warm reboot would hang the POST at drive detection.
It turns out that libata knows how to hard-reset a SATA drive, and this also works, but it never fell back to using it until I was booting with
libata.force=nosrstincluded on the kernel command line. (This also happened to be with a USB stick, so that I could have a functional linux to examine the damage with.) That let me get it working enough to do a fsck, which restored it to fully operational.
Now I have a backup of the "more reliable so I don't need to back it up, besides it'd take forever to dribble 15 GB out over usb2" volume. I want to say that's the worst 15-minute savings ever, but OTOH, not having the backup meant it was worth trying to fix the problem instead of writing off the drive and its data as a loss.
Update: this happened again, so I added a drive, fixed the dying drive again, and migrated everything to the new disk. I'm glad I set up lvm ages ago, because pvmove made it 95% easy.
Post a Comment