2015-02-02

Bringing back an LVM backed volume

What can you do when an LVM backed logical volume goes offline? This happens on my slower netbook on an LVM logical volume spanning about 20 USB flash drives. Sometimes those PVs go missing and the filesystem stops! Here's the steps I take to fix this problem without a reboot. My volume group is called "usb" and my logical volume is called "osd.2".

Since my volume is part of a ceph cluster, I should ensure that the ceph osd is stopped. service ceph stop osd.2. You probably don't need to do this since the OSD probably exited once it saw errors on the filesystem.

Next, unmount the filesystem and mark the logical volume as inactive. We use the -f -l switches to force the dismount and lazily deal with the dismount in the background. Without those switches the umount might freeze.
umount -f -l /dev/mapper/usb-osd.2
Marking the logical volume as inactive can be done in two ways. Prefer the first method since it is more specific. The second method will mark inactive all dismounted logical volumes and that might be overkill.
lvchange -a n usb/osd.2 -or- vgchange -a n

At this point I unplug all the USB drives and check the hubs. Plug in the USB keys a few at a time and use pvscan as you go to ensure that each USB key is being recognised. If you have a dead USB key then try again in another port. If that doesn't work then check the hubs have power - even replug the hub. Failing that try a reboot. Failing that... attempt to repair the LVM volume some other way. Since ceph already replicates data I don't bother running the LVM backed logical volumes on RAID - I just overwrite the LV and make a new one from the remaining USB flash drives.

Once all the PVs have come back then pvscan one last time then vgscan. Now you should see your volume groups have all their PVs in place. Now it's time to reactivate the logical volumes. Both methods will work but again I prefer the first once since it is more specific.
lvchange -a y usb/osd.2 -or- vgchange -a y

All things going well and the Logical Volume is now active. It's a good idea to do a filesystem consistency check before you remount the drive. Since I use XFS I'll carry on with the steps for that. You should use whatever tools work for your filesystem.
mount /dev/mapper/usb-osd.2 mounting the drive allows the journal to replay. That usually fixes any file inconsistency problems.
umount /dev/mapper/usb-osd.2 unmount the drive before checking.
xfs_check /dev/mapper/usb-osd.2 to check the drive and use xfs_repair /dev/mapper/usb-osd.2 if there are any errors.

Now we're ready to mount the logical volume again: mount /dev/mapper/usb-osd.2

And since I'm running ceph I want to restart the OSD process: service ceph osd restart osd.2

Done!

Read more about my ceph cluster running on USB drives.