[linux-lvm] LVM RAID repair trouble

Giuliano Procida giuliano.procida at gmail.com
Fri Sep 30 16:22:40 UTC 2016


I have resolved this myself.

I wrote a tool to modify on-disk PV metadata in-place, recalculating checksums.
I had to write a bitmap superblock fixer script as well to get all my
testing done.

As I see things, these are the outstanding issues, not all of which
are LVM's fault.

0. bitm magic not always written to rmeta subLVs
reported as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=839189

1. An aborted lvconvert --repair (for a RAID LV) can leave behind
__extracted subLVs and leave the new subLVs with their new name.

Might it make more sense to rename the missing subLVs out of the way
first and create new subLVs with the right name?

2. lvconvert --repair (often?) dies, probably due to a bad DM ioctl -
see logs previously gathered

3. A working but interrupted (e.g. due to power failure) or an aborted
lvconvert --repair will leave a sticky REBUILD flag on the new subLVs;
this will cause a full rebuild on every deactivated->open transition.

4. vgcfgrestore cannot be used to remove the REBUILD flag until all
LVs with missing devices have been repaired AND the missing PV has
been removed from metadata.
This really limits the usefulness of this as emergency repair tool
when RAID volumes are present.

5. Multiple REBUILDs happen concurrently even when the same PVs are in
use for each RAID LV - things really slow down.

I'll update the Google Drive folder with the utility source, scripts
and other notes in a day or so.

If someone finds these notes useful, let me know.




More information about the linux-lvm mailing list