[linux-lvm] LVM RAID5 out-of-sync recovery

Slava Prisivko vprisivko at gmail.com
Tue Oct 4 22:14:33 UTC 2016


Thanks!

On Tue, Oct 4, 2016 at 12:49 PM Giuliano Procida <giuliano.procida at gmail.com>
wrote:

Before anything else, I would have suggested backing up the image and
meta sub LVs, but it looks like you are just testing.

Already did. I'm not testing, I just renamed the LVs to "test_*" because
the previous name doesn't matter.

There is nothing particularly important there, but I would like to
understand whether I would be able to recover should something alike happen
in the future.


Clear down any odd state with dmsetup remove /dev/vg/... and then run:

vgextend --restoremissing

I didn't have to, because all the PVs are present:

# pvs
  PV         VG Fmt  Attr PSize   PFree
  /dev/sda2  vg lvm2 a--    1.82t   1.10t
  /dev/sdb2  vg lvm2 a--    3.64t   1.42t
  /dev/sdc2  vg lvm2 a--  931.51g 195.18g


Actually, always run LVM commands with -v -t before really running them.

Thanks! I had backed up the rmeta* and rimage*, so I didn't feel the need
for using -t. Am I wrong?


On 4 October 2016 at 00:49, Slava Prisivko <vprisivko at gmail.com> wrote:
> In order to mitigate cross-posting, here's the original question on
> Serverfault.SE: LVM RAID5 out-of-sync recovery, but feel free to answer
> wherever you deem appropriate.
>
> How can one recover from an LVM RAID5 out-of-sync?

I suppose it's supposed to recover mostly automatically.
*If* your array is assembled (or whatever the LVM-equivalent
termiology is) then you can force a given subset of PVs to be
resynced.
http://man7.org/linux/man-pages/man8/lvchange.8.html - look for rebuild
However, this does not seem to be your problem.

Yeah, I tried, but in vain:
# lvchange --rebuild /dev/sda2 vg/test -v
    Archiving volume group "vg" metadata (seqno 518).
Do you really want to rebuild 1 PVs of logical volume vg/test [y/n]: y
    Accepted input: [y]
  vg/test must be active to perform this operation.

> I have an LVM RAID5 configuration (RAID5 using the LVM tools).
>
> However, because of a technical problem mirrors went out of sync. You can
> reproduce this as explained in this Unix & Linux question:
>
>> Playing with my Jessie VM, I disconnected (virtually) one disk. That
>> worked, the machine stayed running. lvs, though, gave no indication the
>> arrays were degraded.

You should have noticed something in the kernel logs. Also, lvs should
have reported that the array was now (p)artial.

Yes, I've noticed it. The problem was a faulty SATA cable (as I learned
later), so when I switched the computer on for the first time, /dev/sda was
missing (in the current device allocation). I switched off the computer,
swapped the /dev/sda and /dev/sdb SATA cable (without thinking about the
consequences) and switched it on. This time the /dev/sdb was missing. I
replaced the faulty cable with a new one and switched the machine back on.
This time sda, sdb and sdc were all present, but the RAID went out-of-sync.

I'm pretty sure there were very few (if any) writing operations during the
degraded operating mode, so the I could recover by rebuilding the old
mirror (sda) using the more recent ones (sdb and sdc).


>> I re-attached the disk, and removed a second. Stayed
>> running (this is raid6). Re-attached, still no indication from lvs. I ran
>> lvconvert --repair on the volume, it told me it was OK. Then I pulled a
>> third disk... and the machine died. Re-inserted it, rebooted, and am now
>> unsure how to fix.

So this is RAID6 rather than RAID5?
And you killed 3 disks in a RAID 6 array?

Although I have RAID5, not the RAID6, but the principle is the same (as I
explained in the previous paragraph).


> If I had been using mdadm, I could have probably recovered the data using
> `mdadm --force --assemble`, but I was not able to achieve the same using
the
> LVM tools.

LVM is very different. :-(

> I have tried to concatenate rmeta and rimage for each mirror and put them
on
> three linear devices in order to feed them to the mdadm (because LVM
> leverages MD), but without success (`mdadm --examine` does not recognize
the
> superblock), because it appears that the mdadm superblock format differs
> from the dm_raid superblock format (search for the "dm_raid_superblock").

Not only that, but (as far as I can tell), LVM RAID 6 parity (well,
syndrome) is calculated in a different manner to the older mdadm RAID;
it uses an industry-standard layout instead of the (more obvious?) md
layout.
I wrote a utility to parity-check the default LVM RAID6 layout with
the usual stripe size (easily adjusted) here:
https://drive.google.com/open?id=0B8dHrWSoVcaDbkY3WmkxSmpfSVE

You can use this to see to what degree the data in the image LVs are
in fact in/out of sync. I've not attempted to add sync functionality
to this.

Thanks, I used your raid5_parity_check.cc utility with the default stripe
size (64 * 1024), but it actually doesn't matter since you're just
calculating the total xor and the stripe size acts as a buffer size for
that.

I get three unsynced stripes out of 512 (32 mib / 64 kib), but I would like
to try to reconstruct the test_rimage_1 using h the other two. Just in
case, here are the bad stripe numbers: 16, 48, 49.


> I tried to understand how device-mapper RAID leverages MD, but was unable
to
> find any documentation while the kernel code is quite complicated.
>
> I also tried to rebuild the mirror directly by using `dmsetup`, but it
can't
> rebuild if metadata is out of sync.
>
> Overall, almost the only useful information I could find is RAIDing with
LVM
> vs MDRAID - pros and cons? question on Unix & Linux SE.

Well, I would read through this as well (versions 6 and 7 also available):
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/Logical_Volume_Manager_Administration/index.html

 Thanks, but nothing particular relevant for my case there.

> The output of various commands is provided below.
>
>     # lvs -a -o +devices
>
>     test                           vg   rwi---r---  64.00m
> test_rimage_0(0),test_rimage_1(0),test_rimage_2(0)
>     [test_rimage_0]                vg   Iwi-a-r-r-  32.00m /dev/sdc2(1)
>     [test_rimage_1]                vg   Iwi-a-r-r-  32.00m
/dev/sda2(238244)
>     [test_rimage_2]                vg   Iwi-a-r-r-  32.00m
/dev/sdb2(148612)
>     [test_rmeta_0]                 vg   ewi-a-r-r-   4.00m /dev/sdc2(0)
>     [test_rmeta_1]                 vg   ewi-a-r-r-   4.00m
/dev/sda2(238243)
>     [test_rmeta_2]                 vg   ewi-a-r-r-   4.00m
/dev/sdb2(148611)
>
> I cannot activate the LV:
>
>     # lvchange -ay vg/test -v
>         Activating logical volume "test" exclusively.
>         activation/volume_list configuration setting not defined: Checking
> only host tags for vg/test.
>         Loading vg-test_rmeta_0 table (253:35)
>         Suppressed vg-test_rmeta_0 (253:35) identical table reload.
>         Loading vg-test_rimage_0 table (253:36)
>         Suppressed vg-test_rimage_0 (253:36) identical table reload.
>         Loading vg-test_rmeta_1 table (253:37)
>         Suppressed vg-test_rmeta_1 (253:37) identical table reload.
>         Loading vg-test_rimage_1 table (253:38)
>         Suppressed vg-test_rimage_1 (253:38) identical table reload.
>         Loading vg-test_rmeta_2 table (253:39)
>         Suppressed vg-test_rmeta_2 (253:39) identical table reload.
>         Loading vg-test_rimage_2 table (253:40)
>         Suppressed vg-test_rimage_2 (253:40) identical table reload.
>         Creating vg-test
>         Loading vg-test table (253:87)
>       device-mapper: reload ioctl on (253:87) failed: Invalid argument
>         Removing vg-test (253:87)
>
> While trying to activate I'm getting the following in the dmesg:
>
>     device-mapper: table: 253:87: raid: Cannot change device positions in
> RAID array
>     device-mapper: ioctl: error adding target to table

That's a new error message to me. I would try clearing out the dm
table (dmsetup remove /dev/vg/test_*) before trying again (-v -t,
first).

After cleaning the dmsetup table of test_* and trying to lvchange -ay I get
practically the same:
# lvchange -ay vg/test -v
    Activating logical volume vg/test exclusively.
    activation/volume_list configuration setting not defined: Checking only
host tags for vg/test.
    Creating vg-test_rmeta_0
    Loading vg-test_rmeta_0 table (253:35)
    Resuming vg-test_rmeta_0 (253:35)
    Creating vg-test_rimage_0
    Loading vg-test_rimage_0 table (253:36)
    Resuming vg-test_rimage_0 (253:36)
    Creating vg-test_rmeta_1
    Loading vg-test_rmeta_1 table (253:37)
    Resuming vg-test_rmeta_1 (253:37)
    Creating vg-test_rimage_1
    Loading vg-test_rimage_1 table (253:38)
    Resuming vg-test_rimage_1 (253:38)
    Creating vg-test_rmeta_2
    Loading vg-test_rmeta_2 table (253:39)
    Resuming vg-test_rmeta_2 (253:39)
    Creating vg-test_rimage_2
    Loading vg-test_rimage_2 table (253:40)
    Resuming vg-test_rimage_2 (253:40)
    Creating vg-test
    Loading vg-test table (253:87)
  device-mapper: reload ioctl on (253:87) failed: Invalid argument
    Removing vg-test (253:87)

device-mapper: table: 253:87: raid: Cannot change device positions in RAID
array
device-mapper: ioctl: error adding target to table


> lvconvert only works on active LVs:
>     # lvconvert --repair vg/test
>       vg/test must be active to perform this operation.

And it requires new PVs ("replacement drives") to put the subLVs on.
It's probably not what you want.

> I have the following LVM version:
>
>     # lvm version
>       LVM version:     2.02.145(2) (2016-03-04)
>       Library version: 1.02.119 (2016-03-04)
>       Driver version:  4.34.0

I would update LVM to whatever is in Debian testing as there has been
a fair bit of change this year.

I've updated to the 2.02.166 (the latest version):

# lvm version
  LVM version:     2.02.166(2) (2016-09-26)
  Library version: 1.02.135 (2016-09-26)
  Driver version:  4.34.0


> And the following kernel version:
>
>     Linux server 4.4.8-hardened-r1-1 #1 SMP

More useful would be the contents of /etc/lvm/backup/vg and the output
of vgs and pvs.

# pvs
  PV         VG Fmt  Attr PSize   PFree
  /dev/sda2  vg lvm2 a--    1.82t   1.10t
  /dev/sdb2  vg lvm2 a--    3.64t   1.42t
  /dev/sdc2  vg lvm2 a--  931.51g 195.18g

# vgs
  VG #PV #LV #SN Attr   VSize VFree
  vg   3  18   0 wz--n- 6.37t 2.71t

Here is the relevant /etc/lvm/archive (archive is more recent that
backup) content:
test {

id = "JjiPmi-esfx-vdeF-5zMv-TsJC-6vFf-qNgNnZ" status = ["READ", "WRITE",
"VISIBLE"] flags = [] creation_time = 18446744073709551615 # 1970-01-01
02:59:59 +0300 creation_host = "server" segment_count = 1 segment1 {
start_extent = 0 extent_count = 16 # 64 Megabytes type = "raid5"
device_count = 3 stripe_size = 128 region_size = 1024 raids = [
"test_rmeta_0", "test_rimage_0", "test_rmeta_1", "test_rimage_1",
"test_rmeta_2", "test_rimage_2" ] } }

        test_rmeta_0 {
            id = "WE3CUg-ayo8-lp1Y-9S2v-zRGi-mV1s-DWYoST"
            status = ["READ", "WRITE", "VISIBLE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 1    # 4 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv0", 0
                ]
            }
        }
		
		test_rmeta_1 {
            id = "Apk3mc-zy4q-c05I-hiIO-1Kae-9yB6-Cl5lfJ"
            status = ["READ", "WRITE", "VISIBLE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 1    # 4 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv1", 238243
                ]
            }
        }
		
		test_rmeta_2 {
            id = "j2Waf3-A77y-pvfd-foGK-Hq7B-rHe8-YKzQY0"
            status = ["READ", "WRITE", "VISIBLE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 1    # 4 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv2", 148611
                ]
            }
        }
		
		        test_rimage_0 {
            id = "zaGgJx-YSIl-o2oq-UN9l-02Q8-IS5u-sz4RhQ"
            status = ["READ", "WRITE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 8    # 32 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv0", 1
                ]
            }
        }

        test_rimage_1 {
            id = "0mD5AL-GKj3-siFz-xQmO-ZtQo-L3MM-Ro2SG2"
            status = ["READ", "WRITE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 8    # 32 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv1", 238244
                ]
            }
        }
		
		test_rimage_2 {
            id = "4FxiHV-j637-ENml-Okm3-uL1p-fuZ0-y9dE8Y"
            status = ["READ", "WRITE"]
            flags = []
            creation_time = 18446744073709551615    # 1970-01-01 02:59:59 +0300
            creation_host = "server"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 8    # 32 Megabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv2", 148612
                ]
            }
        }

--
Best regards,
Slava Prisivko.



_______________________________________________
linux-lvm mailing list
linux-lvm at redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20161004/fb52622a/attachment.htm>


More information about the linux-lvm mailing list