[linux-lvm] LVM RAID5 out-of-sync recovery

Sun Oct 9 19:00:44 UTC 2016

Hi!

On Wed, Oct 5, 2016 at 3:53 PM Giuliano Procida <giuliano.procida at gmail.com>
wrote:

> On 4 October 2016 at 23:14, Slava Prisivko <vprisivko at gmail.com> wrote:
> >> vgextend --restoremissing
> >
> > I didn't have to, because all the PVs are present:
> >
> > # pvs
> >   PV         VG Fmt  Attr PSize   PFree
> >   /dev/sda2  vg lvm2 a--    1.82t   1.10t
> >   /dev/sdb2  vg lvm2 a--    3.64t   1.42t
> >   /dev/sdc2  vg lvm2 a--  931.51g 195.18g
>
> Double-check in the metadata for MISSING. This is what I was hoping
> might be in your /etc/lvm/backup file.
>
> >> Actually, always run LVM commands with -v -t before really running them.
> >
> > Thanks! I had backed up the rmeta* and rimage*, so I didn't feel the need
> > for using -t. Am I wrong?
>
> Well, some nasty surprises may be avoidable (particularly if also using
> -f).
>
> > Yes, I've noticed it. The problem was a faulty SATA cable (as I learned
> > later), so when I switched the computer on for the first time, /dev/sda
> was
> > missing (in the current device allocation). I switched off the computer,
> > swapped the /dev/sda and /dev/sdb SATA cable (without thinking about the
> > consequences) and switched it on. This time the /dev/sdb was missing. I
> > replaced the faulty cable with a new one and switched the machine back
> on.
> > This time sda, sdb and sdc were all present, but the RAID went
> out-of-sync.
>
> In swapping the cables, you may have changed the sd{a,b,c} enumeration
> but this will have no impact on the UUIDs that LVM uses to identify
> the PVs.
>
That's right, but the images went out-of-sync because during the first boot
only sdb and sdc were present (so the content of sda should have been
implied), during the second boot only sda and sdc were present (so the
content of sdb should have been implied), but when I replaced the cable
there was a conflict between these three.

>
> > I'm pretty sure there were very few (if any) writing operations during
> the
> > degraded operating mode, so the I could recover by rebuilding the old
> mirror
> > (sda) using the more recent ones (sdb and sdc).
>
> Agreed, based on your check below.
>
> > Thanks, I used your raid5_parity_check.cc utility with the default stripe
> > size (64 * 1024), but it actually doesn't matter since you're just
> > calculating the total xor and the stripe size acts as a buffer size for
> > that.
>
> [I was little surprised to discover that RAID 6 works as a byte erasure
> code.]
>
> The stripe size and layout matters once if you want to adapt the code
> to extract or repair the data.
>
> > I get three unsynced stripes out of 512 (32 mib / 64 kib), but I would
> like
> > to try to reconstruct the test_rimage_1 using h the other two. Just in
> case,
> > here are the bad stripe numbers: 16, 48, 49.
>
> I've updated the utility (this is for raid5 = raid5_ls). Warning: not
> tested on out-of-sync data.
>
> https://drive.google.com/open?id=0B8dHrWSoVcaDYXlUWXEtZEMwX0E

>
> # Assume the first sub LV has the out-of-date data and dump the
> correct(ed) LV content.
> ./foo stripe $((64*1024)) repair 0 /dev/${lv}_rimage_* | cmp - /dev/${lv}
>
Thanks!

I tried to reassemble the array using 3 different pairs of correct LV
images, but it doesn't work (I am sure because I cannot luksOpen a LUKS
image which is in the LV, which is almost surely uncorrectable).

>
> >> > The output of various commands is provided below.
> >> >
> >> >     # lvs -a -o +devices
> >> >
> >> >     test                           vg   rwi---r---  64.00m
> >> > test_rimage_0(0),test_rimage_1(0),test_rimage_2(0)
> >> >     [test_rimage_0]                vg   Iwi-a-r-r-  32.00m
> /dev/sdc2(1)
> >> >     [test_rimage_1]                vg   Iwi-a-r-r-  32.00m
> >> > /dev/sda2(238244)
> >> >     [test_rimage_2]                vg   Iwi-a-r-r-  32.00m
> >> > /dev/sdb2(148612)
> >> >     [test_rmeta_0]                 vg   ewi-a-r-r-   4.00m
> /dev/sdc2(0)
> >> >     [test_rmeta_1]                 vg   ewi-a-r-r-   4.00m
> >> > /dev/sda2(238243)
> >> >     [test_rmeta_2]                 vg   ewi-a-r-r-   4.00m
> >> > /dev/sdb2(148611)
>
> The extra r(efresh) attributes suggest trying a resync operation which
> may not be possible on inactive LV.
> I missed that the RAID device is actually in the list.
>
> > After cleaning the dmsetup table of test_* and trying to lvchange -ay I
> get
> > practically the same:
> > # lvchange -ay vg/test -v
> [snip]
> >   device-mapper: reload ioctl on (253:87) failed: Invalid argument
> >     Removing vg-test (253:87)
> >
> > device-mapper: table: 253:87: raid: Cannot change device positions in
> RAID
> > array
> > device-mapper: ioctl: error adding target to table
>
> This error occurs when the sub LV metadata says "I am device X in this
> array" but dmsetup is being asked to put the sub LV at different
> position Y (alas, neither are logged). With lots of -v and -d flags
> you can get lvchange to include the dm table entries in the
> diagnostics.
>
This is as useful as it gets (-vvvv -dddd):
    Loading vg-test_rmeta_0 table (253:35)
        Adding target to (253:35): 0 8192 linear 8:34 2048
        dm table   (253:35) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rmeta_0 (253:35) identical table reload.
    Loading vg-test_rimage_0 table (253:36)
        Adding target to (253:36): 0 65536 linear 8:34 10240
        dm table   (253:36) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rimage_0 (253:36) identical table reload.
    Loading vg-test_rmeta_1 table (253:37)
        Adding target to (253:37): 0 8192 linear 8:2 1951688704
        dm table   (253:37) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rmeta_1 (253:37) identical table reload.
    Loading vg-test_rimage_1 table (253:38)
        Adding target to (253:38): 0 65536 linear 8:2 1951696896
        dm table   (253:38) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rimage_1 (253:38) identical table reload.
    Loading vg-test_rmeta_2 table (253:39)
        Adding target to (253:39): 0 8192 linear 8:18 1217423360
        dm table   (253:39) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rmeta_2 (253:39) identical table reload.
    Loading vg-test_rimage_2 table (253:40)
        Adding target to (253:40): 0 65536 linear 8:18 1217431552
        dm table   (253:40) [ opencount flush ]   [16384] (*1)
    Suppressed vg-test_rimage_2 (253:40) identical table reload.
    Creating vg-test
        dm create vg-test
LVM-Pgjp5f2PRJipxvoNdsYmq0olg9iWwY5pJjiPmiesfxvdeF5zMvTsJC6vFfqNgNnZ [
noopencount flush ]   [16384] (*1)
    Loading vg-test table (253:84)
        Adding target to (253:84): 0 131072 raid raid5_ls 3 128 region_size
1024 3 253:35 253:36 253:37 253:38 253:39 253:40
        dm table   (253:84) [ opencount flush ]   [16384] (*1)
        dm reload   (253:84) [ noopencount flush ]   [16384] (*1)
  device-mapper: reload ioctl on (253:84) failed: Invalid argument

I don't see any problems here.

>
> You can check the rmeta superblocks with
> https://drive.google.com/open?id=0B8dHrWSoVcaDUk0wbHQzSEY3LTg

Thanks, it's very useful!

/dev/mapper/vg-test_rmeta_0
found RAID superblock at offset 0
 magic=1683123524
 features=0
 num_devices=3
 array_position=0
 events=56
 failed_devices=0
 disk_recovery_offset=18446744073709551615
 array_resync_offset=18446744073709551615
 level=5
 layout=2
 stripe_sectors=128
found bitmap file superblock at offset 4096:
         magic: 6d746962
       version: 4
          uuid: 00000000.00000000.00000000.00000000
        events: 56
events cleared: 33
         state: 00000000
     chunksize: 524288 B
  daemon sleep: 5s
     sync size: 32768 KB
max write behind: 0

/dev/mapper/vg-test_rmeta_1
found RAID superblock at offset 0
 magic=1683123524
 features=0
 num_devices=3
 array_position=4294967295
 events=62
 failed_devices=1
 disk_recovery_offset=0
 array_resync_offset=18446744073709551615
 level=5
 layout=2
 stripe_sectors=128
found bitmap file superblock at offset 4096:
         magic: 6d746962
       version: 4
          uuid: 00000000.00000000.00000000.00000000
        events: 60
events cleared: 33
         state: 00000000
     chunksize: 524288 B
  daemon sleep: 5s
     sync size: 32768 KB
max write behind: 0

/dev/mapper/vg-test_rmeta_2
found RAID superblock at offset 0
 magic=1683123524
 features=0
 num_devices=3
 array_position=2
 events=62
 failed_devices=1
 disk_recovery_offset=18446744073709551615
 array_resync_offset=18446744073709551615
 level=5
 layout=2
 stripe_sectors=128
found bitmap file superblock at offset 4096:
         magic: 6d746962
       version: 4
          uuid: 00000000.00000000.00000000.00000000
        events: 62
events cleared: 33
         state: 00000000
     chunksize: 524288 B
  daemon sleep: 5s
     sync size: 32768 KB
max write behind: 0

The problem I see here is that events count is different for the three
rmetas.

>
>
> > Here is the relevant /etc/lvm/archive (archive is more recent that
> backup)
>
> That looks sane, but you omitted the physical volumes section so there
> is no way to cross-check UUIDs and devices or see if there are MISSING
> flags.
>
The ids are the same and there are no MISSING flags.

>
> If you use
> https://drive.google.com/open?id=0B8dHrWSoVcaDQkU5NG1sLWc5cjg
> directly, you can get metadata that LVM is reading off the PVs and
> double-check for discrepancies.

> _______________________________________________
> linux-lvm mailing list
> linux-lvm at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20161009/12ef1c8d/attachment.htm>