[linux-lvm] Raid 10 - recovery after a disk failure

Pavlik Kirilov pavllik at yahoo.ca
Mon Feb 1 17:06:42 UTC 2016



The method for restoring raid 10, which I posted in my previous email, works very well for raid 5 on 4 PVs. I tried many times the "--uuid" method from the link you sent me and I always end up with destroyed data. Here comes the output of the tests I performed:

## Ubuntu VM with 4 new disks (qcow files) vda,vdb,vdc,vdd, one physical partition per disk.

vgcreate vg_data /dev/vda1 /dev/vdb1 /dev/vdc1 /dev/vdd1
Volume group "vg_data" successfully created

lvcreate --type raid10 -L3g -i 2 -I 256 -n lv_r10 vg_data /dev/vda1:1-900 /dev/vdb1:1-900 /dev/vdc1:1-900 /dev/vdd1:1-900
Logical volume "lv_r10" created

mkfs.ext4 /dev/vg_data/lv_r10

mount /dev/vg_data/lv_r10 /mnt/r10/

mount | grep vg_data
/dev/mapper/vg_data-lv_r10 on /mnt/r10 type ext4 (rw)

echo "some data" > /mnt/r10/testr10.txt

dmesg -T | tail -n 70

------------------

[ 3822.367551] EXT4-fs (dm-8): mounted filesystem with ordered data mode. Opts: (null)
[ 3851.317428] md: mdX: resync done.
[ 3851.440927] RAID10 conf printout:
[ 3851.440935]  --- wd:4 rd:4
[ 3851.440941]  disk 0, wo:0, o:1, dev:dm-1
[ 3851.440945]  disk 1, wo:0, o:1, dev:dm-3
[ 3851.440949]  disk 2, wo:0, o:1, dev:dm-5
[ 3851.440953]  disk 3, wo:0, o:1, dev:dm-7


lvs -a -o +devices
LV                VG      Attr      LSize Pool Origin Data%  Move Log Copy%  Convert Devices 
lv_r10            vg_data rwi-aor-- 3.00g                             100.00         lv_r10_rimage_0(0),lv_r10_rimage_1(0),lv_r10_rimage_2(0),lv_r10_rimage_3(0)
[lv_r10_rimage_0] vg_data iwi-aor-- 1.50g                                            /dev/vda1(2) 
[lv_r10_rimage_1] vg_data iwi-aor-- 1.50g                                            /dev/vdb1(2) 
[lv_r10_rimage_2] vg_data iwi-aor-- 1.50g                                            /dev/vdc1(2) 
[lv_r10_rimage_3] vg_data iwi-aor-- 1.50g                                            /dev/vdd1(2) 
[lv_r10_rmeta_0]  vg_data ewi-aor-- 4.00m                                            /dev/vda1(1) 
[lv_r10_rmeta_1]  vg_data ewi-aor-- 4.00m                                            /dev/vdb1(1) 
[lv_r10_rmeta_2]  vg_data ewi-aor-- 4.00m                                            /dev/vdc1(1) 
[lv_r10_rmeta_3]  vg_data ewi-aor-- 4.00m                                            /dev/vdd1(1)
###

### Shutting down, replacing vdb with a new disk, starting the system ###

### 
lvs -a -o +devices
Couldn't find device with uuid GjkgzF-18Ls-321G-SaDW-4vp0-d04y-Gd4xRp.
LV                VG      Attr      LSize Pool Origin Data%  Move Log Copy%  Convert Devices 
lv_r10            vg_data rwi---r-p 3.00g                                            lv_r10_rimage_0(0),lv_r10_rimage_1(0),lv_r10_rimage_2(0),lv_r10_rimage_3(0)
[lv_r10_rimage_0] vg_data Iwi---r-- 1.50g                                            /dev/vda1(2) 
[lv_r10_rimage_1] vg_data Iwi---r-p 1.50g                                            unknown device(2) 
[lv_r10_rimage_2] vg_data Iwi---r-- 1.50g                                            /dev/vdc1(2) 
[lv_r10_rimage_3] vg_data Iwi---r-- 1.50g                                            /dev/vdd1(2) 
[lv_r10_rmeta_0]  vg_data ewi---r-- 4.00m                                            /dev/vda1(1) 
[lv_r10_rmeta_1]  vg_data ewi---r-p 4.00m                                            unknown device(1) 
[lv_r10_rmeta_2]  vg_data ewi---r-- 4.00m                                            /dev/vdc1(1) 
[lv_r10_rmeta_3]  vg_data ewi---r-- 4.00m                                            /dev/vdd1(1)

grep description /etc/lvm/backup/vg_data
description = "Created *after* executing 'lvcreate --type raid10 -L3g -i 2 -I 256 -n lv_r10 vg_data /dev/vda1:1-900 /dev/vdb1:1-900 /dev/vdc1:1-900 /dev/vdd1:1-900'"

pvcreate --uuid  GjkgzF-18Ls-321G-SaDW-4vp0-d04y-Gd4xRp  --restorefile /etc/lvm/backup/vg_data /dev/vdb1
Couldn't find device with uuid GjkgzF-18Ls-321G-SaDW-4vp0-d04y-Gd4xRp.
Physical volume "/dev/vdb1" successfully created

vgcfgrestore vg_data
Restored volume group vg_data

lvs -a -o +devices
LV                VG      Attr      LSize Pool Origin Data%  Move Log Copy%  Convert Devices 
lv_r10            vg_data rwi-d-r-- 3.00g                               0.00         lv_r10_rimage_0(0),lv_r10_rimage_1(0),lv_r10_rimage_2(0),lv_r10_rimage_3(0)
[lv_r10_rimage_0] vg_data iwi-a-r-- 1.50g                                            /dev/vda1(2) 
[lv_r10_rimage_1] vg_data iwi-a-r-- 1.50g                                            /dev/vdb1(2) 
[lv_r10_rimage_2] vg_data iwi-a-r-- 1.50g                                            /dev/vdc1(2) 
[lv_r10_rimage_3] vg_data iwi-a-r-- 1.50g                                            /dev/vdd1(2) 
[lv_r10_rmeta_0]  vg_data ewi-a-r-- 4.00m                                            /dev/vda1(1) 
[lv_r10_rmeta_1]  vg_data ewi-a-r-- 4.00m                                            /dev/vdb1(1) 
[lv_r10_rmeta_2]  vg_data ewi-a-r-- 4.00m                                            /dev/vdc1(1) 
[lv_r10_rmeta_3]  vg_data ewi-a-r-- 4.00m                                            /dev/vdd1(1)

lvchange --resync vg_data/lv_r10
Do you really want to deactivate logical volume lv_r10 to resync it? [y/n]: y

lvs -a -o +devices
LV                VG      Attr      LSize Pool Origin Data%  Move Log Copy%  Convert Devices 
lv_r10            vg_data rwi-a-r-- 3.00g              100.00
---------

dmesg | tail
------------
[  708.691297] md: mdX: resync done.
[  708.765376] RAID10 conf printout:
[  708.765379]  --- wd:4 rd:4
[  708.765381]  disk 0, wo:0, o:1, dev:dm-1
[  708.765382]  disk 1, wo:0, o:1, dev:dm-3
[  708.765383]  disk 2, wo:0, o:1, dev:dm-5
[  708.765384]  disk 3, wo:0, o:1, dev:dm-7

mount /dev/vg_data/lv_r10 /mnt/r10/
cat /mnt/r10/testr10.txt 
some data

### Suppose now that vda must be replaced too. 
### Shutting down again, replacing vda with a new disk, starting the system ###

lvs -a -o +devices
Couldn't find device with uuid KGf6QK-1LrJ-JDaA-bJJY-pmLb-l9eV-LEXgT2.
LV                VG      Attr      LSize Pool Origin Data%  Move Log Copy%  Convert Devices 
lv_r10            vg_data rwi---r-p 3.00g                                            lv_r10_rimage_0(0),lv_r10_rimage_1(0),lv_r10_rimage_2(0),lv_r10_rimage_3(0)
[lv_r10_rimage_0] vg_data Iwi---r-p 1.50g                                            unknown device(2) 
[lv_r10_rimage_1] vg_data Iwi---r-- 1.50g                                            /dev/vdb1(2) 
[lv_r10_rimage_2] vg_data Iwi---r-- 1.50g                                            /dev/vdc1(2) 
[lv_r10_rimage_3] vg_data Iwi---r-- 1.50g                                            /dev/vdd1(2) 
[lv_r10_rmeta_0]  vg_data ewi---r-p 4.00m                                            unknown device(1) 
[lv_r10_rmeta_1]  vg_data ewi---r-- 4.00m                                            /dev/vdb1(1) 
[lv_r10_rmeta_2]  vg_data ewi---r-- 4.00m                                            /dev/vdc1(1) 
[lv_r10_rmeta_3]  vg_data ewi---r-- 4.00m                                            /dev/vdd1(1)


grep description /etc/lvm/backup/vg_data
description = "Created *after* executing 'vgscan'"
pvcreate --uuid  KGf6QK-1LrJ-JDaA-bJJY-pmLb-l9eV-LEXgT2  --restorefile /etc/lvm/backup/vg_data /dev/vda1
Couldn't find device with uuid KGf6QK-1LrJ-JDaA-bJJY-pmLb-l9eV-LEXgT2.
Physical volume "/dev/vda1" successfully created

vgcfgrestore vg_data
Restored volume group vg_data

lvchange --resync vg_data/lv_r10
Do you really want to deactivate logical volume lv_r10 to resync it? [y/n]: y

lvs -a -o +devices
LV                VG      Attr      LSize Pool Origin Data%  Move Log Copy%  Convert Devices 
lv_r10            vg_data rwi-a-r-- 3.00g                             100.00         lv_r10_rimage_0(0),lv_r10_rimage_1(0),lv_r10_rimage_2(0),lv_r10_rimage_3(0)
[lv_r10_rimage_0] vg_data iwi-aor-- 1.50g                                            /dev/vda1(2) 
[lv_r10_rimage_1] vg_data iwi-aor-- 1.50g                                            /dev/vdb1(2) 
[lv_r10_rimage_2] vg_data iwi-aor-- 1.50g                                            /dev/vdc1(2) 
[lv_r10_rimage_3] vg_data iwi-aor-- 1.50g                                            /dev/vdd1(2) 
[lv_r10_rmeta_0]  vg_data ewi-aor-- 4.00m                                            /dev/vda1(1) 
[lv_r10_rmeta_1]  vg_data ewi-aor-- 4.00m                                            /dev/vdb1(1) 
[lv_r10_rmeta_2]  vg_data ewi-aor-- 4.00m                                            /dev/vdc1(1) 
[lv_r10_rmeta_3]  vg_data ewi-aor-- 4.00m                                            /dev/vdd1(1)

mount -t ext4 /dev/vg_data/lv_r10 /mnt/r10/
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_data-lv_r10,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

dmesg | tail
-------------
[  715.361985] EXT4-fs (dm-8): VFS: Can't find ext4 filesystem
[  715.362248] EXT4-fs (dm-8): VFS: Can't find ext4 filesystem
[  715.362548] EXT4-fs (dm-8): VFS: Can't find ext4 filesystem
[  715.362846] FAT-fs (dm-8): bogus number of reserved sectors
[  715.362933] FAT-fs (dm-8): Can't find a valid FAT filesystem
[  729.843473] EXT4-fs (dm-8): VFS: Can't find ext4 filesystem

As you can see, after more then one disk failure and raid repair , I lost the file system on the raid 10 volume. Please, suggest what I am doing wrong.Thanks.

Pavlik 




More information about the linux-lvm mailing list