[dm-devel] [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device

Heinz Mauelshagen heinzm at redhat.com
Thu Apr 16 09:58:06 UTC 2015


Lidong,

tests need to happen under heavy load, i.e. worst
case scenario failures.

E.g. an fs is mounted and being updated whilst
you're tacking offline/bringing back mirror legs
to cause them to be get resynchronized.

Heinz

On 04/16/2015 05:43 AM, Lidong Zhong wrote:
> Hi List/Heinz,
>
> These three patches are done based on last patch series that replied on April 8.
> The following is the test I did about this feature. My test environment:
> linux-klqg:~ # dmsetup ls --tree
> vg-lv (253:4)
>   ├─vg-lv_mimage_2 (253:3)
>   │  └─ (8:48)
>   ├─vg-lv_mimage_1 (253:2)
>   │  └─ (8:32)
>   ├─vg-lv_mimage_0 (253:1)
>   │  └─ (8:16)
>   └─vg-lv_mlog (253:0)
> 	└─ (8:64)
> nux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
> linux-klqg:~ # dmsetup table
> vg-lv_mimage_2: 0 614400 linear 8:48 2048
> vg-lv: 0 614400 mirror disk 2 253:0 1024 3 253:1 0 253:2 0 253:3 0 2 handle_errors keep_log
> vg-lv_mimage_1: 0 614400 linear 8:32 2048
> vg-lv_mimage_0: 0 614400 linear 8:16 2048
> vg-lv_mlog: 0 8192 linear 8:64 2048
>
>
> 1\, single data device failure
> After make one of the data legs failed, writing data to the first three regions.
> linux-klqg:~ # echo "a" |dd of=/dev/vg/lv bs=1K count=1 seek=0
> 0+1 records in
> 0+1 records out
> 2 bytes (2 B) copied, 0.0103211 s, 0.2 kB/s
> linux-klqg:~ # echo "b" |dd of=/dev/vg/lv bs=1K count=1 seek=512
> 0+1 records in
> 0+1 records out
> 2 bytes (2 B) copied, 0.00428962 s, 0.5 kB/s
> linux-klqg:~ # echo "c" |dd of=/dev/vg/lv bs=1K count=1 seek=1024
> 0+1 records in
> 0+1 records out
> 2 bytes (2 B) copied, 0.00282482 s, 0.7 kB/s
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 597/600 1 ADA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> Now the failed device comes back, its major/minor number may changes, replace the table as needed.
> (The devices I tested on are iscsi devices and the minor number changed after each attach/detach)
> Then start the recovery
> linux-klqg:~ # dmsetup suspend vg-lv
> linux-klqg:~ # dmsetup resume vg-lv
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> We can see that all the regions are in sync now.
>
> 2\, two or more data device failure
> After detaching the first device(mine is /dev/sdb), write data to the first and second region
> linux-klqg:~ # echo "1111111" | dd of=/dev/vg/lv bs=1K count=1 seek=0
> 0+1 records in
> 0+1 records out
> 8 bytes (8 B) copied, 0.00209451 s, 3.8 kB/s
> linux-klqg:~ # echo "222222" | dd of=/dev/vg/lv bs=1K count=1 seek=512
> 0+1 records in
> 0+1 records out
> 7 bytes (7 B) copied, 0.00259999 s, 2.7 kB/s
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 598/600 1 ADA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> Now the first and second region are marked as no sync. Then detach the second device
> (mine is /dev/sdd) and write data to the third and fourth region
> linux-klqg:~ # echo "333333" | dd of=/dev/vg/lv bs=1K count=1 seek=1024
> 0+1 records in
> 0+1 records out
> 7 bytes (7 B) copied, 0.00178031 s, 3.9 kB/s
> linux-klqg:~ # echo "444444" | dd of=/dev/vg/lv bs=1K count=1 seek=1536
> 0+1 records in
> 0+1 records out
> 7 bytes (7 B) copied, 0.00256491 s, 2.7 kB/s
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 596/600 1 DDA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> Now there are 4 regions are marked as no sync. Then the first failed device comes back, we try to
> do the recovery.
> linux-klqg:~ # dmsetup suspend vg-lv
> linux-klqg:~ # dmsetup resume vg-lv
> linux-klqg:~ #
> linux-klqg:~ #
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 596/600 1 DDA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> And it shows there are still 4 regions are marked as no resync, because there is still
> a missing device. And we keep writing to the fifth region
> linux-klqg:~ # echo "5555555" | dd of=/dev/vg/lv bs=1K count=1 seek=2048
> 0+1 records in
> 0+1 records out
> 8 bytes (8 B) copied, 0.00213449 s, 3.7 kB/s
>
> And now the second missing device comes back. We try to do the recovery
> linux-klqg:~ # dmsetup suspend vg-lv
> linux-klqg:~ # dmsetup resume  vg-lv
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
>
> It shows all the legs are sync now. we read data from each leg and get the
> same result.
> 3\, log device failure
> After make the log device failed, we tried to write on this lv
> linux-klqg:~ # echo "test" |dd of=/dev/vg/lv bs=1K count=1 seek=0
> 0+1 records in
> 0+1 records out
> 21 bytes (21 B) copied, 0.00470523 s, 4.5 kB/s
> linux-klqg:~ # dmsetup status
> vg-lv_mimage_2: 0 614400 linear
> vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 D
> vg-lv_mimage_1: 0 614400 linear
> vg-lv_mimage_0: 0 614400 linear
> vg-lv_mlog: 0 8192 linear
> And we can see that the log device is marked as failed.
> And the bio is not written to the data legs because we can't read new data our of
> the leg
>
> Is the test enough? or is there corner case that is not covered in the patch?
> Any advice is appreciated.
>
> Regards,
> Lidong
>
> Lidong Zhong (3):
>    dm-raid1: fix the parameter passed into the kernel
>    dm-raid1: remove the error flags in the mirror set when it's in sync
>    dm-raid1: change default mirror when it's not in sync
>
>   drivers/md/dm-raid1.c | 38 +++++++++++++++++++++++++-------------
>   1 file changed, 25 insertions(+), 13 deletions(-)




More information about the dm-devel mailing list