[dm-devel] [QUESTION]: multipath device with wrong path lead to metadata err
lixiaokeng
lixiaokeng at huawei.com
Mon Jan 18 11:08:14 UTC 2021
Hi
When we make IO stress test on multipath device, there will
be a metadata err because of wrong path.
There are three test scripts.
First:
#!/bin/bash
disk_list="/dev/mapper/3600140531f063b3e19349bc82028e0cc
/dev/mapper/36001405ca5165367d67447ea68108e1d
/dev/mapper/3600140584e11eb1818c4afab12c17800
/dev/mapper/36001405b7679bd96b094bccbf971bc90"
for disk in ${disk_list}
do
mkfs.ext4 -F $disk
done
while true
do
for disk in ${disk_list}
do
test_dir=${disk##*/}
[ -d $test_dir ] && umount $test_dir || mkdir $test_dir
while true
do
mount -o data_err=abort,errors=remount-ro $disk $test_dir && break
sleep 0.1
done
nohup fsstress -d $(pwd)/$test_dir -l 10 -n 1000 -p 10 -X &>/dev/null &
done
sleep 5
while [ -n "`pidof fsstress`" ]
do
sleep 1
done
done
Second:
#!/bin/bash
while true
do
sleep 15
i=0
while [ $i -le 5 ]
do
iscsiadm -m node -p 100.1.1.1 -u
iscsiadm -m node -p 100.1.1.1 -l
sleep 1
iscsiadm -m node -p 100.1.2.1 -u
iscsiadm -m node -p 100.1.2.1 -l
sleep 1
((i=i+1))
done
done
Third:
#!/bin/bash
function iscsi_query()
{
interval=5
while true
do
iscsiadm -m node -p 100.1.1.1 &> /dev/null
iscsiadm -m node -p 100.1.2.1 &> /dev/null
iscsiadm -m session &> /dev/null
rescan-scsi-bus.sh &> /dev/null
sleep $interval
done
}
function multipath_query()
{
interval=1
while true
do
multipath -F &> /dev/null
multipath -r &> /dev/null
multipath -v2 &> /dev/null
multipath -ll &> /dev/null
sleep $interval
done
}
function multipathd_query()
{
disk_base=63 # sdc
interval=1
while true
do
multipathd show paths &> /dev/null
multipathd show status &> /dev/null
multipathd show daemon &> /dev/null
multipathd show maps json &> /dev/null
multipathd show config &> /dev/null
multipathd show config local &> /dev/null
multipathd show blacklist &> /dev/null
multipathd show devices &> /dev/null
multipathd reset maps stats &> /dev/null
multipathd disablequeueing maps &> /dev/null
multipathd restorequeueing maps &> /dev/null
multipathd forcequeueing daemon &> /dev/null
multipathd restorequeueing daemon &> /dev/null
let disk_num=disk_base+RANDOM%8
disk=sd`echo "$disk_num" | xxd -p -r`
multipathd show path $disk &> /dev/null
multipathd del path $disk &> /dev/null
multipathd add path $disk &> /dev/null
multipathd fail path $disk &> /dev/null
multipathd reinstate path $disk &> /dev/null
multipathd show path $disk &> /dev/null
map_count=`multipathd show maps | grep -v name | wc -l`
if [ $map_count -ge 1 ];then
let map_num=(RANDOM%map_count)+1
map=`multipathd show maps | grep -v name | awk '{print $1}' | sed -n "$map_num"p`
multipathd show map $map &> /dev/null
multipathd suspend map $map &> /dev/null
multipathd resume map $map &> /dev/null
multipathd reload map $map &> /dev/null
multipathd reset map $map &> /dev/null
fi
sleep $interval
done
}
iscsi_query &
iscsi_query &
multipath_query &
multipath_query &
multipathd_query &
multipathd_query &
After the test scripts are executed for some time (about 24h), there will
a metadata error. The reason is that multipath device has wrong path. The
detail of the first scene:
ip1:
node disk minor
4:0:0:0: [sdd] 48
4:0:0:1: [sdm] 192
4:0:0:2: [sdk] 160
4:0:0:3: [sdi] 128
ip2:
node disk minor
5:0:0:0: [sdc] 32
5:0:0:1: [sdj] 144
5:0:0:2: [sdg] 96
5:0:0:3: [sde] 64
Sequence of events:
(1)multipath -r, ip1 logout at same
The load table params of 36001405ca5165367d67447ea68108e1d is
"0 1 alua 1 1 service-time 0 1 1 8:128 1"(The reason no 128 may
be not long after ip2 login and path_discovery doesn't find sde).
However, domap failed because ip1 logout. The path of sdi is
still in gvecs->pathvec.
(2) multipathd add path sde
The load table params of 36001405ca5165367d67447ea68108e1d is
"0 1 alua 2 1 service-time 0 1 1 8:64 1 service-time 0 1 1 8:128 "
and domap successes.
At this time, 36001405ca5165367d67447ea68108e1d has two path (sde, sdi),
but sdi is actually the path of 36001405b7679bd96b094bccbf971bc90.
(3) metadata of 36001405ca5165367d67447ea68108e1d sync
The metadata of 36001405b7679bd96b094bccbf971bc90 will be covered.
(4) umount 36001405b7679bd96b094bccbf971bc90
36001405b7679bd96b094bccbf971bc90 has no usable path when umount,
so the correct metadata doesn't sync.
(5) mount 36001405b7679bd96b094bccbf971bc90
Failed because of err metadata
I think there may be other ways to lead metadata err too. I have no good
idea to deal this. Can you give a great advice about this. Thanks very much.
Regards,
Lixiaokeng
More information about the dm-devel
mailing list