[dm-devel] [QUESTION]: multipath device with wrong path lead to metadata err

lixiaokeng lixiaokeng at huawei.com
Mon Jan 18 11:08:14 UTC 2021


Hi
  When we make IO stress test on multipath device, there will
be a  metadata err because of wrong path.

There are three test scripts.

First:
#!/bin/bash
disk_list="/dev/mapper/3600140531f063b3e19349bc82028e0cc
/dev/mapper/36001405ca5165367d67447ea68108e1d
/dev/mapper/3600140584e11eb1818c4afab12c17800
/dev/mapper/36001405b7679bd96b094bccbf971bc90"


for disk in ${disk_list}
do
        mkfs.ext4 -F $disk
done

while true
do
        for disk in ${disk_list}
        do
                test_dir=${disk##*/}
                [ -d $test_dir ] && umount $test_dir || mkdir $test_dir
                while true
                do
                        mount -o data_err=abort,errors=remount-ro $disk $test_dir && break
                        sleep 0.1
                done
                nohup fsstress -d $(pwd)/$test_dir -l 10 -n 1000 -p 10 -X &>/dev/null &
        done
        sleep 5

        while [ -n "`pidof fsstress`" ]
        do
                sleep 1
        done
done

Second:
#!/bin/bash
while true
do
        sleep 15
        i=0
        while [ $i -le 5 ]
        do
                iscsiadm -m node -p 100.1.1.1 -u
                iscsiadm -m node -p 100.1.1.1 -l
                sleep 1
                iscsiadm -m node -p 100.1.2.1 -u
                iscsiadm -m node -p 100.1.2.1 -l
                sleep 1
                ((i=i+1))
        done
done

Third:
#!/bin/bash
function iscsi_query()
{
        interval=5
        while true
        do
                iscsiadm -m node -p 100.1.1.1 &> /dev/null
                iscsiadm -m node -p 100.1.2.1 &> /dev/null
                iscsiadm -m session &> /dev/null
                rescan-scsi-bus.sh &> /dev/null
                sleep $interval
        done
}


function multipath_query()
{
        interval=1
        while true
        do
                multipath -F &> /dev/null
                multipath -r &> /dev/null
                multipath -v2 &> /dev/null
                multipath -ll &> /dev/null
                sleep $interval
        done
}

function multipathd_query()
{
        disk_base=63 # sdc
        interval=1
        while true
        do
                multipathd show paths &> /dev/null
                multipathd show status &> /dev/null
                multipathd show daemon &> /dev/null
                multipathd show maps json &> /dev/null
                multipathd show config &> /dev/null
                multipathd show config local &> /dev/null
                multipathd show blacklist &> /dev/null
                multipathd show devices &> /dev/null
                multipathd reset maps stats &> /dev/null
                multipathd disablequeueing maps &> /dev/null
                multipathd restorequeueing maps &> /dev/null
                multipathd forcequeueing daemon &> /dev/null
                multipathd restorequeueing daemon &> /dev/null

                let disk_num=disk_base+RANDOM%8
                disk=sd`echo "$disk_num" | xxd -p -r`
                multipathd show path $disk &> /dev/null
                multipathd del path $disk &> /dev/null
                multipathd add path $disk &> /dev/null
                multipathd fail path $disk &> /dev/null
                multipathd reinstate path $disk &> /dev/null
                multipathd show path $disk &> /dev/null

                map_count=`multipathd show maps | grep -v name | wc -l`
                if [ $map_count -ge 1 ];then
                        let map_num=(RANDOM%map_count)+1
                        map=`multipathd show maps | grep -v name | awk '{print $1}' | sed -n "$map_num"p`
                        multipathd show map $map &> /dev/null
                        multipathd suspend map $map &> /dev/null
                        multipathd resume map $map &> /dev/null
                        multipathd reload map $map &> /dev/null
                        multipathd reset map $map &> /dev/null
                fi

                sleep $interval
        done
}
iscsi_query &
iscsi_query &
multipath_query &
multipath_query &
multipathd_query &
multipathd_query &


After the test scripts are executed for some time (about 24h), there will
a metadata error. The reason is that multipath device has wrong path. The
detail of the first scene:

ip1:
node      disk  minor
4:0:0:0: [sdd]  48
4:0:0:1: [sdm]  192
4:0:0:2: [sdk]  160
4:0:0:3: [sdi]  128
ip2:
node      disk  minor
5:0:0:0: [sdc]  32
5:0:0:1: [sdj]  144
5:0:0:2: [sdg]  96
5:0:0:3: [sde]  64

Sequence of events:
(1)multipath -r, ip1 logout at same
The load table params of 36001405ca5165367d67447ea68108e1d is
"0 1 alua 1 1 service-time 0 1 1 8:128 1"(The reason no 128 may
be not long after ip2 login and path_discovery doesn't find sde).
However, domap failed because ip1 logout. The path of sdi is
still in gvecs->pathvec.

(2) multipathd add path sde
The load table params of 36001405ca5165367d67447ea68108e1d is
"0 1 alua 2 1 service-time 0 1 1 8:64 1 service-time 0 1 1 8:128 "
and domap successes.
At this time, 36001405ca5165367d67447ea68108e1d has two path (sde, sdi),
but sdi is actually the path of 36001405b7679bd96b094bccbf971bc90.

(3) metadata of 36001405ca5165367d67447ea68108e1d sync
The metadata of 36001405b7679bd96b094bccbf971bc90 will be covered.

(4) umount 36001405b7679bd96b094bccbf971bc90
36001405b7679bd96b094bccbf971bc90 has no usable path when umount,
so the correct metadata doesn't sync.

(5) mount 36001405b7679bd96b094bccbf971bc90
Failed because of err metadata

I think there may be other ways to lead metadata err too. I have no good
idea to deal this. Can you give a great advice about this. Thanks very much.

Regards,
Lixiaokeng





More information about the dm-devel mailing list