[dm-devel] Problems with multipathd

Simon gistolero at gmx.de
Wed Aug 31 15:29:06 UTC 2005


Hi,

I am trying to use multipath to provide a single block device for a
multipathed LUN for failover reasons. After some days of installation,
documentation reading and debugging I have solved a lot of problems but not
all and I need some help. I know it's a lot of text (sorry!!!), but I think
it's necessary to describe my problems.

I have marked my questions/comments with "===>". Please answer to this notes.
Thank you.



1.) *** System Description ***



Storage:

- Storage EVA-3000
- Controller-B connected to fabric-A and fabric-B
- one VDisk presented to host testhalde2 via controller-B to fabric-A and -B


Server (testhalde2):

- 1x HBA Qlogic 2340 connected to fabric-A
- 1x HBA Qlogic 2340 connected to fabric-B
- Kernel 2.6.12.5 (vanilla, gentoo)
- device-mapper-1.01.03, udev-058, multipath-tools-0.4.4


testhalde2 tmp # dmesg | fgrep device-mapper
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel at redhat.com
device-mapper: dm-multipath version 1.0.4 loaded
device-mapper: dm-round-robin version 1.0.0 loaded



testhalde2 tmp # lsmod
Module                  Size  Used by
qla2300               123904  0 
qla2xxx                88208  4 qla2300
scsi_transport_fc      26880  1 qla2xxx


testhalde2 etc # cat multipath.conf
defaults {
        multipath_tool                  "/sbin/multipath -v 0 -S"
        udev_dir                        /dev
        polling_interval                10
        default_selector                "round-robin 0"
        default_path_grouping_policy    failover
        default_getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        default_prio_callout            "/bin/false"
        r_min_io                        100
}
blacklist {
        wwid 26353900f02796769
        devnode "(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "hd[a-z][[0-9]*]"
        devnode "cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
multipaths {
        multipath {
                wwid                    3600508b40010079d0001900000460000
                alias                   150gb
                path_grouping_policy    failover
                path_selector           "round-robin 0"
        }
}
devices {
        device {
                vendor                  "HP      "
                product                 "HSV100          "
                path_grouping_policy    multibus
                path_checker            tur
                prio_callout            "/sbin/pp_balance_units %d"
        }
}


testhalde2 etc # cat /etc/udev/rules.d/20-multipath.rules 
KERNEL="dm-[0-9]*", PROGRAM="/sbin/devmap_name %M %m", NAME="%k", SYMLINK="%c"


testhalde2 ~ # cat /etc/dev.d/block/multipath.dev
#!/bin/sh -e
print()
{
  echo "`date +%H%M%S` - $1" >> /tmp/devd_multipath
}
print "ENV_ACTION:  $ACTION" # debugging
if [ ! "${ACTION}" = add ] ; then
        exit
fi
if [ "${DEVPATH:7:3}" = "dm-" ] ; then
        dev=$(</sys${DEVPATH}/dev)
        map=$(/sbin/devmap_name $dev)
        print "KPARTX $map" # debugging
        /sbin/kpartx -v -a /dev/$map >> /tmp/devd_multipath
else
        print "ENV_DEVNAME: ${DEVNAME}" # debugging
        /sbin/multipath ${DEVNAME}
fi



2.) *** Multipath in action ***



After rebooting testhalde2, I see the following:


testhalde2 tmp # ls /sys/block/
dm-0  loop0  loop3  loop6  ram1   ram12  ram15  ram4  ram7  sda
fd0   loop1  loop4  loop7  ram10  ram13  ram2   ram5  ram8  sdb
hda   loop2  loop5  ram0   ram11  ram14  ram3   ram6  ram9


testhalde2 tmp # ls -lF /dev/mapper/
total 0
brw-------  1 root root 254,  0 Aug 31 12:20 150gb
crw-rw----  1 root root  10, 63 Aug 31  2005 control


testhalde2 ~ # fdisk -l /dev/mapper/150gb 
Disk /dev/mapper/150gb: 161.0 GB, 161061273600 bytes
255 heads, 63 sectors/track, 19581 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/mapper/150gb doesn't contain a valid partition table


===> Is it possible to _use_ partitions on this device? I know that it is
     possible to create them, but what is the device-name (/dev/...) from
     partition 1?


testhalde2 ~ # mkreiserfs /dev/mapper/150gb      
mkreiserfs 3.6.19 (2003 www.namesys.com)
...
ReiserFS is successfully created on /dev/mapper/150gb.
testhalde2 ~ # 


testhalde2 ~ # mount /dev/mapper/150gb /mnt/test/
testhalde2 ~ # touch /mnt/test/file  # ok
testhalde2 ~ # rm /mnt/test/file     # ok


testhalde2 rules.d # udevtest /sys/block/dm-0 block
udevtest.c: looking at device '/block/dm-0' from subsystem 'block'
udevtest.c: opened class_dev->name='dm-0'
udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
d, added symlink '%c'
udev_rules.c: add symlink '150gb'
udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
d, 'dm-0' becomes '%k'
udev_rules.c: configured rule in '/etc/udev/rules.d/50-udev.rules[63]' applied, '
dm-0' is ignored


testhalde2 tmp # ls -lF /dev/1*
ls: /dev/1*: No such file or directory


===> udevtest shows that udev reads the 20-multipath.rules rule. Why doesn't
     udev creates /dev/150gb?


testhalde2 tmp # multipath -l   
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 0:0:0:1 sda  8:0     [ready ][active]
\_ round-robin 0 [enabled]
  \_ 1:0:0:1 sdb  8:16    [ready ][active]


testhalde2 tmp # dmsetup table
150gb: 0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1 
1 8:16 1000 


testhalde2 tmp # cat devd_multipath    # multipath.dev debugging output
...
142037 - ENV_DEVPATH: ram
142037 - ENV_DEVNAME: /dev/rd/9
142046 - ENV_ACTION:  add
142046 - ENV_DEVPATH: sda
142046 - ENV_DEVNAME: /dev/sda
122045 - ENV_ACTION:  add
122045 - ENV_DEVPATH: sdb
122045 - ENV_DEVNAME: /dev/sdb


testhalde2 tmp # fgrep dm devd_multipath 
testhalde2 tmp # 


===> My idea from the sysfs/device-mapper/udev/multipath cooperation is the
following: After loading the hba module qla2300 the kernel creates
/sys/block/sda and /sys/block/sdb und executes udevsend - udevd - udev. udev
invokes /etc/dev.d/block/multipath.dev (ADD, sda/sdb). multipath.dev executes
multipath that creates the device-mapper table and the device-mapper device
/sys/block/dm-0. Ok, now we have a new device (dm-0). And again: udevsend -
udevd - udev and multipath.dev (ADD, dm-0). multipath.dev should start kpartx,
but the debug file /tmp/devd_multipath shows nothing! So, I think kpartx will
never started. Is this behavior ok? It seems to work without kpartx, so I don't
understand why I need this tool.
     


testhalde2 ~ # multipath -v3
fd0 blacklisted
ram0 blacklisted
ram1 blacklisted
ram2 blacklisted
ram3 blacklisted
ram4 blacklisted
ram5 blacklisted
ram6 blacklisted
ram7 blacklisted
ram8 blacklisted
ram9 blacklisted
ram10 blacklisted
ram11 blacklisted
ram12 blacklisted
ram13 blacklisted
ram14 blacklisted
ram15 blacklisted
loop0 blacklisted
loop1 blacklisted
loop2 blacklisted
loop3 blacklisted
loop4 blacklisted
loop5 blacklisted
loop6 blacklisted
loop7 blacklisted
hda blacklisted
path sda not found in pathvec

===== path sda =====
vendor = HP      
:
product = HSV100          
rev = 3025
dev_t = 8:0
size = 314572800
h:b:t:l = 0:0:0:1
tgt_node_name = 0x50001fe150051d20
serial = P66C5E2AAQI010
path checker = tur (controler setting)
state = 2
getprio = /sbin/pp_balance_units %d (controler setting)
prio = 1
getuid = /sbin/scsi_id -g -u -s /block/%n (internal default)
uid = 3600508b40010079d0001900000460000 (callout)
path sdb not found in pathvec

===== path sdb =====
vendor = HP      
product = HSV100          
rev = 3025
dev_t = 8:16
size = 314572800
h:b:t:l = 1:0:0:1
tgt_node_name = 0x50001fe150051d20
serial = P66C5E2AAQI010
path checker = tur (controler setting)
state = 2
getprio = /sbin/pp_balance_units %d (controler setting)
prio = 1
getuid = /sbin/scsi_id -g -u -s /block/%n (internal default)
uid = 3600508b40010079d0001900000460000 (callout)
dm-0 blacklisted
#
# all paths :
#
3600508b40010079d0001900000460000 0:0:0:1 sda  8:0     [ready ][HSV100          ]
3600508b40010079d0001900000460000 1:0:0:1 sdb  8:16    [ready ][HSV100          ]
params = 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1000 
status = 1 0 0 2 1 A 0 1 0 8:0 A 0 E 0 1 0 8:16 A 0 
pgpolicy = failover (LUN setting)
selector = round-robin 0 (LUN setting)
features = 0 (internal default)
hwhandler = 0 (internal default)
0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1
000
action preset to 0
action set to 1
cannot signal daemon, pidfile not found
testhalde2 ~ # 


testhalde2 ~ # ps ax | fgrep multipathd
10870 pts/0    SL     0:00 multipathd
10871 pts/0    SL     0:00 multipathd
10872 pts/0    SL     0:00 multipathd
10875 pts/0    S+     0:00 fgrep multipathd

testhalde2 ~ # ls /var/run/multipathd.pid
ls: /var/run/multipathd.pid: No such file or directory

===> Does the system really need _three_ multipathd daemons and why is
     there no pid file?


testhalde2 ~ # echo 10870 > /var/run/multipathd.pid
testhalde2 ~ # strace -f -p 10870 >/tmp/strace_multipatd 2>&1 &
[1] 11192




Now, I disable HBA-fabric-B port on the san-switch...

testhalde2 ~ # multipath -l
[ sleeping 35 seconds ]
open class /sys/block/sdc failed: No such file or directory
error calling out /sbin/scsi_id -g -u -s /block/sdc
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 1:0:0:1 sdb  8:16    [ready ][active]
\_ round-robin 0 [enabled]
  \_ 0:0:0:1 sdc  8:32    [ready ][active]

testhalde2 ~ # multipath -l       # again
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 1:0:0:1 sdb  8:16    [ready ][active]
\_ round-robin 0 [enabled]
  \_ 0:0:0:0      8:32    [undef ][active]

testhalde2 tmp # touch /mnt/test/test    # ok
testhalde2 tmp # rm /mnt/test/test       # ok

testhalde2 tmp # ps ax | fgrep multipathd
10871 pts/0    SL     0:00 multipathd
10872 pts/0    SL     0:00 multipathd
10870 pts/0    SL     0:00 multipathd
11534 pts/0    S+     0:00 fgrep multipathd

testhalde2 tmp # cat strace_multipatd 
Process 10870 attached - interrupt to quit
testhalde2 tmp # 

===> No output in the strace-debug file from multipathd. It seems that
     multipathd don't recognize the changes.
 


Enabling HBA-fabric-B port on the san-switch...

testhalde2 tmp # multipath -l
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 1:0:0:1 sdb  8:16    [ready ][active]
\_ round-robin 0 [enabled]
  \_ 0:0:0:1 sdc  8:32    [ready ][active]

testhalde2 tmp # touch /mnt/test/test    # ok
testhalde2 tmp # rm /mnt/test/test       # ok




Disabling HBA-fabric-A port on the other san-switch...

testhalde2 ~ # multipath -l
[ sleeping 35 seconds ]
1:0:0:1: cannot open /tmp/scsi-maj8-min16-11665: No such device or address
error calling out /sbin/scsi_id -g -u -s /block/sdb
error calling out /sbin/pp_balance_units 8:32
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active][first]
  \_ 1:0:0:1 sdb  8:16    [faulty][active]
\_ round-robin 0 [enabled]
  \_ 0:0:0:1 sdc  8:32    [ready ][active]

testhalde2 tmp # multipath -l             # again
error calling out /sbin/pp_balance_units 8:32
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active][first]
  \_ 0:0:0:0      8:16    [undef ][active]
\_ round-robin 0 [enabled]
  \_ 0:0:0:1 sdc  8:32    [ready ][active]

testhalde2 tmp # touch /mnt/test/test    # ok
testhalde2 tmp # rm /mnt/test/test       # ok


===> Why do I get the "error calling out..." error only when I disable the
     HBA-port from _fabric-A_?


Enabling HBA-fabric-A port...

testhalde2 tmp # multipath -l
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 0:0:0:1 sdc  8:32    [ready ][active]
\_ round-robin 0 [enabled]
  \_ 1:0:0:1 sda  8:0     [ready ][active]

testhalde2 tmp # touch /mnt/test/test    # ok
testhalde2 tmp # rm /mnt/test/test       # ok


testhalde2 tmp # ps ax | fgrep multipathd
10871 pts/0    SL     0:00 multipathd
10872 pts/0    SL     0:00 multipathd
10870 pts/0    SL     0:00 multipathd
11534 pts/0    S+     0:00 fgrep multipathd

testhalde2 tmp # cat strace_multipatd 
Process 10870 attached - interrupt to quit
testhalde2 tmp # 

===> Again: No output in the strace-debug file from multipathd.



SUMMARY:
========

The failover mechanism seems to work, but it's very very slow (>= 35 sec).
I am sure that the host will die when I have a lot of I/O's in this moment.
The documentation says that multipathd "is in charge of checking the paths
in case they come up or down" and multipathd seems to do nothing... I think
that is the problem... What do you mean?


Thanks a lot for your help
Simon


-- 

Simon
gistolero at gmx.de




More information about the dm-devel mailing list