[dm-devel] multipath prio_callout broke from 5.2 to 5.3

John A. Sullivan III jsullivan at opensourcedevel.com
Mon Apr 13 09:00:05 UTC 2009


Thank you.  I'll detail our script and the logic behind it in a separate
email in case it is helpful to others.

In the meantime, we have a critical problem where the script which was
working perfectly in 5.2 is now broken in 5.3.  Is there any way to
deconfuse the 5.3 multipathd or any other immediate solution? - John

On Sun, 2009-04-12 at 09:13 +0200, christophe.varoqui at free.fr wrote:
> John,
> 
> Redhat-shiped multipathd populates upon start-up a private mem-backed filesystem with binaries it needs.
> Prio callouts in the form "$SHELL /path/to/myscript" seem to confuse the logic.
> If you prio callout is of general interest, may be we can port it upstream (as a shared object).
> If you are interested, please describe and post the source.
> 
> Regards,
> cvaroqui
> 
> ----- Mail Original -----
> De: "John A. Sullivan III" <jsullivan at opensourcedevel.com>
> À: "device-mapper development" <dm-devel at redhat.com>
> Envoyé: Dimanche 12 Avril 2009 06h07:55 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
> Objet: Re: [dm-devel] multipath prio_callout broke from 5.2 to 5.3
> 
> On Sat, 2009-04-11 at 23:54 -0400, John A. Sullivan III wrote:
> > Hello, all.  We are facing a serious problem with dm-multipath after our
> > upgrade.  We use a bash script to set priorities for failover.  We
> > understand multipathd cannot use a bash script directly so it has been
> > carefully crafted to use only internal commands and is loaded as:
> > 
> > prio_callout            "/bin/bash /usr/local/sbin/mpath_prio_ssi %n"
> > 
> > This has been working perfectly fine.  We upgraded our test lab to
> > CentOS 5.3, device-mapper-multipath.x86_64 0.4.7-23.el5_3.2, kernel
> > 2.6.29.1 (the 2.6.18 default causes a kernel panic with iSCSI).
> > Suddenly, it is breaking.  /var/log/messages is filled with:
> > 
> > Apr 11 23:17:15 kvm01 multipathd: cannot open /sbin/dasd_id : No such file or directory
> > Apr 11 23:17:15 kvm01 multipathd: cannot open /sbin/gnbd_import : No such file or directory
> > Apr 11 23:17:15 kvm01 multipathd: [copy.c] cannot open /sbin/dasd_id
> > Apr 11 23:17:15 kvm01 multipathd: cannot copy /sbin/dasd_id in ramfs : No such file or directory
> > Apr 11 23:17:15 kvm01 multipathd: [copy.c] cannot open /sbin/gnbd_import
> > Apr 11 23:17:15 kvm01 multipathd: cannot copy /sbin/gnbd_import in ramfs : No such file or directory
> > Apr 11 23:17:15 kvm01 multipathd: /bin/bash exitted with 127
> > Apr 11 23:17:15 kvm01 multipathd: error calling out /bin/bash /usr/local/sbin/mpath_prio_ssi sdc
> > Apr 11 23:17:15 kvm01 multipathd: /bin/bash exitted with 127
> > Apr 11 23:17:15 kvm01 multipathd: error calling out /bin/bash /usr/local/sbin/mpath_prio_ssi sdd
> > Apr 11 23:17:15 kvm01 multipathd: /bin/bash exitted with 127
> > Apr 11 23:17:15 kvm01 multipathd: error calling out /bin/bash /usr/local/sbin/mpath_prio_ssi sde
> > Apr 11 23:17:15 kvm01 multipathd: /bin/bash exitted with 127
> > 
> > The first several messages are expected but not the latter ones.  If we
> > run the call from the command line, e.g.,
> > "/bin/bash /usr/local/sbin/mpath_prio_ssi sdc" it works perfectly fine.
> > 
> > What has changed and how do we fix it? I'll include a sample script
> > below.  The script is dynamically created just before launching
> > multipathd:
> > 
> > #!/bin/bash
> > # if not passed any device name, return a priority of 0
> > if [ -z "${1}" ];then
> >         echo 0
> >         exit
> > fi
> > 
> > DEVS="lrwxrwxrwx 1 root root  9 Apr 11 23:13 ip-172.x.x.30:3260-iscsi-iqn.1986-03.com.sun:02:17f534f0-74af-e61b-a716-b8ac8e219dac-lun-0 -> ../../sdj
> > lrwxrwxrwx 1 root root  9 Apr 11 23:13 ip-172.x.x.30:3260-iscsi-iqn.1986-03.com.sun:02:47c5e722-10d3-66c7-a952-d3d79732da9c-lun-0 -> ../../sdr
> > lrwxrwxrwx 1 root root  9 Apr 11 23:13 ip-172.x.x.30:3260-iscsi-iqn.1986-03.com.sun:02:520e823d-342c-6668-9477-fad130b148d7-lun-0 -> ../../sdn"
> > 
> > LIST="172.x.x.78:3260-iscsi-iqn.1986-03.com.sun:02:adb0cf37-9a23-6fc9-922a-eb4540bee1c9->99
> > 172.x.x.46:3260-iscsi-iqn.1986-03.com.sun:02:adb0cf37-9a23-6fc9-922a-eb4540bee1c9->49
> > 172.x.x.62:3260-iscsi-iqn.1986-03.com.sun:02:adb0cf37-9a23-6fc9-922a-eb4540bee1c9->24"
> > 
> > FOUND=0
> > IFSORIG=${IFS}
> > IFS=$'\n'
> > for LINE in ${DEVS}
> > do
> >         ENTRY=${LINE%/${1}}
> >         if [ ${#ENTRY} -ne ${#LINE} ];then # We found the line
> >                 FOUND=1
> >                 break
> >         fi
> > done
> > if [ "$FOUND" = "0" ];then  # This is not an iSCSI device
> >         echo 0
> >         exit
> > fi
> > DEV="${ENTRY##* ip-}"
> > #DEV="${DEV%% ->*}" # the pattern changed in CentOS 5.3
> > #DEV="$(echo ${DEV} | sed 's/-lun-[0-9][0-9]* ->.*//')"
> > DEV="${DEV%%-lun-[0-9]* ->*}"
> > PRIORITY=0
> > for LINE in ${LIST}
> > do
> >         DISK=${LINE%->*}
> >         if [ "${DEV}" = "${DISK}" ];then
> >                 PRIORITY="${LINE##*->}"
> >                 break
> >         fi
> > done
> > echo ${PRIORITY}
> > 
> > I did notice the semantics of /dev/disk/by-path changed and we adapted
> > to that.  We were planning to move this to production on Thursday so
> > this has thrown a huge spanner in the works.  Any help would be greatly
> > appreciated.  Thanks - John
> 
> I've just notice that my console is filled with:
> 
> /bin/bash: /usr/local/sbin/mpath_prio_ssi: No such file or directory
> 
> but it is indeed there and owned by root and executable.  I've quintuple
> checked! Has multipathd been changed so it cannot read anything from
> disk even if invoked from within bash? Thanks - John
-- 
John A. Sullivan III
Open Source Development Corporation
+1 207-985-7880
jsullivan at opensourcedevel.com

http://www.spiritualoutreach.com
Making Christianity intelligible to secular society





More information about the dm-devel mailing list