[dm-devel] Multipath not re-activating failed paths? [SOLVED] and Multipath on root [SOLVED]
Darryl Dixon
esrever_otua at pythonhacker.is-a-geek.net
Fri Sep 15 09:08:53 UTC 2006
Hi Anbu,
For the benefit of the list, I tracked the problem of paths not
re-activating down to (ironically) the interaction between the
supposedly 'enhanced' HP-supplied GPL'ed QLogic drivers and our SUN
3510 :) What I noticed was that when the link was brought back up, two
of my four LUNs would have their second path re-activated, but the other
two wouldn't. In /var/log/messages whenever a cable was unplugged for
testing, I'd see messages like this:
----------8<----------[cut]
kernel: qla2300 0000:06:01.1: qla2xxx_eh_abort scsi(1:0:1:0):
cmd_timeout_in_sec=0x3c.
kernel: qla2300 0000:06:01.1: scsi(1:0:1:0): DEVICE RESET ISSUED.
kernel: qla2300 0000:06:01.1: qla2xxx_eh_device_reset: device reset
failed
kernel: qla2300 0000:06:01.1: scsi(1:0:1:0): LOOP RESET ISSUED.
kernel: qla2300 0000:06:01.1: qla2xxx_eh_bus_reset: reset failed
kernel: qla2300 0000:06:01.1: scsi(1:0:1:0): ADAPTER RESET issued.
kernel: qla2300 0000:06:01.1: Performing ISP error recovery - ha=
00000100f54903c8.
kernel: Performing ISP error recovery - ha= 00000100f54903c8.
kernel: qla2300 0000:06:01.1: LIP reset occured (f8f7).
kernel: qla2300 0000:06:01.1: LIP occured (f7f7).
kernel: qla2300 0000:06:01.1: LOOP UP detected (2 Gbps).
kernel: qla2300 0000:06:01.1: qla2xxx_eh_host_reset: reset succeded
kernel: scsi: Device offlined - not ready after error recovery: host 1
channel 0 id 2 lun 0
last message repeated 15 times
kernel: scsi: Device offlined - not ready after error recovery: host 1
channel 0 id 0 lun 0
----------8<----------[cut]
Sure enough, when I rolled back to use the standard RHEL qla2300.ko and
qla2xxx.ko kernel modules that are supplied in the distribution,
everything started working as expected, and I no longer saw the above
messages any more.
In summary, I *was* using the 'enhanced' QLogic drivers available from
HP et al, but the Qlogic drivers that are packaged by RedHat with RHEL 4
work better in this situation.
To answer your second question (HOW-TO multipath on root)...
In terms of changes to a default RHEL install, I needed to unpack the
standard initrd that is created with `mkinitrd` and then modify it as
follows:
* copy in the following files: bin/dmsetup.static, bin/kpartx.static,
bin/multipath.static, bin/scsi_id.static (these are available
from /sbin/ in a standard RHEL install), and then create symlinks in the
initrd that pointed the 'normal' names for each to the staticly compiled
version, eg bin/dmsetup -> bin/dmsetup.static
* copy /etc/multipath.conf (as outlined below in my earlier mail) to
etc/multipath.conf in the initrd
* edit the standard /etc/udev/rules.d/40-multipath.rules to use
different rules (THIS IS CRITICAL) that look like:
----------8<----------[cut]
# multipath wants the devmaps presented as meaninglful device names
# so name them after their devmap name
#The Blockdev
ACTION=="add", SUBSYSTEM=="block", KERNEL=="dm-*", \
PROGRAM="/sbin/dmsetup -j %M -m %m --noopencount --noheadings -c -o name
info"
#The Partitions
ACTION=="add", SUBSYSTEM=="block", KERNEL=="dm-*", \
RUN+="/sbin/kpartx -a /dev/mapper/%c"
----------8<----------[cut]
* ...and then copy the contents of /etc/udev/rules.d/* into the same
directory in the initrd
* Copy all the dm-* kernel modules and the qla* modules (if using QLogic
HBA) into lib/ in the initrd
* Edit the 'init' script in the initrd. Here's what mine looks like
now. I added the insmod lines for the dm-* modules and the qla* modules.
I also added the two lines beginning with 'multipath' and 'dmsetup',
which are critical, it won't work without them there (although I'm still
not certain on ~why~). Also, I seemed to need to load the qla2300 HBA
module *after* all the dm-* modules.
----------8<----------[cut]
#!/bin/nash
mount -t proc /proc /proc
setquiet
echo Mounted /proc filesystem
echo Mounting sysfs
mount -t sysfs none /sys
echo Creating /dev
mount -o mode=0755 -t tmpfs none /dev
mknod /dev/console c 5 1
mknod /dev/null c 1 3
mknod /dev/zero c 1 5
mkdir /dev/pts
mkdir /dev/shm
echo Starting udev
/sbin/udevstart
echo -n "/sbin/hotplug" > /proc/sys/kernel/hotplug
echo "Loading scsi_mod.ko module"
insmod /lib/scsi_mod.ko
echo "Loading sd_mod.ko module"
insmod /lib/sd_mod.ko
echo "Loading cciss.ko module"
insmod /lib/cciss.ko
echo "Loading scsi_transport_fc.ko module"
insmod /lib/scsi_transport_fc.ko
echo "Loading qla2xxx.ko module"
insmod /lib/qla2xxx.ko
echo "Loading dm-mod.ko module"
insmod /lib/dm-mod.ko
echo "Loading dm-multipath.ko module"
insmod /lib/dm-multipath.ko
echo "Loading dm-round-robin.ko module"
insmod /lib/dm-round-robin.ko
echo "Loading dm-mirror.ko module"
insmod /lib/dm-mirror.ko
# LOAD THE HBA DRIVER LAST
echo "Loading qla2300.ko module"
insmod /lib/qla2300.ko
/sbin/udevstart
# THE NEXT TWO LINES ARE CRITICAL
multipath
dmsetup ls --target multipath --exec "/sbin/kpartx -a"
echo Creating root device
mkrootdev /dev/root
umount /sys
echo Mounting root filesystem
mount -o defaults --ro -t ext2 /dev/root /sysroot
mount -t tmpfs --bind /dev /sysroot/dev
echo Switching to new root
switchroot /sysroot
umount /initrd/dev
----------8<----------[cut]
* Now re-pack the initrd and copy the image into /boot, then edit the
appropriate entry in your grub.conf so that the root= option points to
the mapper device (eg, mine is root=/dev/mapper/os2), and change the
initrd line to point at your newly modified initrd image.
* Finally, make sure that you have the appropriate entry in
your /etc/fstab; in my case /dev/mapper/os2 is the device to use for
root, as 'os' was the alias that I set up for the root LUN.
Now reboot :)
I hope that this helps anyone else trying to do what I have done, it was
the better part of a week's worth of work :)
many regards,
Darryl Dixon
http://www.winterhouseconsulting.com
On Fri, 2006-09-15 at 12:41 +0530, Arumugam, Anburaja (STSD) wrote:
> Hi Darryl,
>
> Not sure if this hint helps you, if you haven't tried this before. But
> you may want to check the process status of your 'multipathd' daemon
> which initiates the path verification, after the failure of one path.
> B'cos, for some reason if the 'multipathd' daemon is in "stopped" state,
> then there is no way for the multipath configurator to get the path back
> as online.
>
> You can check the status of the 'multipathd' daemon by using
> "/etc/init.d/multipathd status" on your host.
>
> Hope this helps!!
>
> We are curious of the fact that you have a working multipath root device
> setup on your side. Could you please give some pointers on how do we
> have the working multipath boot setup? What we are looking at is, what
> kind of changes you need to do at the grub.conf, and what kind of steps
> you should follow to get the multipath/udev/multipath.conf in the
> 'initrd', if we need to do so.
>
> Thanks in advance,
> Anbu
>
> -----Original Message-----
> From: dm-devel-bounces at redhat.com [mailto:dm-devel-bounces at redhat.com]
> On Behalf Of Darryl Dixon
> Sent: Friday, September 15, 2006 5:24 AM
> To: dm-devel at redhat.com
> Subject: [dm-devel] Multipath not re-activating failed paths?
>
> Hi All,
>
> I have a working dm-multipath set up with a multipath root device. For
> some reason, while multipath seems to correctly use both paths, and will
> gracefully handle the failing of a path (uninterrupted IO works OK), it
> does not seem to want to detect once the failed path has come back up
> again. In other words, in my two-path setup, it will load balance
> between the paths, continue successfully on one path when one fails, but
> it will then be 'stuck' on that path forever until the next reboot, even
> if the first path is back up and otherwise working fine.
>
> >From what I can understand of the multipath.conf settings, the paths
> should be tested every 5 seconds, and should be marked 'active' once
> they come back up.
>
> How can I best go about debugging/investigating this?
>
> My setup details:
> Machine: HP Blade BL25P with QLogic dual-ported HBA
> Storage: Two paths to SUN 3510
> OS: RHEL4 x86_64
> DM package: device-mapper-multipath-0.4.5-16.1.RHEL4
> uname -r: 2.6.9-42.0.2.ELsmp
>
> contents of /etc/multipath.conf:
> ----------8<----------[cut]
> devnode_blacklist {
> devnode "^cciss!c[0-9]d[0-9]*"
> }
>
> defaults {
> user_friendly_names yes
> no_path_retry fail
> path_grouping_policy multibus
> failback immediate
>
> }
>
> multipaths {
> multipath {
> wwid 3500000e01190e340
> alias os
> }
> }
> ----------8<----------[cut]
>
> Output of multipath -l:
> ----------8<----------[cut]
> 3500000e01190e100
> [size=68 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_
> 0:0:3:0 sdd 8:48 [active] \_ 1:0:3:0 sdh 8:112 [active]
>
> 3500000e01190e3f0
> [size=68 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_
> 0:0:1:0 sdb 8:16 [active] \_ 1:0:0:0 sde 8:64 [active]
>
> os (3500000e01190e340)
> [size=68 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active]
> \_ 0:0:0:0 sda 8:0 [active]
> \_ 1:0:2:0 sdg 8:96 [active]
>
> 3500000e01190e310
> [size=68 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_
> 0:0:2:0 sdc 8:32 [active] \_ 1:0:1:0 sdf 8:80 [active]
> ----------8<----------[cut]
>
> Contents of /dev/mapper/:
> ----------8<----------[cut]
> brw-rw---- 1 root disk 253, 3 Sep 15 2006 3500000e01190e100
> brw-rw---- 1 root disk 253, 2 Sep 15 2006 3500000e01190e310
> brw-rw---- 1 root disk 253, 1 Sep 15 2006 3500000e01190e3f0
> crw------- 1 root root 10, 63 Sep 15 2006 control
> brw-rw---- 1 root disk 253, 0 Sep 15 2006 os
> brw-rw---- 1 root disk 253, 4 Sep 15 2006 os1
> brw-rw---- 1 root disk 253, 5 Sep 15 2006 os2
> brw-rw---- 1 root disk 253, 6 Sep 15 2006 os3
> ----------8<----------[cut]
>
> Output of df -k:
> ----------8<----------[cut]
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/mapper/os2 50394996 29944792 17890248 63% /
> /dev/mapper/os1 101086 23801 72066 25% /boot
> none 5036176 0 5036176 0% /dev/shm
> ----------8<----------[cut]
>
>
> Any and all pointers or assistance appreciated.
>
> regards,
> Darryl Dixon
> http://www.winterhouseconsulting.com
>
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
More information about the dm-devel
mailing list