[dm-devel] multipath timing help sought

Thu Sep 27 15:10:11 UTC 2007

Timing help sought.. I think

We have been running on an iscsi mpath setup for about 1.5 years ... (no
real failover other than testing)
Here is the HW we are dealing with:
- Equallogic ps disk array dual controller modules
- qlogic 4052 HBA
- RHEL4.5

During testing phase things worked .. if i pulled power to a switch things
moved over to the other .. but when
other switch came back .. no 'failback' occurred.. I was not too concerned
about this as the initial failure worked and
oracle kept going etc.. (if this happened in real life i figured we would
obviously replace switch and reboot boxes when things
were back) ..

The orig  switch setup did not incorporate a trunk as expected by the
equallogic (as we now know) .. This  was our error and reason for the fail
back
to not happen (im thinking). By now everything is in production and we
discover this on a routine (during scheduled maintenance) fw
update of the switches which requires a reboot.

One week later (during maintenance again) we have the trunk in place between
our iscsi switches and spanning-tree working on the
switches (iscsi san looks like a square with two sets of switches with 1g
fiber connections on one set of parallel lines)

My issue is this .. I am now seeing  many  path 'failures' like below .. but
these are not really failures.. as it comes back
in less than 2 seconds.. It seems no real I/O is affected at all.

Is this due to a setting in my defaults section of multipath.conf? I'm
thinking minimum io or polling interval. Links all show
good on the switches and minimal errors (if any).

====== snip from /var/log/messages ===========
Sep 27 09:25:45 host kernel: SCSI error : <2 0 3 0> return code = 0x20000
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector
161085656
Sep 27 09:25:45 host kernel: device-mapper: dm-multipath: Failing path 8:64.
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector
161085664
Sep 27 09:25:45 host kernel: SCSI error : <2 0 3 0> return code = 0x20000
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector
119577336
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector
119577344
Sep 27 09:25:45 host kernel: SCSI error : <2 0 3 0> return code = 0x20000
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector
233247600
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector
233247608
Sep 27 09:25:45 host multipathd: 8:64: mark as failed
Sep 27 09:25:45 host multipathd: host.datafiles.prod: remaining active
paths: 1
Sep 27 09:25:47 host multipathd: 8:64: readsector0 checker reports path is
up
Sep 27 09:25:47 host multipathd: 8:64: reinstated
Sep 27 09:25:47 host multipathd: host.datafiles.prod: remaining active
paths: 2
Sep 27 09:25:47 host multipathd: host.datafiles.prod: switch to path group
#1
Sep 27 09:25:47 host multipathd: host.datafiles.prod: switch to path group
#1
========= end snip =========================

========= /etc/multipath.conf ================
defaults {
        multipath_tool          "/sbin/multipath -v0"
        udev_dir                /dev
        polling_interval        2
        selector                "round-robin 0"
        path_grouping_policy    failover
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        path_checker            readsector0
        prio_callout            "/bin/true"
        features                "0"
        rr_min_io               2
        rr_weight               priorities
        failback                immediate
        no_path_retry           fail
        user_friendly_name      yes
}
## everything is friendly names and ignore devices below
=========== end ======================

-- 
:wq!
kevin.foote
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20070927/cfe7fead/attachment.htm>