[dm-devel] failure in path between fc switch and storage: info request
Gianluca Cecchi
gianluca.cecchi at gmail.com
Thu Oct 21 10:01:42 UTC 2010
Hello,
I have some servers connected with two qlogic hba to two different fc switches.
Each fc switch is then connected to the two controllers of the storage
array (IBM DS6800), one port for each controller.
So my servers have 4 paths to a LUN.
They are RH EL 5.5 x86_64 with slghtly different minor versions of dm
(see below)
I had a problem with the gbic of one of the fc switches, connected to
the controller od the storage array.
So in this case the servers lose a path.
After gbic replacement, I register different behaviours.
1) cluster of two servers that both have access to the storage
The same disk is seen in different mode by the two servers.
servera
mpath3 (3600507630efe0b0c0000000000000804) dm-5 IBM,1750500
[size=1.0G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:1:8 sdaf 65:240 [active][undef]
\_ 0:0:1:8 sdp 8:240 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 0:0:0:8 sdh 8:112 [active][undef]
\_ 1:0:0:8 sdx 65:112 [active][undef]
serverb
mpath3 (3600507630efe0b0c0000000000000804) dm-5 IBM,1750500
[size=1.0G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
\_ 0:0:1:8 sdp 8:240 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 0:0:0:8 sdh 8:112 [active][undef]
\_ 1:0:0:8 sdx 65:112 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:8 sdaf 65:240 [active][undef]
Here I have:
device-mapper-1.02.39-1.el5_5.2
device-mapper-event-1.02.39-1.el5_5.2
device-mapper-1.02.39-1.el5_5.2
device-mapper-multipath-0.4.7-34.el5_5.5
2) Standalone system with unique access to the luns
mpath22 (3600507630efe0b0c0000000000000400) dm-14 IBM,1750500
[size=60G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
\_ 0:0:1:7 sdag 66:0 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 0:0:0:7 sdac 65:192 [active][undef]
\_ 1:0:0:7 sdak 66:64 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:7 sdao 66:128 [active][undef]
Here I have:
device-mapper-1.02.39-1.el5_5.2
device-mapper-1.02.39-1.el5_5.2
device-mapper-event-1.02.39-1.el5_5.2
device-mapper-multipath-0.4.7-34.el5_5.6
3) another cluster of two servers. One of them seems to have ok
conditions for some LUNS and for others it continues to register
failed
servera
mpath22 (3600507630efe0b0c0000000000000606) dm-17 IBM,1750500
[size=120G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
\_ 2:0:3:11 sdbb 67:80 [active][undef]
\_ 1:0:3:11 sdbo 68:32 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:2:11 sdo 8:224 [active][undef]
\_ 2:0:2:11 sdz 65:144 [active][undef]
...
mpath1 (3600507630efe0b0c0000000000000601) dm-8 IBM,1750500
[size=15G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:3:2 sdao 66:128 [active][undef]
\_ 2:0:3:2 sdaq 66:160 [failed][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:2:2 sdd 8:48 [active][undef]
\_ 2:0:2:2 sdp 8:240 [active][undef]
serverb
mpath22 (3600507630efe0b0c0000000000000606) dm-11 IBM,1750500
[size=120G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:1:11 sdar 66:176 [active][undef]
\_ 2:0:1:11 sdbo 68:32 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:11 sdao 66:128 [active][undef]
\_ 1:0:0:11 sdm 8:192 [active][undef]
mpath1 (3600507630efe0b0c0000000000000601) dm-4 IBM,1750500
[size=15G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:1:2 sdae 65:224 [active][undef]
\_ 2:0:1:2 sdbb 67:80 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:2 sdd 8:48 [active][undef]
\_ 2:0:0:2 sdu 65:64 [active][undef]
here I have:
device-mapper-multipath-0.4.7-34.el5_5.4
device-mapper-1.02.39-1.el5_5.2
device-mapper-1.02.39-1.el5_5.2
device-mapper-event-1.02.39-1.el5_5.2
My relevant configuration in multipath.conf for all the systems is:
devices {
device {
vendor "IBM"
product "1750500"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout "/sbin/mpath_prio_alua %d"
features "0"
hardware_handler "0"
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
path_checker tur
}
}
Also, in cluster nodes I have lines such as:
multipath {
wwid 3600507630efe0b0c0000000000000601
alias mpath1
}
for binding, to have both nodes see the storage LUNS with the same name
Any suggestions?
It seems that 3) after about half an hour is returned ok.
But for example 2) continues to have group composition anomalies....
how to re-set as originally?
Thanks in advance,
Gianluca
More information about the dm-devel
mailing list