[dm-devel] dm-multipath (multipathd) not removing/adding channels back on one device in a multipath array

Eli Stair estair at ilm.com
Mon Oct 9 22:35:41 UTC 2006


All,

I'm experiencing repeatable issues with multipathd (but not the kernel 
detecting, or multipath manually) failing to add and/or remove paths to 
a single device on a dual-loop FC disk tray.  If I stop multipathd from 
running, the kernel sees the paths as unreachable and marks them as 
'failed' in the multipath -l output.  If I run 'multipath' manually, it 
_always_ picks up or removes the appropriate channels for all devices.

The failure mode comes up when using multipathd to auto-correct for path 
failures.  There is only a /single/ device (the first FC drive in the 
array) that (reliably) has issues.

When running multipathd, the drive that is enumerated as /dev/sdb && 
/dev/sdp (14-drive enclosure sdb-sdo, drive naming re-starts at 
/dev/sdp) gets skipped upon removal or addition of the path at least 50% 
of the time.  No amount of time I've waited has resulted in multipathd 
making another attempt at fixing the path, however, running 'multipath' 
immediately results in IT cleaning up the straggler and it is made 
proper with output to that effect.  Of note, if I leave multipathd off 
and do not manually run multipath before reconnecting the FC channel, 
upon disconnecting it again the system OOPS'es like mad, and hard-crashes.


// Example of multipathd leaving the drive (mpath3) with only one path 
"up", while the others have both paths present:

mpath10 (32000000c50e8df4b)
[size=136 GB][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 2:0:8:0  sdj  8:144  [active][undef]
  \_ 3:0:8:0  sdx  65:112 [active][undef]
mpath3 (320000011c6bdfbd5)
[size=136 GB][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 2:0:0:0  sdb  8:16   [active][undef]


// Example output (snippet) of 'multipath -v4' after 'multipathd' fails 
to fix it:

mpath3: set ACT_RELOAD (path group topology change)
reload: mpath3 (320000011c6bdfbd5)
[size=136 GB][features=0][hwhandler=0]
\_ round-robin 0 [prio=2][undef]
  \_ 2:0:0:0  sdb  8:16   [active][ready]
  \_ 3:0:0:0  sdp  8:240  [undef][ready]



The ACT_RELOAD line is what differs at this point, as all the other 
fully-populated multipath devices show for instance "mpath0: set 
ACT_NOTHING (map unchanged)".  It seems that whatever criteria 
multipathd is using to test a device and adjust its settings are failing 
on this first-enumerated disk when it starts looking at the drives 
through the second FC loop.

I've attached typescript of both "multipathd -d" in one file, and the 
multipath -l and multipath -v output in a second file.  It indicates in 
detail the sequence of events on both loop addition and removal from the 
system.  dmesg output also attached.

I'd love to be of as much assistance as can, as I have eight systems 
currently with this problem, and can't do much with them as of yet.  I 
have a set of QLogic qla2300 controllers as well as different disk trays 
I'll be testing to see if this is a controller or enclosure-specific issue.

Please let me know what more I can do/provide/try to make myself useful.


Cheers,


/eli





/////////////

Some info:

2x Opteron 248, 8GB RAM

Tyan S2882 and Arima HDAMA boards tested.

kernel 2.6.18 (64-bit, SMP, NUMA)

dm-multipath v0.4.7 (03/12, 2006)

2 LSI FC adapters single-port 2G

14-drive LSI FC JBOD tray (model 2600/0834) dual-controller 2G


#multipath.conf:
defaults {
         polling_interval        5
         path_grouping_policy    multibus
         rr_min_io               100
         failback                15
         no_path_retry           2
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: multipathd-debug.out.bz2
Type: application/octet-stream
Size: 1565 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20061009/ef4acf68/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: multipath-debug.out.bz2
Type: application/octet-stream
Size: 8134 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20061009/ef4acf68/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.2006-10-09.bz2
Type: application/octet-stream
Size: 8973 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20061009/ef4acf68/attachment-0002.obj>


More information about the dm-devel mailing list