[dm-devel] Problems with multipathing

James Smart James.Smart at Emulex.Com
Tue Apr 18 19:38:44 UTC 2006


> Roger Håkansson a écrit :
>> Also, I've noticed that it's not only when a controller fails that this
>> happens, when a failed controller is "revived" the same thing might 
>> happen.
>>
>> As far as I've been able to tell, the more I/O-transactions at the time
>> of the failure, the more likely that the (SCSI) device will be marked as
>> "dead".

Hmmm.. I'm wondering if he's hitting the scenario in which the midlayer
marks the sdev in an offline state - which could be the "dead" state.
This occurs if an i/o hits the LLDD when the device is disconnected, and
error recovery fails. If so, at a later time when the LLDD has connectivity
and can access the device, the scsi layer would still likely bounce i/o.
It requires a manual interaction to change it back to a running state,
any i/o requests by dm would be failed back by the midlayer.

What doesn't jive is the rescan re-enabling the device. As I stated, this
is usually a manual action to restore things. If the rescans are just
prior to the transition to the offline state, they may be making dm change
it's path mappings to avoid i/o to the failed path, thus deflecting the
sdev transition.  Can you report the contents of
/sys/class/scsi_device/1:0:*/device/state  at the following states in both
the works and does not work cases :
   working, right after failover but before dm fails it; after failure/success

-- james




More information about the dm-devel mailing list