[dm-devel] 答复: why command of multipath send reinstate message to all dm's paths

Douglas Gilbert dgilbert at interlog.com
Fri Jun 29 14:28:52 UTC 2018


On 2018-06-28 09:38 AM, Martin Wilck wrote:
> [I've added Hannes, Ben and Douglas to the recepient list to fill in
> knowledge from the past that I may lack].
> 
> tl;dr summary: We've got 3 issues:
> 
>   1) Why does multipath, in reinstate_paths(), try to reinstate paths
> which are known to be down?
>   2) rescan-scsi-bus.sh can call "multipath" even if "-m" switch is not
> used (that looks like a bug to me).
>   3) In Jiaojianbing's environment, dead paths that have been removed on
> the target and were already marked "offline" may appear as "running"
> after rescan-scsi-bus.sh invocation.
> 
> Furthermore,
>   4) perhaps rescan-scsi-bus.sh should replace suboptimal "multipath"
> calls with multipathd cli commands (or better even, we multipath-tools
> people should eventually finish the "delegate to multipathd" work).
> 
> 
> On Thu, 2018-06-28 at 06:35 +0000, Jiaojianbing wrote:
>>>> Dear Christophe,
>>>> when dm-105 is in one state of below, paths of dm-105 will change
>>>> to
>>>> active if we run command of multipath.
>>>
>>> Could you be more specific please? What multipath command did you
>>> run?
>>> Which version of multipath-tools are you running?
>>
>> command is "multipath", which can run in shell as below:
>> #multipath
> 
> ... and if I understand correctly, originally the problem occured while
> running rescan_scsi_bus.sh. Please also state the version of sg3_utils
> you are using.
> 
>>
>> And the version:  multipath-tools v0.4.9 (05/33, 2016)
> 
> Well, that's ancient. But latest multipath-tools still has the same
> code.
> 
>>
>>>>   I check code of multipath, it sends messge "reinstate_path
>>>> pathname"
>>>> to kernel in routine reinstate_paths when status of pathgroup =
>>>> "PGSTATE_ENABLED/PGSTATE_UNDEF" and path's state =
>>>> "PSTATE_FAILED".
>>>> why command of multipath do above action to all dm devices?
>>>> actually,
>>>> parts of these paths are already offline or failed which can't be
>>>> recovered. Maybe we can check these devices's status by sending
>>>> io to
>>>> these sd device at first. according to return of io, multipath
>>>> send
>>>> reinstate to running devices and do nothing to failed devices?
>>>
>>> I see this code in reinstate_paths():
>>>
>>> 		vector_foreach_slot (pgp->paths, pp, j) {
>>> 			if (pp->state != PATH_UP &&
>>> 			    (pgp->status == PGSTATE_DISABLED ||
>>> 			     pgp->status == PGSTATE_ACTIVE))
>>> 				continue;
>>>
>>> 			if (pp->dmstate == PSTATE_FAILED) {
>>> 				if (dm_reinstate_path(mpp->alias, pp-
>>>> dev_t))
>>> 					condlog(0, "%s: error
>>> reinstating",
>>> 						pp->dev);
>>> 			}
>>> 		}
>>>
>>> The reinstate command is only sent for paths which are either in
>>> PATH_UP
>>> state, or belong to an PGSTATE_ENABLED path group. I admit I'm
>>> unsure why
>>> all we try to reinstate paths that we know are down. This is 13-
>>> year-old code.
>>>
>>> Interstingly, the state of your paths changes from "faulty offline"
>>> to "ready
>>> running". So it appears that these paths are actually _not_
>>> down  Just the
>>> reinstate seems has failed on them.
>>>
>>> multipathd -v3 logs and possibly kernel logs would be helpful to
>>> understand
>>> what was going on in that situation.
>>
>>      Sorry, maybe my two multipath status sample confused you. They
>> are just sample. Actually, I run command "rescan-scsi-bus" to
>> clear all mapped scsi devices by iscsid in host when all of LUNS in
>> remote IPSAN are removed.
>> In process of running rescan-scsi-bus, if command "multipath" is
>> running, the status of dm's path will change from
>> failed to active in some moment as below. If IO is sent to dm-105,
>> the process who sends io will be in D state.
>> # multipath -ll
>> 36d0d04b100b8cba665a187f0000000f9 dm-105 HUAWEI  ,XSG1
>> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
>> `-+- policy='service-time 0' prio=1 status=active
>>    `- 18:0:0:101 sdku 67:288  active faulty running
> 
> The strange part here is that the device is considered "running". This
> is the state of the kernel device. If the LUNs are actually _removed_
> as you say, the device should be gone, or at least marked "offline".
> 
> Apparently the SCSI bus SCAN via iSCSI still showed the LUN in a
> workable state. For multipath this translates to PATH_UP. Thus even if
> the above code didn't have the (pgp->status == PGSTATE_DISABLED || pgp-
>> status == PGSTATE_ACTIVE) clause, the reinstate would have been
> attempted by multipath. This looks like a low-level problem in your
> SCSI or iSCSI layer to me.
> 
> This looks like the actual problem to me. multipath aside, if the path
> appears to be "running", any Linux process could try to send IO down to
> it and be stuck, as you say.
> 
>>
>>    I want to know whether command "multipath" is reasonable in
>> reinstate_paths().
> 
> 
>>> And maybe we should not call "multipath" in process of running
>> rescan-scsi-bus ?
> 
> Normally rescan-scsi-bus.sh should call "multipath" only if the
> "-m|--multipath" switch was used. I quickly scanned through the code
> and didn't find a call to "multipath" (with no options) which wasn't
> guarded by the [ -n "$mp_enable" ] condition. (FTR: there is a call to
> "multipath -f" from main->flushmpaths if "-f|--flush" is set).
> 
> Again, please double-check your version of sg3_utils, and perhaps run
> "bash -x rescan-scsi-bus.sh" to figure out the call chain which runs
> the "multipath" command.
> 
> Thanks,
> Martin
> 

Hi,
My upstream version of rescan-scsi-bus.sh is attached. The last change was
the --ignore-rev option from Gris Ge <fge at redhat.com>. He has sent several
cleanups in the last year, usually via Hannes' github site for sg3_utils.

My ChangeLog entry to that script (since sg3_utils 1.42) is:

   - rescan-scsi-bus.sh: harden code
     - fixes from Suse; bump version
     - bump version to 20180615
     - add to install list in Makefile, hope it does
       not clash with other package providing it
     - add --ignore-rev to ignore revision change

If there are no further changes it will be like that in sg3_utils-1.43
revision 780.

Doug Gilbert

-------------- next part --------------
A non-text attachment was scrubbed...
Name: rescan-scsi-bus.sh
Type: application/x-shellscript
Size: 39068 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20180629/8f3e4fc0/attachment.bin>


More information about the dm-devel mailing list