[dm-devel] 答复: why command of multipath send reinstate message to all dm's paths

Martin Wilck mwilck at suse.com
Thu Jun 28 07:38:49 UTC 2018


[I've added Hannes, Ben and Douglas to the recepient list to fill in
knowledge from the past that I may lack].

tl;dr summary: We've got 3 issues:

 1) Why does multipath, in reinstate_paths(), try to reinstate paths
which are known to be down?
 2) rescan-scsi-bus.sh can call "multipath" even if "-m" switch is not
used (that looks like a bug to me).
 3) In Jiaojianbing's environment, dead paths that have been removed on
the target and were already marked "offline" may appear as "running"
after rescan-scsi-bus.sh invocation.

Furthermore,
 4) perhaps rescan-scsi-bus.sh should replace suboptimal "multipath"
calls with multipathd cli commands (or better even, we multipath-tools
people should eventually finish the "delegate to multipathd" work).


On Thu, 2018-06-28 at 06:35 +0000, Jiaojianbing wrote:
> > > Dear Christophe,
> > > when dm-105 is in one state of below, paths of dm-105 will change
> > > to
> > > active if we run command of multipath.
> > 
> > Could you be more specific please? What multipath command did you
> > run?
> > Which version of multipath-tools are you running?
> 
> command is "multipath", which can run in shell as below:
> #multipath

... and if I understand correctly, originally the problem occured while
running rescan_scsi_bus.sh. Please also state the version of sg3_utils
you are using.

> 
> And the version:  multipath-tools v0.4.9 (05/33, 2016)

Well, that's ancient. But latest multipath-tools still has the same
code.

> 
> > >  I check code of multipath, it sends messge "reinstate_path
> > > pathname"
> > > to kernel in routine reinstate_paths when status of pathgroup =
> > > "PGSTATE_ENABLED/PGSTATE_UNDEF" and path's state =
> > > "PSTATE_FAILED".
> > > why command of multipath do above action to all dm devices?
> > > actually,
> > > parts of these paths are already offline or failed which can't be
> > > recovered. Maybe we can check these devices's status by sending
> > > io to
> > > these sd device at first. according to return of io, multipath
> > > send
> > > reinstate to running devices and do nothing to failed devices?
> > 
> > I see this code in reinstate_paths():
> > 
> > 		vector_foreach_slot (pgp->paths, pp, j) {
> > 			if (pp->state != PATH_UP &&
> > 			    (pgp->status == PGSTATE_DISABLED ||
> > 			     pgp->status == PGSTATE_ACTIVE))
> > 				continue;
> > 
> > 			if (pp->dmstate == PSTATE_FAILED) {
> > 				if (dm_reinstate_path(mpp->alias, pp-
> > >dev_t))
> > 					condlog(0, "%s: error
> > reinstating",
> > 						pp->dev);
> > 			}
> > 		}
> > 
> > The reinstate command is only sent for paths which are either in
> > PATH_UP
> > state, or belong to an PGSTATE_ENABLED path group. I admit I'm
> > unsure why
> > all we try to reinstate paths that we know are down. This is 13-
> > year-old code.
> > 
> > Interstingly, the state of your paths changes from "faulty offline"
> > to "ready
> > running". So it appears that these paths are actually _not_
> > down  Just the
> > reinstate seems has failed on them.
> > 
> > multipathd -v3 logs and possibly kernel logs would be helpful to
> > understand
> > what was going on in that situation.
> 
>     Sorry, maybe my two multipath status sample confused you. They
> are just sample. Actually, I run command "rescan-scsi-bus" to
> clear all mapped scsi devices by iscsid in host when all of LUNS in
> remote IPSAN are removed. 
> In process of running rescan-scsi-bus, if command "multipath" is
> running, the status of dm's path will change from 
> failed to active in some moment as below. If IO is sent to dm-105,
> the process who sends io will be in D state.
> # multipath -ll
> 36d0d04b100b8cba665a187f0000000f9 dm-105 HUAWEI  ,XSG1
> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='service-time 0' prio=1 status=active
>   `- 18:0:0:101 sdku 67:288  active faulty running

The strange part here is that the device is considered "running". This
is the state of the kernel device. If the LUNs are actually _removed_
as you say, the device should be gone, or at least marked "offline". 

Apparently the SCSI bus SCAN via iSCSI still showed the LUN in a
workable state. For multipath this translates to PATH_UP. Thus even if
the above code didn't have the (pgp->status == PGSTATE_DISABLED || pgp-
>status == PGSTATE_ACTIVE) clause, the reinstate would have been
attempted by multipath. This looks like a low-level problem in your
SCSI or iSCSI layer to me.

This looks like the actual problem to me. multipath aside, if the path
appears to be "running", any Linux process could try to send IO down to
it and be stuck, as you say.

> 
>   I want to know whether command "multipath" is reasonable in
> reinstate_paths().


> > And maybe we should not call "multipath" in process of running
> rescan-scsi-bus ?

Normally rescan-scsi-bus.sh should call "multipath" only if the 
"-m|--multipath" switch was used. I quickly scanned through the code
and didn't find a call to "multipath" (with no options) which wasn't
guarded by the [ -n "$mp_enable" ] condition. (FTR: there is a call to
"multipath -f" from main->flushmpaths if "-f|--flush" is set).

Again, please double-check your version of sg3_utils, and perhaps run
"bash -x rescan-scsi-bus.sh" to figure out the call chain which runs
the "multipath" command.

Thanks,
Martin

> 
> > Regards
> > Martin
> > 
> > --
> > Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107 SUSE
> > Linux
> > GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284
> > (AG
> > Nürnberg)
> 
> 

-- 
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)




More information about the dm-devel mailing list