[dm-devel] 答复: 答复: why command of multipath send reinstate message to all dm's paths

Jiaojianbing jiaojianbing at huawei.com
Mon Jul 2 01:06:00 UTC 2018


> [I've added Hannes, Ben and Douglas to the recepient list to fill in knowledge
> from the past that I may lack].
> 
> tl;dr summary: We've got 3 issues:
> 
>  1) Why does multipath, in reinstate_paths(), try to reinstate paths which are
> known to be down?
>  2) rescan-scsi-bus.sh can call "multipath" even if "-m" switch is not used (that
> looks like a bug to me).
>  3) In Jiaojianbing's environment, dead paths that have been removed on the
> target and were already marked "offline" may appear as "running"
> after rescan-scsi-bus.sh invocation.
> 
> Furthermore,
>  4) perhaps rescan-scsi-bus.sh should replace suboptimal "multipath"
> calls with multipathd cli commands (or better even, we multipath-tools people
> should eventually finish the "delegate to multipathd" work).

It's my negligence, command "multipath" is not the one in rescan-scsi-bus.sh, but another one called every five minutes by process "test.sh". 
It means there are two processes, one is rescan-scsi-bus.sh, another is test.sh which call multipath every five minutes.

In the scene, rescan-scsi-bus.sh will consume more larger time than the scene without calling "test.sh". The reason is that all "systemd-udevd" process 
are in D state who send io to device mapper device, such as dm-105. 

so it can be 2 issues:
 1) Why does multipath, in reinstate_paths(), try to reinstate paths which are known to be down?
 2) when run script "rescan-scsi-bus.sh", another process call command "multipath" may make mistake.

> On Thu, 2018-06-28 at 06:35 +0000, Jiaojianbing wrote:
> > > > Dear Christophe,
> > > > when dm-105 is in one state of below, paths of dm-105 will change
> > > > to active if we run command of multipath.
> > >
> > > Could you be more specific please? What multipath command did you
> > > run?
> > > Which version of multipath-tools are you running?
> >
> > command is "multipath", which can run in shell as below:
> > #multipath
> 
> ... and if I understand correctly, originally the problem occured while running
> rescan_scsi_bus.sh. Please also state the version of sg3_utils you are using.

And the version of sg3_utils: sg3_utils-libs-1.37-14.x86_64;
According to above description, the problem may be made by adding another process calling command "multipath" in period. 

> >
> > And the version:  multipath-tools v0.4.9 (05/33, 2016)
> 
> Well, that's ancient. But latest multipath-tools still has the same code.
> 
> >
> > > >  I check code of multipath, it sends messge "reinstate_path
> > > > pathname"
> > > > to kernel in routine reinstate_paths when status of pathgroup =
> > > > "PGSTATE_ENABLED/PGSTATE_UNDEF" and path's state =
> > > > "PSTATE_FAILED".
> > > > why command of multipath do above action to all dm devices?
> > > > actually,
> > > > parts of these paths are already offline or failed which can't be
> > > > recovered. Maybe we can check these devices's status by sending io
> > > > to these sd device at first. according to return of io, multipath
> > > > send reinstate to running devices and do nothing to failed
> > > > devices?
> > >
> > > I see this code in reinstate_paths():
> > >
> > > 		vector_foreach_slot (pgp->paths, pp, j) {
> > > 			if (pp->state != PATH_UP &&
> > > 			    (pgp->status == PGSTATE_DISABLED ||
> > > 			     pgp->status == PGSTATE_ACTIVE))
> > > 				continue;
> > >
> > > 			if (pp->dmstate == PSTATE_FAILED) {
> > > 				if (dm_reinstate_path(mpp->alias, pp-
> > > >dev_t))
> > > 					condlog(0, "%s: error
> > > reinstating",
> > > 						pp->dev);
> > > 			}
> > > 		}
> > >
> > > The reinstate command is only sent for paths which are either in
> > > PATH_UP state, or belong to an PGSTATE_ENABLED path group. I admit
> > > I'm unsure why all we try to reinstate paths that we know are down.
> > > This is 13- year-old code.
> > >
> > > Interstingly, the state of your paths changes from "faulty offline"
> > > to "ready
> > > running". So it appears that these paths are actually _not_ down
> > > Just the reinstate seems has failed on them.
> > >
> > > multipathd -v3 logs and possibly kernel logs would be helpful to
> > > understand what was going on in that situation.
> >
> >     Sorry, maybe my two multipath status sample confused you. They are
> > just sample. Actually, I run command "rescan-scsi-bus" to clear all
> > mapped scsi devices by iscsid in host when all of LUNS in remote IPSAN
> > are removed.
> > In process of running rescan-scsi-bus, if command "multipath" is
> > running, the status of dm's path will change from failed to active in
> > some moment as below. If IO is sent to dm-105, the process who sends
> > io will be in D state.
> > # multipath -ll
> > 36d0d04b100b8cba665a187f0000000f9 dm-105 HUAWEI  ,XSG1 size=1.0G
> > features='1 queue_if_no_path' hwhandler='0' wp=rw
> > `-+- policy='service-time 0' prio=1 status=active
> >   `- 18:0:0:101 sdku 67:288  active faulty running
> 
> The strange part here is that the device is considered "running". This is the
> state of the kernel device. If the LUNs are actually _removed_ as you say, the
> device should be gone, or at least marked "offline".
> 
> Apparently the SCSI bus SCAN via iSCSI still showed the LUN in a workable state.
> For multipath this translates to PATH_UP. Thus even if the above code didn't
> have the (pgp->status == PGSTATE_DISABLED || pgp-
> >status == PGSTATE_ACTIVE) clause, the reinstate would have been
> attempted by multipath. This looks like a low-level problem in your SCSI or iSCSI
> layer to me.
> 
> This looks like the actual problem to me. multipath aside, if the path appears to
> be "running", any Linux process could try to send IO down to it and be stuck, as
> you say.


> >
> >   I want to know whether command "multipath" is reasonable in
> > reinstate_paths().
> 
> 
> > > And maybe we should not call "multipath" in process of running
> > rescan-scsi-bus ?
> 
> Normally rescan-scsi-bus.sh should call "multipath" only if the "-m|--multipath"
> switch was used. I quickly scanned through the code and didn't find a call to
> "multipath" (with no options) which wasn't guarded by the [ -n "$mp_enable" ]
> condition. (FTR: there is a call to "multipath -f" from main->flushmpaths if
> "-f|--flush" is set).
> 
> Again, please double-check your version of sg3_utils, and perhaps run "bash -x
> rescan-scsi-bus.sh" to figure out the call chain which runs the "multipath"
> command.
> 
> Thanks,
> Martin
> 
> >
> > > Regards
> > > Martin
> > >
> > > --
> > > Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107 SUSE
> > > Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB
> > > 21284 (AG
> > > Nürnberg)
> >
> >
> 
> --
> Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux
> GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG
> Nürnberg)





More information about the dm-devel mailing list