[dm-devel] 答复: 答复: why command of multipath send reinstate message to all dm's paths
Jiaojianbing
jiaojianbing at huawei.com
Mon Jul 2 01:06:00 UTC 2018
> [I've added Hannes, Ben and Douglas to the recepient list to fill in knowledge
> from the past that I may lack].
>
> tl;dr summary: We've got 3 issues:
>
> 1) Why does multipath, in reinstate_paths(), try to reinstate paths which are
> known to be down?
> 2) rescan-scsi-bus.sh can call "multipath" even if "-m" switch is not used (that
> looks like a bug to me).
> 3) In Jiaojianbing's environment, dead paths that have been removed on the
> target and were already marked "offline" may appear as "running"
> after rescan-scsi-bus.sh invocation.
>
> Furthermore,
> 4) perhaps rescan-scsi-bus.sh should replace suboptimal "multipath"
> calls with multipathd cli commands (or better even, we multipath-tools people
> should eventually finish the "delegate to multipathd" work).
It's my negligence, command "multipath" is not the one in rescan-scsi-bus.sh, but another one called every five minutes by process "test.sh".
It means there are two processes, one is rescan-scsi-bus.sh, another is test.sh which call multipath every five minutes.
In the scene, rescan-scsi-bus.sh will consume more larger time than the scene without calling "test.sh". The reason is that all "systemd-udevd" process
are in D state who send io to device mapper device, such as dm-105.
so it can be 2 issues:
1) Why does multipath, in reinstate_paths(), try to reinstate paths which are known to be down?
2) when run script "rescan-scsi-bus.sh", another process call command "multipath" may make mistake.
> On Thu, 2018-06-28 at 06:35 +0000, Jiaojianbing wrote:
> > > > Dear Christophe,
> > > > when dm-105 is in one state of below, paths of dm-105 will change
> > > > to active if we run command of multipath.
> > >
> > > Could you be more specific please? What multipath command did you
> > > run?
> > > Which version of multipath-tools are you running?
> >
> > command is "multipath", which can run in shell as below:
> > #multipath
>
> ... and if I understand correctly, originally the problem occured while running
> rescan_scsi_bus.sh. Please also state the version of sg3_utils you are using.
And the version of sg3_utils: sg3_utils-libs-1.37-14.x86_64;
According to above description, the problem may be made by adding another process calling command "multipath" in period.
> >
> > And the version: multipath-tools v0.4.9 (05/33, 2016)
>
> Well, that's ancient. But latest multipath-tools still has the same code.
>
> >
> > > > I check code of multipath, it sends messge "reinstate_path
> > > > pathname"
> > > > to kernel in routine reinstate_paths when status of pathgroup =
> > > > "PGSTATE_ENABLED/PGSTATE_UNDEF" and path's state =
> > > > "PSTATE_FAILED".
> > > > why command of multipath do above action to all dm devices?
> > > > actually,
> > > > parts of these paths are already offline or failed which can't be
> > > > recovered. Maybe we can check these devices's status by sending io
> > > > to these sd device at first. according to return of io, multipath
> > > > send reinstate to running devices and do nothing to failed
> > > > devices?
> > >
> > > I see this code in reinstate_paths():
> > >
> > > vector_foreach_slot (pgp->paths, pp, j) {
> > > if (pp->state != PATH_UP &&
> > > (pgp->status == PGSTATE_DISABLED ||
> > > pgp->status == PGSTATE_ACTIVE))
> > > continue;
> > >
> > > if (pp->dmstate == PSTATE_FAILED) {
> > > if (dm_reinstate_path(mpp->alias, pp-
> > > >dev_t))
> > > condlog(0, "%s: error
> > > reinstating",
> > > pp->dev);
> > > }
> > > }
> > >
> > > The reinstate command is only sent for paths which are either in
> > > PATH_UP state, or belong to an PGSTATE_ENABLED path group. I admit
> > > I'm unsure why all we try to reinstate paths that we know are down.
> > > This is 13- year-old code.
> > >
> > > Interstingly, the state of your paths changes from "faulty offline"
> > > to "ready
> > > running". So it appears that these paths are actually _not_ down
> > > Just the reinstate seems has failed on them.
> > >
> > > multipathd -v3 logs and possibly kernel logs would be helpful to
> > > understand what was going on in that situation.
> >
> > Sorry, maybe my two multipath status sample confused you. They are
> > just sample. Actually, I run command "rescan-scsi-bus" to clear all
> > mapped scsi devices by iscsid in host when all of LUNS in remote IPSAN
> > are removed.
> > In process of running rescan-scsi-bus, if command "multipath" is
> > running, the status of dm's path will change from failed to active in
> > some moment as below. If IO is sent to dm-105, the process who sends
> > io will be in D state.
> > # multipath -ll
> > 36d0d04b100b8cba665a187f0000000f9 dm-105 HUAWEI ,XSG1 size=1.0G
> > features='1 queue_if_no_path' hwhandler='0' wp=rw
> > `-+- policy='service-time 0' prio=1 status=active
> > `- 18:0:0:101 sdku 67:288 active faulty running
>
> The strange part here is that the device is considered "running". This is the
> state of the kernel device. If the LUNs are actually _removed_ as you say, the
> device should be gone, or at least marked "offline".
>
> Apparently the SCSI bus SCAN via iSCSI still showed the LUN in a workable state.
> For multipath this translates to PATH_UP. Thus even if the above code didn't
> have the (pgp->status == PGSTATE_DISABLED || pgp-
> >status == PGSTATE_ACTIVE) clause, the reinstate would have been
> attempted by multipath. This looks like a low-level problem in your SCSI or iSCSI
> layer to me.
>
> This looks like the actual problem to me. multipath aside, if the path appears to
> be "running", any Linux process could try to send IO down to it and be stuck, as
> you say.
> >
> > I want to know whether command "multipath" is reasonable in
> > reinstate_paths().
>
>
> > > And maybe we should not call "multipath" in process of running
> > rescan-scsi-bus ?
>
> Normally rescan-scsi-bus.sh should call "multipath" only if the "-m|--multipath"
> switch was used. I quickly scanned through the code and didn't find a call to
> "multipath" (with no options) which wasn't guarded by the [ -n "$mp_enable" ]
> condition. (FTR: there is a call to "multipath -f" from main->flushmpaths if
> "-f|--flush" is set).
>
> Again, please double-check your version of sg3_utils, and perhaps run "bash -x
> rescan-scsi-bus.sh" to figure out the call chain which runs the "multipath"
> command.
>
> Thanks,
> Martin
>
> >
> > > Regards
> > > Martin
> > >
> > > --
> > > Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107 SUSE
> > > Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB
> > > 21284 (AG
> > > Nürnberg)
> >
> >
>
> --
> Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux
> GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG
> Nürnberg)
More information about the dm-devel
mailing list