[dm-devel] [PATCH v3 18/19] libmultipath: Don't blank intialized paths

Martin Wilck mwilck at suse.com
Tue Oct 2 22:37:17 UTC 2018


Hi Ben,

On Tue, 2018-10-02 at 00:00 +0200, Martin Wilck wrote:
> On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> > When pathinfo fails for some likely transient reason, it clears the
> > path
> > wwid, but otherwise returns successfully, to keep the path around
> > but
> > not usable until it gets fully initialized. However, if the path
> > has
> > already been initialized, and pathinfo hits a transient error, it
> > shouldn't clear the wwid.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins at redhat.com>
> > ---
> >  libmultipath/discovery.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > index 3e0db7f..33815dc 100644
> > --- a/libmultipath/discovery.c
> > +++ b/libmultipath/discovery.c
> > @@ -1991,9 +1991,9 @@ blank:
> >  	/*
> >  	 * Recoverable error, for example faulty or offline path
> >  	 */
> > -	memset(pp->wwid, 0, WWID_SIZE);
> >  	pp->chkrstate = pp->state = PATH_DOWN;
> > -	pp->initialized = INIT_FAILED;
> > +	if (pp->initialized == INIT_FAILED)
> > +		memset(pp->wwid, 0, WWID_SIZE);
> >  
> >  	return PATHINFO_OK;
> >  }
> 
> I am uncertain about this one. The old code sets pp->initialized to
> INIT_FAILED. If the state had been INIT_MISSING_UDEV or
> INIT_REQUESTED_UDEV before, this patch might change how the code
> behaves later in check_path(), where these conditions are checked.
> 
> Likewise, tests for strlen(pp->wwid) are used in various places
> around
> the code. These tests would now yield different results for paths in
> "recoverable error" state.
> 
> Have you considered these possible side effects?

I've pondered over this a lot. The dust is clearing up a bit.

1. With your patch in place, INIT_FAILED is never set except in
alloc_path() (we might rename it to INIT_NEW or the like, but see
below).

2. I don't understand how you handle repeated failure to retrieve the
WWID. I see that get_uid() (actually, scsi_uid_fallback()) would
retrieve the WWID from sysfs after retriggers are exhausted. But I
don't see how pathinfo(DI_WWID) would ever be called in this situation:

In the last invocation, pathinfo() had failed to retrieve the WWID and
set pp->initialized = INIT_MISSING_UDEV. There it will remain because
check_path() won't set it to INIT_REQUESTED_UDEV any more after retries
are exhausted. And now, check_path() won't call pathinfo(DI_ALL) any
more from the "add missing path" code, because of the (pp->initialized
!= INIT_MISSING_UDEV) condition.

Am I overlooking something?

3. If "blank" state means that important device information couldn't be
retrieved because of presumably transient failure conditions, we should
retry to retrieve this information by calling pathinfo again later. But
unless the WWID is (reset to) the empty string, check_path() won't call
pathinfo(DI_ALL) any more.

4. The "blank" logic in pathinfo() combines several very different
cases.
  a) PATH_REMOVED status from path_offline(). This means that
elementary sysfs attributes were missing. This is almost the same as
failure in sysfs_pathinfo(), which results in PATHINFO_FAILED return
status; but for PATH_REMOVED we return PATHINFO_OK and keep the path
around.
  b) Failure in checker_check(). If the path is offline in the first
place, the checker isn't called, and WWID determination is attempted.
But if the checker returns PATH_UNCHECKED or PATH_WILD, we goto "blank"
state.
  c) Failure in scsi_ioctl_pathinfo() or cciss_ioctl_pathinfo(). Both
functions never fail, so this can't happen. I've patches here to fix
that.  
  d) Failure to open pp->fd. 

d) is the only case in which the "blank" logic makes really sense to
me. It can happen only at the first pathinfo() invocation, meaning 
pp->wwid is still empty, and pp->initialized is INIT_FAILED. Your patch
would change nothing for this case.

a) and b) can happen for paths that have been initialized already. I
think in case a) the WWID should be reset, probably initialized should
be set to INIT_FAILED, and PATHINFO_FAILED should be returned. In case
b) we should IMO proceed normally rather than goto "blank". Resetting
the WWID in case b) is nonsense, agreed.

Altogether, if my analysis is correct, your patch (not blanking the
WWID) should be applied to case b) only.

Please comment - I still feel a bit confused and may have overlooked
something essential.

Regards
Martin

-- 
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)





More information about the dm-devel mailing list