[dm-devel] [PATCH v2 23/24] domap(): never return DOMAP_RETRY in daemon mode

Martin Wilck mwilck at suse.com
Sun Dec 9 21:06:05 UTC 2018


On Mon, 2018-12-03 at 17:45 -0600,  Benjamin Marzinski  wrote:
> On Mon, Dec 03, 2018 at 08:36:23PM +0100, Martin Wilck wrote:
> > DOMAP_RETRY is only used by the multipath tool. multipathd always
> > treats
> > it exactly like DOMAP_FAIL. Simplify logic by only returning
> > DOMAP_RETRY in non-daemon mode from domap().
> > 
> > Signed-off-by: Martin Wilck <mwilck at suse.com>
> > ---
> >  libmultipath/configure.c | 28 +++++++++++++---------------
> >  multipathd/main.c        |  9 +--------
> >  2 files changed, 14 insertions(+), 23 deletions(-)
> > 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index d0e7107c..1bf3c481 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -998,15 +998,8 @@ rescan:
> >  	/*
> >  	 * reload the map for the multipath mapped device
> >  	 */
> > -retry:
> >  	ret = domap(mpp, params, 1);
> > -	if (ret == DOMAP_FAIL || ret == DOMAP_RETRY) {
> > -		if (ret < 0 && retries-- > 0) {
> > -			condlog(0, "%s: retry domap for addition of new
> > "
> > -				"path %s", mpp->alias, pp->dev);
> > -			sleep(1);
> > -			goto retry;
> > -		}
> > +	if (ret == DOMAP_FAIL) {
> 
> I'm kind of conflicted about this change.

Right, I shot over the top here. I was irritated by the fact that 
"ret < 0" translates to "ret == DOMAP_RETRY" and that therefore
DOMAP_RETRY is tested twice.

>  According to the commit
> message (commit 1c50cd32), Hannes put this code in to deal with lock
> contention on the paths (back when we used an exclusive lock in
> lock_multipath(), and actually saw contention). I don't know of any
> purpose for lock_multipath(), so removing this code probably won't
> hurt
> anything. But if we believe that we need lock_multipath(), then we
> should probably retry here (unless you have some understanding of
> lock_multipath()'s purpose where retrying won't help).
>
> [...]
> Probably the most important question I have is "Does anyone know why
> we > Probably the most important question I have is "Does anyone know
> why we lock the paths?

TL;DR: I'm 99.7% sure we don't need lock_multipath() any more.

The historic reason is 4d7a270:

    'Multiple multipath(8) execs can race with udev storm.
    
    We can simulate this with the following :
    "multipath -F; /sbin/multipath 8:16 & /sbin/multipath 8:32"
    
    Problem arise when two runs are about to create the same map.
    One will fail, leaving us with a choice : abord or retry.'

This commit was made at a time (October 2005) when multipath was called
directly from udev rules to set up maps. Earlier versions of multipath
had a general locking that would not allow multiple multipath commands
to run in parallel, but that has been removed later. This was an
attempt to lock only (would-be) members of one specific map.

Obviously, the goal of this patch wouldn't be achieved any more since
the lock has been change to non-exclusive (1c50cd32). Multiple
multipath instances run happily on members of the same set now. I
haven't tested it, but I believe the historic race "/sbin/multipath
8:16 & /sbin/multipath 8:32" still exists; just we don't run multipath
this way from udev rules any more. 

lock_multipath() doesn't help us void this race, as we can't go back to
exclusive locking. If we want to avoid it, we could create a lock file
from the WWID before calling domap(), /dev/shm/multipath/$WWID.lock or
so. Or we could use a SYSV semaphore set.

> I believe that the code originally existed to keep
> multipath and multipathd from trying to load a device at the same
> time.
> It ended up causing problems with udev, because that tries to grab a
> shared lock on the path while working on it.  When multipath switched
> to
> using a shared lock (commit 5ec07b3a), I agreed based on the rational
> that udev must be protecting against something, and we could need
> protecting against the same thing. The discussion is at:
> 
> https://www.redhat.com/archives/dm-devel/2016-May/msg00009.html
> 

The message of udev commit 3ebdb81, which introduced "device ownership"
in udev, says just "udev: serialize/synchronize block device event
handling with file locks". A comment says "this establishes a concept
of device "ownership" to serialize device access. External processes
holding a "write lock" will cause udev to skip the event handling; in
the case udev acquired the lock, the external process will block until
udev has finished its event handling".

Because multipathd is an "external process" from udev's point of view,
this would mean that our old way of using a write lock was correct (at
least, it was what the udev author intended). But udev gets this really
wrong, it should itself block or retry if the device is locked, rather
than aborting the event processing.

Note that both dm and md devices are excluded from udev's locking
anyway, because the udev people encountered problems with this scheme.

Martin


-- 
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)





More information about the dm-devel mailing list