[dm-devel] [PATCH v2 17/20] multipath.rules: find_multipaths "smart" logic

Martin Wilck mwilck at suse.com
Tue Mar 27 21:34:00 UTC 2018


On Tue, 2018-03-27 at 16:03 -0500, Benjamin Marzinski wrote:
> On Mon, Mar 19, 2018 at 04:01:52PM +0100, Martin Wilck wrote:
> > When the first path to a device appears, we don't know if more
> > paths are going
> > to follow. find_multipath "smart" logic attempts to solve this
> > dilemma by
> > waiting for additional paths for a configurable time before giving
> > up
> > and releasing single paths to upper layers.
> > 
> > These rules apply only if both find_multipaths is set to "smart" in
> > multipath.conf. In this mode, multipath -u sets
> > DM_MULTIPATH_DEVICE_PATH=2 if
> > there's no clear evidence wheteher a given device should be a
> > multipath member
> > (not blacklisted, not listed as "failed", not in WWIDs file, not
> > member of an
> > exisiting map, only one path seen yet).
> > 
> > In this case, pretend that the path is multipath member, disallow
> > further
> > processing by systemd (allowing multipathd some time to grab the
> > path),
> > and check again after some time. If the path is still not
> > multipathed by then,
> > pass it on to systemd for further processing.
> > 
> > The timeout is controlled by the "find_multipaths_timeout" config
> > option.
> > Note that delays caused by waiting don't "add up" during boot,
> > because the
> > timers run concurrently.
> > 
> > Implementation note: This logic requires obtaining the current
> > time. It's not
> > trivial to do this in udev rules in a portable way, because
> > "/bin/date" is
> > often not available in restricted environments such as the initrd.
> > I chose
> > the sysfs method, because /sys/class/rtc/rtc0 seems to be quite
> > universally
> > available. I'm open for better suggestions if there are any.
> 
> I have a couple of code issues, that I'll point out below, but I have
> an
> overall question.  If multipath exists in the initramfs, and a device
> is
> not claimed there, then after the pivot, multipath will not
> temporarily
> claim it, correct? 

Incorrect, it will do the temporary claim.

>  I'm pretty sure, but not totally certain, that udev
> database persists between the udev running in the initramfs and the
> regular system.

That's only true for devices that set OPTIONS+="db_persist", and dracut
sets this only for dm and md devices. For other devices,
/usr/lib/systemd/system/initrd-udevadm-cleanup-db.service cleans up the
udev data base, and devices are seen as "new" during coldplug. So, if
there's still only one path and no other information (e.g. wwids file)
after pivot, we'll wait.

>  On the other hand, if multipth isn't in the initramfs
> but it is in the regular system, then AFAICS, once the system pivots
> to
> the regular fs, there is nothing to warn multipath that these devices
> could already be in use, correct? 

Correct.

>  So, even if you don't need to
> multipath any devices in your initramfs, you will need multipath in
> your
> initramfs, or it could go setting devices to not ready. right?

The following happens: multipath -u temporarily claims the device. When
multipathd starts, it fails to set up the map, sets the "failed"
marker, and retriggers udev. The second time, multipath -u unclaims the
device because it recognizes it as failed.

I admit I haven't tested the default Red Hat setup with a very
restrictive multipath.conf in the initrd. But I'm pretty certain that
in that case, the same thing happens.
I'd be grateful if you could give it a try :-)

> 
> > 
> > Signed-off-by: Martin Wilck <mwilck at suse.com>
> > ---
> >  multipath/multipath.rules | 80
> > +++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 78 insertions(+), 2 deletions(-)
> > 
> > diff --git a/multipath/multipath.rules b/multipath/multipath.rules
> > index aab64dc7182c..32d33991db3d 100644
> > --- a/multipath/multipath.rules
> > +++ b/multipath/multipath.rules
> > @@ -21,7 +21,83 @@ TEST!="$env{MPATH_SBIN_PATH}/multipath",
> > ENV{MPATH_SBIN_PATH}="/usr/sbin"
> >  
> >  # multipath -u sets DM_MULTIPATH_DEVICE_PATH
> >  ENV{DM_MULTIPATH_DEVICE_PATH}!="1",
> > IMPORT{program}="$env{MPATH_SBIN_PATH}/multipath -u %k"
> > -ENV{DM_MULTIPATH_DEVICE_PATH}=="1",
> > ENV{ID_FS_TYPE}="mpath_member", \
> > -	ENV{SYSTEMD_READY}="0"
> > +
> > +# case 1: this is definitely multipath
> > +ENV{DM_MULTIPATH_DEVICE_PATH}=="1", \
> > +	ENV{ID_FS_TYPE}="mpath_member", ENV{SYSTEMD_READY}="0", \
> > +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
> > +	GOTO="end_mpath"
> > +
> > +# case 2: this is definitely not multipath
> > +ENV{DM_MULTIPATH_DEVICE_PATH}!="2", \
> > +	ENV{FIND_MULTIPATHS_WAIT_UNTIL}="finished", \
> > +	GOTO="end_mpath"
> > +
> > +# All code below here is only run in "smart" mode.
> > +
> > +# FIND_MULTIPATHS_WAIT_UNTIL is the timeout (in seconds after the
> > +# epoch). If waiting ends for any reason, it is set to "finished".
> > +IMPORT{db}="FIND_MULTIPATHS_WAIT_UNTIL"
> > +
> > +# At this point we know DM_MULTIPATH_DEVICE_PATH==2.
> > +# (multipath -u indicates this is "maybe" multipath)
> > +
> > +# case 3: waiting has already finished. Treat as non-multipath.
> > +ENV{FIND_MULTIPATHS_WAIT_UNTIL}=="finished", \
> > +	ENV{DM_MULTIPATH_DEVICE_PATH}="", GOTO="end_mpath"
> > +
> > +# The timeout should have been set by the multipath -u call above,
> > set a default
> > +# value it that didn't happen for whatever reason
> > +ENV{FIND_MULTIPATHS_PATH_TMO}!="?*",
> > ENV{FIND_MULTIPATHS_PATH_TMO}="5"
> > +
> 
> This code adds three more callouts.  I know that the udev people
> dislike
> these, and they do eat up time that can cause udev to timeout on busy
> systems.  To avoid the overhead of these execs, as well as to make
> the
> rules simpler, what do you thing about moving the 
> 
> IMPORT{db}="FIND_MULTIPATHS_WAIT_UNTIL"
> 
> line before the "multipath -u" call, and passing that as a parameter
> if
> present.  Then multipath could check the current time and compare it.
> It could also return an updated FIND_MULTIPATHS_WAIT_UNTIL as a udev
> environment variable, instead of returning FIND_MULTIPATHS_PATH_TMO,
> and
> forcing udev to calculate the new timeout. That would remove the need
> for the other PROGRAM calls.

That's a nice idea. Why didn't I have it?

Martin

-- 
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)




More information about the dm-devel mailing list