[dm-devel] [RFC PATCH 14/16] multipath.rules: find_multipaths+ignore_wwids logic

Fri Jan 26 17:29:04 UTC 2018

On Thu, 2018-01-25 at 07:40 -0600, Benjamin Marzinski wrote:
> On Mon, Jan 22, 2018 at 10:56:19PM +0100, Martin Wilck wrote:

> > I'd like to *simplify the configuration*, and exclude
> > configurations
> > that make no sense. Before my commits 64e27e and ffbb88 last year,
> > there were 3 settings related to path detection: find_multipaths,
> > -i
> > for multipath (ignore_wwids), and -n for multipathd
> > (ignore_new_devs).
> > This adds up to 8 combinations, which I denote "fin", "FiN", etc.
> > in
> > the following, using upper case for "on" and lower case for "off".
> > 
> > The SUSE default setup is "fIn", and the Red Hat / Ubuntu one is
> > "Fin".
> > In initramfs, Red hat is effectively using "FiN" ("multipathd -n"
> > isn't
> > used, but strict blacklisting is used to the same effect).
> > 
> > My patch es 64e27e and ffbb88 forced F=>N and F=>i, thus "FiN"
> > became
> > the only combination with find_multipaths, leaving 5 valid
> > combinations. My recent RFC series allows only "xiN" and "xIn"
> > combinations for consistency reasons. But I can see this doesn't
> > fit
> > the way Red Hat and others are setting up multipath, thus we need
> > something different.
> > 
> > I wonder if we can agree that the combinations "fIN", "FIN", and
> > "fin"
> > are useless. "IN" combinations are really dangerous and can lead to
> > the
> > fatal outcomes 4A.2, 4B.2, 4C.2 from your analysis; they shouldn't
> > be
> > allowed. "fin" is similar to "Fin" at first sight, but without the
> > protection of "find_multipaths", it becomes much more likely that a
> > device that multipath hasn't claimed is claimed by multipathd
> > later, I
> > think we should disallow it as well, although it's the current
> > upstream
> > default. Moreover, "fiN" and "FiN" are equivalent: if new devices
> > are
> > completely ignored, "find_multipaths yes" has no effect.
> 
> I'd be o.k with removing "fin" from upstream in favor of "fIn". My
> analysis ingnores class 3 devices, on the assumption that "fIn" is
> correct. If we do this, I will change "fIn" to "fin" in redhat's
> local
> patches. Here's why. Outcome 2 is really bad. Imagine a
> non-find-mutipaths equivalent of 4C Outcome 2 (3C Outcome 2?). The
> "I"
> makes sure that multipath claims the device when it first appears,
> but
> for some reason multipathd simply can't create a device on it.  This
> can
> happen if multipath is not running in your initramfs, and something
> else
> sets itself up on a path device.  When you switch-root, multipath
> will
> claim the device, and mess everything up. But basically, this can
> happen
> any time that multipathd fails to be able to set up on a device it
> has
> claimed.  One way you can deal with most of these possibilites is to
> require that multipathd has to set itself up on path at least once
> before we claim it. That's exactly what "i" gives us.

I agree that this makes perfect sense in the "Fin" case, but for "fin",
if find_multipaths is off, it seems odd. "fXn" means that multipathd
claims everything it comes by, which pairs better with "I" than "i".

> Possibly another option is to check if something else is using the
> device
> and to not claim the device in this case. That will solve the
> initramfs
> case, with is the really bad one.

I agree it's bad, but AFAICS "i" only eliminates a part of the problem.

> It will still leave the case where
> multipath will never be able to create the device for some other
> reason,
> but that is almost always a configuration issues, where for instance
> multipath should be blacklisting a whole class of devices that it
> isn't.

Hannes and I have recently discussed an idea how to deal with this
situation. It's quite high on my todo list. The basic idea is to that
if multipathd fails to set up a device, it leaves a flag somewhere and
retrigger an uevent. When multipath sees the flag, it doesn't claim the
device next time.

> 
> If we agree on that, I'd like to propose a new configuration scheme.
> As
> > in my RFC series, I'd like to replace the command line options with
> > config file options (**). For backward compatibility reasons, I
> > propose
> > to use the "find_multipaths" option, but with 4 rather than 2
> > possible
> > values:
> > 
> >  - find_multipaths "no": fIn, current SUSE default
> >  - find_multipaths "yes": Fin, current Red Hat / Ubuntu default
> >  - find_multipaths "strict": fiN/FiN, use only known WWIDs 
> >  - find_multipaths "auto": FIn, try to be smart; this is what we've
> > been discussing.
> 
> I would still like to put forward the code for my idea, but if we
> don't
> find agreement on anything else, I would definitely accept this as
> the upstream version (with the caveat that I really don't like ending
> up
> in Outcome 2, and will make "no" be "fin" on RedHat if I can't
> convince
> you to do that upstream).

If you insist, we'll just let "fin" survive as, say, 'find_multipaths
"conservative"'. Or we call "fIn" "greedy" and "fin" "no".

> > A common case is that users install without multipath, and convert
> > the
> > system to using multipath later. That means dracut is run in a non-
> > multipathed system, where the wwids file doesn't contain the
> > entries
> > for the root FS yet. That's a case which may lead to a fatal
> > variant of
> > 4C.3 later on. 
> 
> How? This outcome only happens in a "Fin" or "FiN" setup. You never
> claim the device, because you never multipath the device. In this
> sitution, multipath never changes the path device at all. If dracut
> is
> run without multipath running, it will create an initramfs where
> multipathd won't grab the devices (which I agree is what Outcome 1 is
> all about).  This will mean that the other users grab the devices,
> which
> means that Outcome 3 is pretty impossible, because the device is
> already
> in use by something else. 

We had a case where the customer ran "dracut --add multipath", but
forgot to run "systemctl enable multipath.service". Because we use
"fIn", multipathd grabbed the device in the initrd, and afterwards the
root device's subvolumes couldn't be mounted, because systemd would try
to mount the members, which were locked by the multipath device.

>  The only thing that can grab a path device
> device and have multipath grab it later is LVM on a whole device.
> This
> is the specific case that reassign_maps is designed to handle. Even
> without it, if multipathd created a device on top of (and later
> claimed)
> the same paths that a LVM device is using, it would set the path
> devices
> to not ready, not the LVM device.
> 
> > Along similar lines, it's essential for the Red Hat "multipath-
> > hostonly" approach that indeed no service in the initrd grabs
> > devices
> > which might be multipathed later. If that happens, a fatal form of
> > 4C.3
> > can occur. We see this often with BTRFS + subvolumes.
> 
> Again, I don't understand how the case here works.  If something in
> the
> initrd grabs the device, that will keep multipathd from assembling on
> it.

Exactly. But if then after switch root, multipath claims the device,
and multipathd fails to grab it, we're hosed. The root device is
accessible but other file systems on the same device (e.g. subvolumes)
can't be mounted. Actually this rather 4C.2, sorry for mixing it up.

>  If LVM is already assembled, it shouldn't be hard to make multipath
> notice this and not assemble even if LVM is on the whole device.

Maybe not hard, but currently unimplemented :-/

>  As an
> aside, I am personally very wary about reassign_maps. Multipath
> doesn't
> own the other devices it is reloading. There is nothing to guarantee
> that someone else isn't trying to modify the LVM device at the same
> time. I don't know of a specific bug with this (we never turn it on),
> but it seems very risky to start changing devices we don't own with
> no
> coordination.

I've no experience with reassign_maps. It's tempting to think that it
could solve all these nasty issues, but my gut feeling, like yours,
says that it might be dangerous. We don't turn it on, either.

> If I had to make a guess, I can definitely see how you could get into
> a
> problem with the SUSE policy of "fIn".  In this case, multipathd
> doesn't
> claim or grab the device in the initrd, so something else does. Then
> after the switch-root, multipath will claim the device and multipathd
> won't be able to assemble on it.  This is the dreaded Outcome 2, and
> this is the reason I never use "I", even when find_multipaths is not
> set.

Ah, right the situation that I described above. I think we're mostly on
the same boat here now.

> [...]
> > 
> > Finally, as you said yourself, multipathd is likely to "loose the
> > race"
> > anyway. With your patch you just make its chance even smaller. In a
> > way, d7188fc "multipathd: start daemon after udev trigger" already
> > implements your idea, because by the time multipathd starts,
> > essential
> > device detection will be finished (with the exception of extremely
> > slow
> > device detection where the udev queue runs empty).
> 
> I don't worry about 4C.3 happening in our current RedHat setup. There
> isn't a hard barrier that is keeping this from happening, but the
> timing
> makes it very unlikely.  If we assume that it won't happen, then
> RedHat's current implementation guarantees 4A.1, 4B.3, and 4C.1.  I'm
> fine with those guarantees.

I agree your approach can't hurt, although I still think it will just
make a very unlikely outcome even more unlikely (the 4C.3 case
described above is obviously very special, and your patch wouldn't
avoid it). I hope that our mutual ideas can go together.

> 
>   Problems like you mention above, which can
> cause 4C.2 if you use "I", even in the non-find-multipaths case, make
> me
> leary about using "I" in any setup. But I'm willing to switch the
> non-find-multipaths case to "i" in a RedHat patch, if I am alone in
> this
> concern.

That won't be necessary, see above.

> > > The advantage of your method is that, as long as the timeout is
> > > long
> > > enough, you always do the correct thing with multipath devices.
> > > The
> > > disadvantage is that the timeout slows down the common case, to
> > > make
> > > the
> > > rare case correct.
> > 
> > Would the idea with variable timeouts improve my approach in your
> > eyes?
> 
> Yes. It still will cause slowdowns on single-pathed SAN storage, but
> it
> should fix the most common case.

Great.

> If nobody is worried about multipathd winning the race against other
> device users, then 4C.3 is basically an impossible state, and there
> is
> no point in adding an additional timeout to make an impossible state
> less likely. In this case, there is no point in my solution. As far
> as
> limiting the number of possible configurations. If we could agree
> that
> "I" isn't safe when checking if multipath should claim a device in
> udev,
> then there would be only 3 cases: fin, Fin, and FiN/fiN.  Like I
> said,
> there two classes of problem where "I" causes problems: if the device
> is
> already in use, and if multipathd simply can't set itself up on the
> device.  If we check the path device is not being used before
> claiming
> it, then FIn with being smart is also a safe case since it will solve
> both of these. fIn with being smart is also safe. I simply don't
> believe
> that fIn is safe without doing these extra steps to protect against
> claiming devices that we shouldn't.

You don't have to use it, but please let's keep the "fIn" option
around. Our customers are used to this behavior. I don't deny that it
has caused some problems (usually related to mishandling one way or the
other), but we're not ready to give it up. We'll be working on
improving it. Dropping it upstream would hurt us.

> 
> This would still allow 5 states, that would probably need 3 config
> parameters
> 
> - (f)ind_multipaths
> - (i)gnore_wwids (or "smart" or something else. I orginally called
> this
>   mode "greedy")
> - (n)o_new_devs

I'd really like to pursue the idea to hide all of this in a single
option with multiple possible values, rather than providing several
options but disallowing certain combinations thereof. That was the main
point I was trying to make in my previous email.

> 
> In this case, N would ignore f/F and i/I. Because we are protecting
> against problems with "I", any of the other four states are valid.
> 
> > Btw, it just occured to me that your approach could be implemented
> > in
> > exactly the way as mine. Basically, all we need to change is what
> > udev
> > properties get set on the "maybe" uevents. Take my code, but don't
> > set
> > SYSTEMD_READY=0 and DM_MULTIPATH_DEVICE_PATH=1 in the "maybe"
> > case...
> > Should work, no? 
> 
> No. This would let nobody use the device. lvm won't scan devices in
> SYSTEMD_READY=0 state, and they can't be mounted.  These are exactly
> the things I am trying to allow.

That's why I said *don't* set SYSTEMD_READY=0 :-) ... but never mind,
you were thinking of a different solution anyway.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)