[dm-devel] multipath-tools 0.7.4 failure to remove device

Wed Jan 17 00:43:47 UTC 2018

On Tue, 2018-01-16 at 14:30 -0600, Benjamin Marzinski wrote:
> On Mon, Jan 15, 2018 at 05:46:24PM +0100, Martin Wilck wrote:
> > On Mon, 2018-01-15 at 17:26 +0100, Julian Andres Klode wrote:
> > > On Mon, Jan 15, 2018 at 05:12:10PM +0100, Martin Wilck wrote:
> > > > On Mon, 2018-01-15 at 16:44 +0100, Julian Andres Klode wrote:
> > > > > 
> > > > > It was your commit that made it imply -n that caused the test
> > > > > failure
> > > > > :)
> > > > > I understand your position on that, but reverted it for
> > > > > Ubuntu,
> > > > > because,
> > > > > while it might be somewhat broken, it's at least the same
> > > > > level
> > > > > of
> > > > > broken
> > > > > as before.
> > > > 
> > > > I'd like to understand that better. Why would this break boot?
> > > 
> > > Ah sorry for the confusion. It did not break boot. Having finding
> > > disabled would have "broken" boot. The problem I was talking
> > > about
> > > was that with finding enabled, no dm devices were generated,
> > > which
> > > it turned out, was not entirely true - you had to run multipath
> > > manually (due to your find implies -n commit). 
> > > 
> > > It seems to me that this would cause more issues in practice than
> > > just keeping the slightly race-y combination enabled. It's
> > > something
> > > to revisit after the ubuntu lts when then people have some time
> > > to
> > > adjust to that change.
> > 
> > I wouldn't call it "slightly race-y". With find_multipaths "on" and
> > "ignore_wwids" "off", "multipath -u" classifies every device as a
> > multipath device. So SYSTEMD_READY=0 will be set on each device.
> > But
> > multipathd, which is responsible for actually setting up the map,
> > will
> > honor "find_multipaths", and will not set up a map for a device it
> > finds only a single path for. So the device never appears in a
> > state in
> > which systemd would be able to use it. Unless you've put in some
> > other
> > magic, this is almost guaranteed to make your system unbootable in
> > the
> > quite common case where people using multipath are running off a
> > non-
> > multipathed local root disk.
> 
> RedHat also removes your patches to force ignore_wwids off and imply
> -n
> 
> 64e27ec066a001012f44550f095c93443e91d845
> ffbb886a8a16cb063d669cd76a1e656fd3ec8c4b
> 
> I don't see any why that this could claim all the devices and render
> a
> system unbootable. First off, I assume you mean that there is a
> problem
> with find_multipaths "on" and ignore_wwids "on", since your patch
> forces
> ignore_wwids "off", and with that setting, mutipath will obviously
> only
> claim devices in the wwids file.

OK, you're right. I was confused by the double negation, once more :-/
So my statement above was incorrect, sorry everyone.

> [...]
> 
> So you only classify a device as a multipath device, if either:
> 
> 1. you find multiple devices with the same WWID. That's what the
>    (VECTOR_SIZE(pathvec) > 1) check after the filter_pathvec() call
>    does.
> 
> 2. You find that the device is already part of a multipath device.
>    That's what the (VECTOR_SIZE(curmp) != 0) after the get_dm_mpvec()
>    call does.

My argument from ffbb886 still applies. You classify the first path as
non-multipath, and subsequent paths as multipath. If systemd is in the
game, it seems highly likely to me that this will cause havoc, because
I'd expect it to grab the first path immediately after it's processed. 

In general, I strongly dislike the fact that subsequent events for the
same device get opposite results. It just doesn't feel right.

On practical terms, for users who have root on non-multipath and don't
care about blacklisting, this logic results in desired behavior. That's
good.

But with root on multipath, I expect this to cause trouble. What are
you doing to avoid the root FS being mounted before it can be
multipathed? I guess you can get away with it with root on xfs or ext4,
because the only problem you have is a non-multipathed root FS. With
root on BTRFS and subvolumes for /usr, /var, etc (as is the default on
SUSE), you'll run into emergency mode because the latter can't be
mounted.

Note that since d7188fcd, multipathd is started after "udev settle"
during boot. Thus it isn't available for picking up events while the
first devices are processed. That makes it almost certain that systemd
will grab the first path (without SYSTEMD_READY=0) for mounting root.

> This code is actually more stingy with allowing devices to be claimed
> as
> multipath devices than it could safely be.  Even though ignore_wwids
> is
> set, it should probably also allow devices that are in the wwids
> file,
> since they are multipath-able. The idea of ignore_wwids is to be more
> generous in accepting devices, but by disallowing devices even if
> they
> are in the wwids file, it is being less generous in some cases. 

I didn't realize this so far. I agree that it might be better to allow
devices from the wwids file, too.

> As for implying -n with find_multipaths, my personal opinion is that
> it
> breaks the main point of find_multipaths, which is to make stuff
> "just
> work". Since RedHat has been running with find_multipaths as the
> default
> for years now, I am well aware of the race-y issue. But in practice
> it's not very bad.
> 
> Here is how I see it. The first time a multipathable device appears,
> the
> udev rules will not claim the first path of it. Multipathd will also
> not
> immediately create a multipath device on top of it.  This means that
> something else could auto set-up on it.  The most likely way this can
> happen is if you are adding new storage that already has LVM/MD
> metadata
> on it.  
> lvm will auto-activate on that new path, making multipath unable
> to create the multipath device.
> 
> If you think that sounds like a big problem, you should consider that
> if
> you use the default multipath udev rules, the exact same problem
> happens
> if you don't use find_multipaths as well. That's because when we
> check
> if multipath can claim a device, we don't call multipath -u with -i.

At SUSE, we apply a patch that adds the "-i" flag to the "multipath -u
call" in the udev rules. By default, we treat all non-blacklisted
devices as multipath devices. That's why ffbb886 "ignore -i if
find_multipaths is set" is handy for SUSE - it avoids the non-
deterministic behavior with both "find_multpaths" and "ignore_wwids".

Unless I'm mistaken, the configuration with "find_multipaths=on" and
"ignore_wwids=on" would only matter in practice if "-i" was added to
the rules (as SUSE does) *and* ffbb886 is backed out (as Red Hat does,
and apparently Ubuntu, too). No big distro seems to do it that way by
default, apparently. But OTOH, no big distro is using the current
upstream conventions, either.

> This actually can save thoughtless users some headaches.  If they
> aren't
> using find_multipaths, and they don't setup their blacklisting
> correctly, multipath still won't claim devices that are already in
> use
> when it starts up, because it will never create a multipath device on
> them, so it will never add their wwid to the wwids file.

Am I mistaken, or did you just present an argument for not using (aka
"ignoring") "-i" with find_multipath? :-)

> So in short, the problem (which currently effects multipath
> regardless
> of whether find_multipaths is set) is that when new storage is added,
> if
> that new storage already has metadata on it that the system will use
> to
> autoassemble something else on top of the device (in practice this
> means
> lvm/md metadata), you may autoassemble the wrong thing on the device.
> It's a race, and yes, find-multipaths is more likely to lose the race
> because it can't start assembling until the second path appears.
> The solution is to simply remove the lvm/dm device, and run
> multipath,
> and from then on, you will never see the problem again.

If this was the only issue, I wouldn't see any reason to argue. I'm
fairly sure that I made the two patches in question in order to
fix some really nasty boot issues. That was almost a year ago, so I
need to double-check. If neither Red Hat nor Ubuntu are seeing severe
problems, maybe those hangs in the past were caused by something else.

> The only bug I have ever received about this issue was because
> anaconda
> (the redhat system installer) knew that multipath was supposed to be
> running on a device, but couldn't disassemble the lvm device that got
> automatically created, because of the old meta-data. The solution is
> to
> make anaconda smarter.
> 
> As a side note, RedHat also adds code to automatically fire off a
> change
> uevent on all the path devices on the first time a multipath device
> is
> created, so that all the path devices get correctly claimed by
> multipath
> after the fact, on the first create. It's currently part of the same
> patch that reverts the two commits listed above, but I have no
> problem
> with posting it as a seperate patch. 

That sounds interesting, although I don't see how it would help once
systemd has grabbed the first path.

> I obviously have no problem with
> reverting those two commits upstream either. But I don't feel
> horribly
> burdened with carrying them as RedHat patches.

No problem for me to carry 64e27ec and ffbb886 as SUSE-patches, either.
It's not optimal that the current upstream code represents a mixture of
various distro patches that no distro is actually using. We should
agree on one approach for upstream.

So either we should revert 64e27ec (and maybe ffbb886, too, but I'm not
yet convinced), and add your "fire off change uevent" patch (take the
"Red Hat / Ubuntu approach"), or we should add "-i" to the multipath
call in the udev rules ("SUSE approach"). I'm fine with both.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)