[dm-devel] [PATCH] udev-md-raid-assembly.rules: skip if DM_UDEV_DISABLE_OTHER_RULES_FLAG

Thu Feb 17 13:09:34 UTC 2022

On Thu 17 Feb 2022 11:58, Martin Wilck wrote:
> On Thu, 2022-02-17 at 09:09 +1100, NeilBrown wrote:
> > On Thu, 17 Feb 2022, mwilck at suse.com wrote:
> > > From: Martin Wilck <mwilck at suse.com>
> > > 
> > > device-mapper sets the flag DM_UDEV_DISABLE_OTHER_RULES_FLAG to 1
> > > for
> > > devices which are unusable. They may be no set up yet, suspended,
> > > or
> > > otherwise unusable (e.g. multipath maps without usable path). This
> > > flag does not necessarily imply SYSTEMD_READY=0 and must therefore
> > > be tested separately.
> > 
> > I really don't like this - looks like a hack.  A Kludge.
> 
> These are strong words. You didn't go into detail, so I'm assuming that
> your reasoning is that DM_UDEV_DISABLE_OTHER_RULES_FLAG is an internal
> flag of the device-mapper subsystem. Still, you can see that is's used
> both internally by dm, and by other subsystems:
> 
> https://github.com/lvmteam/lvm2/blob/8dccc2314e2482370bc6e5cf007eb210994abdef/udev/13-dm-disk.rules.in#L15
> https://github.com/g2p/bcache-tools/blob/a73679b22c333763597d39c72112ef5a53f55419/69-bcache.rules#L6
> https://github.com/opensvc/multipath-tools/blob/d9d7ae9e2125116b465b4ff4d98ce65fe0eac3cc/kpartx/kpartx.rules#L10
> 

The flags that DM use for udev were introduced before systemd project
even existed. We needed to introduce the DM_UDEV_DISABLE_OTHER_RULES_FLAG
to have a possibility for all the "other" (non-dm) udev rules to check
for if there's another subsystem stacking its own devices on top of DM ones.

The flag is used to communicate the other rules a condition when a DM
device underneath is either not yet set up completely or it's not ready
to be read/scanned for a reason (e.g. the device is suspended, not yet
loaded with a table...).

The reason we needed to introduce such a flag is simple - there's
limited amount of event types and DM devices are not ready on the usual
ADD event. It's after the CHANGE event that originates from the DM
device having a table loaded and resumed. At the same time, a CHANGE
event can be generated for various different reasons. So checking a
single flag that we set based in out own udev rules based on our best
knowledge and let other other rules to check for this single flag
seemed to be the best option to solve this.

> Would you call these others "hacks", too?
> 
> > Can you provide a reference to a detailed discussion that explains
> > why
> > SYSTEMD_READY=0 cannot be used?
> 
> The main reason is that SYSTEMD_READY=0 is set too late, in 99-systemd-
> rules, and only on "add" events:
> https://github.com/systemd/systemd/blob/bfae960e53f6986f1c4d234ea82681d0acad57df/rules.d/99-systemd.rules.in#L18
> 
> I guess the device-mapper rules themselves could be setting
> SYSTEMD_READY="0". @Peter Rajnoha, do you want to comment on that? My
> concern wrt such a change would be possible side effects. Setting
> SYSTEMD_READY=0 on "change" events could actually be wrong, see below.
> 

First of all, as already mentioned, DM udev rules with all the flags
pre-date the systemd project itself.

When systemd was introduced, we communicated the flag use with systemd
right away and so this line was added to 99-systemd.rules:

  SUBSYSTEM=="block", ACTION=="add", ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}=="1", ENV{SYSTEMD_READY}="0"

At that early time, the SYSTEMD_READY flag was used solely for systemd
purpose of setting its own device units properly. Just later, other subsystems
started (mis)using this flag for notifying about device readiness and so
the very original intention of the SYSTEMD_READY flag has diverted this
way a little bit.

Last but not the least, systemd is just one of the init systems/service
managers around so it's not any standard for block devices to set the
SYSTEMD_READY flag to notify about device readiness. Yes, it's true
that systemd is widespread now, but still not a single standard...

> I the case I was observing, there was a multipath device without valid
> paths, which had switched to queueing mode [*]. If this happens for
> whatever reason (and it could be something else, like a suspended DM
> device), IO on such a device hangs. IO that may hang must not be
> attempted from an udev rule. Therefore it makes sense that layers
> stacked on top of DM try to avoid it, and checking udev properties set
> by DM is a reasonable way to do that.
> 
> The core of the problem is that there is no well-defined "API"
> specifying how different udev rule sets can communicate, iow which udev
> properties are well-defined enough to be consumed outside of the
> subsystem that defines them. SYSTEMD_READY is about the only "global"
> property. IMO it's somewhat overloaded: The actual semantics of
> SYSTEMD_READY=0 is "systemd shouldn't activate the associated device
> unit". Various udev rule sets use it with similar but not 100%
> identical semantics like "don't touch this" or "don't probe this". 
> 

Exactly!

The SID - Storage Instantiation Daemon, which is still in development,
is trying to cover exactly this part, among other things.

> In the case I was looking at, the device had already been activated by
> systemd. Later, the device had lost all active paths and thus became
> unusable. We can't easily set SYSTEMD_READY=0 on such a device. Doing
> so would actually be dangerous, because systemd might remove the
> device. Moreover, while processing the udev rule, we just don't know if
> the problem is temporary or permanent. 
> 
> Other properties, like those set by the DM subsystem, are less well-
> defined. There's no official spec defining their names and semantics,
> and there are multiple flags which aren't easly differentiated
> (DM_UDEV_DISABLE_DISK_RULES_FLAG, DM_UDEV_DISABLE_OTHER_RULES_FLAG,
> DM_NOSCAN, DM_SUSPENDED, MPATH_DEVICE_READY). OTOH, most of these flags
> have been around for many years without changing, and thus have
> acquired the status of a semi-official API, which is actually used in
> other rule sets. In particular DM_UDEV_DISABLE_OTHER_RULES_FLAG has a
> few users, see above. I believe this is for good reason, and therefore
> I don't consider my patch a "hack".
> 

Maybe we (DM) should have documented this better, more clearly, but the
DM_UDEV_DISABLE_OTHER_RULES_FLAG is really designed to be checked by
"other" foreign subsystems to notify them whether they can process their
udev rules on such a DM device.

Full documentation for the generic DM udev flags is here:

https://sourceware.org/git/?p=lvm2.git;a=blob;f=libdm/libdevmapper.h;=e9412da7d33fc7534cd1eccd88c21b75c6c221b1;hb=HEAD#l3644

In summary, the meaning of the flags:

  DM_UDEV_DISABLE_DISK_RULES_FLAG is controlling 13-dm-disk.rules (where
blkid is called for DM devices and /dev/disk/by-* symlinks are set)

  DM_UDEV_DISABLE_DM_RULES_FLAG is controlling 10-dm.rules

  DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG is controlling DM subsystem (LVM,
multipath, crypt, ...)

  DM_NOSCAN is just a helper DM-internal flag in udev to help inside
DM's own rules and/or its subsystem rules

  DM_SUSPENDED is something that is set and can be checked, but foreign
(non-DM) udev rules don't need to bother about this at all. DM udev
rules already set DM_UDEV_DISABLE_OTHER_RULES_FLAG to notify other rules
if the DM device becomes unreadable.

  DM_NAME, DM_UUID - normally, other rules don't need to bother about
DM name or UUID - they're set mainly to hook custom permission rules on
(for which DM has a template 12-dm-permissions.rules).

So the only flag a non-DM rule should be concerned about is exactly the
single DM_UDEV_DISABLE_OTHER_RULES_FLAG. That's its exact purpose it
was designed for within DM block devices and uevent processing.

Definitely not a hack!

(I'm just a bit surprised that we haven't sent a patch to MD yet.
Wasn't there a check for this flag anytime before? I thought all
major block subsystems have already been covered.)

-- 
Peter