[linux-lvm] udev/10-dm.rules.in: unexpectedly skipping device?

Fri Apr 17 12:57:16 UTC 2020

Hi,

On 4/17/20 9:42 AM, Michael Stapelberg wrote:
> Hey,
> 
> I’m starting to use LVM (+LUKS) on a computer of mine, but ran into
> trouble getting it to work.
> 
> The issue I’m running into is that systemd boot hangs until the
> default unit timeout elapses. This is because the cryptroot device is
> not found, which in turn is because udev doesn’t create the symlinks
> (e.g. in /dev/disk/by-uuid). udevadm info shows:
> 
> # udevadm info -p /sys/block/dm-0
> P: /devices/virtual/block/dm-0
> N: dm-0
> L: 0
> E: DEVPATH=/devices/virtual/block/dm-0
> E: DEVNAME=/dev/dm-0
> E: DEVTYPE=disk
> E: MAJOR=254
> E: MINOR=0
> E: SUBSYSTEM=block
> E: USEC_INITIALIZED=6522555
> E: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG=1
> E: DM_UDEV_DISABLE_DISK_RULES_FLAG=1
> E: DM_UDEV_DISABLE_OTHER_RULES_FLAG=1
> E: SYSTEMD_READY=0
> E: TAGS=:systemd:
> 
> I pinpointed this result to udev rule
> https://sourceware.org/git/?p=lvm2.git;a=blob;f=udev/10-dm.rules.in;hb=ecae76c713bd4fa6c9d8f2a2c990625e4f38b504#l87,
> i.e.:
> ENV{DM_UDEV_RULES_VSN}!="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}!="1",
> GOTO="dm_disable"
> 
> I assume I’m running into this rule because I’m using a custom initrd
> which does not run systemd nor udev. Instead, my initrd is directly
> calling vgchange -ay and vgmknodes.
> 
> I understand that this is not a common setup, but booting without
> systemd/udev in the initrd should be supported, no?
> 

You hit the painful spot here!

Unfortunately, we don't support this case with existing rules. It's not that
we wouldn't like to see this case supported, but the issue is in recognition
of the uevents.

To answer why in a way it makes sense, I need to be a little bit wordy here,
sorry for that in advance...

Device-mapper device activation consists of three steps for which different
uevents are generated:

  - DM device creation (ADD uevent)
  - DM table load (no uevent)
  - DM device resume which also activates the mapping as described by the
table (CHANGE uevent)

Right after the first step (with the ADD uevent), the device is not usable
yet, obviously, because it has no table loaded yet. So we need to make sure
that no udev rule causes this device to be accessed at this point in time.

One of the elementary udev rule is a call to "blkid" which scans the device
and extracts metadata information based on which the /dev/disk/by-* content is
created and other udev rules can act further based on the information. That's
why we need to postpone this device access within udev rule processing up
until we're sure the device is ready, that is, after the CHANGE uevent when
the table is made active.

On the contra, we have coldplugging (calling "udevadm trigger --action=add").
At boot, coldplugging is used to make up for all the devices that have been
activated before udevd is started from root fs (to make udevd conscious about
those devices which were handled inside initrd). These "coldplug uevents" are
in essence unrecognizable from other ADD uevents - there's no mark or flag
saying this uevent is coming from the coldplug. And that is exactly the
problematic part - we don't know whether this is the coldplug's ADD uevent
AFTER we did the proper activation sequence or if this is spurious ADD uevent
that comes before the device is properly activated. We simply don't know.

To alleviate this problem, when a DM device is being activated, that is,
libdevmapper in userspace calls create + table load + device resume sequence,
it also provides the DM_UDEV_PRIMARY_SOURCE_FLAG=1 so that it is attached to
the "resume device" call (...then this flag appears in the uevent the "resume
device" call causes inside kernel). Once we have uevents with this flag set,
it is stored in udev database. When we're processing any other subsequent
uevent, we know we have already passed this activation sequence correctly.
This also applies for processing any "coldplug uevents" - we simply look at
the udev database content and if it has that flag set (that's exactly the
IMPORT{db}=DM_UDEV_PRIMARY_SOURCE_FLAG call that you can also see in
10-dm.rules), we know we can just rerun udev rules for such uevents as the
device has already gone through the activation sequence properly.

Now, if we have initrd completely without udev and then switching over to root
fs where we have udevd running, we're getting into the problem you are hitting
here:

  - device is activated in initrd without udev (so we have no udev db record
about this device)

  - switching over to root fs

  - running udevd

  - running coldplug (udevadm trigger --action=add)

  - udev rules reacting to coldplug uevents

  - 10-dm.rules trying to import the DM_UDEV_PRIMARY_SOURCE_FLAG, but since
there was no udevd to record this information inside inird, we conclude the
device has not yet passed activation sequence correctly and this is just a
spurious uevent, hence ignoring it - and that's exactly what you see.

You can also simulate this problem by executing:

  - udevadm info --cleanup-db
  - udevadm trigger --action=add

...which gets you into exactly the same situation (do that only on a test
system :) ).

However...

When it comes to improving uevent recognition, there's a kernel patch I did
back in 2017 which adds SYNTH_UUID (and other possible SYNTH_* variables) to
synthetic/coldplug uevents:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f36776fafbaa0094390dd4e7e3e29805e0b82730

There are also userspace patches for systemd/udevd (which still need some
cherishing before systemd guys take that):

https://github.com/systemd/systemd/pull/13881

With this in, we could be in a better position to fix udev rules too.

> I’m not sure where DM_UDEV_PRIMARY_SOURCE_FLAG is supposed to be set,
> or why it isn’t set in my scenario. Do you have any ideas regarding
> what I could check?
>

As described above, it's set by libdevmapper, then libdevmapper passing that
through DM ioctl to kernel, then kernel generating uevent with this flag, then
udevd receiving the uevent with this flag set. Any subsequent uevents reimport
this flag from existing udev database records.

> Thanks in advance,
> Best regards,
> Michael
> 
> PS: As a workaround, I’m just commenting out that rule. Does that have
> any negative consequences?
> 

Yes, there's a race because of the 3 step sequence to activate a DM device.
With commenting out that rule, you make it possible to access a DM device
where the table is not yet loaded and made active (hence unusable device). If
you're lucky, when the ADD event is being processed, the "load table + resume"
part could have already executed because it takes some time for udevd to react
to uevents, but it doesn't need to be always the case. If you're not lucky,
you can get non-deterministic behavior (the blkid scan will fail, various
other records in udev may be set based on that incorrectly etc.).

-- 
Peter