[linux-lvm] udev/10-dm.rules.in: unexpectedly skipping device?

Michael Stapelberg michael+lvm at stapelberg.ch
Sat Apr 18 17:46:13 UTC 2020

Hi Peter,

thank you very much for the detailed response, I learnt a lot from it!

Answers inline:

On Fri, Apr 17, 2020 at 2:57 PM Peter Rajnoha <prajnoha at redhat.com> wrote:
> Hi,
> On 4/17/20 9:42 AM, Michael Stapelberg wrote:
> > Hey,
> >
> > I’m starting to use LVM (+LUKS) on a computer of mine, but ran into
> > trouble getting it to work.
> >
> > The issue I’m running into is that systemd boot hangs until the
> > default unit timeout elapses. This is because the cryptroot device is
> > not found, which in turn is because udev doesn’t create the symlinks
> > (e.g. in /dev/disk/by-uuid). udevadm info shows:
> >
> > # udevadm info -p /sys/block/dm-0
> > P: /devices/virtual/block/dm-0
> > N: dm-0
> > L: 0
> > E: DEVPATH=/devices/virtual/block/dm-0
> > E: DEVNAME=/dev/dm-0
> > E: DEVTYPE=disk
> > E: MAJOR=254
> > E: MINOR=0
> > E: SUBSYSTEM=block
> > E: TAGS=:systemd:
> >
> > I pinpointed this result to udev rule
> > https://sourceware.org/git/?p=lvm2.git;a=blob;f=udev/10-dm.rules.in;hb=ecae76c713bd4fa6c9d8f2a2c990625e4f38b504#l87,
> > i.e.:
> > GOTO="dm_disable"
> >
> > I assume I’m running into this rule because I’m using a custom initrd
> > which does not run systemd nor udev. Instead, my initrd is directly
> > calling vgchange -ay and vgmknodes.
> >
> > I understand that this is not a common setup, but booting without
> > systemd/udev in the initrd should be supported, no?
> >
> You hit the painful spot here!
> Unfortunately, we don't support this case with existing rules. It's not that
> we wouldn't like to see this case supported, but the issue is in recognition
> of the uevents.
> To answer why in a way it makes sense, I need to be a little bit wordy here,
> sorry for that in advance...
> Device-mapper device activation consists of three steps for which different
> uevents are generated:
>   - DM device creation (ADD uevent)
>   - DM table load (no uevent)
>   - DM device resume which also activates the mapping as described by the
> table (CHANGE uevent)
> Right after the first step (with the ADD uevent), the device is not usable
> yet, obviously, because it has no table loaded yet. So we need to make sure
> that no udev rule causes this device to be accessed at this point in time.
> One of the elementary udev rule is a call to "blkid" which scans the device
> and extracts metadata information based on which the /dev/disk/by-* content is
> created and other udev rules can act further based on the information. That's
> why we need to postpone this device access within udev rule processing up
> until we're sure the device is ready, that is, after the CHANGE uevent when
> the table is made active.
> On the contra, we have coldplugging (calling "udevadm trigger --action=add").

To save others some unnecessary confusion: I had originally looked for
mentions of cold-plugging (in various spellings) in systemd/src/udev,
but couldn’t find anything. Starting systemd-udevd did not result in
any uevent messages as reported by “udevadm monitor”.

I eventually figured out that the systemd unit
systemd-udev-trigger.service literally calls e.g. “/usr/bin/udevadm
trigger --type=devices --action=add” at boot time on my system.

> At boot, coldplugging is used to make up for all the devices that have been
> activated before udevd is started from root fs (to make udevd conscious about
> those devices which were handled inside initrd). These "coldplug uevents" are
> in essence unrecognizable from other ADD uevents - there's no mark or flag
> saying this uevent is coming from the coldplug. And that is exactly the
> problematic part - we don't know whether this is the coldplug's ADD uevent
> AFTER we did the proper activation sequence or if this is spurious ADD uevent
> that comes before the device is properly activated. We simply don't know.

Another approach that comes to mind is plumbing DM_COOKIE from
libdevmapper via the DM_DEV_CREATE ioctl to the resulting action=add
uevent, and then in the udev rules only skip action=add events when a
flag is set.

> To alleviate this problem, when a DM device is being activated, that is,
> libdevmapper in userspace calls create + table load + device resume sequence,
> it also provides the DM_UDEV_PRIMARY_SOURCE_FLAG=1 so that it is attached to
> the "resume device" call (...then this flag appears in the uevent the "resume
> device" call causes inside kernel). Once we have uevents with this flag set,

Ah, thanks for the explanation! This was the missing puzzle piece to
programmatically skip hidden subLVs
in my initrd implementation

> it is stored in udev database. When we're processing any other subsequent
> uevent, we know we have already passed this activation sequence correctly.
> This also applies for processing any "coldplug uevents" - we simply look at
> the udev database content and if it has that flag set (that's exactly the
> IMPORT{db}=DM_UDEV_PRIMARY_SOURCE_FLAG call that you can also see in
> 10-dm.rules), we know we can just rerun udev rules for such uevents as the
> device has already gone through the activation sequence properly.
> Now, if we have initrd completely without udev and then switching over to root
> fs where we have udevd running, we're getting into the problem you are hitting
> here:
>   - device is activated in initrd without udev (so we have no udev db record
> about this device)
>   - switching over to root fs
>   - running udevd
>   - running coldplug (udevadm trigger --action=add)
>   - udev rules reacting to coldplug uevents
>   - 10-dm.rules trying to import the DM_UDEV_PRIMARY_SOURCE_FLAG, but since
> there was no udevd to record this information inside inird, we conclude the
> device has not yet passed activation sequence correctly and this is just a
> spurious uevent, hence ignoring it - and that's exactly what you see.
> You can also simulate this problem by executing:
>   - udevadm info --cleanup-db
>   - udevadm trigger --action=add
> ...which gets you into exactly the same situation (do that only on a test
> system :) ).
> However...
> When it comes to improving uevent recognition, there's a kernel patch I did
> back in 2017 which adds SYNTH_UUID (and other possible SYNTH_* variables) to
> synthetic/coldplug uevents:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f36776fafbaa0094390dd4e7e3e29805e0b82730
> There are also userspace patches for systemd/udevd (which still need some
> cherishing before systemd guys take that):
> https://github.com/systemd/systemd/pull/13881
> With this in, we could be in a better position to fix udev rules too.

Thanks, that’s a great pointer! I have applied a minimal version of
the required changes and it does seem to work AFAICT!

> > I’m not sure where DM_UDEV_PRIMARY_SOURCE_FLAG is supposed to be set,
> > or why it isn’t set in my scenario. Do you have any ideas regarding
> > what I could check?
> >
> As described above, it's set by libdevmapper, then libdevmapper passing that
> through DM ioctl to kernel, then kernel generating uevent with this flag, then
> udevd receiving the uevent with this flag set. Any subsequent uevents reimport
> this flag from existing udev database records.
> > Thanks in advance,
> > Best regards,
> > Michael
> >
> > PS: As a workaround, I’m just commenting out that rule. Does that have
> > any negative consequences?
> >
> Yes, there's a race because of the 3 step sequence to activate a DM device.
> With commenting out that rule, you make it possible to access a DM device
> where the table is not yet loaded and made active (hence unusable device). If
> you're lucky, when the ADD event is being processed, the "load table + resume"
> part could have already executed because it takes some time for udevd to react
> to uevents, but it doesn't need to be always the case. If you're not lucky,
> you can get non-deterministic behavior (the blkid scan will fail, various
> other records in udev may be set based on that incorrectly etc.).
> --
> Peter

More information about the linux-lvm mailing list