[dm-devel] [PATCH 4/7] libmultipath: force map reload if udev incomplete

Martin Wilck mwilck at suse.com
Fri Dec 18 15:06:57 UTC 2020


On Thu, 2020-12-17 at 23:48 -0600, Benjamin Marzinski wrote:
> On Thu, Dec 17, 2020 at 12:00:15PM +0100, mwilck at suse.com wrote:
> > From: Martin Wilck <mwilck at suse.com>
> > 
> > We've recently observed various cases of incompletely processed
> > uevents
> > during initrd processing. Typically, this would leave a dm device
> > in
> > the state it had after the initial "add" uevent, which is basically
> > unusable,
> > because udevd had been killed by systemd before processing the
> > subsequent
> > "change" event. After switching root, the coldplug event would re-
> > read
> > the db file, which would be in unusable state, and would not do
> > anything.
> > In such cases, a RELOAD action with force_udev_reload=1 is in order
> > to
> > make udev re-process the device completely
> > (DM_UDEV_PRIMARY_SOURCE_FLAG=1 and
> > DM_SUBSYSTEM_UDEV_FLAG0=0).
> > 
> > The previous commits
> > 
> > 2b25a9e libmultipath: select_action(): force udev reload for
> > uninitialized maps
> > cb10d38 multipathd: uev_trigger(): handle incomplete ADD events
> > 
> > addressed the same issue, but incompletely. They would miss cases
> > where the
> > map was configured correctly but none of the RELOAD criteria were
> > met.
> > This patch partially reverts 2b25a9e by converting
> > select_reload_action() into
> > a trivial helper. Instead, we now check for incompletely
> > initialized udev now
> > before checking any of the other reload criteria.
> 
> I'll review this patch tomorrow, but are you able to reproduce this?

Not me, but multiple customers of ours :-/ Most of them where running
PowperPC, for reasons I can only speculate about. The user-visible
phenomenon is that some upper layers on some maps (kpartx-created
partitions, LVM, ...) are not present after boot, and "multipathd
reload" fixes the situation.

I suppose it should be reproducible if one has multiple multipath maps
with partitions, devices are discovered somewhat slowly / with delays
during intird processing, and the root device is discovered early on.
Then systemd has enough time to mount the root FS and stop services
before all maps are completely set up. The exact behavior would still
depend on timing (but in the in the last case I worked on, it was 100%
reproducible by the customer).

> I've seen something similar, except in the case I saw, multipathd
> took
> too long during the initial configuration, and the systemd shut
> things
> down for the switch-root before multipath could finish creating the
> devices. I was thinking of trying to solve cases like this by forcing
> some ordering on multipathd stopping in the initramfs, with something
> like
> 
> Before=initrd-cleanup.service
> Conflicts=initrd-cleanup.service
> 
> for the multipathd.service file for the initramfs. The goal is to
> make
> sure that things don't get shutdown until multipathd has stopped.
> This
> would keep multipath from creating devices when udev isn't around to
> deal with them. Did you try something like this?

No, I didn't think of that. It's an interesting idea, although it might
slow down booting. IMO it's actually a good thing that the services in
the initrd are stopped quickly when the root FS becomes available. It
can just have some side effects our current code doesn't deal well
with.

AFAICS, multipathd is not the problem, udev is. I can see that
multipathd has cleanly set up the maps, but udev has been stopped
before processing the respective events (in particular, the change
events).

If this happens, the udev db for the affected maps looks more or less
like this:

DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG="1"
DM_UDEV_DISABLE_DISK_RULES_FLAG="1"
DM_UDEV_DISABLE_OTHER_RULES_FLAG="1"

The device-mapper hotplug mechanism doesn't help here, because it tries
to import properties from the db. Triggering uevents in other ways
helps even less. Only a "genuine" change event without the "reload"
flag (DM_SUBSYSTEM_UDEV_FLAG0) set will do the trick.

When multipathd starts after switching root, it sees the maps in
perfect state as far as multipath properties are concerned, and thus
will not set the force_udev_reload flag. This patch changes that.

(My break-through step when I attempted to understand the issue was to
tell customers to tar up /run/udev/data before starting coldplug. That
way I could see the state in which the udev db was left when initrd
finished, and I saw the half-completed entries like above).

I suppose this works differently on Red Hat where you use mpathconf and
set up only a very limited set of maps during initrd processing.
Therefore my guess was you'd not see this at all. I'm still wondering
why we have been seeing it only very recently. *Perhaps* my recent
changes to make multipathd shutdown more quickly are part of the
equation, I'm unsure about that. I am pretty positive that this patch
is effective, though.

Regards,
Martin





More information about the dm-devel mailing list