[linux-lvm] Discussion: performance issue on event activation mode
martin.wilck at suse.com
Tue Sep 28 15:16:08 UTC 2021
On Tue, 2021-09-28 at 09:42 -0500, David Teigland wrote:
> On Tue, Sep 28, 2021 at 06:34:06AM +0000, Martin Wilck wrote:
> > Hello David and Peter,
> > On Mon, 2021-09-27 at 10:38 -0500, David Teigland wrote:
> > > On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > > > - We could use the new lvm-activate-* services to replace the
> > > > > activation
> > > > > generator when lvm.conf event_activation=0. This would be
> > > > > done by
> > > > > simply
> > > > > not creating the event-activation-on file when
> > > > > event_activation=0.
> > > >
> > > > ...the issue I see here is around the systemd-udev-settle:
> > >
> > > Thanks, I have a couple questions about the udev-settle to
> > > understand
> > > that
> > > better, although it seems we may not need it.
> > >
> > > > - the setup where lvm-activate-vgs*.service are always there
> > > > (not
> > > > generated only on event_activation=0 as it was before with
> > > > the
> > > > original lvm2-activation-*.service) practically means we
> > > > always
> > > > make a dependency on systemd-udev-settle.service, which we
> > > > shouldn't
> > > > do in case we have event_activation=1.
> > >
> > > Why wouldn't the event_activation=1 case want a dependency on
> > > udev-
> > > settle?
> > You said it should wait for multipathd, which in turn waits for
> > udev
> > settle. And indeed it makes some sense. After all: the idea was to
> > avoid locking issues or general resource starvation during uevent
> > storms, which typically occur in the coldplug phase, and for which
> > the
> > completion of "udev settle" is the best available indicator.
> Hi Martin, thanks, you have some interesting details here.
> Right, the idea is for lvm-activate-vgs-last to wait for other
> like multipath (or anything else that a PV would typically sit on),
> that it will be able to activate as many VGs as it can that are
> present at
> startup. And we avoid responding to individual coldplug events for
> saving time/effort/etc.
> > I'm arguing against it (perhaps you want to join in :-), but odds
> > are
> > that it'll disappear sooner or later. Fot the time being, I don't
> > see a
> > good alternative.
> multipath has more complex udev dependencies, I'll be interested to
> how you manage to reduce those, since I've been reducing/isolating
> udev usage also.
I have pondered this quite a bit, but I can't say I have a concrete
To avoid depending on "udev settle", multipathd needs to partially
revert to udev-independent device detection. At least during initial
startup, we may encounter multipath maps with members that don't exist
in the udev db, and we need to deal with this situation gracefully. We
currently don't, and it's a tough problem to solve cleanly. Not relying
on udev opens up a Pandora's box wrt WWID determination, for example.
Any such change would without doubt carry a large risk of regressions
in some scenarios, which we wouldn't want to happen in our large
customer's data centers.
I also looked into Lennart's "storage daemon" concept where multipathd
would continue running over the initramfs/rootfs switch, but that would
be yet another step with even higher risk.
> > The dependency type you have to use depends on what you need. Do
> > you
> > really only depend on udev settle because of multipathd? I don't
> > think
> > so; even without multipath, thousands of PVs being probed
> > simultaneously can bring the performance of parallel pvscans down.
> > That
> > was the original motivation for this discussion, after all. If this
> > is
> > so, you should use both "Wants" and "After". Otherwise, using only
> > "After" might be sufficient.
> I don't think we really need the settle. If device nodes for PVs are
> present, then vgchange -aay from lvm-activate-vgs* will see them and
> activate VGs from them, regardless of what udev has or hasn't done
> them yet.
Hm. This would mean that the switch to event-based PV detection could
happen before "udev settle" ends. A coldplug storm of uevents could
create 1000s of PVs in a blink after event-based detection was enabled.
Wouldn't that resurrect the performance issues that you are trying to
fix with this patch set?
> > > - Reading the udev db: with the default
> > > external_device_info_source=none
> > > we no longer ask the udev db for any info about devs. (We now
> > > follow that setting strictly, and only ask udev when
> > > source=udev.)
> > This is a different discussion, but if you don't ask udev, how do
> > you
> > determine (reliably, and consistently with other services) whether
> > a
> > given device will be part of a multipath device or a MD Raid
> > member?
> Firstly, with the new devices file, only the actual md/mpath device
> be in the devices file, the components will not be, so lvm will never
> attempt to look at an md or mpath component device.
I have to look more closely into the devices file and how it's created
> Otherwise, when the devices file is not used,
> md: from reading the md headers from the disk
> mpath: from reading sysfs links and /etc/multipath/wwids
Ugh. Reading sysfs links means that you're indirectly depending on
udev, because udev creates those. It's *more* fragile than calling into
libudev directly, IMO. Using /etc/multipath/wwids is plain wrong in
general. It works only on distros that use "find_multipaths strict",
like RHEL. Not to mention that the path can be customized in
> > In the past, there were issues with either pvscan or blkid (or
> > multipath) failing to open a device while another process had
> > opened it
> > exclusively. I've never understood all the subtleties. See systemd
> > commit 3ebdb81 ("udev: serialize/synchronize block device event
> > handling with file locks").
> Those locks look like a fine solution if a problem comes up like
> I suspect the old issues may have been caused by a program using an
> exclusive open when it shouldn't.
Possible. I haven't seen many of these issues recently. Very rarely, I
see reports of a mount command mysteriously, sporadically failing
during boot. It's very hard to figure out why that happens if it does.
I suspect some transient effect of this kind.
> > After=udev-settle will make sure that you're past a coldplug uevent
> > storm during boot. IMO this is the most important part of the
> > equation.
> > I'd be happy to find a solution for this that doesn't rely on udev
> > settle, but I don't see any.
> I don't think multipathd is listening to uevents directly?
> If it were,
> you might use a heuristic to detect a change in uevents (e.g. the
> and conclude coldplug is finished.
multipathd does listen to uevents (only "udev" events, not "kernel").
But that doesn't help us on startup. Currently we try hard to start up
after coldplug is finished. multipathd doesn't have a concurrency issue
like LVM2 (at least I hope so; it handles events with just two threads,
a producer and a consumer). The problem is rather that dm devices
survive the initramfs->rootfs switch, while member devices don't (see
More information about the linux-lvm