[linux-lvm] Discussion: performance issue on event activation mode
heming.zhao at suse.com
heming.zhao at suse.com
Sun Jun 6 06:15:23 UTC 2021
Hello David & Zdenek,
I send this mail for a well known performance issue:
when system is attached huge numbers of devices. (ie. 1000+ disks),
the lvm2-pvscan at .service costs too much time and systemd is very easy to
time out, and enter emergency shell in the end.
This performance topic had been discussed in there some times, and the issue was
lasting for many years. From the lvm2 latest code, this issue still can't be fix
completely. The latest code add new function _pvscan_aa_quick(), which makes the
booting time largely reduce but still can's fix this issue utterly.
In my test env, x86 qemu-kvm machine, 6vcpu, 22GB mem, 1015 pv/vg/lv, comparing
with/without _pvscan_aa_quick() code, booting time reduce from "9min 51s" to
"2min 6s". But after switching to direct activation, the booting time is 8.7s
(for longest lvm2 services: lvm2-activation-early.service).
The hot spot of event activation is dev_cache_scan, which time complexity is
O(n^2). And at the same time, systemd-udev worker will generate/run
lvm2-pvscan at .service on all detecting disks. So the overall is O(n^3).
dev_cache_scan //order: O(n^2)
+ _insert_dirs //O(n)
| if obtain_device_list_from_udev() true
| _insert_udev_dir //O(n)
+ dev_cache_index_devs //O(n)
There are 'n' lvm2-pvscan at .service running: O(n)
Overall: O(n) * O(n^2) => O(n^3)
Could we find out a final solution to have a good performance & scale well under
Maybe two solutions (Martin & I discussed):
1. During boot phase, lvm2 automatically swithes to direct activation mode
("event_activation = 0"). After booted, switch back to the event activation mode.
Booting phase is a speical stage. *During boot*, we could "pretend" that direct
activation (event_activation=0) is set, and rely on lvm2-activation-*.service
for PV detection. Once lvm2-activation-net.service has finished, we could
"switch on" event activation.
More precisely: pvscan --cache would look at some file under /run,
e.g. /run/lvm2/boot-finished, and quit immediately if the file doesn't exist
(as if event_activation=0 was set). In lvm2-activation-net.service, we would add
... so that, from this point in time onward, "pvscan --cache" would _not_ quit
immediately any more, but run normally (assuming that the global
event_activation setting is 1). This way we'd get the benefit of using the
static activation services during boot (good performance) while still being able
to react to udev events after booting has finished.
This idea would be worked out with very few code changes.
The result would be a huge step forward on booting time.
2. change lvm2-pvscan at .service running mode from parallel to serival.
This idea looks a little weird, it goes the opposite trend of today's
programming technologies: parallel programming on multi-cores.
the action of lvm2 scaning "/dev" is hard to change, the outside parallel
lvm2-pvscan at .service could change from parallel to serial.
For example, a running pvscan instance could set a "running" flag in tmpfs (ie.
/run/lvm/) indicating that no other pvscan process should be called in parallel.
If another pvscan is invoked and sees "running", it would create a "pending"
flag, and quit. Any other pvscan process seeing the "pending" flag would
just quit. If the first instance sees the "pending" flag, it would
atomically remove "pending" and restart itself, in order to catch any device
that might have appeared since the previous sysfs scan.
In most condition, devices had been found by once pvscan scanning,
then next time of pvscan scanning should work with order O(n), because the
target device had been inserted internal cache tree already. and on overall,
there is only a single pvscan process would be running at any given time.
We could create a list of pending to-be-scanned devices then (might be directory
entries in some tmpfs directory). On exit, pvscan could check this dir and
restart if it's non-empty.
More information about the linux-lvm