[linux-lvm] Discussion: performance issue on event activation mode
heming.zhao at suse.com
heming.zhao at suse.com
Wed Jun 9 04:01:40 UTC 2021
On 6/9/21 12:18 AM, heming.zhao at suse.com wrote:
> On 6/8/21 5:30 AM, David Teigland wrote:
>> On Mon, Jun 07, 2021 at 10:27:20AM +0000, Martin Wilck wrote:
>>> Most importantly, this was about LVM2 scanning of physical volumes. The
>>> number of udev workers has very little influence on PV scanning,
>>> because the udev rules only activate systemd service. The actual
>>> scanning takes place in lvm2-pvscan at .service. And unlike udev, there's
>>> no limit for the number of instances of a given systemd service
>>> template that can run at any given time.
>> Excessive device scanning has been the historical problem in this area,
>> but Heming mentioned dev_cache_scan() specifically as a problem. That was
>> surprising to me since it doesn't scan/read devices, it just creates a
>> list of device names on the system (either readdir in /dev or udev
>> listing.) If there are still problems with excessive scannning/reading,
>> we'll need some more diagnosis of what's happening, there could be some
>> cases we've missed.
> the dev_cache_scan doesn't have direct disk IOs, but libudev will scan/read
> udev db which issue real disk IOs (location is /run/udev/data).
> we can see with combination "obtain_device_list_from_udev=0 &
> event_activation=1" could largely reduce booting time from 2min6s to 40s.
> the key is dev_cache_scan() does the scan device by itself (scaning "/dev").
> I am not very familiar with systemd-udev, below shows a little more info
> about libudev path. the top function is _insert_udev_dir, this function:
> 1. scans/reads /sys/class/block/. O(n)
> 2. scans/reads udev db (/run/udev/data). may O(n)
> udev will call device_read_db => handle_db_line to handle every
> line of a db file.
> 3. does qsort & deduplication the devices list. O(n) + O(n)
> 4. has lots of "memory alloc" & "string copy" actions during working.
> it takes too much memory, from the host side, use 'top' can see:
> - direct activation only used 2G memory during boot
> - event activation cost ~20G memory.
> I didn't test the related udev code, and guess the <2> takes too much time.
> And there are thousand scanning job parallel in /run/udev/data, meanwhile
> there are many devices need to generate udev db file in the same dir. I am
> not sure if the filesystem can perfect handle this scenario.
> the another code path, obtain_device_list_from_udev=0, which triggers to
> scan/read "/dev", this dir has less write IOs than /run/udev/data.
I made a minor mistake: above <3> qsort time is O(logn).
More info about my analysis:
I set filter in lvm.conf, the rule: filter = [ "a|/dev/vda2|", "r|.*|" ]
the booting time reduced a little, from 2min 6s to 1min 42s.
The vm vda2 layout:
# lsblk | egrep -A 4 "^vd"
vda 253:0 0 40G 0 disk
├─vda1 253:1 0 8M 0 part
└─vda2 253:2 0 40G 0 part
├─system-swap 254:0 0 2G 0 lvm [SWAP]
└─system-root 254:1 0 35G 0 lvm /
the filter rule denies all the LVs except rootfs LVs.
the rule makes _pvscan_cache_args() to remove dev from devl->list by nodata filters.
the hot spot narrow to setup_devices (calling dev_cache_scan()).
More information about the linux-lvm