[linux-lvm] lvmpolld causes IO performance issue
Heming Zhao
heming.zhao at suse.com
Tue Aug 16 09:28:20 UTC 2022
Hello maintainers & list,
I bring a story:
One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
decrease.
How to trigger:
When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
cpu load. But when system connects ~10 LUNs, the performance is fine.
We found two work arounds:
1. set lvm.conf 'activation/polling_interval=120'.
2. write a speical udev rule, which make udev ignore the event for mpath devices.
echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
/etc/udev/rules.d/90-dm-watch.rules
Run above any one of two can make the performance issue disappear.
** the root cause **
lvmpolld will do interval requeset info job for updating the pvmove status
On every polling_interval time, lvm2 will update vg metadata. The update job will
call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
(8 is IN_CLOSE_WRITE.)
These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
even if pvmove write a few data, the sys_close action trigger udev's "watch"
mechanism to gets notified frequently about a process that has written to the
device and closed it. This causes frequent, pointless re-evaluation of the udev
rules for these devices.
My question: Does LVM2 maintainers have any idea to fix this bug?
In my view, does lvm2 could drop VGs devices fds until pvmove finish?
Thanks,
Heming
More information about the linux-lvm
mailing list