[linux-lvm] lvmpolld causes high cpu load issue

Wed Aug 17 12:54:46 UTC 2022

Dne 17. 08. 22 v 14:39 Martin Wilck napsal(a):
> On Wed, 2022-08-17 at 18:47 +0800, Heming Zhao wrote:
>> On Wed, Aug 17, 2022 at 11:46:16AM +0200, Zdenek Kabelac wrote:
>>
>>
>>>
>>> ATM I'm not even sure if you are complaining about how CPU usage of
>>> lvmpolld
>>> or just huge udev rules processing overhead.
>>
>> The load is generated by multipath. lvmpolld does the IN_CLOSE_WRITE
>> action
>> which is the trigger.
> 
> Let's be clear here: every close-after-write operation triggers udev's
> "watch" mechanism for block devices, which causes the udev rules to be
> executed for the device. That is not a cheap operation. In the case at
> hand, the customer was observing a lot of "multipath -U" commands. So
> apparently a significant part of the udev rule processing was spent in
> "multipath -U". Running "multipath -U" is important, because the rule
> could have been triggered by a change of the number of available paths
> devices, and later commands run from udev rules might hang indefinitely
> if the multipath device had no usable paths any more. "multipath -U" is
> already quite well optimized, but it needs to do some I/O to complete
> it's work, thus it takes a few milliseconds to run.
> 
> IOW, it would be misleading to point at multipath. close-after-write
> operations on block devices should be avoided if possible. As you
> probably know, the purpose udev's "watch" operation is to be able to
> determine changes on layered devices, e.g. newly created LVs or the
> like. "pvmove" is special, because by definition it will usually not
> cause any changes in higher layers. Therefore it might make sense to
> disable the udev watch on the affected PVs while pvmove is running, and
> trigger a single change event (re-enabling the watch) after the pvmove
> has finished. If that is impossible, lvmpolld and other lvm tools that
> are involved in the pvmove operation should avoid calling close() on
> the PVs, IOW keep the fds open until the operation is finished.

Hi

Let's make clear we are very well aware of all the constrains associated with 
udev rule logic  (and we tried quite hard to minimize impact - however udevd 
developers kind of 'misunderstood'  how badly they will be impacting system's 
performance with the existing watch rule logic - and the story kind of 
'continues' with  'systemd's' & dBus services unfortunatelly...

However let's focus on 'pvmove' as it is potentially very lengthy operation - 
so it's not feasible to keep the  VG locked/blocked  across an operation which 
might take even days with slower storage and big moved sizes (write 
access/lock disables all readers...)

So the lvm2 does try to minimize locking time. We will re validate whether 
just necessary  'vg updating' operation are using 'write' access - since 
occasionally due to some unrelated code changes it might eventually result 
sometimes in unwanted 'write' VG open - but we can't keep the operation 
blocking  a whole VG because of slow udev rule processing.

In normal circumstances udev rule should be processed very fast - unless there 
is something mis-designe causing a CPU overloading.

But as mentioned already few times - without more knowledge about the case we 
could hardly guess exact reasoning.  But we already provided useful suggestion 
how to reduce number of 'processed' device by udev by reduction of 'lvm2 
metadata PVs'  - the big reason for frequent metadata upsate would be a big 
segmentation of LV - but this we will not know without seeing user's 
'metadata' of a VG in this case...

Zdenek