[linux-lvm] lvmpolld causes high cpu load issue

Wed Aug 17 11:13:27 UTC 2022

Dne 17. 08. 22 v 12:47 Heming Zhao napsal(a):
> On Wed, Aug 17, 2022 at 11:46:16AM +0200, Zdenek Kabelac wrote:
>> Dne 17. 08. 22 v 10:43 Heming Zhao napsal(a):
>>> On Wed, Aug 17, 2022 at 10:06:35AM +0200, Zdenek Kabelac wrote:
>>>> Dne 17. 08. 22 v 4:03 Heming Zhao napsal(a):
>>>>> On Tue, Aug 16, 2022 at 12:26:51PM +0200, Zdenek Kabelac wrote:
>>>>>> Dne 16. 08. 22 v 12:08 Heming Zhao napsal(a):
>>>>>>> Ooh, very sorry, the subject is wrong, not IO performance but cpu high load
>>>>>>> is triggered by pvmove.
>>>>>>>
> The machine connecting disks are more than 250. The VG has 103 PVs & 79 LVs.
> 
> # /sbin/vgs
>    VG           #PV #LV #SN Attr   VSize   VFree
>    <vgname>     103  79   0 wz--n-  52t    17t

Ok - so main issue could be too many  PVs with relatively high latency of 
mpath devices  (which could be all actually simulated easily in lvm2 test suite)

> The load is generated by multipath. lvmpolld does the IN_CLOSE_WRITE action
> which is the trigger.
> 

I'll check lvmpolld whether it's using correct locking while checking for the 
operational state - you may possibly extend checking interval of polling 
(although that's where the mentioned patchset has been enhancing couple things)

>>
>> If you have too many disks in VG  (again unclear how many there are paths
>> and how many distinct PVs) - user may *significantly* reduce burden
>> associated with metadata updating by reducing number of 'actively'
>> maintained metadata areas in VG - so i.e. if you have 100PVs in VG - you may
>> keep metadata only on 5-10 PVs to have 'enough' duplicate copies of lvm2
>> metadata within VG (vgchange --metadaatacopies X) - clearly it depends on
>> the use case and how many PVs are added/removed from a VG over the
>> lifetime....
> 
> Thanks for the important info. I also found the related VG config from
> /etc/lvm/backup/<vgname>, this file shows 'metadata_copies = 0'.
> 
> This should be another solution. But why not lvm2 takes this behavior by
> default, or give a notification when pv number beyond a threshold when user
> executing pvs/vgs/lvs or pvmove.
> There are too many magic switch, users don't know how to adjust them for
> better performance.

Problem is always the same -  selecting right 'default' :) what suites to user 
A is sometimes  'no go' for user B. So ATM it's more 'secure/safe' to keep 
metadata with each PV - so when a PV is discovered it's known how the VG using 
such PV looks like.  When only fraction of PV have the info - VG is way more 
fragile on damage when disks are lost i.e. there is no 'smart' mechanism to 
pick disks in different racks....

So this option is there for administrators that are 'clever' enough to deal 
with a new set of problems it may create for them.

Yes - lvm2 has lot of options - but that's what is usually necessary when we 
want to be capable to provide optimal solution for really wide variety of 
setups - so I think spending couple minutes on reading man pages pays off - 
especially if you had to spend 'days' on build your disk racks ;)

And yes we may add few more hints - but then we are asked by 'second' group of 
users ('skilled admins')  - why do we print so many dumb messages every time 
they do some simple operation :)

> I'm busy with many bugs, still can't find a time slot to set up a env.
> For this performance issue, it relates with mpath, I can't find a easy
> way to set up a env. (I suspect it may trigger this issue by setting up
> 300 fake PVs without mpath, then do pvmove cmd.)

'Fragmented' LVs with small segment sizes my significantly raise the amount of 
metadata updates needed during pvmove operation as  each single LV segments 
will be mirrored by individual mirror.

Zdenek