[linux-lvm] Discussion: performance issue on event activation mode

heming.zhao at suse.com heming.zhao at suse.com
Sat Jul 3 11:49:21 UTC 2021


On 7/3/21 6:02 AM, David Teigland wrote:
> On Fri, Jul 02, 2021 at 09:22:03PM +0000, Martin Wilck wrote:
>> On Fr, 2021-07-02 at 16:09 -0500, David Teigland wrote:
>>> On Sun, Jun 06, 2021 at 02:15:23PM +0800, heming.zhao at suse.com wrote:
>>>> dev_cache_scan //order: O(n^2)
>>>>   + _insert_dirs //O(n)
>>>>   | if obtain_device_list_from_udev() true
>>>>   |   _insert_udev_dir //O(n)
>>>>   |
>>>>   + dev_cache_index_devs //O(n)
>>>
>>> I've been running some experiments and trying some patches to improve
>>> this.  By setting obtain_device_list_from_udev=0, and using the
>>> attached
>>> patch to disable dev_cache_index_devs, the pvscan is much better.
>>>
>>> systemctl status lvm2-pvscan appears to show that the pvscan command
>>> itself runs for only 2-4 seconds, while the service as a whole takes
>>> around 15 seconds.  See the 16 sec gap below from the end of pvscan
>>> to the systemd Started message.  If that's accurate, the remaining
>>> delay
>>> would lie outside lvm.
>>>
>>> Jul 02 15:27:57 localhost.localdomain systemd[1]: Starting LVM event
>>> activation on device 253:1710...
>>> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] PV
>>> /dev/mapper/mpathalz online, VG 1ed02c7d-0019-43c4-91b5-f220f3521ba9
>>> is complete.
>>> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] VG
>>> 1ed02c7d-0019-43c4-91b5-f220f3521ba9 run autoactivation.
>>> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   1 logical
>>> volume(s) in volume group "1ed02c7d-0019-43c4-91b5-f220f3521ba9" now
>>> active
>>
>> Printing this message is really the last thing that pvscan does?
> 
> I've not seen anything measurable after that message.  However, digging
> through the command init/exit paths I did find libudev setup/destroy calls
> that can also be skipped when the command is not accessing libudev info.
> A quick check seemed to show some further improvement from dropping those.
> (That will be part of the larger patch isolating libudev usage.)
> 
> I'm still seeing significant delay between the apparent command exit and
> the systemd "Started" message, but this will require a little more work to
> prove.
> 

I applied the patch and got the very similar result.


```with patch
# systemd-analyze blame
          27.679s lvm2-pvscan at 134:896.service
          27.664s lvm2-pvscan at 134:800.service
          27.648s lvm2-pvscan at 134:960.service
          27.645s lvm2-pvscan at 132:816.service
          27.589s lvm2-pvscan at 134:816.service
          27.582s lvm2-pvscan at 133:992.service
          ... ...

# systemctl status lvm2-pvscan at 134:896.service
... ...
Jul 03 19:43:02 sle15sp2-base40g systemd[1]: Starting LVM event activation on device 134:896...
Jul 03 19:43:03 sle15sp2-base40g lvm[24817]:   pvscan[24817] PV /dev/sdalm online, VG vg_sdalm is complet>
Jul 03 19:43:03 sle15sp2-base40g lvm[24817]:   pvscan[24817] VG vg_sdalm run autoactivation.
Jul 03 19:43:03 sle15sp2-base40g lvm[24817]:   1 logical volume(s) in volume group "vg_sdalm" now active
Jul 03 19:43:30 sle15sp2-base40g systemd[1]: Started LVM event activation on device 134:896.
```

the 27.679s get from 19:43:30 minus 19:43:02.

and 27s is 10s quick than the lvm without patch (by today's test result)
```without patch
# systemd-analyze blame
          37.650s lvm2-pvscan at 133:992.service
          37.650s lvm2-pvscan at 133:1008.service
          37.649s lvm2-pvscan at 133:896.service
          37.649s lvm2-pvscan at 134:960.service
          37.612s lvm2-pvscan at 133:880.service
          37.612s lvm2-pvscan at 133:864.servic
          ... ...

# systemctl status lvm2-pvscan at 133:992.service
... ...
Jul 03 19:31:28 sle15sp2-base40g systemd[1]: Starting LVM event activation on device 133:992...
Jul 03 19:31:30 sle15sp2-base40g lvm[24243]:   pvscan[24243] PV /dev/sdalc online, VG vg_sdalc is complet>
Jul 03 19:31:30 sle15sp2-base40g lvm[24243]:   pvscan[24243] VG vg_sdalc run autoactivation.
Jul 03 19:31:30 sle15sp2-base40g lvm[24243]:   1 logical volume(s) in volume group "vg_sdalc" now active
Jul 03 19:32:05 sle15sp2-base40g systemd[1]: Started LVM event activation on device 133:992.
```

I added log in lvm2 code, the time of log "1 logical volume(s) in volume
group "vg_sdalm" now active" is the real pvscan finished time. Martin
comment in previous mail is may right, systemd is may busy and delayed to
detect/see the pvscan service completion.

Thanks,
heming





More information about the linux-lvm mailing list