[linux-lvm] Discussion: performance issue on event activation mode

Fri Oct 1 08:08:25 UTC 2021

On Fri 01 Oct 2021 07:41, Martin Wilck wrote:
> On Thu, 2021-09-30 at 23:32 +0800, heming.zhao at suse.com wrote:
> > > I just want to say that some of the issues might simply be
> > > regressions/issues with systemd/udev that could be fixed. We as
> > > providers of block device abstractions where we need to handle,
> > > sometimes, thousands of devices, might be the first ones to hit these
> > > issues.
> > > 
> > 
> > The rhel8 callgrind picture
> > (https://prajnoha.fedorapeople.org/bz1986158/rhel8_libudev_critical_cost.png
> > )
> > responds to my analysis:
> > https://listman.redhat.com/archives/linux-lvm/2021-June/msg00022.html
> > handle_db_line took too much time and become the hotspot.
> 
> I missed that post. You wrote
> 
> > the dev_cache_scan doesn't have direct disk IOs, but libudev will
> scan/read
> > udev db which issue real disk IOs (location is /run/udev/data).
> > ...
> > 2. scans/reads udev db (/run/udev/data). may O(n)
> >  udev will call device_read_db => handle_db_line to handle every
> >    line of a db file.
> > ...
> > I didn't test the related udev code, and guess the <2> takes too much
> time.
> 
> ... but note that /run/udev is on tmpfs, not on a real disk. So  the
> accesses should be very fast unless there's some locking happening.

Yes, indeed! I think this is a regression.

The results/graphs show that lots of time is spent on some internal
hashmap handling. I don't see this in older versions of udev like the
one bundled with systemd v219 (I compared RHEL7 and 8, haven't done
detailed bisection yet). My suspicion is that some of the code in udev
got more shared with native systemd code, like that hash usage, so this
might be the clue, but someone from systemd/udev should look more closer
into this.

-- 
Peter