[libvirt] Symptoms of main loop slowing down in libvirtd

Tue May 2 10:57:12 UTC 2017

On Tue, May 02, 2017 at 16:16:39 +0530, Prerna wrote:
> On Tue, May 2, 2017 at 4:07 PM, Peter Krempa <pkrempa at redhat.com> wrote:
> 
> > On Tue, May 02, 2017 at 16:01:40 +0530, Prerna wrote:
> >
> > [please don't top-post on technical lists]
> >
> > > Thanks for the quick response Peter !
> > > This ratifies the basic approach I had in mind.
> > > It needs some (not-so-small) cleanup of the qemu driver code, and I have
> > > already started cleaning up some of it. I am planning to have a constant
> > > number of event handler threads to start with. I'll try adding this as a
> > > configurable parameter in qemu.conf once basic functionality is
> > completed.
> >
> > That is wrong, since you can't guarantee that it will not lock up. Since
> > the workers handling monitor events tend to call monitor commands
> > themselves it's possible that it will get stuck due to unresponsive
> > qemu. Without having a worst-case-scenario of a thread per VM you can't
> > guarantee that the pool won't be depleted.
> >
> 
> Once a worker thread "picks" an event, it will contend on the per-VM lock
> for that VM. Consequently, the handling for that event will be delayed
> until an existing RPC call for that VM completes.
> 
> 
> >
> > If you want to fix this properly, you'll need a dynamic pool.
> >
> 
> To improve the efficiency of the thread pool, we can try contending for a
> VM's lock for a specific time, say, N seconds, and then relinquish the
> lock. The same thread in the pool can then move on to process events of the
> next VM.

This would unnecessarily delay events which are not locked.

> Note that this needs all VMs to be hashed to a constant number of threads
> in the pool, say 5. This ensures that each worker thread has a unique ,
> non-overlapping set of VMs to work with.

How would this help?

> As an example,  {VM_ID: 1, 6,11,16,21 ..} are handled by the same worker
> thread. If this particular worker thread cannot find the requisite VM's
> lock, it will move on to the event list for the next VM and so on. The use
> of pthread_trylock() ensures that the worker thread will never be stuck
> forever.

No, I think this isn't the right approach at all. You could end up
having all VM's handled with one thread, with others being idle. I think
the right approach will be to have a dynamic pool, which will handle
incomming events. In case when two events for a single VM should be
handled in parallel, the same thread should pick them up in order they
arrived. In that way, you will have at most a thread per VM, while
normally you will have only one.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170502/4ae35edc/attachment-0001.sig>