[libvirt] Symptoms of main loop slowing down in libvirtd

Tue May 2 11:12:11 UTC 2017

On Tue, May 2, 2017 at 4:27 PM, Peter Krempa <pkrempa at redhat.com> wrote:

> On Tue, May 02, 2017 at 16:16:39 +0530, Prerna wrote:
> > On Tue, May 2, 2017 at 4:07 PM, Peter Krempa <pkrempa at redhat.com> wrote:
> >
> > > On Tue, May 02, 2017 at 16:01:40 +0530, Prerna wrote:
> > >
> > > [please don't top-post on technical lists]
> > >
> > > > Thanks for the quick response Peter !
> > > > This ratifies the basic approach I had in mind.
> > > > It needs some (not-so-small) cleanup of the qemu driver code, and I
> have
> > > > already started cleaning up some of it. I am planning to have a
> constant
> > > > number of event handler threads to start with. I'll try adding this
> as a
> > > > configurable parameter in qemu.conf once basic functionality is
> > > completed.
> > >
> > > That is wrong, since you can't guarantee that it will not lock up.
> Since
> > > the workers handling monitor events tend to call monitor commands
> > > themselves it's possible that it will get stuck due to unresponsive
> > > qemu. Without having a worst-case-scenario of a thread per VM you can't
> > > guarantee that the pool won't be depleted.
> > >
> >
> > Once a worker thread "picks" an event, it will contend on the per-VM lock
> > for that VM. Consequently, the handling for that event will be delayed
> > until an existing RPC call for that VM completes.
> >
> >
> > >
> > > If you want to fix this properly, you'll need a dynamic pool.
> > >
> >
> > To improve the efficiency of the thread pool, we can try contending for a
> > VM's lock for a specific time, say, N seconds, and then relinquish the
> > lock. The same thread in the pool can then move on to process events of
> the
> > next VM.
>
> This would unnecessarily delay events which are not locked.
>
> > Note that this needs all VMs to be hashed to a constant number of threads
> > in the pool, say 5. This ensures that each worker thread has a unique ,
> > non-overlapping set of VMs to work with.
>
> How would this help?
>
> > As an example,  {VM_ID: 1, 6,11,16,21 ..} are handled by the same worker
> > thread. If this particular worker thread cannot find the requisite VM's
> > lock, it will move on to the event list for the next VM and so on. The
> use
> > of pthread_trylock() ensures that the worker thread will never be stuck
> > forever.
>
> No, I think this isn't the right approach at all. You could end up
> having all VM's handled with one thread, with others being idle. I think
> the right approach will be to have a dynamic pool, which will handle
> incomming events. In case when two events for a single VM should be
> handled in parallel, the same thread should pick them up in order they
> arrived. In that way, you will have at most a thread per VM, while
> normally you will have only one.
>

I agree that dynamic threadpool is helpful when there are events from
distinct VMs that need to be processed at the same time.
But I am also concerned about efficiently using the threads in this pool.
If we have a few threads only contend on per-VM locks until the RPCs for
that VM complete, it is not a very efficient use of resources. I would
rather have this thread drop handling of this VM's events and do something
useful while it is unable to grab this VM's lock.
This is the reason I wanted to load-balance incoming events by VM IDs and
hash them onto distinct threads. The idea was that a pthread always has
something else to take up if the current Vm's lock is not available. Would
you have some suggestions on improving the efficacy of the thread pool as a
whole ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170502/beb4f458/attachment-0001.htm>