[libvirt] Symptoms of main loop slowing down in libvirtd

Tue May 2 10:03:47 UTC 2017

Hi all,
On my host, I have been seeing instances of keepalive responses slow down
intermittently when issuing bulk power offs.
With some tips from Danpb on the channel, I was able to trace via systemtap
that the main event loop would not run for about 6-9 seconds. This would
stall keepalives and kill client connections.

I was able to trace it to the fact that qemuProcessHandleEvent() needed the
vm lock, and this was called from the main loop. I had hook scripts that
slightly elongated the time the power off RPC completed and the subsequent
keepalive delays were noticeable.

I agree that the easiest solution is to unblock the Vm lock before hook
scripts are activated.
However, I was wondering why we contend on the per-Vm lock directly from
the main loop at all ? Can we do this instead : have the main loop "park"
events to a separate event queue, and then have a dedicated thread pool in
the qemu driver pick these raw events and then try grabbing the per-vm lock
for that VM ?
That way, we can be sure that the main event loop is _never_ delayed
irrespective of an RPC dragging on.

If this sounds reasonable I will be happy to post the driver rewrite
patches to that end.

Regards,
Prerna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170502/478e1318/attachment-0001.htm>