[libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.

Thu Oct 26 04:51:17 UTC 2017

On Wed, Oct 25, 2017 at 4:12 PM, Jiri Denemark <jdenemar at redhat.com> wrote:

> On Tue, Oct 24, 2017 at 10:34:53 -0700, Prerna Saxena wrote:
> >
> > As noted in
> > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html
> > libvirt-QEMU driver handles all async events from the main loop.
> > Each event handling needs the per-VM lock to make forward progress. In
> > the case where an async event is received for the same VM which has an
> > RPC running, the main loop is held up contending for the same lock.
> >
> > This impacts scalability, and should be addressed on priority.
> >
> > Note that libvirt does have a 2-step deferred handling for a few event
> > categories, but (1) That is insufficient since blockign happens before
> > the handler could disambiguate which one needs to be posted to this
> > other queue.
> > (2) There needs to be homogeniety.
> >
> > The current series builds a framework for recording and handling VM
> > events.
> > It initializes per-VM event queue, and a global event queue pointing to
> > events from all the VMs. Event handling is staggered in 2 stages:
> > - When an event is received, it is enqueued in the per-VM queue as well
> >   as the global queues.
> > - The global queue is built into the QEMU Driver as a threadpool
> >   (currently with a single thread).
> > - Enqueuing of a new event triggers the global event worker thread, which
> >   then attempts to take a lock for this event's VM.
> >     - If the lock is available, the event worker runs the function
> handling
> >       this event type. Once done, it dequeues this event from the global
> >       as well as per-VM queues.
> >     - If the lock is unavailable(ie taken by RPC thread), the event
> worker
> >       thread leaves this as-is and picks up the next event.
>
> If I get it right, the event is either processed immediately when its VM
> object is unlocked or it has to wait until the current job running on
> the VM object finishes even though the lock may be released before that.
> Correct? If so, this needs to be addressed.
>

In most cases, the lock is released just before we end the API. However, it
is a small change that can be made.

>
> > - Once the RPC thread completes, it looks for events pertaining to the
> >   VM in the per-VM event queue. It then processes the events serially
> >   (holding the VM lock) until there are no more events remaining for
> >   this VM. At this point, the per-VM lock is relinquished.
> >
> > Patch Series status:
> > Strictly RFC only. No compilation issues. I have not had a chance to
> > (stress) test it after rebase to latest master.
> > Note that documentation and test coverage is TBD, since a few open
> > points remain.
> >
> > Known issues/ caveats:
> > - RPC handling time will become non-deterministic.
> > - An event will only be "notified" to a client once the RPC for same VM
> completes.
> > - Needs careful consideration in all cases where a QMP event is used to
> >   "signal" an RPC thread, else will deadlock.
>
> This last issue is actually a show stopper here. We need to make sure
> QMP events are processed while a job is still active on the same domain.
> Otherwise thinks kile block jobs and migration, which are long running
> jobs driven by events, will break.
>
> Jirka
>
Completely agree, which is why I have explicitly mentioned this.
However, I do not completely follow why it needs to be this way. Can the
block job APIs between QEMU <--> libvirt be fixed so that such behaviour is
avoided ?

Regards,
Prerna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20171026/2db4f1c8/attachment-0001.htm>