[libvirt] [PATCH] create a thread to handle MigrationParamResetto avoid deadlock

Daniel P. Berrangé berrange at redhat.com
Fri Jan 3 10:11:22 UTC 2020


On Fri, Dec 27, 2019 at 01:59:51PM +0800, wang.yi59 at zte.com.cn wrote:
> Hi Daniel,
> 
> Thanks a lot for your review and reply!
> 
> > On Mon, Dec 23, 2019 at 04:50:00PM +0100, Michal Prívozník wrote:
> > > On 12/23/19 11:12 AM, Daniel P. Berrangé wrote:
> > > > On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote:
> > > >> From: Li XueLei <li.xuelei1 at zte.com.cn>
> > > >>
> > > >> Libvirtd no longer receives external requests, after I do the following.
> > > >>
> 
> .....
> 
> > > >
> > > > Libvirt allows multiple connections concurrently and changes made by one
> > > > connection are supposed to apply globally. No per-VM state should be tied
> > > > to a particular connection.
> > >
> > > This is generally very true, except for migration.
> >
> > Hmm, so in that case we do need to make sure this runs from a non-main
> > event thread as this patch does. I'm surprised we've not seen this
> > before though - i'd think it should be a guaranteed deadlock anytime
> > someone tests this scenario.
> >
> > >
> > > >
> > > > IOW, I don't think we need a thread. We should just stop calling
> > > > qemuMigrationSrcCleanup from the connection close callback entirely
> > >
> > > But I agree that something fishy is going on and this doesn't look like
> > > proper solution. Yi, can you please share the backtrace of other threads
> > > too? Why aren't we getting any reply on the monitor?
> >
> > qemuMonitorSend() just puts date onto the TX queue & waits for the
> > main event loop to send it.  We're invoking qemuMonitorSend from
> > the main event loop in this backtrace, hence it'll wait forever
> > for the thread to send it.
> >
> > qemuMonitorSend must never be invoked from the main event thread.
> 
> So do you think how to imporve this patch? Any sugesstion is appreciated :-)

I'm facing a related problem in my patches for providing an "embedded"
QEMU driver - any libvirt API calls that the application makes from
its main event loop will deadlock if they result in a QEMU monitor
command.

We've got another long standing scalability bug too when dealing
with shutdown takes too long, the main event loop is blocked for an
unreasonably long time.

I think the solution to all of these problems is to stop using the main
event loop for the QEMU monitor and agent.

Instead, we should create a new thread for each QEMU VM, and that thread
should run an event loop, handling just the monitor and agent work for
that one VM.

We can do this quite easily now if we use GLib's  GMainContext as the
event loop.

We'll create a GMainContext and store it in qemuDomainObjPrivatePtr.
In qemuProcessLaunch and qemuProcessReconnect we'll need to spawn a
thread running this GMainContext as an event loop.

At the end of qemuProcess we'll need to tell this event loop to quit
and stop the thread.

Then the qemu_agent.c and qemu_monitor.c code need changing to use the
g_source_add_unix_fd () / g_source_remove_unix_fd () APIs instead of
virEventAddHandle/virEventRemoveHandle.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list