[libvirt] [PATCH] create a thread to handle MigrationParamResetto avoid deadlock

Daniel P. Berrangé berrange at redhat.com
Tue Jan 21 16:48:02 UTC 2020


On Fri, Jan 03, 2020 at 10:11:22AM +0000, Daniel P. Berrangé wrote:
> On Fri, Dec 27, 2019 at 01:59:51PM +0800, wang.yi59 at zte.com.cn wrote:
> > Hi Daniel,
> > 
> > Thanks a lot for your review and reply!
> > 
> > > On Mon, Dec 23, 2019 at 04:50:00PM +0100, Michal Prívozník wrote:
> > > > On 12/23/19 11:12 AM, Daniel P. Berrangé wrote:
> > > > > On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote:
> > > > >> From: Li XueLei <li.xuelei1 at zte.com.cn>
> > > > >>
> > > > >> Libvirtd no longer receives external requests, after I do the following.
> > > > >>
> > 
> > .....
> > 
> > > > >
> > > > > Libvirt allows multiple connections concurrently and changes made by one
> > > > > connection are supposed to apply globally. No per-VM state should be tied
> > > > > to a particular connection.
> > > >
> > > > This is generally very true, except for migration.
> > >
> > > Hmm, so in that case we do need to make sure this runs from a non-main
> > > event thread as this patch does. I'm surprised we've not seen this
> > > before though - i'd think it should be a guaranteed deadlock anytime
> > > someone tests this scenario.
> > >
> > > >
> > > > >
> > > > > IOW, I don't think we need a thread. We should just stop calling
> > > > > qemuMigrationSrcCleanup from the connection close callback entirely
> > > >
> > > > But I agree that something fishy is going on and this doesn't look like
> > > > proper solution. Yi, can you please share the backtrace of other threads
> > > > too? Why aren't we getting any reply on the monitor?
> > >
> > > qemuMonitorSend() just puts date onto the TX queue & waits for the
> > > main event loop to send it.  We're invoking qemuMonitorSend from
> > > the main event loop in this backtrace, hence it'll wait forever
> > > for the thread to send it.
> > >
> > > qemuMonitorSend must never be invoked from the main event thread.
> > 
> > So do you think how to imporve this patch? Any sugesstion is appreciated :-)
> 
> I'm facing a related problem in my patches for providing an "embedded"
> QEMU driver - any libvirt API calls that the application makes from
> its main event loop will deadlock if they result in a QEMU monitor
> command.
> 
> We've got another long standing scalability bug too when dealing
> with shutdown takes too long, the main event loop is blocked for an
> unreasonably long time.
> 
> I think the solution to all of these problems is to stop using the main
> event loop for the QEMU monitor and agent.
> 
> Instead, we should create a new thread for each QEMU VM, and that thread
> should run an event loop, handling just the monitor and agent work for
> that one VM.
> 
> We can do this quite easily now if we use GLib's  GMainContext as the
> event loop.
> 
> We'll create a GMainContext and store it in qemuDomainObjPrivatePtr.
> In qemuProcessLaunch and qemuProcessReconnect we'll need to spawn a
> thread running this GMainContext as an event loop.
> 
> At the end of qemuProcess we'll need to tell this event loop to quit
> and stop the thread.
> 
> Then the qemu_agent.c and qemu_monitor.c code need changing to use the
> g_source_add_unix_fd () / g_source_remove_unix_fd () APIs instead of
> virEventAddHandle/virEventRemoveHandle.

Is this approach something you would like to work on implementing,
or have already started ?  If not, I'll attempt to implement it
myself in the next week or so.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list