[libvirt] [PATCH] Fix race condition reconnecting to vms & loading configs

Daniel P. Berrange berrange at redhat.com
Thu Oct 31 14:26:51 UTC 2013


On Thu, Oct 31, 2013 at 10:23:25AM -0400, Cole Robinson wrote:
> On 10/29/2013 11:22 AM, Cole Robinson wrote:
> > On 10/29/2013 10:25 AM, Daniel P. Berrange wrote:
> >> On Mon, Oct 28, 2013 at 01:22:39PM -0400, Cole Robinson wrote:
> >>> On 10/28/2013 01:14 PM, Daniel P. Berrange wrote:
> >>>> On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote:
> >>>>> On 10/28/2013 01:06 PM, Daniel P. Berrange wrote:
> >>>>>> On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote:
> >>>>>>> On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
> >>>>>>>> From: "Daniel P. Berrange" <berrange at redhat.com>
> >>>>>>>>
> >>>>>>>> The following sequence
> >>>>>>>>
> >>>>>>>>  1. Define a persistent QMEU guest
> >>>>>>>>  2. Start the QEMU guest
> >>>>>>>>  3. Stop libvirtd
> >>>>>>>>  4. Kill the QEMU process
> >>>>>>>>  5. Start libvirtd
> >>>>>>>>  6. List persistent guets
> >>>>>>>>
> >>>>>>>> At the last step, the previously running persistent guest
> >>>>>>>> will be missing. This is because of a race condition in the
> >>>>>>>> QEMU driver startup code. It does
> >>>>>>>>
> >>>>>>>>  1. Load all VM state files
> >>>>>>>>  2. Spawn thread to reconnect to each VM
> >>>>>>>>  3. Load all VM config files
> >>>>>>>>
> >>>>>>>> Only at the end of step 3, does the 'virDomainObjPtr' get
> >>>>>>>> marked as "persistent". There is therefore a window where
> >>>>>>>> the thread reconnecting to the VM will remove the persistent
> >>>>>>>> VM from the list.
> >>>>>>>>
> >>>>>>>> The easy fix is to simply switch the order of steps 2 & 3.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
> >>>>>>>> ---
> >>>>>>>>  src/qemu/qemu_driver.c | 3 +--
> >>>>>>>>  1 file changed, 1 insertion(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> >>>>>>>> index c613967..9c3daad 100644
> >>>>>>>> --- a/src/qemu/qemu_driver.c
> >>>>>>>> +++ b/src/qemu/qemu_driver.c
> >>>>>>>> @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
> >>>>>>>>  
> >>>>>>>>      conn = virConnectOpen(cfg->uri);
> >>>>>>>>  
> >>>>>>>> -    qemuProcessReconnectAll(conn, qemu_driver);
> >>>>>>>> -
> >>>>>>>>      /* Then inactive persistent configs */
> >>>>>>>>      if (virDomainObjListLoadAllConfigs(qemu_driver->domains,
> >>>>>>>>                                         cfg->configDir,
> >>>>>>>> @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged,
> >>>>>>>>                                         NULL, NULL) < 0)
> >>>>>>>>          goto error;
> >>>>>>>>  
> >>>>>>>> +    qemuProcessReconnectAll(conn, qemu_driver);
> >>>>>>>>  
> >>>>>>>>      virDomainObjListForEach(qemu_driver->domains,
> >>>>>>>>                              qemuDomainSnapshotLoad,
> >>>>>>>>
> >>>>>>>
> >>>>>>> I tried testing this patch to see if it would fix:
> >>>>>>>
> >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1015246
> >>>>>>>
> >>>>>>> from current master I did:
> >>>>>>>
> >>>>>>> git revert a924d9d083c215df6044387057c501d9aa338b96
> >>>>>>> reproduce the bug
> >>>>>>> git am <your-patch>
> >>>>>>>
> >>>>>>> But the daemon won't even start up after your patch is built:
> >>>>>>>
> >>>>>>> (gdb) bt
> >>>>>>> #0  qemuMonitorOpen (vm=vm at entry=0x7fffd4211090, config=0x0, json=false,
> >>>>>>>     cb=cb at entry=0x7fffddcae720 <monitorCallbacks>,
> >>>>>>>     opaque=opaque at entry=0x7fffd419b840) at qemu/qemu_monitor.c:852
> >>>>
> >>>>> Sorry for not being clear: The daemon crashes, that's the backtrace.
> >>>>
> >>>> Hmm config is NULL - does the state XML files not include the
> >>>> monitor info perhaps ?
> >>>>
> >>>
> >>> I see:
> >>>
> >>> pidfile for busted VM in /var/run/libvirt/qemu
> >>> nothing in /var/cache/libvirt/qemu
> >>> no state that I can see in /var/lib/libvirt/qemu
> >>>
> >>> But I'm not sure where it's supposed to be stored.
> >>>
> >>> FWIW reproducing this state was pretty simple: revert
> >>> a924d9d083c215df6044387057c501d9aa338b96, edit an existing x86 guest to remove
> >>> all <video> and <graphics> devices, start the guest, libvirtd crashes.
> >>
> >> Ok, I believe you probably have SELinux disabled on your machine or in
> >> libvirtd. With SELinux enabled you hit another bug first
> >>
> >> 2013-10-29 13:50:11.711+0000: 17579: error : qemuConnectMonitor:1401 : Failed to set security context for monitor for rhel6x86_64
> >>
> >>
> >> which prevents hitting the crash you report. The fix is the same in both
> >> cases - we must skip VMs with PID of zero. I've sent a v2 patch.
> >>
> > 
> > Hmm, selinux is permissive here but not disabled. But I'll try your patches
> > and report back.
> > 
> 
> Applied both patches, the original bug report and the crash I reported here
> are both fixed. Thanks Dan!

Cool, thanks for confirming.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list