[libvirt] [PATCH] Fix race condition reconnecting to vms & loading configs

Cole Robinson crobinso at redhat.com
Tue Oct 29 15:22:54 UTC 2013


On 10/29/2013 10:25 AM, Daniel P. Berrange wrote:
> On Mon, Oct 28, 2013 at 01:22:39PM -0400, Cole Robinson wrote:
>> On 10/28/2013 01:14 PM, Daniel P. Berrange wrote:
>>> On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote:
>>>> On 10/28/2013 01:06 PM, Daniel P. Berrange wrote:
>>>>> On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote:
>>>>>> On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
>>>>>>> From: "Daniel P. Berrange" <berrange at redhat.com>
>>>>>>>
>>>>>>> The following sequence
>>>>>>>
>>>>>>>  1. Define a persistent QMEU guest
>>>>>>>  2. Start the QEMU guest
>>>>>>>  3. Stop libvirtd
>>>>>>>  4. Kill the QEMU process
>>>>>>>  5. Start libvirtd
>>>>>>>  6. List persistent guets
>>>>>>>
>>>>>>> At the last step, the previously running persistent guest
>>>>>>> will be missing. This is because of a race condition in the
>>>>>>> QEMU driver startup code. It does
>>>>>>>
>>>>>>>  1. Load all VM state files
>>>>>>>  2. Spawn thread to reconnect to each VM
>>>>>>>  3. Load all VM config files
>>>>>>>
>>>>>>> Only at the end of step 3, does the 'virDomainObjPtr' get
>>>>>>> marked as "persistent". There is therefore a window where
>>>>>>> the thread reconnecting to the VM will remove the persistent
>>>>>>> VM from the list.
>>>>>>>
>>>>>>> The easy fix is to simply switch the order of steps 2 & 3.
>>>>>>>
>>>>>>> Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
>>>>>>> ---
>>>>>>>  src/qemu/qemu_driver.c | 3 +--
>>>>>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>>>>>> index c613967..9c3daad 100644
>>>>>>> --- a/src/qemu/qemu_driver.c
>>>>>>> +++ b/src/qemu/qemu_driver.c
>>>>>>> @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
>>>>>>>  
>>>>>>>      conn = virConnectOpen(cfg->uri);
>>>>>>>  
>>>>>>> -    qemuProcessReconnectAll(conn, qemu_driver);
>>>>>>> -
>>>>>>>      /* Then inactive persistent configs */
>>>>>>>      if (virDomainObjListLoadAllConfigs(qemu_driver->domains,
>>>>>>>                                         cfg->configDir,
>>>>>>> @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged,
>>>>>>>                                         NULL, NULL) < 0)
>>>>>>>          goto error;
>>>>>>>  
>>>>>>> +    qemuProcessReconnectAll(conn, qemu_driver);
>>>>>>>  
>>>>>>>      virDomainObjListForEach(qemu_driver->domains,
>>>>>>>                              qemuDomainSnapshotLoad,
>>>>>>>
>>>>>>
>>>>>> I tried testing this patch to see if it would fix:
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1015246
>>>>>>
>>>>>> from current master I did:
>>>>>>
>>>>>> git revert a924d9d083c215df6044387057c501d9aa338b96
>>>>>> reproduce the bug
>>>>>> git am <your-patch>
>>>>>>
>>>>>> But the daemon won't even start up after your patch is built:
>>>>>>
>>>>>> (gdb) bt
>>>>>> #0  qemuMonitorOpen (vm=vm at entry=0x7fffd4211090, config=0x0, json=false,
>>>>>>     cb=cb at entry=0x7fffddcae720 <monitorCallbacks>,
>>>>>>     opaque=opaque at entry=0x7fffd419b840) at qemu/qemu_monitor.c:852
>>>
>>>> Sorry for not being clear: The daemon crashes, that's the backtrace.
>>>
>>> Hmm config is NULL - does the state XML files not include the
>>> monitor info perhaps ?
>>>
>>
>> I see:
>>
>> pidfile for busted VM in /var/run/libvirt/qemu
>> nothing in /var/cache/libvirt/qemu
>> no state that I can see in /var/lib/libvirt/qemu
>>
>> But I'm not sure where it's supposed to be stored.
>>
>> FWIW reproducing this state was pretty simple: revert
>> a924d9d083c215df6044387057c501d9aa338b96, edit an existing x86 guest to remove
>> all <video> and <graphics> devices, start the guest, libvirtd crashes.
> 
> Ok, I believe you probably have SELinux disabled on your machine or in
> libvirtd. With SELinux enabled you hit another bug first
> 
> 2013-10-29 13:50:11.711+0000: 17579: error : qemuConnectMonitor:1401 : Failed to set security context for monitor for rhel6x86_64
> 
> 
> which prevents hitting the crash you report. The fix is the same in both
> cases - we must skip VMs with PID of zero. I've sent a v2 patch.
> 

Hmm, selinux is permissive here but not disabled. But I'll try your patches
and report back.

Thanks,
Cole




More information about the libvir-list mailing list