[libvirt] [BUG] qemu quited unexpectedly will cause libvirtd crashed
Eric Blake
eblake at redhat.com
Thu Mar 31 18:54:02 UTC 2011
On 03/31/2011 12:42 AM, Wen Congyang wrote:
>>> So I try to use gdb and add sleep() to trigger this bug. I have posted two patches
>>> to fix 2 bugs. But there is still another bug, and I have no good way to fix it.
>>
>>> Steps to reproduce this bug:
>>> 1. use gdb to attach libvirtd, and set a breakpoint in the function
>>> qemuConnectMonitor()
>>> 2. start a vm
>>> 3. let the libvirtd to run until qemuMonitorSetCapabilities() returns.
>>> 4. kill the qemu process
>>> 5. step into qemuDomainObjExitMonitorWithDriver(), and set debug to 1
>>>
>>> Now, qemuDomainObjExitMonitorWithDriver() will sleep 100s to make sure
>>> qemuProcessHandleMonitorEOF() is done before qemuProcessHandleMonitorEOF()
>>> returns.
>>>
>>> priv->mon will be null after qemuDomainObjExitMonitorWithDriver() returns.
>>> So we must not use it. Unfortunately we still use it, and it will cause
>>> libvirtd crashed.
>>
>> Sounds like qemuConnectMonitor needs an extra reference around priv->mon
>> for the duration of the connect attempt, so that
>> qemuProcessHandleMonitorEOF will not free the monitor.
>
> No, qemuConnectMonitor() calls qemuDomainObjEnterMonitorWithDriver() to hold
> an extra reference around priv->mon, and release it in qemuDomainObjExitMonitorWithDriver().
> But qemuDomainObjExitMonitorWithDriver() does not tell the caller whether
> priv->mon can be used.
>
> If we will call qemuDomainObjEnterMonitorWithDriver()/qemuDomainObjEnterMonitor(),
> I think we must check whether the domain is active. If we call them more than once,
> we must check it every time. And we should not do other things between checking whether
> the domain is active and calling them(If we do as this, the code can be maintained easily)
I think I hit on the same problem earlier, of a guest modifying vm state
on assumption that a successful monitor command even if the vm went down
in the window after the monitor command completed but before lock was
regained:
https://www.redhat.com/archives/libvir-list/2011-March/msg00636.html
>
> But some codes does not respect this rule.
>
> Here is the list of the functions that may have the similar problem:
> 1. qemu_migration.c
> doNativeMigrate()
> qemuMigrationToFile()
My current thoughts are that maybe qemuDomainObjExitMonitor should be
made to return the value of vm active after regaining lock, and marked
ATTRIBUTE_RETURN_CHECK, to force all other callers to detect the case of
a monitor command completing successfully but then the VM disappearing
in the window between command completion and regaining the vm lock.
--
Eric Blake eblake at redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 619 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20110331/d471dc40/attachment-0001.sig>
More information about the libvir-list
mailing list