[libvirt] [PATCH 1/1] qemu: Tidy up job handling during live migration
Ján Tomko
jtomko at redhat.com
Fri Aug 1 14:04:53 UTC 2014
On 08/01/2014 03:12 AM, Sam Bobroff wrote:
> On 30/07/14 01:52, Ján Tomko wrote:
>> On 07/29/2014 02:41 AM, Sam Bobroff wrote:
>>> During a QEMU live migration several warning messages about job
>>> handling could be written to syslog on the destination host:
>>>
>>> "entering monitor without asking for a nested job is dangerous"
>>>
>>> The messages are written because the job handling during migration
>>> uses hard coded asyncJob values in several places that are incorrect.
>>>
>>> This patch passes the required asyncJob value around and prevents
>>> the warnings as well as any issues that the warnings may be referring
>>> to.
>>>
>>> Signed-off-by: Sam Bobroff <sam.bobroff at au1.ibm.com>
>>> --
>>
>> This patch seems to fix the deadlock that can happen if the migrated domain is
>> destroyed at the destination reported here:
>> https://www.redhat.com/archives/libvir-list/2014-May/msg00236.html
>>
>> It looks good to me, but it seems there are more functions calling
>> qemuDomainObjEnterMonitor that can be called from qemuProcessStart,
>> for example qemuDomainChangeGraphicsPasswords.
>
> Yes, I was fairly sure there would be other cases; I just fixed all the
> ones that actually occurred during my tests.
>
> In fact it seems like for the cases I'm looking at here, where it's the
> async job owner thread that's using the EnterMonitor functions, passing
> asyncJob around is a waste of time anyway because we know the correct
> value of asyncJob to use: it's stored in priv->job.asyncJob.
>
> Why not have qemuDomainObjEnterMonitorInternal() automatically switch to
> creating a nested job in this case?
>
> It seems easy to do and would simplify some code as well; what do you think?
We've had it that way before - it didn't work that well:
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=193cd0f3
>
>>> @@ -2505,7 +2506,7 @@ qemuDomainUpdateDeviceList(virQEMUDriverPtr driver,
>>> if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT))
>>> return 0;
>>>
>>> - qemuDomainObjEnterMonitor(driver, vm);
>>> + ignore_value(qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob));
>>
>> Also, the return value of this call could be dangerous to ignore if asyncJob
>> != NONE.
>
> True, but the patch hasn't introduced this, and the full story is even
> worse ;-)
>
> void qemuDomainObjEnterMonitor(virQEMUDriverPtr driver,
> virDomainObjPtr obj)
> {
> ignore_value(qemuDomainObjEnterMonitorInternal(driver, obj,
> QEMU_ASYNC_JOB_NONE));
> }
>
qemuDomainObjEnterMonitorInternal is called with QEMU_ASYNC_JOB_NONE here. It
always returns 0 in that case and it's safe to ignore. The problem is when you
use other async jobs:
static int
qemuDomainObjEnterMonitorInternal(virQEMUDriverPtr driver,
virDomainObjPtr obj,
qemuDomainAsyncJob asyncJob)
{
qemuDomainObjPrivatePtr priv = obj->privateData;
if (asyncJob != QEMU_ASYNC_JOB_NONE) {
int ret;
if ((ret = qemuDomainObjBeginNestedJob(driver, obj, asyncJob)) < 0)
return ret;
if (!virDomainObjIsActive(obj)) {
virReportError(VIR_ERR_OPERATION_FAILED, "%s",
_("domain is no longer running"));
/* Still referenced by the containing async job. */
ignore_value(qemuDomainObjEndJob(driver, obj));
return -1;
}
}
...
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20140801/40b26771/attachment-0001.sig>
More information about the libvir-list
mailing list