[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 1/1] qemu: Tidy up job handling during live migration



On 08/01/2014 03:12 AM, Sam Bobroff wrote:
> On 30/07/14 01:52, Ján Tomko wrote:
>> On 07/29/2014 02:41 AM, Sam Bobroff wrote:
>>> During a QEMU live migration several warning messages about job
>>> handling could be written to syslog on the destination host:
>>>
>>> "entering monitor without asking for a nested job is dangerous"
>>>
>>> The messages are written because the job handling during migration
>>> uses hard coded asyncJob values in several places that are incorrect.
>>>
>>> This patch passes the required asyncJob value around and prevents
>>> the warnings as well as any issues that the warnings may be referring
>>> to.
>>>
>>> Signed-off-by: Sam Bobroff <sam bobroff au1 ibm com>
>>> --
>>
>> This patch seems to fix the deadlock that can happen if the migrated domain is
>> destroyed at the destination reported here:
>> https://www.redhat.com/archives/libvir-list/2014-May/msg00236.html
>>
>> It looks good to me, but it seems there are more functions calling
>> qemuDomainObjEnterMonitor that can be called from qemuProcessStart,
>> for example qemuDomainChangeGraphicsPasswords.
> 
> Yes, I was fairly sure there would be other cases; I just fixed all the
> ones that actually occurred during my tests.
> 
> In fact it seems like for the cases I'm looking at here, where it's the
> async job owner thread that's using the EnterMonitor functions, passing
> asyncJob around is a waste of time anyway because we know the correct
> value of asyncJob to use: it's stored in priv->job.asyncJob.
> 
> Why not have qemuDomainObjEnterMonitorInternal() automatically switch to
> creating a nested job in this case?
> 
> It seems easy to do and would simplify some code as well; what do you think?

We've had it that way before - it didn't work that well:
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=193cd0f3

> 
>>> @@ -2505,7 +2506,7 @@ qemuDomainUpdateDeviceList(virQEMUDriverPtr driver,
>>>      if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT))
>>>          return 0;
>>>  
>>> -    qemuDomainObjEnterMonitor(driver, vm);
>>> +    ignore_value(qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob));
>>
>> Also, the return value of this call could be dangerous to ignore if asyncJob
>> != NONE.
> 
> True, but the patch hasn't introduced this, and the full story is even
> worse ;-)
> 
>  void qemuDomainObjEnterMonitor(virQEMUDriverPtr driver,
>                                virDomainObjPtr obj)
>  {
>     ignore_value(qemuDomainObjEnterMonitorInternal(driver, obj,
>                                                    QEMU_ASYNC_JOB_NONE));
>  }
> 

qemuDomainObjEnterMonitorInternal is called with QEMU_ASYNC_JOB_NONE here. It
always returns 0 in that case and it's safe to ignore. The problem is when you
use other async jobs:

static int
qemuDomainObjEnterMonitorInternal(virQEMUDriverPtr driver,
                                  virDomainObjPtr obj,
                                  qemuDomainAsyncJob asyncJob)
{
    qemuDomainObjPrivatePtr priv = obj->privateData;

    if (asyncJob != QEMU_ASYNC_JOB_NONE) {
        int ret;
        if ((ret = qemuDomainObjBeginNestedJob(driver, obj, asyncJob)) < 0)
            return ret;
        if (!virDomainObjIsActive(obj)) {
            virReportError(VIR_ERR_OPERATION_FAILED, "%s",
                           _("domain is no longer running"));
            /* Still referenced by the containing async job.  */
            ignore_value(qemuDomainObjEndJob(driver, obj));
            return -1;
        }
    }
...

Jan

Attachment: signature.asc
Description: OpenPGP digital signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]