[libvirt] [PATCH] qemu: Resolve agent deadlock

Mon Oct 26 21:28:57 UTC 2015

On 10/26/2015 11:37 AM, Peter Krempa wrote:
> On Mon, Oct 26, 2015 at 11:06:21 -0400, John Ferlan wrote:
>> If we have a shutdown of a VM by a qemu agent while an agent EOF occurs
>> at relatively the same time, it's possible that a deadlock occurs depending
>> on what happens first.
>>
>> Assume qemuProcessHandleAgentEOF runs first, with the vm->lock it clears
>> priv->agent, then clears the vm->lock.
> 
> Couldn't we make sure, that qemuProcessHandleAgentEOF clears priv->agent
> if and only if it removed the last reference?

reference to?  agent?  vm?  via?

qemuConnectAgent/qemuAgentOpen takes out a reference to the agent. That
is un-referenced by qemuAgentClose. The EnterAgent takes out a reference
which is un-referenced during ExitAgent. Adding a new reference just for
EOF processing doesn't solve the problem.

EOF doesn't do an agent-unlock, it does a vm-lock/unlock, and calls
qemuAgentClose which will attempt an agent-lock. The agent may be locked
by some other thread and it needs to wait until that reference is cleared.

Adding some sort of unref-agent check logic in EOF similar to ExitAgent
feels like it'll cause other issues. It seems "backwards" to remove the
last reference and avoid everything else that qemuCloseAgent does.

I think logically if some sort of unref-agent check logic was added,
then qemuCloseAgent could not be called from either EOF or ExitAgent.
>From EOF, if the current thread was the last one, then the agent is
free'd so qemuCloseAgent shouldn't be called. If the current thread
wasn't the last one, then we'd have to wait for the last reference check
in ExitAgent, but once that is done, the agent is freed and thus
qemuCloseAgent shouldn't be called.

> [...]
> 
>> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
>> index 890d8ed..a908df8 100644
>> --- a/src/qemu/qemu_domain.c
>> +++ b/src/qemu/qemu_domain.c
>> @@ -1868,19 +1868,20 @@ qemuDomainObjEnterAgent(virDomainObjPtr obj)
>>   * Should be paired with an earlier qemuDomainObjEnterAgent() call
>>   */
>>  void
>> -qemuDomainObjExitAgent(virDomainObjPtr obj)
>> +qemuDomainObjExitAgent(virDomainObjPtr obj,
>> +                       qemuAgentPtr agent)
> 
> You would not need to modify this prototype ...
> 
>>  {
>>      qemuDomainObjPrivatePtr priv = obj->privateData;
>>      bool hasRefs;
>>  
>> -    hasRefs = virObjectUnref(priv->agent);
>> +    hasRefs = virObjectUnref(agent);
>>  
>>      if (hasRefs)
>> -        virObjectUnlock(priv->agent);
>> +        virObjectUnlock(agent);
>>  
>>      virObjectLock(obj);
>> -    VIR_DEBUG("Exited agent (agent=%p vm=%p name=%s)",
>> -              priv->agent, obj, obj->def->name);
>> +    VIR_DEBUG("Exited agent (agent=%p vm=%p name=%s hasRefs=%d)",
>> +              agent, obj, obj->def->name, hasRefs);
>>  
>>      priv->agentStart = 0;
>>      if (!hasRefs)
> 
> 
>>  void qemuDomainObjEnterRemote(virDomainObjPtr obj)
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index a2cc002..b8c9ff7 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -1994,9 +1994,10 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags)
>>      qemuDomainSetFakeReboot(driver, vm, isReboot);
>>  
>>      if (useAgent) {
>> +        qemuAgentPtr agent = priv->agent;
>>          qemuDomainObjEnterAgent(vm);
>> -        ret = qemuAgentShutdown(priv->agent, agentFlag);
>> -        qemuDomainObjExitAgent(vm);
>> +        ret = qemuAgentShutdown(agent, agentFlag);
>> +        qemuDomainObjExitAgent(vm, agent);
> 
> ... and could avoid this rather ugly code here. The result would be IMO
> the same.
> 

So IYO is it reasonable to access priv->agent even though we don't own
the vm-lock once EnterAgent returns?

John