[libvirt] [PATCH 04/10] qemu: agent: handle agent connection errors in one place

John Ferlan jferlan at redhat.com
Thu Oct 27 15:51:23 UTC 2016



On 10/27/2016 07:34 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 26.10.2016 22:57, John Ferlan wrote:
>>
>> There's no commit message...
> 
> This is simple refactoring.
> 

Not really...

A simple refactor would have kept the -2 logic.

What's been done here is to let qemuConnectAgent make the decision on
"hard" or "soft" error and only use < 0 as failure

>>
>> You're altering qemuConnectAgent to return -1 on the only error and
>> perform the VIR_WARN plus agentError = true on other "soft" failures.
>>
>> Not exactly nothing going on!
> 
> This is what already present in code, I've just moved soft failures
> in qemuConnectAgent.
> 
>>
>> On 10/04/2016 02:56 AM, Nikolay Shirokovskiy wrote:
>>> ---
>>>  src/qemu/qemu_driver.c    |  8 +------
>>>  src/qemu/qemu_migration.c | 13 ++---------
>>>  src/qemu/qemu_process.c   | 58 ++++++++++++-----------------------------------
>>>  3 files changed, 17 insertions(+), 62 deletions(-)
>>>
>>
>> This idea seems to have merit too - something that I've thought now that
>> I've read the first 3 patches...
>>
>> I think you should have started with this patch, it probably would have
>> made the others easier to think about. In fact, I'm curious if with just
>> this change and the suggestion in patch 3 for clearing agentError
>> whether you can reproduced the bug you're trying to fix/resolve without
>> any of the other patches.
>>
>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>> index 12ddbc0..edff973 100644
>>> --- a/src/qemu/qemu_driver.c
>>> +++ b/src/qemu/qemu_driver.c
>>> @@ -4397,7 +4397,6 @@ processSerialChangedEvent(virQEMUDriverPtr driver,
>>>      virObjectEventPtr event = NULL;
>>>      virDomainDeviceDef dev;
>>>      qemuDomainObjPrivatePtr priv = vm->privateData;
>>> -    int rc;
>>>  
>>>      if (connected)
>>>          newstate = VIR_DOMAIN_CHR_DEVICE_STATE_CONNECTED;
>>> @@ -4450,13 +4449,8 @@ processSerialChangedEvent(virQEMUDriverPtr driver,
>>>  
>>>      if (STREQ_NULLABLE(dev.data.chr->target.name, "org.qemu.guest_agent.0")) {
>>>          if (newstate == VIR_DOMAIN_CHR_DEVICE_STATE_CONNECTED) {
>>> -            if (!priv->agent) {
>>> -                if ((rc = qemuConnectAgent(driver, vm)) == -2)
>>> +            if (!priv->agent && qemuConnectAgent(driver, vm) < 0)
>>
>> FWIW:
>> qemuConnectAgent returns 0 "if (priv->agent)", so there's one less check
>> here too.  Could just be "if (qemuConnectAgent(driver, vm) < 0)"
> 
> It depends on patch 3. If we clear the error early in qemuConnectAgent then
> we'd better check before calling this function.
> 

Right my mind was already there or more succinctly said - I was trying
to consider this as the first patch as I think it'll make the reasoning
a bit more clear.

Since the "issue" has been described now as a pure shutdown and restart
after error problem without needing to consider the migration,
reconnect, and attach separately - it's easier to focus on.

So my feeling now after reading and thinking a bit about everything so
far is that this patch could go in (mostly) as is.  That way there are
then 2 places to set agentError=true and two places to set
agentError=false (discluding original allocation and reconnect paths).

I don't think we can "combine" the Error and EOF paths. They're
separate. What happens if some day someone uses the VSERPORT logic to
somehow restart an agent after an error, after a close, and before a
shutdown. IOW: Some kind of error recovery logic.

In order to handle the issue where agentError is set, but we cannot
start again (e.g. the have error, do shutdown, do start path) - add an
unconditional priv->agentError = false before in qemuConnectAgent before
the virSecurityManagerSetDaemonSocketLabel call.  The other agentError =
false settings can then be removed.

So we're left with 2 places to set error and one place to clear. I'll
continue to consider this, bug figured I'd get out a response to all
this sooner than later.

I'm less convinced mucking with combining EOF and Error is a good idea
in patches 7-10, but I haven't thought too much about it either.

John
>>
>>>                      goto endjob;
>>
>> The indention for this is off (remove leading 4 spaces)
>>
>>> -
>>> -                if (rc < 0)
>>> -                    priv->agentError = true;
>>> -            }
>>>          } else {
>>>              if (priv->agent) {
>>>                  qemuAgentClose(priv->agent);
>>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>>> index e2ca330..0a02236 100644
>>> --- a/src/qemu/qemu_migration.c
>>> +++ b/src/qemu/qemu_migration.c
>>> @@ -6171,7 +6171,6 @@ qemuMigrationFinish(virQEMUDriverPtr driver,
>>>      unsigned short port;
>>>      unsigned long long timeReceived = 0;
>>>      virObjectEventPtr event;
>>> -    int rc;
>>>      qemuDomainJobInfoPtr jobInfo = NULL;
>>>      bool inPostCopy = false;
>>>      bool doKill = true;
>>> @@ -6244,16 +6243,8 @@ qemuMigrationFinish(virQEMUDriverPtr driver,
>>>                                        QEMU_ASYNC_JOB_MIGRATION_IN) < 0)
>>>          goto endjob;
>>>  
>>> -    if ((rc = qemuConnectAgent(driver, vm)) < 0) {
>>> -        if (rc == -2)
>>> -            goto endjob;
>>> -
>>> -        VIR_WARN("Cannot connect to QEMU guest agent for %s",
>>> -                 vm->def->name);
>>> -        virResetLastError();
>>> -        priv->agentError = true;
>>> -    }
>>> -
>>> +    if (qemuConnectAgent(driver, vm) < 0)
>>> +        goto endjob;
>>>  
>>>      if (flags & VIR_MIGRATE_PERSIST_DEST) {
>>>          if (qemuMigrationPersist(driver, vm, mig, !v3proto) < 0) {
>>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>>> index 42f7f84..d7c9ce3 100644
>>> --- a/src/qemu/qemu_process.c
>>> +++ b/src/qemu/qemu_process.c
>>> @@ -206,7 +206,6 @@ int
>>>  qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm)
>>>  {
>>>      qemuDomainObjPrivatePtr priv = vm->privateData;
>>> -    int ret = -1;
>>>      qemuAgentPtr agent = NULL;
>>>      virDomainChrDefPtr config = qemuFindAgentConfig(vm->def);
>>>  
>>> @@ -252,8 +251,7 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm)
>>>          qemuAgentClose(agent);
>>>          virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>>>                         _("guest crashed while connecting to the guest agent"));
>>> -        ret = -2;
>>> -        goto cleanup;
>>> +        return -1;
>>>      }
>>>  
>>>      if (virSecurityManagerClearSocketLabel(driver->securityManager,
>>> @@ -264,18 +262,16 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm)
>>>          goto cleanup;
>>>      }
>>>  
>>> -
>>>      priv->agent = agent;
>>>  
>>> -    if (priv->agent == NULL) {
>>> -        VIR_INFO("Failed to connect agent for %s", vm->def->name);
>>
>> Interesting a "marker" of sorts to know the virAgentOpen "failed" is
>> that we'd get an Informational "Failed to connect..." followed shortly
>> thereafter by a Warning "Cannot connect..." (depending of course on
>> one's message display severity level).
>>
>> I think if you "restore" this without the goto cleanup (below) and of
>> course the removal of the {}, then at least message parity is
>> achieved...  I suppose it could move up into the "if (agent == NULL)"
>> too, but then it could be given even though an Error is reported.
>>
>> It's not that important, but it does leave some breadcrumbs.
> 
> Agree. Even though I'm still wondering how useful this message is as
> this patch is intended to be refactoring let's keep this message.
> 
>>
>> Again I'd like to see the breadcrumbs with just this patch applied to
>> reproduced what you're chasing in patches 1-4...
>>
> 
> Sorry, not possible. The situation I've described in patch 1 is based on analysis. However
> after applying the patches 1-3 the bug is not occured in our test system anymoree.
> 
> Nikolay
> 




More information about the libvir-list mailing list