[libvirt] [PATCH 04/10] qemu: agent: handle agent connection errors in one place
John Ferlan
jferlan at redhat.com
Thu Oct 27 15:51:23 UTC 2016
On 10/27/2016 07:34 AM, Nikolay Shirokovskiy wrote:
>
>
> On 26.10.2016 22:57, John Ferlan wrote:
>>
>> There's no commit message...
>
> This is simple refactoring.
>
Not really...
A simple refactor would have kept the -2 logic.
What's been done here is to let qemuConnectAgent make the decision on
"hard" or "soft" error and only use < 0 as failure
>>
>> You're altering qemuConnectAgent to return -1 on the only error and
>> perform the VIR_WARN plus agentError = true on other "soft" failures.
>>
>> Not exactly nothing going on!
>
> This is what already present in code, I've just moved soft failures
> in qemuConnectAgent.
>
>>
>> On 10/04/2016 02:56 AM, Nikolay Shirokovskiy wrote:
>>> ---
>>> src/qemu/qemu_driver.c | 8 +------
>>> src/qemu/qemu_migration.c | 13 ++---------
>>> src/qemu/qemu_process.c | 58 ++++++++++++-----------------------------------
>>> 3 files changed, 17 insertions(+), 62 deletions(-)
>>>
>>
>> This idea seems to have merit too - something that I've thought now that
>> I've read the first 3 patches...
>>
>> I think you should have started with this patch, it probably would have
>> made the others easier to think about. In fact, I'm curious if with just
>> this change and the suggestion in patch 3 for clearing agentError
>> whether you can reproduced the bug you're trying to fix/resolve without
>> any of the other patches.
>>
>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>> index 12ddbc0..edff973 100644
>>> --- a/src/qemu/qemu_driver.c
>>> +++ b/src/qemu/qemu_driver.c
>>> @@ -4397,7 +4397,6 @@ processSerialChangedEvent(virQEMUDriverPtr driver,
>>> virObjectEventPtr event = NULL;
>>> virDomainDeviceDef dev;
>>> qemuDomainObjPrivatePtr priv = vm->privateData;
>>> - int rc;
>>>
>>> if (connected)
>>> newstate = VIR_DOMAIN_CHR_DEVICE_STATE_CONNECTED;
>>> @@ -4450,13 +4449,8 @@ processSerialChangedEvent(virQEMUDriverPtr driver,
>>>
>>> if (STREQ_NULLABLE(dev.data.chr->target.name, "org.qemu.guest_agent.0")) {
>>> if (newstate == VIR_DOMAIN_CHR_DEVICE_STATE_CONNECTED) {
>>> - if (!priv->agent) {
>>> - if ((rc = qemuConnectAgent(driver, vm)) == -2)
>>> + if (!priv->agent && qemuConnectAgent(driver, vm) < 0)
>>
>> FWIW:
>> qemuConnectAgent returns 0 "if (priv->agent)", so there's one less check
>> here too. Could just be "if (qemuConnectAgent(driver, vm) < 0)"
>
> It depends on patch 3. If we clear the error early in qemuConnectAgent then
> we'd better check before calling this function.
>
Right my mind was already there or more succinctly said - I was trying
to consider this as the first patch as I think it'll make the reasoning
a bit more clear.
Since the "issue" has been described now as a pure shutdown and restart
after error problem without needing to consider the migration,
reconnect, and attach separately - it's easier to focus on.
So my feeling now after reading and thinking a bit about everything so
far is that this patch could go in (mostly) as is. That way there are
then 2 places to set agentError=true and two places to set
agentError=false (discluding original allocation and reconnect paths).
I don't think we can "combine" the Error and EOF paths. They're
separate. What happens if some day someone uses the VSERPORT logic to
somehow restart an agent after an error, after a close, and before a
shutdown. IOW: Some kind of error recovery logic.
In order to handle the issue where agentError is set, but we cannot
start again (e.g. the have error, do shutdown, do start path) - add an
unconditional priv->agentError = false before in qemuConnectAgent before
the virSecurityManagerSetDaemonSocketLabel call. The other agentError =
false settings can then be removed.
So we're left with 2 places to set error and one place to clear. I'll
continue to consider this, bug figured I'd get out a response to all
this sooner than later.
I'm less convinced mucking with combining EOF and Error is a good idea
in patches 7-10, but I haven't thought too much about it either.
John
>>
>>> goto endjob;
>>
>> The indention for this is off (remove leading 4 spaces)
>>
>>> -
>>> - if (rc < 0)
>>> - priv->agentError = true;
>>> - }
>>> } else {
>>> if (priv->agent) {
>>> qemuAgentClose(priv->agent);
>>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>>> index e2ca330..0a02236 100644
>>> --- a/src/qemu/qemu_migration.c
>>> +++ b/src/qemu/qemu_migration.c
>>> @@ -6171,7 +6171,6 @@ qemuMigrationFinish(virQEMUDriverPtr driver,
>>> unsigned short port;
>>> unsigned long long timeReceived = 0;
>>> virObjectEventPtr event;
>>> - int rc;
>>> qemuDomainJobInfoPtr jobInfo = NULL;
>>> bool inPostCopy = false;
>>> bool doKill = true;
>>> @@ -6244,16 +6243,8 @@ qemuMigrationFinish(virQEMUDriverPtr driver,
>>> QEMU_ASYNC_JOB_MIGRATION_IN) < 0)
>>> goto endjob;
>>>
>>> - if ((rc = qemuConnectAgent(driver, vm)) < 0) {
>>> - if (rc == -2)
>>> - goto endjob;
>>> -
>>> - VIR_WARN("Cannot connect to QEMU guest agent for %s",
>>> - vm->def->name);
>>> - virResetLastError();
>>> - priv->agentError = true;
>>> - }
>>> -
>>> + if (qemuConnectAgent(driver, vm) < 0)
>>> + goto endjob;
>>>
>>> if (flags & VIR_MIGRATE_PERSIST_DEST) {
>>> if (qemuMigrationPersist(driver, vm, mig, !v3proto) < 0) {
>>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>>> index 42f7f84..d7c9ce3 100644
>>> --- a/src/qemu/qemu_process.c
>>> +++ b/src/qemu/qemu_process.c
>>> @@ -206,7 +206,6 @@ int
>>> qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm)
>>> {
>>> qemuDomainObjPrivatePtr priv = vm->privateData;
>>> - int ret = -1;
>>> qemuAgentPtr agent = NULL;
>>> virDomainChrDefPtr config = qemuFindAgentConfig(vm->def);
>>>
>>> @@ -252,8 +251,7 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm)
>>> qemuAgentClose(agent);
>>> virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>>> _("guest crashed while connecting to the guest agent"));
>>> - ret = -2;
>>> - goto cleanup;
>>> + return -1;
>>> }
>>>
>>> if (virSecurityManagerClearSocketLabel(driver->securityManager,
>>> @@ -264,18 +262,16 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm)
>>> goto cleanup;
>>> }
>>>
>>> -
>>> priv->agent = agent;
>>>
>>> - if (priv->agent == NULL) {
>>> - VIR_INFO("Failed to connect agent for %s", vm->def->name);
>>
>> Interesting a "marker" of sorts to know the virAgentOpen "failed" is
>> that we'd get an Informational "Failed to connect..." followed shortly
>> thereafter by a Warning "Cannot connect..." (depending of course on
>> one's message display severity level).
>>
>> I think if you "restore" this without the goto cleanup (below) and of
>> course the removal of the {}, then at least message parity is
>> achieved... I suppose it could move up into the "if (agent == NULL)"
>> too, but then it could be given even though an Error is reported.
>>
>> It's not that important, but it does leave some breadcrumbs.
>
> Agree. Even though I'm still wondering how useful this message is as
> this patch is intended to be refactoring let's keep this message.
>
>>
>> Again I'd like to see the breadcrumbs with just this patch applied to
>> reproduced what you're chasing in patches 1-4...
>>
>
> Sorry, not possible. The situation I've described in patch 1 is based on analysis. However
> after applying the patches 1-3 the bug is not occured in our test system anymoree.
>
> Nikolay
>
More information about the libvir-list
mailing list