[libvirt] [PATCH 2/2] network: better log message when network is inactive during reconnect

Thu Apr 27 13:33:19 UTC 2017

On 04/27/2017 09:06 AM, John Ferlan wrote:
> 
> 
> On 04/26/2017 05:57 PM, Laine Stump wrote:
>> On 04/26/2017 02:39 PM, John Ferlan wrote:
>>>
>>>
>>> On 04/25/2017 12:34 PM, Laine Stump wrote:
>>>> If the network isn't active during networkNotifyActualDevice(), we
>>>> would log an error message stating that the bridge device didn't
>>>> exist. This patch adds a check to see if the network is active, making
>>>> the logs more useful in the case that it isn't.
>>>>
>>>> Partially resolves: https://bugzilla.redhat.com/1442700
>>>> ---
>>>>  src/network/bridge_driver.c | 7 +++++++
>>>>  1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
>>>> index d2d8557..e06f81b 100644
>>>> --- a/src/network/bridge_driver.c
>>>> +++ b/src/network/bridge_driver.c
>>>> @@ -4676,6 +4676,13 @@ networkNotifyActualDevice(virDomainDefPtr dom,
>>>>      }
>>>>      netdef = network->def;
>>>>  
>>>> +    if (!virNetworkObjIsActive(network)) {
>>>> +        virReportError(VIR_ERR_OPERATION_INVALID,
>>>> +                       _("network '%s' is not active"),
>>>> +                       netdef->name);
>>>> +        goto error;
>>>> +    }
>>>> +
>>>
>>> /me wonders whether this should just a goto cleanup - IOW: if the
>>> network isn't active, so what, why error. Once someone attempts to start
>>> it, they'll get errors I assume...
>>>
>>> Not that goto error or cleanup matters since commit id '4fee4e0' changed
>>> the goto cleanup to goto error and added:
>>>
>>> +
>>> +error:
>>> +    goto cleanup;
>>
>> That's missing the point of that commit. "cleanup:" is there only as a
>> place for error: to goto the cleanup code that is common to both the
>> "success:" exit and the "error:" exit (and the three labels are all
>> there so that that this function is structured similarly to
>> networkAllocateActualDevice() - I wanted them to be as close to the same
>> as possible). In the body of the function you either declare the
>> allocate/notification a success (with "goto success") in which case the
>> connect count for the network goes up, or you declare it a failure (with
>> "goto error") in which case the connect count for the network remains
>> unchanged, but you never directly goto the noncommittal "cleanup:".
>>
> 
> OK understood - it's just one of those odd looking constructs to have a
> goto error and goto cleanup immediately following it, but your reasoning
> for having the logic this way is understandable.
> 
>>>
>>> I guess I don't have the answer readily available in my head as to how
>>> much of the subsequent code would be called at network start time?
>>
>> None of this is called when the network is started - we have no way for
>> the networkStart() function to learn which interfaces in which domains
>> are supposed to be connected to a particular network. That needs more
>> infrastructure change than I wanted to put in right now (we need some
>> sort of event or callback that will allow the network driver to notify
>> all active hypervisor drivers whenever a network is started (or
>> destroyed?), giving the hypervisor driver a chance to call
>> networkNotifyActualDevice() for any affected domain interface).
>>
>> So have I explained myself well enough?
>>
> 
> Sure, but for my context...
> 
> I guess I was putting myself in the mindset of a libvirtd reconnect
> where "who knows" what was done with each network prior to reconnection
> and this is the discovery of that. For some paths it's a hard death
> issue - the physical connection had problems or went away. For other
> paths it's more of a soft or state issue - someone stopped/destroyed,
> but didn't undefine a network.  I was thinking more that in this latter
> case, does an error make sense?

I think it makes sense to log an error, but it doesn't make sense to
kill the guest. I think it would only make sense to kill the guest if an
error condition led to 1) a guest that was completely non-functioning
and/or unreachable anyway, so it would eventually need to be killed
anyway no matter what you tried to do, or 2) the guest being left
exposed/vulnerable in some serious way. Neither of those is the case here.