[libvirt] [PATCH REBASE 2/5] qemu: monitor: set error flag even in OOM conditions

Fri May 4 13:45:34 UTC 2018

On 05/03/2018 03:35 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 01.05.2018 01:03, John Ferlan wrote:
>>
>>
>> On 04/18/2018 10:44 AM, Nikolay Shirokovskiy wrote:
>>> lastError.code is used as flag in qemu monitor. Unfortunately
>>> due to temporary OOM conditions (very unlikely through as thread local
>>> error is allocated on first use) we can fail to set this flag
>>> in case of monitor eofs/errors. This can cause hangs.
>>>
>>> Let's make sure flag is always set in case eofs/errors.
>>>
>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy at virtuozzo.com>
>>> ---
>>>  src/qemu/qemu_monitor.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
>>> index f642d9a..906a320 100644
>>> --- a/src/qemu/qemu_monitor.c
>>> +++ b/src/qemu/qemu_monitor.c
>>> @@ -757,6 +757,11 @@ qemuMonitorIO(int watch, int fd, int events, void *opaque)
>>>                  virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>>>                                 _("Error while processing monitor IO"));
>>>              virCopyLastError(&mon->lastError);
>>> +
>>> +            /* set error code if due to OOM conditions we fail to set it before */
>>> +            if (mon->lastError.code == VIR_ERR_OK)
>>
>> So this can only happen if virCopyLastError fails, right?  And code == 0
> 
> Nope, virCopyLastError fail to set code field only in one case - if last
> error is NULL which can be only if in next lines even reporting fails:
> 
>             virErrorPtr err = virGetLastError();                                
>             if (!err)                                                           
>                 virReportError(VIR_ERR_INTERNAL_ERROR, "%s",                    
>                                _("Error while processing monitor IO"));     
> 

I read the code as if pthread_getspecific in virThreadLocalGet returns
NULL, then virLastErrorObject will try to VIR_ALLOC_QUIET and return
NULL if it fails. If that succeeds, then virThreadLocalSet is called to
pthread_setspecific and if it fails, then @err (from VIR_ALLOC_QUIET) is
VIR_FREE'd and a NULL is returned.

So NULL could be returned if VIR_ALLOC_QUIET or virThreadLocalSet fail.

>> because of memset(..., 0, ...).  Wouldn't you only be working around the
>> issue for one of the callers? and only setting *.code and not *.domain?
> 
> We use mon->lastError for 2 purposes:
> 
> - as flag, and it is critical that we set this flag appropriately. We
>   use only code field for this purpuse

Understood, but still ->domain would be 0; however, if you set it to
something (such as VIR_FROM_THREAD), then perhaps that too could be used
as a marker of sorts (e.g. code == value, thread == value, and message
== NULL means something fairly specific).

> 
> - as error message that client can retrieve in case if API call returns
>   error, it is no critical if we can not provide error message, moreover
>   we can fail to pass error message in RPC for example due to temporary OOM
>   conditions
> 
> I even think to remove flag usage from mon->lastError and introduce bool
> flag but this looks ugly like 2 fields to keep error. 
> 

In any case, perhaps you misinterpreted my remark. Only adjusting this
one caller fixes one condition, but if there are possibly more then
changing virCopyLastError instead of here would allow those others to
also have the adjustment...

>>
>> BTW: I feel we've been this way in rather recent history... See commit
>> 'e711a391' - I recall we had varying opinions on that one.
> 
> Yes, but this fix does not relate to usage mon->lastError as flag, rather
> then to second usage in the above distinction.
> 
>>
>> Anyway, assuming CopyLastError is the issue, then perhaps the right
>> place to fix this is there and I would think at least the following
>> would need to be set in the error path:
>>
>>    to->code = VIR_ERR_NO_MEMORY;
>>    to->domain = VIR_FROM_THREAD;
> 
> It can be misleading. For example if we forget to set last error
> and then try to copy it.
> 
>>
>> We cannot create a to->message...
>>
>> Of course, how can we be "sure" that the failure is because the
>> VIR_ALLOC_QUIET in virLastErrorObject caused the issue other than
>> changing it's return values or passing a "bool &memfail" that's only set
>> when the VIR_ALLOC_QUIET fails allowing the caller to decide what to do.
>> It's local, so change is limited. Thus virGetLastErrorMessage and
>> virCopyLastError could make use of the returned, while other callers
>> could just pass "&dummy".
>>
>> In the long run, it's a somewhat interesting corner case problem and we
>> may have some varying opinions over the "best way" to resolve.  Suffice
>> to say, libvirt probably isn't going to "survive" much longer, but
>> unless others read this response and have varying opinions and/or we
>> come to a resolution - can this be posted/consider on it's own merits?> 
>> Tks -
>>
>> John
> 
> May be it would be just more simple to not to use mon->lastError as flag
> in monitor.
> 

I think it's still possible. You've found two conditions where usage was
problematic (low memory and failure to set thread value).  So the patch
isn't without merit.

John

>>
>>> +                mon->lastError.code = VIR_ERR_INTERNAL_ERROR;
>>> +
>>>              virResetLastError();
>>>          }
>>>  
>>>