[libvirt] [PATCH v3 2/6] fdstream: Report error from the I/O thread

Tue Jun 13 13:38:14 UTC 2017

On 06/13/2017 03:13 PM, John Ferlan wrote:
> 
> 
> On 06/13/2017 07:31 AM, Michal Privoznik wrote:
>> On 06/12/2017 11:48 PM, John Ferlan wrote:
>>>
>>>
>>> On 06/05/2017 04:22 AM, Michal Privoznik wrote:
>>>> Problem with our error reporting is that the error object is a
>>>> thread local variable. That means if there's an error reported
>>>> within the I/O thread it gets logged and everything, but later
>>>> when the event loop aborts the stream it doesn't see the original
>>>> error. So we are left with some generic error. We can do better
>>>> if we copy the error message between the threads.
>>>>
>>>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>>>> ---
>>>>  daemon/stream.c        | 18 ++++++++++++------
>>>>  src/util/virfdstream.c |  9 ++++++---
>>>>  2 files changed, 18 insertions(+), 9 deletions(-)
>>>>
>>>
>>> Perhaps I'm lost in weeds of this code, but I thought the magic of error
>>> message details for the threads was handled through rerr (and
>>> virNetMessageSaveError).
>>
>> Not really. When you call virReportError()/virReportSystemError() these
>> functions:
>>
>> 1) fetch thread local error object
>> 2) if it doesn't exist they create one
>> 3) copy the error message and couple of other things into the object
>>
>> Therefore, every thread has its own error object and unless two threads
>> share a pointer, they cannot see each others error object. Now, we have
>> two threads when it comes to server side of streams: the event loop
>> (which sends data to the client), and I/O thread. Should something go
>> wrong in the latter, we need to save the error object somewhere so that
>> the former can pick it up.
>>
> 
> Right, I'm running through the weeds of error messages.
> 
>>>
>>> Still, based on later patches which seem to care about the value of
>>> errno, it interesting this one removes an errno setting and replaces it
>>> with an error message.
>>
>> With a pointer. That way we can still check if it is set or not. Prior
>> to this patch, threadErr is an int, and checks are: "if
>> (!fdst->threadErr)". After that threadErr is a pointer so that the
>> checks can stay the same.
>>
> 
> Right because coding practices have always been that if "(!0)" is OK;
> whereas, the check could have been "if (fdst->threadErr != 0)".
> 
>>>
>>> Maybe the answer is rather than replacing threadErr it should be
>>> 'augmented' with a 'threadErrMsg'.
>>
>> Why? Prior to this patch threadErr despite being type of was used as a
>> boolean really. What would be the benefits of having both int threadErr;
>> and virErrorPtr trheadErrMsg;?
>>
> 
> I read patch6 and started down the path of thinking which thread am I in
> and if saving/fetching errno was important for those callback functions
> - would it be something necessary to save in the fdst as well in
> addition to saving the last error message.

That is something different. Errno set by callbacks from patch 6 is on
the client side. The callbacks are used exclusively by client as it only
uses StreamRecvAll/StreamSendAll wrappers.

This errno is set on the server side by the I/O thread.

> Then of course the what if
> virSaveLastError returned NULL for some reason in virFDStreamThread, but
> errno was set.  Considering all the possibilities.

Well, the only case when virSaveLastError can return NULL is OOM. True,
we will not see it here, but something else will for sure. Then again,
on Linux you won't see malloc() failing. The program gets terminated
instead.

> 
>>>
>>>> diff --git a/daemon/stream.c b/daemon/stream.c
>>>> index 1d5b50ad7..5077ac8b0 100644
>>>> --- a/daemon/stream.c
>>>> +++ b/daemon/stream.c
>>>> @@ -231,17 +231,23 @@ daemonStreamEvent(virStreamPtr st, int events, void *opaque)
>>>>          int ret;
>>>>          virNetMessagePtr msg;
>>>>          virNetMessageError rerr;
>>>> +        virErrorPtr origErr = virSaveLastError();
>>>>  
>>>>          memset(&rerr, 0, sizeof(rerr));
>>>>          stream->closed = true;
>>>>          virStreamEventRemoveCallback(stream->st);
>>>>          virStreamAbort(stream->st);
>>>> -        if (events & VIR_STREAM_EVENT_HANGUP)
>>>> -            virReportError(VIR_ERR_RPC,
>>>> -                           "%s", _("stream had unexpected termination"));
>>>> -        else
>>>> -            virReportError(VIR_ERR_RPC,
>>>> -                           "%s", _("stream had I/O failure"));
>>>> +        if (origErr && origErr->code != VIR_ERR_OK) {
>>>> +            virSetError(origErr);
>>>> +            virFreeError(origErr);
>>>> +        } else {
>>>> +            if (events & VIR_STREAM_EVENT_HANGUP)
>>>> +                virReportError(VIR_ERR_RPC,
>>>> +                               "%s", _("stream had unexpected termination"));
>>>> +            else
>>>> +                virReportError(VIR_ERR_RPC,
>>>> +                               "%s", _("stream had I/O failure"));
>>>> +        }
>>>
>>> This seems fine - display the error that we got from some lower spot if
>>> it exists; otherwise,
>>>
>>> Might be important to note the saving of the error has a lot to do with
>>> virStreamAbort's clearing.
>>
>> Well, we don't have comment on other places like this, but okay, I can
>> add a comment that says "save original error in case something resets
>> its later", or something among those lines.
>>
> 
> A commit message noting would be fine. This only pops into my
> consciousness because of patch 4.
> 
>>>
>>> At the very least this is an error in our processing vs. something
>>> within the stream, correct? Or is that the thicket of weeds I'm lost in.
>>>
>>>>  
>>>>          msg = virNetMessageNew(false);
>>>>          if (!msg) {
>>>> diff --git a/src/util/virfdstream.c b/src/util/virfdstream.c
>>>> index cd24757e6..5b59765fa 100644
>>>> --- a/src/util/virfdstream.c
>>>> +++ b/src/util/virfdstream.c
>>>> @@ -106,7 +106,7 @@ struct virFDStreamData {
>>>>      /* Thread data */
>>>>      virThreadPtr thread;
>>>>      virCond threadCond;
>>>> -    int threadErr;
>>>> +    virErrorPtr threadErr;
>>>>      bool threadQuit;
>>>>      bool threadAbort;
>>>>      bool threadDoRead;
>>>> @@ -123,6 +123,7 @@ virFDStreamDataDispose(void *obj)
>>>>      virFDStreamDataPtr fdst = obj;
>>>>  
>>>>      VIR_DEBUG("obj=%p", fdst);
>>>> +    virFreeError(fdst->threadErr);
>>>>      virFDStreamMsgQueueFree(&fdst->msg);
>>>>  }
>>>>  
>>>> @@ -312,8 +313,10 @@ static void virFDStreamEvent(int watch ATTRIBUTE_UNUSED,
>>>>          return;
>>>>      }
>>>>  
>>>> -    if (fdst->threadErr)
>>>> +    if (fdst->threadErr) {
>>>>          events |= VIR_STREAM_EVENT_ERROR;
>>>> +        virSetError(fdst->threadErr);> +    }
>>>>  
>>>>      cb = fdst->cb;
>>>>      cbopaque = fdst->opaque;
>>>> @@ -637,7 +640,7 @@ virFDStreamThread(void *opaque)
>>>>      return;
>>>>  
>>>>   error:
>>>> -    fdst->threadErr = errno;
>>>> +    fdst->threadErr = virSaveLastError();
>>>
>>> "Typically" the processing is:
>>>
>>>     orig_err = virSaveLastError();
>>>     { do stuff };
>>>     if (orig_err) {
>>>         virSetError(orig_err);
>>>         virFreeError(orig_err);
>>>     }
>>
>> This is not typical. 
> 
> agreed!
> 
>> This hunk here (virSaveLastError()) is called from
>> within the I/O thread. And as I've explained earlier, error objects are
>> thread local. So what this hunk does is save any error reported earlier
>> so that the other hunk above (virSetError()) can set it (it runs in a
>> different threat). What's not so typical about this? I don't think that
>> we copy error objects between two threads anywhere else in the code.
>>
> 
> The only place I would think we could do that would be somewhere in
> migration code, vmagent code, or perhaps some qemu driver algorithm
> dealing with stats or snapshots.  I didn't look.

Yeah, maybe. But I haven't touched those areas in a while, therefore I
think what I think - we don't do it anywhere :-)

> 
>>>
>>> So, how do we know threadErr got something in return and if it didn't
>>> we've messed up "other" algorithms. In particular, the lines in the
>>> previous hunk won't set 'events' any more.
>>
>> How come? threadErr is set if and only if the I/O thread finished with
>> an error.
>>
> 
> virSaveLastError can return NULL even though @errno was non-zero,

Again, this can happen only on OOM in which case libvirtd is going to be
killed anyway.

> so
> even though in that instance we're not going to get much more done - it
> does mean that code that was going to fail previously because errno had
> been set will now succeed at least for a short while rather than causing
> an error like it should have.

Succeeding for a short period of time it takes oom_killer to kill a
process ;-)

> 
>>>
>>> I'm not worried about being called twice (we know that won't happen
>>> ;-)), but spreading out the error message processing through 3 functions
>>> causes me wonder whether that virFreeError should be done right after
>>> the virSetError rather than waiting for virFDStreamDataDispose.
>>
>> I don't think so. threadError is part of virFDStreamData object and as
>> such the dispose function is responsible for freeing the object.
>>
>> Michal
>>
> 
> 
> So threadA has an error message allocated that now gets set or copied
> into threadB. So, virFDStreamDataDispose is running for threadA, right?
> That's where I'm lost in the thicket. Still, as I'm typing this I'm
> thinking if we Free'd it right after the Set, then we'd have to set
> threadErr = NULL too and that causes other algorithm problems possibly.
> 
> OK - so I've convinced myself that it's better in Dispose!

Cool!

Michal