[libvirt] [PATCH v3 1/6] virfdstream: Check for thread error more frequently

Tue Jun 13 12:35:14 UTC 2017

On 06/13/2017 07:31 AM, Michal Privoznik wrote:
> On 06/12/2017 11:26 PM, John Ferlan wrote:
>>
>>
>> On 06/05/2017 04:22 AM, Michal Privoznik wrote:
>>> When the I/O thread quits (e.g. due to an I/O error, lseek()
>>> error, whatever), any subsequent virFDStream API should return
>>> error too. Moreover, when invoking stream event callback, we must
>>> set the VIR_STREAM_EVENT_ERROR flag so that the callback knows
>>> something bad happened.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>>> ---
>>>  src/util/virfdstream.c | 15 ++++++++++++---
>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>
>>> diff --git a/src/util/virfdstream.c b/src/util/virfdstream.c
>>> index 7ee58be13..cd24757e6 100644
>>> --- a/src/util/virfdstream.c
>>> +++ b/src/util/virfdstream.c
>>> @@ -312,6 +312,9 @@ static void virFDStreamEvent(int watch ATTRIBUTE_UNUSED,
>>>          return;
>>>      }
>>>  
>>> +    if (fdst->threadErr)
>>> +        events |= VIR_STREAM_EVENT_ERROR;
>>> +
>>
>> This hunk almost feels separable... That is, it's updating the events
>> (VIR_STREAM_EVENT_ERROR) in the callback function feels different than
>> checking for threadErr more often, but I'll leave it up to you to decide
>> to split more or not.
> 
> I don't think that separation is needed. The commit says "Check for
> thread error more frequently". Every check comes from an action. In
> majority of the cases the action is reporting an error. In some cases it
> means setting a flag, a boolean that causes error to be reported later
> in the process.
> 

"almost" - hand grenades, atomic warfare, etc. ;-)

I'm fine with keeping as is.

>>
>>>      cb = fdst->cb;
>>>      cbopaque = fdst->opaque;
>>>      ff = fdst->ff;
>>> @@ -787,10 +790,10 @@ static int virFDStreamWrite(virStreamPtr st, const char *bytes, size_t nbytes)
>>>      if (fdst->thread) {
>>>          char *buf;
>>>  
>>> -        if (fdst->threadQuit) {
>>> +        if (fdst->threadQuit || fdst->threadErr) {
>>>              virReportSystemError(EBADF, "%s",
>>>                                   _("cannot write to stream"));
>>
>> Would this (and the subsequent similar check) possibly overwrite
>> something from a threadErr with this error message?
> 
> What do you mean? In case of thread error, threadErr is set to errno. So
> we should do:
> 
> virReportSystemError(fdst->threadErr, ...);
> 
> instead of EBADF? Well, look for the very next patch.
> 

As you saw in the next patch, it was too easy for me to be lost in the
weeds of which error is which...

The comment was more towards how virFDStreamThread would set threadErr
to errno after:

  1. virCondWait fails in which case a virReportSystemError was issued
  2. if (got < 0) on *DoRead and *DoWrite where I didn't chase
throughout the calls, but there were virReportSystemError calls already
made.

so the question in my mind was on the '|| fdst->threadErr' would a
message already be generated? or is the error message for ->thread
thread as opposed to this processing thread?

>>
>>> -            return -1;
>>> +            goto cleanup;
>>
>> ouch, <sigh> and this is separable too technically....
> 
> Yes, technically. On the other hand, how strictly do we want to follow
> the rules? ;-)
> 

Rules, we don't need no stinking rules ;-)...  I would have been remiss
if I didn't point it out though.  I'm fine with it as is since it's not
like this is being backported somewhere special to fix a specific
problem noted in some rogue QE test.

>>
>>>          }
>>>  
>>>          if (VIR_ALLOC(msg) < 0 ||
>>> @@ -866,7 +869,7 @@ static int virFDStreamRead(virStreamPtr st, char *bytes, size_t nbytes)
>>>          virFDStreamMsgPtr msg = NULL;
>>>  
>>>          while (!(msg = fdst->msg)) {
>>> -            if (fdst->threadQuit) {
>>> +            if (fdst->threadQuit || fdst->threadErr) {
>>>                  if (nbytes) {
>>>                      virReportSystemError(EBADF, "%s",
>>>                                           _("stream is not open"));
>>> @@ -959,6 +962,9 @@ virFDStreamSendHole(virStreamPtr st,
>>>          fdst->offset += length;
>>>      }
>>>  
>>> +    if (fdst->threadErr)
>>> +        goto cleanup;
>>> +
>>
>> So it's interesting in these paths the threadErr check is done outside
>> the fdst->thread check which is different than the virFDStreamRead and
>> Write API's.
> 
> Well, if there's no fdst->thread then no one sets the threadErr. But
> okay, for consistency purpose I can move this check.
> 

The question that popped into my mind was at this point was why not
alter the previous calls to make the check earlier. Especially
considering that both of the previous ones also potentially calls
virReportSystemError.

For this particular API, I agree it's not an issue here, but I was going
on the consistency route with the thought in the back of my mind about
someone looking at this code at some point in the future and wondering
why the checks were being made at different places.

>>
>>>      if (fdst->thread) {
>>>          /* Things are a bit complicated here. If FDStream is in a
>>>           * read mode, then if the message at the queue head is
>>> @@ -1018,6 +1024,9 @@ virFDStreamInData(virStreamPtr st,
>>>  
>>>      virObjectLock(fdst);
>>>  
>>> +    if (fdst->threadErr)
>>> +        goto cleanup;
>>> +
>>
>> This one in particular because threadQuit is checked in the subsequent
>> code similar to virFDStreamRead made me ponder a bit more, but I'm not
>> sure I could come to a conclusion...
> 
> This is slightly different. I mean, I can move the check, but this is
> sort of a special API in sense that if the I/O thread exits and the msg
> queue is empty we will signal EOF to the caller. The other three APIs do
> change state of the stream - they change the position in the stream.
> This is a query API and as such can be used to query "are we at EOF
> yet?". That is the reason why this API doesn't fail on threadQuit,
> rather than return EOF (in a specific way as described in
> virStreamInData description).
> 

Makes sense when you consider the totality of things. The new API's
treat threadErr as different than threadQuit, while the old API's treat
them the same which is what I was trying to consider.

I kept going back to that error message being generated for the earlier
two calls and wondering what, if any, downside there is to doing that.

John

>>
>> I guess I'd want to see some consistency over when threadErr is checked
>> between the 4 functions or I'd wonder why 2 do things one way and 2 the
>> other. If they need to be different for a specific reason, then I
>> suppose more separation could be done or at least the commit message
>> indicate why they're different.
> 
> 
> Okay. I can be more verbose there.
> 
> Michal
>