[libvirt] [PATCH] qemu: Report errors from iohelper

Tue Oct 23 15:06:29 UTC 2012

On 23.10.2012 16:51, Eric Blake wrote:
> On 10/23/2012 03:39 AM, Michal Privoznik wrote:
>> Currently, we use iohelper when saving/restoring a domain.
>> However, if there's some kind of error (like I/O) it is not
>> propagated to libvirt. Since it is not qemu who is doing
>> the actual write() it will not get error. The iohelper does.
>> Therefore we should check for iohelper errors as it makes
>> libvirt more user friendly.
>> ---
>>  src/libvirt_private.syms |    1 +
>>  src/qemu/qemu_driver.c   |    7 ++++-
>>  src/util/virfile.c       |   47 ++++++++++++++++++++++++++++++++++++++++++++++
>>  src/util/virfile.h       |    2 +
>>  4 files changed, 55 insertions(+), 2 deletions(-)
> 
> I agree that catching iohelper errors is needed, but I'm worried that
> you might introduce deadlock with this approach (then again, it may just
> be theoretical):
> 
>> +++ b/src/qemu/qemu_driver.c
>> @@ -2945,6 +2945,7 @@ endjob:
>>                  if (rc < 0)
>>                      VIR_WARN("Unable to resume guest CPUs after save failure");
>>              }
>> +            virFileWrapperFdCatchError(wrapperFd);
>>          }
>>          if (qemuDomainObjEndAsyncJob(driver, vm) == 0)
>>              vm = NULL;
>> @@ -3282,9 +3283,11 @@ doCoreDump(struct qemud_driver *driver,
>>  
>>  cleanup:
>>      VIR_FORCE_CLOSE(fd);
>> -    virFileWrapperFdFree(wrapperFd);
>> -    if (ret != 0)
>> +    if (ret != 0) {
>> +        virFileWrapperFdCatchError(wrapperFd);
>>          unlink(path);
> 
> You aren't doing any reading of the stderr file descriptor until you get
> to the cleanup label.  But suppose that iohelper encounters situations
> that cause it to write more than PIPE_BUF bytes to stderr, but while
> still continuing to run.  Then, because no one is reading the other end
> of the pipe, iohelper blocks on writes to stderr instead of proceeding
> on with the rest of its operations, and thus never gets to a point where
> it will exit in order to kick libvirtd over to the cleanup label to
> start reading the stderr pipe in the first place.
> 
>> +++ b/src/util/virfile.c
>> @@ -135,6 +135,7 @@ virFileDirectFdFlag(void)
>>   * read-write is not supported, just a single direction.  */
>>  struct _virFileWrapperFd {
>>      virCommandPtr cmd; /* Child iohelper process to do the I/O.  */
>> +    int err_fd; /* FD to read stderr of @cmd */
>>  };
>>  
>>  #ifndef WIN32
>> @@ -229,6 +230,14 @@ virFileWrapperFdNew(int *fd, const char *name, unsigned int flags)
>>          virCommandAddArg(ret->cmd, "0");
>>      }
>>  
>> +    /* In order to catch iohelper stderr, we must:
>> +     * - pass a FD to virCommand (-1 to auto-allocate one)
>> +     * - change iohelper's env so virLog functions print to stderr
>> +     */
>> +    ret->err_fd = -1;
>> +    virCommandSetErrorFD(ret->cmd, &ret->err_fd);
>> +    virCommandAddEnvPair(ret->cmd, "LIBVIRT_LOG_OUTPUTS", "1:stderr");
> 
> Would it work to use virCommandSetErrorBuffer() instead, or does that
> rely on using virCommandRun() to properly populate the buffer via
> poll(), where we are stuck using virCommandRunAsync() instead?  Or,
> since we are already using virCommandRunAsync(), does that mean we need
> to figure out how to incorporate a poll() to favorably handle stderr in
> addition to everything else that iohelper is already  handling?
> 

Yeah, I've tried te virCommandSetErrorBuffer() way firstly. But it
requires the poll() which is wrapped in virCommandProcessIO(). However,
it is static so it would needed to be dig out and run in a separate
thread. But we can run virCommandRun directly then. So I think I can
incorporate our event loop. Just add a callback once ret->err_fd is
initialized.

> Or am I worrying too much, and if iohelper hits any issue where it would
> be logging to stderr in the first place, it will be exiting before
> filling up the stderr pipe and thus never deadlocking?

Well, with current implementation of iohelper there is only one place
where it doesn't die right after reporting an error:
src/util/iohelper.c:160

I don't know how possible is to hit that.

> 
> At any rate, depending on how that question is answered, this code does
> indeed look useful.
> 

Okay, since we would have to keep this limitation in mind when changing
iohelper (and honestly, I'll forget immediately after the push) I think
we should make this safer and read the error pipe in the event loop.

I'll update the patch and resend.

Michal