[Avocado-devel] RFC: avocado.utils.process (or, how to handle the split stdout/stderr recording)

Thu Nov 9 15:30:27 UTC 2017

On Wed, 8 Nov 2017 10:01:22 -0500
Cleber Rosa <crosa at redhat.com> wrote:

> Hi everyone,
> 
> This is kind of brainstorm about the really annoying issue we've trying
> to deal with in Avocado.  The root problem we're trying to fix is that
> Avocado's "output check" feature can not be used when the test's
> generated content combines both stdout and stderr in a single file.  One
> example is QEMU's iotests[1].
> 
> The root cause of this limitation is that avocado.utils.process.run()[2]
> (because of avocado.utils.process.SubProcess[3]) decided a long time ago
> to split the content of a process STDOUT and STDERR to separate files
> (and different logging streams).  This is reflected on Avocado's "output
> check" feature[4].  Users can choose to record the test's generated
> STDOUT, STDERR, both (all) or none of them.  This is how the command
> line options look like:
> 
>   --output-check-record {none,all,stdout,stderr}
>                Record output streams of your tests to reference files
>                (valid options: none (do not record output streams),
>                all (record both stdout and stderr), stdout (record
>                only stderr), stderr (record only stderr). Current:
>                none
> 
> Which map to the following recorded files:
> 
> +----------+----------------------------+--------------------+
> | OPTION   | RECORDED FILES             | CONTENT            |
> +----------+----------------------------+--------------------+
> | none     |                            |                    |
> +----------+----------------------------+--------------------+
> |          | stdout                     | process FD 1       |
> + all      +----------------------------+--------------------+
> |          | stderr                     | process FD 2       |
> +----------+----------------------------+--------------------+
> | stdout   | stdout                     | process FD 1       |
> +----------+----------------------------+--------------------+
> | stderr   | stderr                     | process FD 2       |
> +----------+----------------------------+--------------------+
> 
> Notice how the "all" option still creates two separate files, with
> content coming from individual file descriptors.  Adding another record
> option would actually be pretty simple, that is, something like:
> 
> +----------+----------------------------+--------------------+
> | OPTION   | RECORDED FILES             | CONTENT            |
> +----------+----------------------------+--------------------+
> | combined | combined_stdout_stderr     | process FD 1 and 2 |
> +----------+----------------------------+--------------------+
> 
> What is *not* easily possible, is to have two options such as "all" and
> "combined" at the same time.  The reason is that it the order of the
> content written to "combined_stdout_stderr" can not be guaranteed to be
> correct.
> 
> I've attempted to allow for both options to be used at once, by using a
> pair of file-like objects with specialized writes that would either
> share single list of records or just share a secondary file with a lock
> (pseudo/prototype code):
> 
> class TeeFile(file):
>     def __init__(self, path, tee_path, tee_lock, mode='a'):
>         super(TeeFile, self).__init__(path, mode)
>         self._tee_path = tee_path
>         self._tee_lock = tee_lock
> 
>     def write(self, record):
>         self.tee_lock.acquire()
>         super(TeeFile, self).write(record)
>         with open(self._tee_path, 'a') as tee_file:
>             tee_file.write(record)
>             tee_file.flush()
>         self.tee_lock.release()
> 
> But the fact that the Python standard library "subprocess.Popen"
> implementation works with file descriptors doesn't make it possible to
> implement a shared/synchronized write strategy.  Rewriting parts of
> "subprocess.Popen" or "avocado.utils.process.SubProcess" is, IMO, a
> clear sign to not attempt something way more complicated and error prone
> for the sake of this feature.

Agreed!

> Accepting the fact that either combined (stdout+stderr) or a split
> (stdout, stderr) will be available, the point of option names comes into
> play.  Both "stdout", "stderr" and "none" are aptly named.  But, with
> the introduction of a third *real* output record option/file, "all"
> becomes a rather bad option name.
> 
> To keep things simple, I'd add an alias from "all" to "both",
> deprecating the former.  The new option would be named either "combined"
> and the generated file named "combined_stdout_stderr" by default.  It's
> a rather verbose name, but I'd go with explicit option names rather than
> confusing ones.
> 
> Comments?

I think this is a reasonable approach; drop "all" and use "both" to
indicate stdout and stderr. The generated file name looks fine.

Someone looking at these options (without knowing the history) will
see: none, both, combined, stdout and stderr. This looks OK, once they
understand the difference between both and combined.

-Jeff

> --
> 
> [1] -
> https://git.qemu.org/?p=qemu.git;a=blob;f=tests/qemu-iotests/check;h=e6b6ff7a047562099c2fb5f0fd35652780c9c006;hb=HEAD#l757
> [2] -
> http://avocado-framework.readthedocs.io/en/55.0/api/utils/avocado.utils.html#avocado.utils.process.run
> [3] -
> http://avocado-framework.readthedocs.io/en/55.0/api/utils/avocado.utils.html#avocado.utils.process.SubProcess
> [4] -
> http://avocado-framework.readthedocs.io/en/55.0/WritingTests.html#test-output-check-and-output-record-mode
>