[Avocado-devel] RFC: avocado.utils.process (or, how to handle the split stdout/stderr recording)

Thu Nov 9 15:39:14 UTC 2017

On Thu, Nov 9, 2017 at 12:54 PM, Cleber Rosa <crosa at redhat.com> wrote:
>
>
> On 11/09/2017 03:38 AM, Amador Pahim wrote:
>> On Wed, Nov 8, 2017 at 4:01 PM, Cleber Rosa <crosa at redhat.com> wrote:
>>> Hi everyone,
>>>
>>> This is kind of brainstorm about the really annoying issue we've trying
>>> to deal with in Avocado.  The root problem we're trying to fix is that
>>> Avocado's "output check" feature can not be used when the test's
>>> generated content combines both stdout and stderr in a single file.  One
>>> example is QEMU's iotests[1].
>>>
>>> The root cause of this limitation is that avocado.utils.process.run()[2]
>>> (because of avocado.utils.process.SubProcess[3]) decided a long time ago
>>> to split the content of a process STDOUT and STDERR to separate files
>>> (and different logging streams).  This is reflected on Avocado's "output
>>> check" feature[4].  Users can choose to record the test's generated
>>> STDOUT, STDERR, both (all) or none of them.  This is how the command
>>> line options look like:
>>>
>>>   --output-check-record {none,all,stdout,stderr}
>>>                Record output streams of your tests to reference files
>>>                (valid options: none (do not record output streams),
>>>                all (record both stdout and stderr), stdout (record
>>>                only stderr), stderr (record only stderr). Current:
>>>                none
>>>
>>> Which map to the following recorded files:
>>>
>>> +----------+----------------------------+--------------------+
>>> | OPTION   | RECORDED FILES             | CONTENT            |
>>> +----------+----------------------------+--------------------+
>>> | none     |                            |                    |
>>> +----------+----------------------------+--------------------+
>>> |          | stdout                     | process FD 1       |
>>> + all      +----------------------------+--------------------+
>>> |          | stderr                     | process FD 2       |
>>> +----------+----------------------------+--------------------+
>>> | stdout   | stdout                     | process FD 1       |
>>> +----------+----------------------------+--------------------+
>>> | stderr   | stderr                     | process FD 2       |
>>> +----------+----------------------------+--------------------+
>>>
>>> Notice how the "all" option still creates two separate files, with
>>> content coming from individual file descriptors.  Adding another record
>>> option would actually be pretty simple, that is, something like:
>>>
>>> +----------+----------------------------+--------------------+
>>> | OPTION   | RECORDED FILES             | CONTENT            |
>>> +----------+----------------------------+--------------------+
>>> | combined | combined_stdout_stderr     | process FD 1 and 2 |
>>> +----------+----------------------------+--------------------+
>>>
>>> What is *not* easily possible, is to have two options such as "all" and
>>> "combined" at the same time.  The reason is that it the order of the
>>> content written to "combined_stdout_stderr" can not be guaranteed to be
>>> correct.
>>>
>>> I've attempted to allow for both options to be used at once, by using a
>>> pair of file-like objects with specialized writes that would either
>>> share single list of records or just share a secondary file with a lock
>>> (pseudo/prototype code):
>>>
>>> class TeeFile(file):
>>>     def __init__(self, path, tee_path, tee_lock, mode='a'):
>>>         super(TeeFile, self).__init__(path, mode)
>>>         self._tee_path = tee_path
>>>         self._tee_lock = tee_lock
>>>
>>>     def write(self, record):
>>>         self.tee_lock.acquire()
>>>         super(TeeFile, self).write(record)
>>>         with open(self._tee_path, 'a') as tee_file:
>>>             tee_file.write(record)
>>>             tee_file.flush()
>>>         self.tee_lock.release()
>>>
>>> But the fact that the Python standard library "subprocess.Popen"
>>> implementation works with file descriptors doesn't make it possible to
>>> implement a shared/synchronized write strategy.  Rewriting parts of
>>> "subprocess.Popen" or "avocado.utils.process.SubProcess" is, IMO, a
>>> clear sign to not attempt something way more complicated and error prone
>>> for the sake of this feature.
>>
>> Agreed. You're trying to control from above something handled many layers below.
>> I don't see a problem in having a mutual exclusive behavior, as long
>> as we have clear documentation on that.
>>
>>>
>>> Accepting the fact that either combined (stdout+stderr) or a split
>>> (stdout, stderr) will be available, the point of option names comes into
>>> play.  Both "stdout", "stderr" and "none" are aptly named.  But, with
>>> the introduction of a third *real* output record option/file, "all"
>>> becomes a rather bad option name.
>>>
>>> To keep things simple, I'd add an alias from "all" to "both",
>>> deprecating the former.  The new option would be named either "combined"
>>> and the generated file named "combined_stdout_stderr" by default.  It's
>>> a rather verbose name, but I'd go with explicit option names rather than
>>> confusing ones.
>>
>> For the usability, "both" in a list with more than 2 other options,
>> even though clear enough in this case, seems essentially wrong.
>> I'd simply deprecate/drop "all" and allow "--output-check-record=stdout,stderr".
>>
>> In this case, the new list of options, after the deprecation, would be:
>>
>> {none,stdout,stderr,combined}
>>
>> Where "stdout" and "stderr" can be used together and "combined" can
>> only be used alone.
>>
>
> I liked your proposal, until I thought of
> avocado.utils.process.SubProcess API which backs this feature and is
> closely related to it.  The "allow_output_check" parameter takes a "str"
> parameter, so providing a set of options isn't natural.
>
> While we can change the API, say to take a list and implement the same
> exclusion logic, it'd steer away from that.  One even worse option is to
> keep the "str" parameter type, and allow for values such as "stdout,stderr".
>
> So, my counter proposal is to name the both the API and command line
> options:
>
> {none,stdout,sterr,combined,split}
>
> How does that sound?

Well, "both" > "split". Let's keep your original proposal and use "both" :)

>
> - Cleber.
>
>>>
>>> Comments?
>>>
>>> --
>>>
>>> [1] -
>>> https://git.qemu.org/?p=qemu.git;a=blob;f=tests/qemu-iotests/check;h=e6b6ff7a047562099c2fb5f0fd35652780c9c006;hb=HEAD#l757
>>> [2] -
>>> http://avocado-framework.readthedocs.io/en/55.0/api/utils/avocado.utils.html#avocado.utils.process.run
>>> [3] -
>>> http://avocado-framework.readthedocs.io/en/55.0/api/utils/avocado.utils.html#avocado.utils.process.SubProcess
>>> [4] -
>>> http://avocado-framework.readthedocs.io/en/55.0/WritingTests.html#test-output-check-and-output-record-mode
>>>
>>> --
>>> Cleber Rosa
>>> [ Sr Software Engineer - Virtualization Team - Red Hat ]
>>> [ Avocado Test Framework - avocado-framework.github.io ]
>>> [  7ABB 96EB 8B46 B94D 5E0F  E9BB 657E 8D33 A5F2 09F3  ]
>>>
>>>
>
> --
> Cleber Rosa
> [ Sr Software Engineer - Virtualization Team - Red Hat ]
> [ Avocado Test Framework - avocado-framework.github.io ]
> [  7ABB 96EB 8B46 B94D 5E0F  E9BB 657E 8D33 A5F2 09F3  ]
>