[Avocado-devel] [RFC] Environment Variables

Wed Jun 1 19:48:45 UTC 2016

On 06/01/2016 04:39 PM, Ademar Reis wrote:
> On Wed, Jun 01, 2016 at 04:02:54PM -0300, Cleber Rosa wrote:
>> On 06/01/2016 03:07 PM, Ademar Reis wrote:
>>> On Tue, May 31, 2016 at 07:30:43AM -0300, Cleber Rosa wrote:
>>>>
>>>
>>> I'm replying on top of Cleber because he already said a few
>>> things I was going to say.
>>>
>>>> On 05/25/2016 05:31 AM, Amador Pahim wrote:
>>>>> Hi folks,
>>>>>
>>>>> We have requests to handle the environment variables that we can set to
>>>>> the tests. This is the RFC in that regard, with a summary of the ideas
>>>>> already exposed in the original request and some additional planning.
>>>>>
>>>>> The original request is here:
>>>>> https://trello.com/c/Ddcly0oG/312-mechanism-to-provide-environment-variables-to-tests-run-on-a-virtual-machine-remote
>>>>>
>>>>>
>>>>> Motivation
>>>>> ==========
>>>>> Avocado tests are executed in a fork process or even in a remote
>>>>> machine. Regardless the fact that Avocado is hard coded to set some
>>>>> environment variables, they are for internal consumption and user is not
>>>>> allowed to control/configure its behavior.
>>>>
>>>> You mean this:
>>>>
>>>> http://avocado-framework.readthedocs.io/en/latest/WritingTests.html#environment-variables-for-simple-tests
>>>>
>>>> Right? Basically, the fact that Avocado sets some of the job/test state as
>>>> environment variables, that can be used by SIMPLE tests.
>>>>
>>>>> The motivation is the request to provide users an interface to set
>>>>> and/or keep environment variables for test consumption.
>>>
>>> I'm not sure if they're necessarily for test consumption. I think
>>> the motivation for the original request was to provide the
>>> standard Unix interface of environment variables for when tests
>>> are run remotely.
>>>
>>
>> If the motivation is basically about setting the env vars when running tests
>> remotely, than this brings the discussion about the *local* behavior to:
>>
>> 1. Should Avocado default to the standard UNIX behavior of cloning the
>> environment?
>>
>>  A: IMHO, yes.
>
> That's the current behavior (see my example at the end of the
> previous email). Except when one runs tests remotely, which is
> precisely the use case this feature would "fix".
>
>>
>> 2. Could Avocado have have a feature to start tests in a clean(er)
>> environment?
>>
>>  A: Possibly yes, but seems low priority.  The use case here could be seen
>> as a plus in predictability, helping to achieve expected test results in
>> spite of the runner environment.  A real world example could be a CI
>> environment that sets a VERBOSE environment variable. This env var will be
>> passed over to Avocado, to the test process and finally to a custom binary
>> (say a benchmark tool) that will produce different output depending on that
>> environment variable.  Doing that type of cleaning in the test code is
>> possible, but the framework could help with that.
>>
>> 2.1. If Avocado provides a "clean(er) test environment" feature, how to
>> determine which environment variables are passed along?
>>
>>  A: The "env-keep" approach seems like the obvious way to do it.  If the
>> mechanism is enabled, which I believe should be disabled by default (see
>> #1), its default list could contain the more or less standard UNIX
>> environment variables (TERM, SHELL, LANG, etc).
>
> Agree. But like you said such a feature would be low priority and
> optional. The important thing is that the implementation of what
> we're discussing in this RFC would not interfere with it.
>
>>
>>> These environment variables can change the behavior of both
>>> Avocado (the runner itself), the tests (after all nothing
>>> prevents the test writer from using them) and all sub-processes
>>> executed by the test.
>>>
>>
>> Right.
>>
>>> Locally, this is standard:
>>>
>>>   $ TMPDIR=/whatever/tmp VAR=foo ./avocado run test1.py
>>>
>>> But when running avocado remotely, there's no way to configure
>>> the environment in the destination. The environment variables set
>>> in the command line below will not be "forwarded" to the remote
>>> environment:
>>>
>>>   $ TMPDIR=/whatever/tmp VAR=foo ./avocado run test1.py \
>>>                                  --remote...
>>>
>>
>> Right.
>>
>>>>>
>>>>> Use cases
>>>>> =========
>>>>> 1) Use the command line or the config file to set the environment
>>>>> variables in tests processes environment; access those variables from
>>>>> inside the test.
>>>>> 2) Copy from current environment some environment variable(s) to the
>>>>> tests processes environment; access those variables from inside the test.
>>>
>>> I think we don't even have to go that far. We can simply say the
>>> intention is to set the environment variables in the environment
>>> where Avocado is run. The mechanism is quite standard and well
>>> understood.
>>>
>>> And here comes an important point: I don't think this should be a
>>> mechanism to pass variables to tests. Although, again,
>>> environment variables can be used for that purpose, Avocado
>>> should have a proper interface to provide a dictionary of
>>> configuration and variables to each test.
>>>
>>
>> The only valid reason for having such a mechanism to pass *different*
>> environment variables to tests, talking about local environment, would be
>> *if and only if* the same environment variable to be set when running
>> Avocado would change the behavior of Avocado itself.  Example:
>>
>>  $ AVOCADO_LOG_EARLY=1 avocado run avocado-self-tests.py
>>
>> This way, both the first level avocado process (our "real" runner) and other
>> instances run by the "avocado-self-test.py" code would react to that
>> variable. *BUT* this seems a corner case, and I wouldn't think it justifies
>> the implementation of such a feature at this point.
>
> I think we should KISS: environment variables are well understood
> in the Unix world. This feature will simply guarantee they (or a
> subset of them) are kept when tests are run in a different
> environment (remotely, in a VM, in a container -- or whatever
> else we implement in the future).
>
>>
>>> Currently, this is erroneously provided by the multiplexer
>>> (including --mux-inject), but it should be cleaned up in the near
>>> future.
>>>
>>
>> Right.
>>
>>>>>
>>>>> Proposal
>>>>> ========
>>>>> - To create a command line option, under the `run` command, to set
>>>>> environment variables that will be available in tests environment process:
>>>>>
>>>>>  $ avocado run --test-env='FOO=BAR,FOO1=BAR1' passtest.py
>>>>>
>>>>
>>>> I can relate to this use case...
>>>
>>> This would be a simple way of doing it, but something like
>>> "--env-keep=FOO,FOO1" could be a better approach (more about it
>>> below). So one would write it this way:
>>>
>>>   $ FOO=BAR FOO1=BAR1 avocado run --env-keep='FOO,FOO1' passtest.py
>>>
>>
>> Right. Again, the only drawback of --env-keep is something along the lines
>> of the "AVOCADO_LOG_EARLY" example I gave earlier.
>>
>>>>
>>>>> - To create an option in config file with a dictionary of environment
>>>>> variables to set in test process environment. It can be used as a
>>>>> replacement or complement to the command line option (with lower priority):
>>>>>
>>>>>  [tests.env]
>>>>>  test_env_vars = {'FOO': 'BAR', 'FOO1': 'BAR1'}
>>>>>
>>>>
>>>> ... while putting those in a config file does not seem like something one
>>>> would do.
>>>>
>>>> In all cases, and more explicitly in the config file example, this is only
>>>> really necessary if/when the environment variable to pass to the test
>>>> actually harms Avocado (considering a local execution, that is, in a forked
>>>> process).
>>>>
>>>> So, if Avocado and the test, share the use of environment variables by the
>>>> same name, then this is a must.  Also in the case of execution in other
>>>> "runners", such as remote/vm, this can be quite valuable.
>>>
>>> I don't like this interface because I think this opens the door
>>> for abuse. Like I said in a previous paragraph, this interface
>>> should not be a mechanism for passing variables to tests.
>>>
>>
>> I can also see it being abused very quickly.
>>
>>>>
>>>>> - Create an option in config file with a list of environment variable
>>>>> names to copy from avocado main process environment to the test process
>>>>> environment (similar to env_keep in the /etc/sudoers file):
>>>>>
>>>>>  [tests.env]
>>>>>  env_keep = ['FOO', 'FOO1', 'FOO2']
>>>>>
>>>
>>> I like this approach because it reinforces the message that we're
>>> keeping (or forwarding) some of the environment variables from
>>> the original environment where the test runner was run.
>>>
>>
>> Then, we need a "env_reset" mentality, and a default list of environment
>> variables that we forward by default (along the same lines of sudo):
>>
>> Defaults    env_reset
>> Defaults    env_keep =  "COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR
>> LS_COLORS"
>> Defaults    env_keep += "MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS
>> LC_CTYPE"
>> Defaults    env_keep += "LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT
>> LC_MESSAGES"
>> Defaults    env_keep += "LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER
>> LC_TELEPHONE"
>> Defaults    env_keep += "LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET
>> XAUTHORITY"
>>
>> Like I briefly mentioned before.
>
> That would be useful only for the use-case of resetting the
> environment where *tests* are run.
>
> As of today, there's no difference between the environment where
> a test is run and the environment where the test runner is run.
> They share the same variables (I like that).
>
> The feature we're discussing would forward (keep, clone, copy,
> whatever you want to call it) some of the env variables from the
> env where the *original runner* is running to the env where the
> *remote runner* will run. It says nothing about the env where the
> actual *tests* are run. It also doesn't create any variables. It
> simply forward them if they already exist in the environment.
>
> So env_reset is not necessary. If we ever implement it, it'll
> have no impact in this env-keep feature, because they'll be
> working on different layers (test runner vs test).
>
>>
>>>>>
>>>>
>>>> Right, this makes sense. But it also brings the point that we may actually
>>>> change the default behavior of keeping environment variables from Avocado in
>>>> the tests' process.  That is, they would get a much cleaner environment by
>>>> default.  While this sounds cleaner, it may break a lot of expectations.
>>>
>>> I wonder what the motivation would be to clean-up the environment
>>> where tests are run. Can you please elaborate?  If we indeed
>>> decide to implement this change, then I would say we should honor
>>> whatever is set in env_keep.
>>>
>>
>> I mentioned before a few points about test predictability, but the other
>> points that came later also support this idea.
>>
>> Again, if env-keep is applicable for tests running remotely *only*, this
>> changes a lot of things.  If we're talking about both local and remote, than
>> all what I said before applies.
>
> There's no "local" and "remotely", because local already works.
> So this feature is about extending the behavior we have locally
> to remote use-cases.
>
>>
>>>>
>>>>> For every configuration entry point, the setting have to be respected in
>>>>> local and remote executions.
>>>>>
>>>>> Drawbacks
>>>>> =========
>>>>>
>>>>> While setting an environment variable, user will be able to change the
>>>>> behavior of a test and probably the behavior of Avocado itself. Maybe
>>>>> even the OS behavior as well. We should:
>>>>> - Warn users about the danger when using such options.
>>>>
>>>> I fail to see where an environment variable, to be set by Avocado in the
>>>> test process, can or should impact Avocado itself.  If it does, then we'd
>>>> probably be doing something wrong.  I'm not sure we need warnings that
>>>> exceed documenting the intended behavior.
>>>
>>> I think the environment has to be set where Avocado is run, not
>>> where tests are run. Which is why I prefer --env-keep and
>>> [env-keep].
>>>
>>> So in the case of:
>>>
>>>   $ FOO=bla avocado run test.py --remote=...
>>>
>>> $FOO is available inside the environment where Avocado and its
>>> tests are run, both locally and remotely.
>>>
>>
>> Agreed, but let's just keep in mind that one (very much) corner case (where
>> the environment may affect Avocado itself).
>>
>>>>
>>>>> - Protect Avocado environment variables from overwriting.
>>>>
>>>> About protecting the Avocado's own environment variables: agreed.
>>>
>>> This is something that won't need any change. There are variables
>>> which are read by Avocado and others which are written by it.
>>>
>>> For example:
>>>
>>>  * TMPDIR will influence Avocado's behavior (standard Unix
>>>    variable)
>>>  * AVOCADO_VERSION is written by Avocado. Setting it externally
>>>    won't make any difference.
>>>
>>>   $ TMPDIR=/home/ademar/tmp avocado run examples/tests/env_variables.sh --show-job-log | grep Temporary
>>>   Temporary dir: /home/ademar/tmp/avocado_L9YiE4
>>>
>>>   $ AVOCADO_VERSION=0 ./scripts/avocado run examples/tests/env_variables.sh --show-job-log | grep Version
>>>   [stdout] Avocado Version: 35.0
>>>
>>
>> Exactly.  Now I think I missed that fact that Avocado's own environment
>> variables are indeed already "protected", because they're set at when the
>> SIMPLE tests' processes are created, which happens later and thus takes
>> precedence.
>
> Yes, that's why we don't have to worry too much about the
> details. The way environment variables work is quite standard and
> well understood.
>
>>
>>>>
>>>>>
>>>>> Looking forward to read your comments.
>>>>>
>>>>
>>>> Overall, this is definitely welcome.  Let's discuss possible implementation
>>>> issues, such as remote/vm support, because it wouldn't be nice to introduce
>>>> something like this with too many caveats.
>>>>
>>>
>>> We appear to have different understandings about what this
>>> feature should be about. IMO it should be about the standard unix
>>> environment where Avocado and tests are run. The primary use-case
>>> is for remote/vm support (in other words, whenever there's a
>>> change in the environment).
>>>
>>
>> Then we should make it clear what we want to tackle first.  If remote/vm
>> support is the primary goal, then maybe we should think about
>> --remote-env/--vm-env options to not poison the discussion and
>> implementation with the env_reset/env_keep like features.
>>
>
> If we think env_reset is a different use-case (and based on your
> use case #2 from the beginning of this e-mail looks like we do),
> then indeed --remote-keep-env and --vm-keep-env are the options
> we're looking for (with respective configuration file entries).
>
> I was hoping we could implement just one "--keep-env=" instead of
> multiple "--<something-remote>-keep-env=" (--remote-keep-env=,
> --vm-keep-env=, --docker-keep-env=, etc), but maybe that's not a
> good idea.
>

I also dislike the duplicated --remote-*/--vm-* options, but this looks 
more like a different design issue, that will have to eventually have to 
be addressed.  What I mean is, "--vm-*" is *mostly* about "provisioning" 
a VM, while still using the same remote connection mechanism.  Anyway, 
it's unrelated to this specific issue.

> Thanks.
>    - Ademar
>

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]