[Avocado-devel] Multiplexer and params retrieval

Mon Mar 2 08:27:06 UTC 2015

Dne 26.2.2015 v 23:23 Lucas Meneghel Rodrigues napsal(a):
> On Wed, Feb 25, 2015 at 11:54 AM, Lukáš Doktor <ldoktor at redhat.com> wrote:
>> Hello World,
>>
>> I'm working on the
>> https://trello.com/c/vKxylMFN/115-multiplexer-mechanism-for-tests-to-retrieve-variables
>> card and having some design decision concerns. I'd like to ask you for your
>> opinions, examples, concerns or demands :D
>>
>
> OK, below you'll find some comments.
>
>> Multiplexer
>> ===========
>>
>> Ultimate goal
>> -------------
>>
>> Possibility to specify test parameters, allowing simple upstream-downstream
>> modification, supplying different files on command line or even injecting
>> nodes/variables from cmdline to use different params for different runs.
>>
>> Additionally we must avoid params name clashes.
>>
>> Ideally this should be simple enough to be understood without any prior
>> knowledge.
>>
>>
>> Tree
>> ----
>>
>> Let's say tree is the way we want to go:
>>
>> -*- by_something -*- A
>>   |                \- B
>>   \- by_else -*- A
>>               \- B
>>
>> Produces:
>> 1. /by_something/A + /by_else/A
>> 2. /by_something/A + /by_else/B
>> 3. /by_something/B + /by_else/A
>> 4. /by_something/B + /by_else/B
>>
>> and we can ask for:
>> params.get(key) => can generate errors when multiple matches
>> params.get(key, path=/by_something)
>> params.get(key, path=/by_else)
>>
>> Upstream-downstream
>> -------------------
>>
>> One way I come up with is to use default multiplex files location:
>>
>> /etc/avocado/config.yaml    => avocado config (could be parsed from INI)
>> $AVOCADO/global.yaml        => default variables (use /etc/avocado/
>> instead?)
>> $PLUGIN                     => plugins should be able to specify default mux
>> too
>> $TEST_DATA/multiplex.yaml   => per-test additional combinations
>>
>> Not all of the files needs to be proceed. This should depend on the cmdline
>> option, which either disables multiplexation, uses only tests options,
>> (maybe) uses global options but multiplex only parts specified in test,
>> multiplex everything.
>>
>> We might also consider using `10_` prefix and automatically load all files
>> matching the name (eg.: when files "10_multiplex.yaml" and
>> "20_multiplex.yaml" exists, we should include booth of them. That way
>> extending up-down is just adding an additional file, which shouldn't clash
>> with our default files. Thus only fast-forward merges are required in
>> downstream projects). The same applies for "-m" cmdline option, when `-m
>> $PATH/10_myfile.yaml` is set, we should search for other `\d\d_myfile
>> files`.
>>
>>
>> Structure
>> ---------
>>
>> This one is tricky and it heavily affects the usage.
>>
>> 1. we might not care about the structure (allow everything for advanced
>> users)
>> 2. we can precisely define the structure (restrict everyone, but simplify
>> usage)
>> 3. we can automatically create locations (heavily affect tree modifications)
>> 4. combine the methods
>>
>>
>> Allow everything
>> ~~~~~~~~~~~~~~~~
>>
>> This one is the most flexible but can surprise users. They are responsible
>> for not touching elsewhere used variants, but if they succeed, they can do
>> absolutely everything.
>>
>> On the other hand it might be harder to write the tests as you need to know
>> the namespace you want to get the variable from. Also with multiple
>> multiplex files this can get nasty.
>>
>>
>> Precise structure
>> ~~~~~~~~~~~~~~~~~
>>
>> Let me demonstrate this on example (+ => !join, * => !mux):
>>
>> -+- config -+- datadir --- paths
>>   |          +- sysinfo
>>   |          \- runner
>>   +- plugins -*- virt -*- hw -*- disk -*- virtio
>>   |                    |      |        \- ide
>>   |                    |      \- nic -*- virtio
>>   |                    |              \- rtl8139
>>   |                    \- os ...
>>   +- tests -*- HASH1
>>   |         \- HASH2
>>   \- self
>>
>> This is a basic global structure. When we multiplex the file, we get
>> (instead of full path I used *+leaf)
>> 1. */paths + */sysinfo + */runner + */virtio + */virtio + */os* + /self
>> 2. */paths + */sysinfo + */runner + */ide + */virtio + */os* + /self
>> ...
>>
>> /tests are special. It's parsed out of the cmdline's urls and/or specified
>> by mapping file. For each run current test just extends the existing
>> "/self". That way we can specify per-test variants.
>>
>> All multiplex files goes automatically into "/self" by default. When we
>> specify !using, we can override other parts too (?except /config).
>>
>>
>> Automatic locations
>> ~~~~~~~~~~~~~~~~~~~
>>
>> Similar to previous only each file (which is not using !using) goes into
>> `/self/$UID` location.
>>
>> This brings one benefit and drawback. It supports simple files:
>>
>> # simple.yaml
>> short:
>> medium:
>> long:
>>
>> Which works fine with multiple independent files:
>>
>> # simple2.yaml
>> builtin:
>> bash:
>>
>> But makes it impossible to extend this file from another file:
>>
>> # extend.yaml
>> longest:
>>
>> which results in (without the simple2.yaml):
>> 1. */short + */longest
>> 2. */medium + */longest
>> 3. */long + */longest
>>
>> People could use !using to avoid this problem, but that somehow workarounds
>> the automatic location and creates the same structure as 1 or 2.
>>
>> Also it brings problems to params.get(key, "/self/by_something") as in
>> reality it would become "/self/$UID/by_something". Should we search for
>> "*/by_something" instead? Or differentiate between "/by_something" vs.
>> "by_something" and in case of relative path prepend "/self/*"?.
>>
>>
>> Combined
>> ~~~~~~~~
>>
>> Possibilities are unlimited.
>>
>>
>> Command line
>> ~~~~~~~~~~~~
>>
>> We also need to consider where -m files go (when not using !using). I'd vote
>> for /self by default, /self/$UID in case of automatic location.
>>
>> Also we might allow -m [$location:]$file to dynamically extend the location.
>> (I tend to prefer this to hard-coded locations inside the file).
>
> Sure, it makes sense. I wonder if there's any use case where using
> this over hard coded would bring problems. I'm saying that because we
> used to use 'hardcoded' in cartesian files.
>
The point is that this is optional. In complex examples you'd use 
hardcoded paths, but for simple changes (usually not often or usable for 
more purposes/tests) you'd be able to inject it on a given place. This 
feature should be for test-developers and for some ordinary developers 
who need to inject specific data to their tests. QA should always create 
well defined multiplex files with given structure.

>> Take
>> another look at the "Structure.Automatic locations" simple.yaml and
>> simle2.yaml. Instead of automatic handling we can just let users say:
>>
>> -m /self/by_duration:simple.yaml -m /self/by_tool:simple2.yaml -m
>> /self/by_duration:extend.yaml
>>
>> Which does booth; extends the simple.yaml by extend.yaml and adds second set
>> of variants from simple2.yaml.
>
> Does this mean that you start a branch /self/by_duration, and then
> inserts the contents of simple.yaml into it, then another
> /self/by_tool branch and then insert simple2.yaml into it, then extend
> what's on simple.yaml by adding extend.yaml? That sounds reasonable,
> although I'm not sure how I feel about starting branches from the
> cmdline. This is new stuff to me :)
>
Yep. As mentioned before, this is not for QA, but for the ones who just 
need to extend or modify something right now, or regularly for various 
different tests. The change to avocado is minimal as all of this already 
happens inside the multiplex files. So effectively this is only a 
shortcut to dynamically inject `!using` at the beginning of the file.

>>
>> This itself can help the basic users to create variants ad-hoc while not
>> tying our hands to write complex yaml files.
>>
>>
>> Getting the params
>> ------------------
>>
>> OK when we have the tree structure parsed into variants, we execute the
>> tests using the params extracted from variants:
>>
>> Let's say our variant1 is:
>> /config* {...}
>> /plugins/virt/hw/disk/virtio {}
>> /plugins/virt/hw/nic/virtio {}
>> /plugins/virt/os/linux/fedora/21/64 {file:img.qcow2, kickstart:ks.ks, ...}
>> /self {file:boot.py, tag:, exec:sleep 1}
>>
>> # the plugin uses
>> params.get_leaves(/plugins/virt/hw) => [*/disk/virtio, */nic/virtio]
>> params.get_leaf(/plugins/virt/hw/disk) => virtio
>> params.get_leaf(/plugins/virt/hw/nic) => virtio
>> params.get(file, /plugins/virt/os) => img.qcow2
>
> So get_leaves/get_leaf would return tree nodes/ list of tree nodes?
>
Yep, the `get_leaf` would be useful eg. for 
params.get_leaf('/virt/hw/nic') as it'd either raise exception, when 
multiple '/virt/hw/nic/*' are found, or return the current variant 
'/virt/hw/nic/rtl8139'. Then instead of the need to set 'driver = 
rtl8139' we might simply query for the leaf name. So consider this as a 
shortcut.

>> # in test we use
>> params.get(exec) => sleep 1
>
> OK, I guess we are doing something similar currently. I wonder if
> tests should receive tree nodes instead of plain dicts.
>
>>
>> Now imagine that second test defines multiple variants:
>>
>> ...
>> /self {file:boot.py, tag:}
>> /self/by_command/sleep {command:sleep 1}
>>
>> now we use params.get(command, /self/by_command) to avoid clashes with
>> variables from /self.
>
> So we'd have an API
>
> params.get('key', 'branch')
>
> To retrieve the value?
>
>>
>> Clashes
>> ~~~~~~~
>>
>> Simple example:
>>
>> /self/by_variable/oneliner {variable: "Single line example"}
>> /self/by_other/something {variable: 123}
>>
>> params.get('variable') => Exception("Multiple occurances found")
>> params.get('variable', '/self/by_variable') => "Single line example"
>
> OK, this makes sense. It also answers my question above.
>
>> Let's try this:
>>
>> /self {timeout: 10}     # Test's defaults
>> /self/by_duration/long {timeout: 60, sleep: 30}
>> /self/by_tool/builtin {timeout: 10}@/self {}
>>
>> params.get(timeout) => raises exception as it's defined multiple times, but
>> is it what we want?
>>
>> + namespaces are really separated
>> - we can't override defaults inside variants
>>
>> We can use params.get(timeout, /self/by_duration), but that would fail if we
>> remove the multiplex file with /by_duration (as there would be no
>> */by_duration in params).
>>
>> We can make /self special and say it contains defaults. So in case path or
>> key are not found, we look there:
>>
>> /self/by_duration/long {timeout: 60, sleep: 30}
>> /self/by_tool/builtin
>> ---
>> /defaults {timeout: 10}
>>
>> params.get(timeout) => 60   # if in other run timeout is not found, uses 10
>>
>>
>> This solution also has some holes. Users must make sure not to use keys from
>> /defaults multiple times inside variants:
>>
>> /self/by_duration/long {timeout: 60, sleep: 30}
>> /self/count/many {timeout: 120, count: 3}
>> ---
>> /defaults {timeout: 10, count: 1}
>>
>> params.get(timeout) => Exception("Multiple occurances")
>>
>> ^^ I know this example makes little sense, but it demonstrates possible
>> issues.
>
> Sure. Maybe enforce the use of branches to explicitly tell from where
> the variable comes from?
>
Yes, that's one way, but it only moves the problem into 2nd level 
(/self/by_duration can be again multiplexed). And it gets even worse if 
you extend the test in downstream.

>> Additionally we should consider another problem. We have similar example as
>> before:
>>
>> params.get(count, /self/count) => 3
>>
>> in next run we remove the multiplex file specifying /self/count* leafs
>>
>> params.get(count, /self/count) => Exception("Path not found")
>>
>> should we consider supplying /defaults if path not found? (=> 1)
>
> The way I'm seeing this, the exception is fine. Maybe we should stick
> with the 'better explicit than implicit' principle of py development?
> :)
>
Yep, I believe it makes sense here. And it's sane to require this from 
test developers as there is nothing worse than trying to see where this 
parameter came from (Autotest..., why the queue is still 4 when I 
explicitly set it to 0...)

>>
>> Logging
>> -------
>>
>> While thinking about optimization and usage, do we need to log all the
>> config variables? I remember how hard is it to find something when looking
>> at Autotest's output. So why not to log first occurrence of params.get()
>> instead?
>>
>> ...
>> DEBUG: PARAM $name ($path) => $value
>> ...
>
> The problem with this, of course, is that we'd lose some information,
> that could be useful during debugging. But maybe logging each instance
> that actually calls for parameters, is a solution.
>
Not necessarily. We maintain the access to variables so we can easily 
see which one is read or overwritten. And if some of the variables are 
not used at all, it means we don't miss them later when debugging the 
results (they were simply never used). And if you want to see all 
params, you can use `multiplex -v` for that.

>>
>> We can log them:
>>
>> 1. in main log when it's inquired (we might need to use grep to see them
>> all)
>> 2. in main log after the test (in case of panic we might not see them)
>> 3. in separate file (sorted alphabetically)
>> 4. in separate file (sorted chronologically)
>> 5. combination, eg: log all of them into special file and after the test
>> remove
>>     the file and put only used one there (or in main log).
>
> I like logging each time we retrieve a config value from the params
> and having all params on a separate file.
>
>> Benefits:
>>
>> 1. Speed&mem (some leafs might not been processed)
>> 2. See what variables were used and when (first inquire)
>> 3. Shorter logs
>
> For speed and mem, my proposal is bad, since we'd still have to
> evaluate/process all the leaves.
>
Well as I mentioned, we might not need to log unused variables as they 
were not used in the test. When you want to get all variables, you use 
`avocado multiplex ...`. (btw. that brought me an idea, that we should 
probably rework the `avocado multiplex` to support the exact same line 
as `avocado run`. If we have all tests in `/tests` then users would see 
the exact tree which is being executed by simply replacing `avocado run` 
for `avocado multiplex`. No other changes (with the possibility to queue 
for --multiplex-tree, --multiplex-values, --multiplex-debug, ...))

>> Drawbacks:
>>
>> 1. Necessity to look into another file or at the end instead of beginning.
>
> I'd say this is a negligible drawback. I'm all for shorter and more
> comprehensible log.
>
>>
>> Results
>> -------
>>
>> Ok, this was long as multiplexer is something new and very flexible right
>> now. Let me summarize it by providing example of what I'd choose, please
>> feel free to disagree, comment and give advises. (also I'm aware the
>> previous brain-dump has big holes in it, but I don't want to avoid rewriting
>> this using what I realize while writing it).
>>
>> Structure (optional)
>> ~~~~~~~~~~~~~~~~~~~~
>>
>> /config     # /etc/avocad/... (using yaml to allow multiplexation)
>> /plugins    # When plugin asks for it (eg. `test.VirtTest` would process
>>              #                          the virt's yaml file before
>>              #                          test execution.
>> /self       # Default location when -m used
>
> OK, so /self is a 'root' of sorts.
>
Well more-like python class `self`, or C++ `this`, or in BASH `~/` dir. 
It's bond to the currently executed test.

>> /test       # Parsed from cmdline urls (or by mapping file),
>>              # each variant extended of `Test.get_variants()`
>>              # not available inside the test, only during test parsing...
>>
>> Demonstration
>> ~~~~~~~~~~~~~
>>
>> Instead of drawing trees I'll write yaml representing the tree
>>
>> # /etc/avocado/avocado.yaml
>> !using : /config
>> datadir:
>>      log: "/tmp/avocado"
>>
>> # myfile.yaml
>> key: value
>> by_something:
>>      variant_a:
>>      variant_b:
>>
>> avocado run passtest.py boot.py -m myfile.yaml
>>
>> # After params parsing looks like this:
>> config:
>>      datadir:
>>          log: "/tmp/avocado"
>> self:
>>      key: value
>>      by_something:
>>          variant_a:
>>          variant_b:
>> tests:
>>      passtest.py:
>>          id: passtest.py
>>      boot.py:
>>          id: boot.py
>>
>> # Then it starts executing tests
>> # passtest.py:
>> # 1. executes Test.get_variants()
>> self:
>>      id: passtest.py
>> # 2. merges this tree and the original one (without tests)
>> config: # unchanged
>> self:
>>      id: passtest.py
>>      key: value
>>      by_something:
>>          variant_a:
>>          variant_b:
>> # 3. executes all variants (2)
>>
>> # boot.py:
>> # 1. executes Test.get_variants()
>> plugins:
>>      virt:
>>          hw:
>>              disk:
>>                  ide:
>>                  scsi:
>> self:
>>      id: boot.py
>> # 2. merges this tree and the original one (without tests)
>> config: # unchanged
>> plugins: # unchanged
>> self:
>>      id: boot.py
>>      key: value
>>      by_something:
>>          variant_a:
>>          variant_b:
>> 3) executes all variants (4)
>>
>> This requires not to reuse keys from /self as all children would share them
>> and params.get(key) would raise Exception. In my opinion this is sensible
>> and it simplifies the usage.
>
> I'm not sure how I feel about having keys in /self not reusable. This
> could work, although people would shoot themselves in the foot quite
> frequently.
>
Well in Autotest we had global names everywhere and I never had problems 
inside any complex test with param names. (not even in virtio_console 
which contains lots of variants). The problem was always in the 
framework. And framework (in avocado plugins) should define strict 
structure. For me tests are something more flexible, which you can 
easily extend on the command line using the -m $prefix:$file, or in 
up/downstream fashion. I'd expect users to use simple, usually none or 
one level of multiplexation (the biggest qemu virt-test cfg is 
`qemu_cpu` with 624 lines and it can be easily described with 2 levels 
of multiplexation in avocado, when you separate the subtests which in 
avocado are different files then you end up with couple of tests with 
1-level of multiplexation).

Therefor I think this is a sane requirement and users who want to define 
the same names multiple-times inside /self are free to do that, they 
only need to make sure their test will use the full paths.

^^ which reminds me a Al Viro's quote: Giving root the power to shoot 
himself in the foot is one thing. Giving root a loaded gun pointed at 
his foot with the hammer pulled back, and a sign that says I dare you to 
pull the trigger, seems like a bad idea.

(I believe this requirement is still on the "power to shoot himself" 
part of the scale)

>>
>> Params
>> ~~~~~~
>>
>> params.get(key, default=None, strict=False, path=None)
>>    * path:
>>      * None  => "/self[/.*]?"
>>      * PATH  => "/self/[/.*]?PATH[/.*]?"
>>      * /PATH => "/PATH[/.*]?
>>      * '*'   => .*    # eg: */PATH => .*/PATH[/.*]?
>>                       # eg: PA*TH => "/self/[/.*]?PA.*TH[/.*]?"
>>    * strict: exception on missing value?
>>    * default: default value when not found
>>    * Exception when multiple matches of !different! value
>>      In case values are the same, use them (inherited values)
>>
>> params.get_leaves(path) => return all leaves matching path
>>
>> params.get_leaf(path) => return single leaf or exception when 0 or 2+
>
> OK
>
>>
>> Locations
>> ~~~~~~~~~
>>
>> 1. /etc/avocado/avocado.yaml + conf.d/* files (usually !using: /config)
>> 2. API for plugins to inject tree (during init or Test.get_variants())
>
> So, for example, the virt plugin can inject a tree during
> initialization and therefore provide a standard matrix of guest
> os/QEMU hardware? That seems good.
>
>> 3. cmdline $test1 $test2 $test3 => /tests/HASH {url: $test1}, ...
>>     extended by Test.get_variants() during execution.
>
> I didn't understand 3.
>
I'm sorry I shortened this probably too much. It means that when you 
specify files/directories on the command line, they'd be put into the 
/tests using $HASH. This means you'd be able to specify tests booth on 
the command line and inside the multiplex file. Then after 
multiplexation you'd use params.get('uid', '/tests/'), which returns the 
current test. You'd execute the test's self.get_variants() 
(staticmethod) which returns the additional per-test variants 
(avocado.test.Test would only return the content of 
`$test.data/\d\d_multiplex.yaml`; `avocado-virt.test.VirtTest` would 
return the same + the virt.yaml, ...).

You'd be also able to override this function and return your custom 
variants using python...

>> 4. "-m $path" extends /self
>> 5. "-m $path:$file" extends $path (when $path starts with '/', extends
>>                                     global, otherwise injects into /self)
>> 6. when -m 10_file.yaml specified, all files matching \d\d_file.yaml
>>     files are loaded.
>
> OK, I understood 4,5,6, looks good.
>
>
>>
>>
>> Output
>> ~~~~~~
>>
>> 1. Dump first occurrence of the params.get() in the main log with common
>> prefix to be able to grep for it:
>>
>> ...
>> INFO | Starting machine
>> DEBUG| PARAM: /plugins/virt/os/fedora/21/64 image = image.qcow2
>> DEBUG| Logging into machine...
>> DEBUG| PARAM: /plugins/virt/os/fedora/21/64 password = 123456
>> DEBUG| PARAM: /self exec = sleep 1
>> DEBUG| Executing binary "sleep 1"
>> ...
>
> I'd say this + a file with a collection of all params used. It could
> be only the used params, so we could save some memory/cycles by not
> evaluating all values.
IMO saving another file with the same content is duplicitous as you can 
get the content by `grep -e DEBUG\| PARAM: `, but the True is we can 
sort it by something else to make it more readable...

>>
>> Execution
>> ~~~~~~~~~
>>
>> --mux NOMUX|DEFAULT|ALL
>>
>> where plugins and tests should look at /config/multiplex/mux value to
>> distinguish between variants.
>>
>> NOMUX => only execute single variant of the test with defaults
>> DEFAULT => basic multiplexation of the file
>> ALL => multiplex everything
>>
>> NOTE: It's possible to say --nomux -m /:$FILE to use only custom multiplex
>> file (/config and /test would be merged).
>> NOTE2: --mux DEFAULT with virt test would mean use default
>> Test.get_variants() but don't parse the $virttest.yaml.
>
> OK.
>
>>
>> Ugh, bzzz, grrr, kukikuki, agh... please feel free to comment or just
>> provide examples of where it fails and why. Examples are important.
>
> I know this wasn't the best feedback ever, it was just a dump of my
> thoughts during the reading process. I guess I'll have to sleep on it
> a bit more.
>
Thank you for this awesome feedback, even this document wasn't a 
specification, or detailed report. It was only a braindump and you 
provided valid concerns. I guess we can talk about this during the SCRUM 
meeting...

> Cheers,
>
> Lucas
>