[Avocado-devel] Multiplexer and params retrieval

Lukáš Doktor ldoktor at redhat.com
Wed Feb 25 14:54:34 UTC 2015


Hello World,

I'm working on the 
https://trello.com/c/vKxylMFN/115-multiplexer-mechanism-for-tests-to-retrieve-variables 
card and having some design decision concerns. I'd like to ask you for 
your opinions, examples, concerns or demands :D


Multiplexer
===========

Ultimate goal
-------------

Possibility to specify test parameters, allowing simple 
upstream-downstream modification, supplying different files on command 
line or even injecting nodes/variables from cmdline to use different 
params for different runs.

Additionally we must avoid params name clashes.

Ideally this should be simple enough to be understood without any prior 
knowledge.


Tree
----

Let's say tree is the way we want to go:

-*- by_something -*- A
  |                \- B
  \- by_else -*- A
              \- B

Produces:
1. /by_something/A + /by_else/A
2. /by_something/A + /by_else/B
3. /by_something/B + /by_else/A
4. /by_something/B + /by_else/B

and we can ask for:
params.get(key) => can generate errors when multiple matches
params.get(key, path=/by_something)
params.get(key, path=/by_else)

Upstream-downstream
-------------------

One way I come up with is to use default multiplex files location:

/etc/avocado/config.yaml    => avocado config (could be parsed from INI)
$AVOCADO/global.yaml        => default variables (use /etc/avocado/ 
instead?)
$PLUGIN                     => plugins should be able to specify default 
mux too
$TEST_DATA/multiplex.yaml   => per-test additional combinations

Not all of the files needs to be proceed. This should depend on the 
cmdline option, which either disables multiplexation, uses only tests 
options, (maybe) uses global options but multiplex only parts specified 
in test, multiplex everything.

We might also consider using `10_` prefix and automatically load all 
files matching the name (eg.: when files "10_multiplex.yaml" and 
"20_multiplex.yaml" exists, we should include booth of them. That way 
extending up-down is just adding an additional file, which shouldn't 
clash with our default files. Thus only fast-forward merges are required 
in downstream projects). The same applies for "-m" cmdline option, when 
`-m $PATH/10_myfile.yaml` is set, we should search for other 
`\d\d_myfile files`.


Structure
---------

This one is tricky and it heavily affects the usage.

1. we might not care about the structure (allow everything for advanced 
users)
2. we can precisely define the structure (restrict everyone, but 
simplify usage)
3. we can automatically create locations (heavily affect tree modifications)
4. combine the methods


Allow everything
~~~~~~~~~~~~~~~~

This one is the most flexible but can surprise users. They are 
responsible for not touching elsewhere used variants, but if they 
succeed, they can do absolutely everything.

On the other hand it might be harder to write the tests as you need to 
know the namespace you want to get the variable from. Also with multiple 
multiplex files this can get nasty.


Precise structure
~~~~~~~~~~~~~~~~~

Let me demonstrate this on example (+ => !join, * => !mux):

-+- config -+- datadir --- paths
  |          +- sysinfo
  |          \- runner
  +- plugins -*- virt -*- hw -*- disk -*- virtio
  |                    |      |        \- ide
  |                    |      \- nic -*- virtio
  |                    |              \- rtl8139
  |                    \- os ...
  +- tests -*- HASH1
  |         \- HASH2
  \- self

This is a basic global structure. When we multiplex the file, we get 
(instead of full path I used *+leaf)
1. */paths + */sysinfo + */runner + */virtio + */virtio + */os* + /self
2. */paths + */sysinfo + */runner + */ide + */virtio + */os* + /self
...

/tests are special. It's parsed out of the cmdline's urls and/or 
specified by mapping file. For each run current test just extends the 
existing "/self". That way we can specify per-test variants.

All multiplex files goes automatically into "/self" by default. When we 
specify !using, we can override other parts too (?except /config).


Automatic locations
~~~~~~~~~~~~~~~~~~~

Similar to previous only each file (which is not using !using) goes into 
`/self/$UID` location.

This brings one benefit and drawback. It supports simple files:

# simple.yaml
short:
medium:
long:

Which works fine with multiple independent files:

# simple2.yaml
builtin:
bash:

But makes it impossible to extend this file from another file:

# extend.yaml
longest:

which results in (without the simple2.yaml):
1. */short + */longest
2. */medium + */longest
3. */long + */longest

People could use !using to avoid this problem, but that somehow 
workarounds the automatic location and creates the same structure as 1 or 2.

Also it brings problems to params.get(key, "/self/by_something") as in 
reality it would become "/self/$UID/by_something". Should we search for 
"*/by_something" instead? Or differentiate between "/by_something" vs. 
"by_something" and in case of relative path prepend "/self/*"?.


Combined
~~~~~~~~

Possibilities are unlimited.


Command line
~~~~~~~~~~~~

We also need to consider where -m files go (when not using !using). I'd 
vote for /self by default, /self/$UID in case of automatic location.

Also we might allow -m [$location:]$file to dynamically extend the 
location. (I tend to prefer this to hard-coded locations inside the 
file). Take another look at the "Structure.Automatic locations" 
simple.yaml and simle2.yaml. Instead of automatic handling we can just 
let users say:

-m /self/by_duration:simple.yaml -m /self/by_tool:simple2.yaml -m 
/self/by_duration:extend.yaml

Which does booth; extends the simple.yaml by extend.yaml and adds second 
set of variants from simple2.yaml.

This itself can help the basic users to create variants ad-hoc while not 
tying our hands to write complex yaml files.


Getting the params
------------------

OK when we have the tree structure parsed into variants, we execute the 
tests using the params extracted from variants:

Let's say our variant1 is:
/config* {...}
/plugins/virt/hw/disk/virtio {}
/plugins/virt/hw/nic/virtio {}
/plugins/virt/os/linux/fedora/21/64 {file:img.qcow2, kickstart:ks.ks, ...}
/self {file:boot.py, tag:, exec:sleep 1}

# the plugin uses
params.get_leaves(/plugins/virt/hw) => [*/disk/virtio, */nic/virtio]
params.get_leaf(/plugins/virt/hw/disk) => virtio
params.get_leaf(/plugins/virt/hw/nic) => virtio
params.get(file, /plugins/virt/os) => img.qcow2

# in test we use
params.get(exec) => sleep 1


Now imagine that second test defines multiple variants:

...
/self {file:boot.py, tag:}
/self/by_command/sleep {command:sleep 1}

now we use params.get(command, /self/by_command) to avoid clashes with 
variables from /self.


Clashes
~~~~~~~

Simple example:

/self/by_variable/oneliner {variable: "Single line example"}
/self/by_other/something {variable: 123}

params.get('variable') => Exception("Multiple occurances found")
params.get('variable', '/self/by_variable') => "Single line example"

Let's try this:

/self {timeout: 10}     # Test's defaults
/self/by_duration/long {timeout: 60, sleep: 30}
/self/by_tool/builtin {timeout: 10}@/self {}

params.get(timeout) => raises exception as it's defined multiple times, 
but is it what we want?

+ namespaces are really separated
- we can't override defaults inside variants

We can use params.get(timeout, /self/by_duration), but that would fail 
if we remove the multiplex file with /by_duration (as there would be no 
*/by_duration in params).

We can make /self special and say it contains defaults. So in case path 
or key are not found, we look there:

/self/by_duration/long {timeout: 60, sleep: 30}
/self/by_tool/builtin
---
/defaults {timeout: 10}

params.get(timeout) => 60   # if in other run timeout is not found, uses 10


This solution also has some holes. Users must make sure not to use keys 
from /defaults multiple times inside variants:

/self/by_duration/long {timeout: 60, sleep: 30}
/self/count/many {timeout: 120, count: 3}
---
/defaults {timeout: 10, count: 1}

params.get(timeout) => Exception("Multiple occurances")

^^ I know this example makes little sense, but it demonstrates possible 
issues.

Additionally we should consider another problem. We have similar example 
as before:

params.get(count, /self/count) => 3

in next run we remove the multiplex file specifying /self/count* leafs

params.get(count, /self/count) => Exception("Path not found")

should we consider supplying /defaults if path not found? (=> 1)


Logging
-------

While thinking about optimization and usage, do we need to log all the 
config variables? I remember how hard is it to find something when 
looking at Autotest's output. So why not to log first occurrence of 
params.get() instead?

...
DEBUG: PARAM $name ($path) => $value
...


We can log them:

1. in main log when it's inquired (we might need to use grep to see them 
all)
2. in main log after the test (in case of panic we might not see them)
3. in separate file (sorted alphabetically)
4. in separate file (sorted chronologically)
5. combination, eg: log all of them into special file and after the test 
remove
    the file and put only used one there (or in main log).

Benefits:

1. Speed&mem (some leafs might not been processed)
2. See what variables were used and when (first inquire)
3. Shorter logs

Drawbacks:

1. Necessity to look into another file or at the end instead of beginning.


Results
-------

Ok, this was long as multiplexer is something new and very flexible 
right now. Let me summarize it by providing example of what I'd choose, 
please feel free to disagree, comment and give advises. (also I'm aware 
the previous brain-dump has big holes in it, but I don't want to avoid 
rewriting this using what I realize while writing it).

Structure (optional)
~~~~~~~~~~~~~~~~~~~~

/config     # /etc/avocad/... (using yaml to allow multiplexation)
/plugins    # When plugin asks for it (eg. `test.VirtTest` would process
             #                          the virt's yaml file before
             #                          test execution.
/self       # Default location when -m used
/test       # Parsed from cmdline urls (or by mapping file),
             # each variant extended of `Test.get_variants()`
             # not available inside the test, only during test parsing...

Demonstration
~~~~~~~~~~~~~

Instead of drawing trees I'll write yaml representing the tree

# /etc/avocado/avocado.yaml
!using : /config
datadir:
     log: "/tmp/avocado"

# myfile.yaml
key: value
by_something:
     variant_a:
     variant_b:

avocado run passtest.py boot.py -m myfile.yaml

# After params parsing looks like this:
config:
     datadir:
         log: "/tmp/avocado"
self:
     key: value
     by_something:
         variant_a:
         variant_b:
tests:
     passtest.py:
         id: passtest.py
     boot.py:
         id: boot.py

# Then it starts executing tests
# passtest.py:
# 1. executes Test.get_variants()
self:
     id: passtest.py
# 2. merges this tree and the original one (without tests)
config: # unchanged
self:
     id: passtest.py
     key: value
     by_something:
         variant_a:
         variant_b:
# 3. executes all variants (2)

# boot.py:
# 1. executes Test.get_variants()
plugins:
     virt:
         hw:
             disk:
                 ide:
                 scsi:
self:
     id: boot.py
# 2. merges this tree and the original one (without tests)
config: # unchanged
plugins: # unchanged
self:
     id: boot.py
     key: value
     by_something:
         variant_a:
         variant_b:
3) executes all variants (4)

This requires not to reuse keys from /self as all children would share 
them and params.get(key) would raise Exception. In my opinion this is 
sensible and it simplifies the usage.


Params
~~~~~~

params.get(key, default=None, strict=False, path=None)
   * path:
     * None  => "/self[/.*]?"
     * PATH  => "/self/[/.*]?PATH[/.*]?"
     * /PATH => "/PATH[/.*]?
     * '*'   => .*    # eg: */PATH => .*/PATH[/.*]?
                      # eg: PA*TH => "/self/[/.*]?PA.*TH[/.*]?"
   * strict: exception on missing value?
   * default: default value when not found
   * Exception when multiple matches of !different! value
     In case values are the same, use them (inherited values)

params.get_leaves(path) => return all leaves matching path

params.get_leaf(path) => return single leaf or exception when 0 or 2+


Locations
~~~~~~~~~

1. /etc/avocado/avocado.yaml + conf.d/* files (usually !using: /config)
2. API for plugins to inject tree (during init or Test.get_variants())
3. cmdline $test1 $test2 $test3 => /tests/HASH {url: $test1}, ...
    extended by Test.get_variants() during execution.
4. "-m $path" extends /self
5. "-m $path:$file" extends $path (when $path starts with '/', extends
                                    global, otherwise injects into /self)
6. when -m 10_file.yaml specified, all files matching \d\d_file.yaml
    files are loaded.


Output
~~~~~~

1. Dump first occurrence of the params.get() in the main log with common 
prefix to be able to grep for it:

...
INFO | Starting machine
DEBUG| PARAM: /plugins/virt/os/fedora/21/64 image = image.qcow2
DEBUG| Logging into machine...
DEBUG| PARAM: /plugins/virt/os/fedora/21/64 password = 123456
DEBUG| PARAM: /self exec = sleep 1
DEBUG| Executing binary "sleep 1"
...


Execution
~~~~~~~~~

--mux NOMUX|DEFAULT|ALL

where plugins and tests should look at /config/multiplex/mux value to 
distinguish between variants.

NOMUX => only execute single variant of the test with defaults
DEFAULT => basic multiplexation of the file
ALL => multiplex everything

NOTE: It's possible to say --nomux -m /:$FILE to use only custom 
multiplex file (/config and /test would be merged).
NOTE2: --mux DEFAULT with virt test would mean use default 
Test.get_variants() but don't parse the $virttest.yaml.


Ugh, bzzz, grrr, kukikuki, agh... please feel free to comment or just 
provide examples of where it fails and why. Examples are important.

Best regards,
Lukáš




More information about the Avocado-devel mailing list