[Avocado-devel] Multiplexer and params retrieval

Lucas Meneghel Rodrigues lookkas at gmail.com
Thu Feb 26 22:23:21 UTC 2015


On Wed, Feb 25, 2015 at 11:54 AM, Lukáš Doktor <ldoktor at redhat.com> wrote:
> Hello World,
>
> I'm working on the
> https://trello.com/c/vKxylMFN/115-multiplexer-mechanism-for-tests-to-retrieve-variables
> card and having some design decision concerns. I'd like to ask you for your
> opinions, examples, concerns or demands :D
>

OK, below you'll find some comments.

> Multiplexer
> ===========
>
> Ultimate goal
> -------------
>
> Possibility to specify test parameters, allowing simple upstream-downstream
> modification, supplying different files on command line or even injecting
> nodes/variables from cmdline to use different params for different runs.
>
> Additionally we must avoid params name clashes.
>
> Ideally this should be simple enough to be understood without any prior
> knowledge.
>
>
> Tree
> ----
>
> Let's say tree is the way we want to go:
>
> -*- by_something -*- A
>  |                \- B
>  \- by_else -*- A
>              \- B
>
> Produces:
> 1. /by_something/A + /by_else/A
> 2. /by_something/A + /by_else/B
> 3. /by_something/B + /by_else/A
> 4. /by_something/B + /by_else/B
>
> and we can ask for:
> params.get(key) => can generate errors when multiple matches
> params.get(key, path=/by_something)
> params.get(key, path=/by_else)
>
> Upstream-downstream
> -------------------
>
> One way I come up with is to use default multiplex files location:
>
> /etc/avocado/config.yaml    => avocado config (could be parsed from INI)
> $AVOCADO/global.yaml        => default variables (use /etc/avocado/
> instead?)
> $PLUGIN                     => plugins should be able to specify default mux
> too
> $TEST_DATA/multiplex.yaml   => per-test additional combinations
>
> Not all of the files needs to be proceed. This should depend on the cmdline
> option, which either disables multiplexation, uses only tests options,
> (maybe) uses global options but multiplex only parts specified in test,
> multiplex everything.
>
> We might also consider using `10_` prefix and automatically load all files
> matching the name (eg.: when files "10_multiplex.yaml" and
> "20_multiplex.yaml" exists, we should include booth of them. That way
> extending up-down is just adding an additional file, which shouldn't clash
> with our default files. Thus only fast-forward merges are required in
> downstream projects). The same applies for "-m" cmdline option, when `-m
> $PATH/10_myfile.yaml` is set, we should search for other `\d\d_myfile
> files`.
>
>
> Structure
> ---------
>
> This one is tricky and it heavily affects the usage.
>
> 1. we might not care about the structure (allow everything for advanced
> users)
> 2. we can precisely define the structure (restrict everyone, but simplify
> usage)
> 3. we can automatically create locations (heavily affect tree modifications)
> 4. combine the methods
>
>
> Allow everything
> ~~~~~~~~~~~~~~~~
>
> This one is the most flexible but can surprise users. They are responsible
> for not touching elsewhere used variants, but if they succeed, they can do
> absolutely everything.
>
> On the other hand it might be harder to write the tests as you need to know
> the namespace you want to get the variable from. Also with multiple
> multiplex files this can get nasty.
>
>
> Precise structure
> ~~~~~~~~~~~~~~~~~
>
> Let me demonstrate this on example (+ => !join, * => !mux):
>
> -+- config -+- datadir --- paths
>  |          +- sysinfo
>  |          \- runner
>  +- plugins -*- virt -*- hw -*- disk -*- virtio
>  |                    |      |        \- ide
>  |                    |      \- nic -*- virtio
>  |                    |              \- rtl8139
>  |                    \- os ...
>  +- tests -*- HASH1
>  |         \- HASH2
>  \- self
>
> This is a basic global structure. When we multiplex the file, we get
> (instead of full path I used *+leaf)
> 1. */paths + */sysinfo + */runner + */virtio + */virtio + */os* + /self
> 2. */paths + */sysinfo + */runner + */ide + */virtio + */os* + /self
> ...
>
> /tests are special. It's parsed out of the cmdline's urls and/or specified
> by mapping file. For each run current test just extends the existing
> "/self". That way we can specify per-test variants.
>
> All multiplex files goes automatically into "/self" by default. When we
> specify !using, we can override other parts too (?except /config).
>
>
> Automatic locations
> ~~~~~~~~~~~~~~~~~~~
>
> Similar to previous only each file (which is not using !using) goes into
> `/self/$UID` location.
>
> This brings one benefit and drawback. It supports simple files:
>
> # simple.yaml
> short:
> medium:
> long:
>
> Which works fine with multiple independent files:
>
> # simple2.yaml
> builtin:
> bash:
>
> But makes it impossible to extend this file from another file:
>
> # extend.yaml
> longest:
>
> which results in (without the simple2.yaml):
> 1. */short + */longest
> 2. */medium + */longest
> 3. */long + */longest
>
> People could use !using to avoid this problem, but that somehow workarounds
> the automatic location and creates the same structure as 1 or 2.
>
> Also it brings problems to params.get(key, "/self/by_something") as in
> reality it would become "/self/$UID/by_something". Should we search for
> "*/by_something" instead? Or differentiate between "/by_something" vs.
> "by_something" and in case of relative path prepend "/self/*"?.
>
>
> Combined
> ~~~~~~~~
>
> Possibilities are unlimited.
>
>
> Command line
> ~~~~~~~~~~~~
>
> We also need to consider where -m files go (when not using !using). I'd vote
> for /self by default, /self/$UID in case of automatic location.
>
> Also we might allow -m [$location:]$file to dynamically extend the location.
> (I tend to prefer this to hard-coded locations inside the file).

Sure, it makes sense. I wonder if there's any use case where using
this over hard coded would bring problems. I'm saying that because we
used to use 'hardcoded' in cartesian files.

> Take
> another look at the "Structure.Automatic locations" simple.yaml and
> simle2.yaml. Instead of automatic handling we can just let users say:
>
> -m /self/by_duration:simple.yaml -m /self/by_tool:simple2.yaml -m
> /self/by_duration:extend.yaml
>
> Which does booth; extends the simple.yaml by extend.yaml and adds second set
> of variants from simple2.yaml.

Does this mean that you start a branch /self/by_duration, and then
inserts the contents of simple.yaml into it, then another
/self/by_tool branch and then insert simple2.yaml into it, then extend
what's on simple.yaml by adding extend.yaml? That sounds reasonable,
although I'm not sure how I feel about starting branches from the
cmdline. This is new stuff to me :)

>
> This itself can help the basic users to create variants ad-hoc while not
> tying our hands to write complex yaml files.
>
>
> Getting the params
> ------------------
>
> OK when we have the tree structure parsed into variants, we execute the
> tests using the params extracted from variants:
>
> Let's say our variant1 is:
> /config* {...}
> /plugins/virt/hw/disk/virtio {}
> /plugins/virt/hw/nic/virtio {}
> /plugins/virt/os/linux/fedora/21/64 {file:img.qcow2, kickstart:ks.ks, ...}
> /self {file:boot.py, tag:, exec:sleep 1}
>
> # the plugin uses
> params.get_leaves(/plugins/virt/hw) => [*/disk/virtio, */nic/virtio]
> params.get_leaf(/plugins/virt/hw/disk) => virtio
> params.get_leaf(/plugins/virt/hw/nic) => virtio
> params.get(file, /plugins/virt/os) => img.qcow2

So get_leaves/get_leaf would return tree nodes/ list of tree nodes?

> # in test we use
> params.get(exec) => sleep 1

OK, I guess we are doing something similar currently. I wonder if
tests should receive tree nodes instead of plain dicts.

>
> Now imagine that second test defines multiple variants:
>
> ...
> /self {file:boot.py, tag:}
> /self/by_command/sleep {command:sleep 1}
>
> now we use params.get(command, /self/by_command) to avoid clashes with
> variables from /self.

So we'd have an API

params.get('key', 'branch')

To retrieve the value?

>
> Clashes
> ~~~~~~~
>
> Simple example:
>
> /self/by_variable/oneliner {variable: "Single line example"}
> /self/by_other/something {variable: 123}
>
> params.get('variable') => Exception("Multiple occurances found")
> params.get('variable', '/self/by_variable') => "Single line example"

OK, this makes sense. It also answers my question above.

> Let's try this:
>
> /self {timeout: 10}     # Test's defaults
> /self/by_duration/long {timeout: 60, sleep: 30}
> /self/by_tool/builtin {timeout: 10}@/self {}
>
> params.get(timeout) => raises exception as it's defined multiple times, but
> is it what we want?
>
> + namespaces are really separated
> - we can't override defaults inside variants
>
> We can use params.get(timeout, /self/by_duration), but that would fail if we
> remove the multiplex file with /by_duration (as there would be no
> */by_duration in params).
>
> We can make /self special and say it contains defaults. So in case path or
> key are not found, we look there:
>
> /self/by_duration/long {timeout: 60, sleep: 30}
> /self/by_tool/builtin
> ---
> /defaults {timeout: 10}
>
> params.get(timeout) => 60   # if in other run timeout is not found, uses 10
>
>
> This solution also has some holes. Users must make sure not to use keys from
> /defaults multiple times inside variants:
>
> /self/by_duration/long {timeout: 60, sleep: 30}
> /self/count/many {timeout: 120, count: 3}
> ---
> /defaults {timeout: 10, count: 1}
>
> params.get(timeout) => Exception("Multiple occurances")
>
> ^^ I know this example makes little sense, but it demonstrates possible
> issues.

Sure. Maybe enforce the use of branches to explicitly tell from where
the variable comes from?

> Additionally we should consider another problem. We have similar example as
> before:
>
> params.get(count, /self/count) => 3
>
> in next run we remove the multiplex file specifying /self/count* leafs
>
> params.get(count, /self/count) => Exception("Path not found")
>
> should we consider supplying /defaults if path not found? (=> 1)

The way I'm seeing this, the exception is fine. Maybe we should stick
with the 'better explicit than implicit' principle of py development?
:)

>
> Logging
> -------
>
> While thinking about optimization and usage, do we need to log all the
> config variables? I remember how hard is it to find something when looking
> at Autotest's output. So why not to log first occurrence of params.get()
> instead?
>
> ...
> DEBUG: PARAM $name ($path) => $value
> ...

The problem with this, of course, is that we'd lose some information,
that could be useful during debugging. But maybe logging each instance
that actually calls for parameters, is a solution.

>
> We can log them:
>
> 1. in main log when it's inquired (we might need to use grep to see them
> all)
> 2. in main log after the test (in case of panic we might not see them)
> 3. in separate file (sorted alphabetically)
> 4. in separate file (sorted chronologically)
> 5. combination, eg: log all of them into special file and after the test
> remove
>    the file and put only used one there (or in main log).

I like logging each time we retrieve a config value from the params
and having all params on a separate file.

> Benefits:
>
> 1. Speed&mem (some leafs might not been processed)
> 2. See what variables were used and when (first inquire)
> 3. Shorter logs

For speed and mem, my proposal is bad, since we'd still have to
evaluate/process all the leaves.

> Drawbacks:
>
> 1. Necessity to look into another file or at the end instead of beginning.

I'd say this is a negligible drawback. I'm all for shorter and more
comprehensible log.

>
> Results
> -------
>
> Ok, this was long as multiplexer is something new and very flexible right
> now. Let me summarize it by providing example of what I'd choose, please
> feel free to disagree, comment and give advises. (also I'm aware the
> previous brain-dump has big holes in it, but I don't want to avoid rewriting
> this using what I realize while writing it).
>
> Structure (optional)
> ~~~~~~~~~~~~~~~~~~~~
>
> /config     # /etc/avocad/... (using yaml to allow multiplexation)
> /plugins    # When plugin asks for it (eg. `test.VirtTest` would process
>             #                          the virt's yaml file before
>             #                          test execution.
> /self       # Default location when -m used

OK, so /self is a 'root' of sorts.

> /test       # Parsed from cmdline urls (or by mapping file),
>             # each variant extended of `Test.get_variants()`
>             # not available inside the test, only during test parsing...
>
> Demonstration
> ~~~~~~~~~~~~~
>
> Instead of drawing trees I'll write yaml representing the tree
>
> # /etc/avocado/avocado.yaml
> !using : /config
> datadir:
>     log: "/tmp/avocado"
>
> # myfile.yaml
> key: value
> by_something:
>     variant_a:
>     variant_b:
>
> avocado run passtest.py boot.py -m myfile.yaml
>
> # After params parsing looks like this:
> config:
>     datadir:
>         log: "/tmp/avocado"
> self:
>     key: value
>     by_something:
>         variant_a:
>         variant_b:
> tests:
>     passtest.py:
>         id: passtest.py
>     boot.py:
>         id: boot.py
>
> # Then it starts executing tests
> # passtest.py:
> # 1. executes Test.get_variants()
> self:
>     id: passtest.py
> # 2. merges this tree and the original one (without tests)
> config: # unchanged
> self:
>     id: passtest.py
>     key: value
>     by_something:
>         variant_a:
>         variant_b:
> # 3. executes all variants (2)
>
> # boot.py:
> # 1. executes Test.get_variants()
> plugins:
>     virt:
>         hw:
>             disk:
>                 ide:
>                 scsi:
> self:
>     id: boot.py
> # 2. merges this tree and the original one (without tests)
> config: # unchanged
> plugins: # unchanged
> self:
>     id: boot.py
>     key: value
>     by_something:
>         variant_a:
>         variant_b:
> 3) executes all variants (4)
>
> This requires not to reuse keys from /self as all children would share them
> and params.get(key) would raise Exception. In my opinion this is sensible
> and it simplifies the usage.

I'm not sure how I feel about having keys in /self not reusable. This
could work, although people would shoot themselves in the foot quite
frequently.

>
> Params
> ~~~~~~
>
> params.get(key, default=None, strict=False, path=None)
>   * path:
>     * None  => "/self[/.*]?"
>     * PATH  => "/self/[/.*]?PATH[/.*]?"
>     * /PATH => "/PATH[/.*]?
>     * '*'   => .*    # eg: */PATH => .*/PATH[/.*]?
>                      # eg: PA*TH => "/self/[/.*]?PA.*TH[/.*]?"
>   * strict: exception on missing value?
>   * default: default value when not found
>   * Exception when multiple matches of !different! value
>     In case values are the same, use them (inherited values)
>
> params.get_leaves(path) => return all leaves matching path
>
> params.get_leaf(path) => return single leaf or exception when 0 or 2+

OK

>
> Locations
> ~~~~~~~~~
>
> 1. /etc/avocado/avocado.yaml + conf.d/* files (usually !using: /config)
> 2. API for plugins to inject tree (during init or Test.get_variants())

So, for example, the virt plugin can inject a tree during
initialization and therefore provide a standard matrix of guest
os/QEMU hardware? That seems good.

> 3. cmdline $test1 $test2 $test3 => /tests/HASH {url: $test1}, ...
>    extended by Test.get_variants() during execution.

I didn't understand 3.

> 4. "-m $path" extends /self
> 5. "-m $path:$file" extends $path (when $path starts with '/', extends
>                                    global, otherwise injects into /self)
> 6. when -m 10_file.yaml specified, all files matching \d\d_file.yaml
>    files are loaded.

OK, I understood 4,5,6, looks good.


>
>
> Output
> ~~~~~~
>
> 1. Dump first occurrence of the params.get() in the main log with common
> prefix to be able to grep for it:
>
> ...
> INFO | Starting machine
> DEBUG| PARAM: /plugins/virt/os/fedora/21/64 image = image.qcow2
> DEBUG| Logging into machine...
> DEBUG| PARAM: /plugins/virt/os/fedora/21/64 password = 123456
> DEBUG| PARAM: /self exec = sleep 1
> DEBUG| Executing binary "sleep 1"
> ...

I'd say this + a file with a collection of all params used. It could
be only the used params, so we could save some memory/cycles by not
evaluating all values.
>
> Execution
> ~~~~~~~~~
>
> --mux NOMUX|DEFAULT|ALL
>
> where plugins and tests should look at /config/multiplex/mux value to
> distinguish between variants.
>
> NOMUX => only execute single variant of the test with defaults
> DEFAULT => basic multiplexation of the file
> ALL => multiplex everything
>
> NOTE: It's possible to say --nomux -m /:$FILE to use only custom multiplex
> file (/config and /test would be merged).
> NOTE2: --mux DEFAULT with virt test would mean use default
> Test.get_variants() but don't parse the $virttest.yaml.

OK.

>
> Ugh, bzzz, grrr, kukikuki, agh... please feel free to comment or just
> provide examples of where it fails and why. Examples are important.

I know this wasn't the best feedback ever, it was just a dump of my
thoughts during the reading process. I guess I'll have to sleep on it
a bit more.

Cheers,

Lucas




More information about the Avocado-devel mailing list