[Avocado-devel] RFC: Avocado multiplexer plugin

Mon Jul 18 10:46:31 UTC 2016

On 07/07/2016 10:44 AM, Lukáš Doktor wrote:
> 0Dne 5.7.2016 v 16:10 Ademar Reis napsal(a):
>> On Fri, Jul 01, 2016 at 03:57:31PM +0200, Lukáš Doktor wrote:
>>> Dne 30.6.2016 v 22:57 Ademar Reis napsal(a):
>>>> On Thu, Jun 30, 2016 at 06:59:39PM +0200, Lukáš Doktor wrote:
>>>>> Hello guys,
>>>>>
>>>>> the purpose of this RFC is to properly split and define the way test
>>>>> parameters are processed. There are several ways to split them
>>>>> apart, each
>>>>> with some benefits and drawbacks.
>>>>>
>>>>>
>>>>> Current params process
>>>>> ======================
>>>>>
>>>>> `tree.TreeNode` -> Object allowing to store things (environment,
>>>>> filters,
>>>>> ...) in tree-structure
>>>>> `multiplexer.Mux` -> Interface between job and multiplexer. Reports
>>>>> number
>>>>> of tests and yields modified test_suite templates

There has been ideas and suggestions about using the multiplexer for
purposes other than multiplexing tests.  While those other use cases are
only ideas, `Mux` and its `itertests` method are a good enough name, but
there may be the opportunity here to provide a common and more standard
interface for different "multiplexations".  Example:

1) For tests, a class TestMux, with an standard Python iterable
interface[1].
2) For test execution location (say different hosts), a HostMux class,
with the same standard Python interface.

Even if we find no real use for other Mux* classes, having a default
iterable implementation is a good idea.  So moving the `tests` from the
`itertests` method to the `Mux` class name, from `Mux` to `TestsMux`,
feels right to me.

UPDATE: then I looked at PR #1293, and noticed that it intends to take
the custom variant responsibility out of the `multiplexer.Mux()` class.
It feels right because it moves the variant processing into its own
domain, kind of what I had in mind by naming it `TestsMux()`.  Still,
having a standard iterable interface instead of `itertests()` feels
right to me.

>>>>> `multiplexer.MuxTree` -> Object representing part of the tree from
>>>>> the root
>>>>> to leaves or another multiplex domain. Recursively it creates
>>>>> multiplexed
>>>>> variants of the full tree.
>>>>> `multiplexer.AvocadoParams` -> Params object used to retrieve
>>>>> params from
>>>>> given path, allows defining several domains for relative paths
>>>>> matching
>>>>> defined as `mux_path`s.
>>>>> `multiplexer.AvocadoParam` -> Slice of the `AvocadoParams` which
>>>>> handles
>>>>> given `mux_path`.
>>>>> `test.Test.default_params` -> Dictionary which can define test's
>>>>> default
>>>>> values, it's intended for removal for some time.
>>>>>
>>>>>
>>>>> Creating variants
>>>>> -----------------
>>>>>
>>>>> 1. app
>>>>> 2. parser -> creates the root tree `args.default_avocado_params =
>>>>> TreeNode()`

This shows the broken link between other forms of configuration (other
than command line parameters) and the parameter mechanism.  I hope to
address some of it in the Configuration Management RFC.

>>>>> 3. plugins.* -> inject key/value into `args.default_avocado_params`
>>>>> (or it's
>>>>> children). One example is `plugins.run`'s --mux-inject, the other is
>>>>> `avocado_virt`'s default values.
>>>>> 4. job -> creates multiplexer.Mux() object
>>>>>     a. If "-m" specified, parses and filters the yaml file(s),
>>>>> otherwise
>>>>> creates an empty TreeNode() called `mux_tree`
>>>>>     b. If `args.default_avocado_params` exists, it merges it into the
>>>>> `mux_tree` (no filtering of the default params)
>>>>>     c. Initializes `multiplexer.MuxTree` object using the `mux_tree`

`mux_tree` is an internal variable name and not part of any interface,
right?  Also, the checking for multiplex files (and other arguments)
happen in the context of the `multiplexer.Mux()` class and the job is
only supposed to use the outcome of `multiplexer.Mux()`, say via
`get_number_of_tests()` or `itertests()`, right?

Finally, is `multiplexer.Mux().variants` something that should be
considered public in your opinion?

>>>>> 5. job -> asks the Mux() object for number of tests
>>>>>     a. Mux iterates all MuxTree variants and reports `no_variants *
>>>>> no_tests`
>>>>> 6. runner -> iterates through test_suite
>>>>>     a. runner -> iterates through Mux:
>>>>>         i.  multiplexer.Mux -> iterates through MuxTree
>>>>>             * multiplexer.MuxTree -> yields list of leaves of the
>>>>> `mux_tree`
>>>>>         ii, yields the modified test template
>>>>>     b. runs the test template:
>>>>>         i. Test.__init__: |
>>>>>             if isinstance(params, dict):
>>>>>                 # update test's default params
>>>>>             elif params is None:
>>>>>                 # New empty multiplexer.AvocadoParams are created
>>>>>             elif isinstance(params, tuple):
>>>>>                 # multiplexer.AvocadoParams are created from params
>>>>> 7. exit
>>>>>

This overview is pretty nice.  I really feel a lot less ignorant about
this subject now! :)

>>>>> AvocadoParams initialization
>>>>> ----------------------------
>>>>>
>>>>>     def __init__(self, leaves, test_id, tag, mux_path,
>>>>> default_params):
>>>>>         """
>>>>>         :param leaves: List of TreeNode leaves defining current
>>>>> variant
>>>>>         :param test_id: test id
>>>>>         :param tag: test tag
>>>>>         :param mux_path: list of entry points
>>>>>         :param default_params: dict of params used when no matches
>>>>> found
>>>>>         """
>>>>>
>>>>> 1. Iterates through `mux_path` and creates `AvocadoParam` slices
>>>>> containing
>>>>> params from only matching nodes, storing them in `self._rel_paths`
>>>>> 2. Creates `AvocadoParam` slice containing the remaining params,
>>>>> storing
>>>>> them in `self._abs_path`
>>>>>
>>>>> Test params
>>>>> -----------
>>>>>
>>>>>     def get(self, key, path=None, default=None):
>>>>>         """
>>>>>         Retrieve value associated with key from params
>>>>>         :param key: Key you're looking for
>>>>>         :param path: namespace ['*']
>>>>>         :param default: default value when not found
>>>>>         :raise KeyError: In case of multiple different values
>>>>> (params clash)
>>>>>         """
>>>>>
>>>>> 1. Tries to obtain the (key,  path, default) from cache
>>>>> 2. Tries to get the (key, path) from `self._rel_paths`
>>>>> 3. When the `path` is abs_path it tries to get the param from
>>>>> `self._abs_path`
>>>>> 4. Looks into the Test's default params dictionary
>>>>> 5. Returns `default`
>>>>>
>>>>> Overview
>>>>> --------
>>>>>
>>>>> Basically now there are 3 components:
>>>>>
>>>>> 1. params parser (yaml => tree)
>>>>> 2. variants generator (tree => iterator)
>>>>> 3. test parameters (key => value from variant values)
>>>>>
>>>>> and we need to decide how do we want to split it and what should be
>>>>> part of
>>>>> the plugin API and what core API.
>>>>>
>>>>>
>>>>> Plugin parser->tree
>>>>> ===================
>>>>>
>>>>> We can say the `tree` is the actual params API and we could only allow
>>>>> producing custom parsers to construct the tree. This would be the
>>>>> simplest
>>>>> method as it only requires us to move the yaml parsing into the
>>>>> module and
>>>>> the `AvocadoParams` would stay as they are. The cons is that the
>>>>> plugin
>>>>> writers would only be able to produce params compatible with the tree
>>>>> structure (flat, or tree-like).
>>>>>
>>>>> If we decide to chose this method, we can keep the current avocado
>>>>> arguments
>>>>> and only allow replacing the parser plugin, eg. by `--multiplex-plugin
>>>>> NAME`. Alternatively we might even detect the suitable plugin based
>>>>> on the
>>>>> multiplex file and even allow combining them (`.cfg vs. .yaml, ...)
>>>>>
>>>>> The plugin would have to support:
>>>>>
>>>>> * parse_file(FILE) => tree_node
>>>>> * check_file(FILE) => BOOL    // in case we want automatic
>>>>> detection of
>>>>> file->plugin
>>>>>

I really think that, at this point, this is sufficient.  This delivers
what most users would like to have, that is a flat structure of
parameters.  Having good documentation on the tree-like capabilities can
allow users to write, say, database backed implementations that map
record keys to tree nodes.

>>>>> Plugin parser->variant
>>>>> ======================
>>>>>
>>>>> This would require deeper changes, but allow greater flexibility.
>>>>> We'd also
>>>>> have to chose whether we want to allow combinations, or whether the
>>>>> plugin
>>>>> should handle the whole workflow. I don't think we should allow
>>>>> combinations
>>>>> as that would imply another convention for storing the parsed results.
>>>>>

If I understand this correctly, you mean that the plugin would have a
broader scope, and not only stop at the tree creation, but continue all
the way to the production of variants, right?

If that's the case, I'd avoid this at this moment.  IMHO, feeding the
tree from different sources sounds more like what users may need than
tweaking with the variants generation itself.

In the future, if it is really needed, we may introduce a pluggable
`multiplexer.TestsMux()` interface, so that users may stick in their own
variant creation code.

>>>>> The user would configure in config or on cmdline which plugin he
>>>>> wants to
>>>>> use and the arguments would stay the same (optionally extended by the
>>>>> plugin's arguments)
>>>>>
>>>>> The plugin would have to support:
>>>>>
>>>>> * configure(parser)    // optional, add extended options like
>>>>> --mux-version
>>>>> * parse_file(FILE)    // does not return as it's up to plugin to
>>>>> store the
>>>>> results
>>>>> * inject_value(key, value, path)    // used by avocado-virt to
>>>>> inject default
>>>>> values
>>>>> * __len__()            // Return number of variants (we might want
>>>>> to extend this to
>>>>> accept TEMPLATE and allow different number of variants per
>>>>> TEMPLATE. That is
>>>>> currently not supported, but it might be handy
>>>>> * itervariants(TEMPLATE)    // yields modified TEMPLATE with params
>>>>> set in
>>>>> AvocadoParams understandable format
>>>>>
>>>>>
>>>>> Plugin AvocadoParams
>>>>> ====================
>>>>>
>>>>> I don't think we should make the AvocadoParams replaceable, but if
>>>>> we want
>>>>> to we should strictly require `params.get` compatibility so all
>>>>> tests can
>>>>> run seamlessly with all params. Anyway if we decided to make
>>>>> AvocadoParams
>>>>> replaceable, then we can create a proxy between the params and the
>>>>> plugin.
>>>>>
>>>>>

I also don't think it's needed *at this point*.  But our code should be
clear enough that, given requests, we can add this interface for
pluggable parameter handling later with minimal changes.

>>>>> Conclusion
>>>>> ==========
>>>>>
>>>>> I'm looking forward to cleaner multiplexer API. I don't think
>>>>> people would
>>>>> like to invest much time in developing fancy multiplexer plugins so
>>>>> I'd go
>>>>> with the `parser->tree` variant, which allows easy extensibility
>>>>> with some
>>>>> level of flexibility. The flexibility is for example sufficient to
>>>>> implement
>>>>> cartesian_config parser.

+1.

>>>>>
>>>>> As for the automatic detection, I donẗ like the idea as people
>>>>> might want to
>>>>> use the same format with different custom tags.

From the Zen of Python: "explicit is better than implicit".  It's so
important it's the 2nd line.

>>>>
>>>> Hi Lukáš.
>>>>
>>>> I believe we're in sync, but I miss the high level overview, or
>>>> at least review, of how params, variants and the multiplexer or
>>>> other plugins are all related to each other.
>>>>
>>>> Please check the definitions/examples below to see if we're in
>>>> sync:
>>>>
>>>> Params
>>>> ------
>>>>
>>>>     A dictionary of key/values, with an optional path (we could
>>>>     simply call it prefix), which is used to identify the key
>>>>     when there are multiple versions of it. The path is
>>>>     interpreted from right to left to find a match.
>>>>
>>>>     The Params data structure can be populated by multiple
>>>>     sources.
>>>>
>>>>     Example:
>>>>     (implementation and API details are not discussed here)
>>>>
>>>>     key: var1=a
>>>>     path: /foo/bar/baz
>>>>
>>>>     key: var1=b
>>>>     path: /foo/bar
>>>>
>>>>     key: var2=c
>>>>     path: NULL (empty)
>>>>
>>>>     get(key=var1, path=/foo/) ==> error ("/foo/var1" not found)
>>>>     get(key=var1, path=/foo/*) ==> error (multiple var1)
>>>>     get(key=var1, path=/foo/bar/baz/weeee/) ==> error
>>>>     get(key=var1, path=/foo/bar/weeee/) ==> error
>>>>
>>>>     get(key=var2) ==> c
>>>>     get(key=var2, path=foobar) ==> error ("foobar/var2" not found)
>>>>
>>>>     get(key=var1, path=/foo/bar/baz/) ==> a
>>>>     (unique match for "/foo/bar/baz/var1")
>>>>
>>>>     get(key=var1, path=/foo/bar/) ==> b
>>>>     (unique match for "/foo/bar/var1/")
>>>>
>>>>     get(key=var1, path=baz) ==> a
>>>>     (unique match for "baz/var1")
>>>>
>>>>     get(key=var1, path=bar) ==> b
>>>>     (unique match for "bar/var1")
>>>>
>>>>     This kind of "get" API is exposed in the Test API.
>>>>
>>>>
>>>> Variants
>>>> --------
>>>>
>>>>     Multiple sets of params, all with the same set of keys and
>>>>     paths, but potentially different values. Each variant is
>>>>     identified by a "Variant ID" (see the "Test ID RFC").
>>>>
>>>>     The test runner is responsible for the association of tests
>>>>     and variants. That is, the component creating the
>>>>     variants has absolutely no visibility on which tests are
>>>>     going to be associated with variants.
>>>>
>>>>     This is also completely abstract to tests: they don't have
>>>>     any visibility about which variant they're using, or which
>>>>     variants exist.
>>>>
>>> Hello Ademar,
>>>
>>> Thank you for the overview, I probably should have included it. I
>>> omitted it
>>> as it's described in the documentation, so I only mentioned in the
>>> `Plugin
>>> AvocadoParams` that I don't think we should turn that part into a
>>> plugin.
>>>
>>> The variant, as described earlier, is the method which modifies the
>>> `test_template` and as you pointed out it compounds of `Variant ID` and
>>> `Params`. The way it works now it can go even further and alter all the
>>> test's arguments (name, methodName, params, base_logdir, tag, job,
>>> runner_queue) but it's not documented and might change in the future.
>>
>> OK, so I think we should change this. The layers should have
>> clear responsibilities and abstractions, with variants being
>> restricted to params only, as defined above.
>>
>> I don't think the component responsible for creating variants
>> needs any visibility or knowledge about tests.
>>
> Yes, there is no need for that, it was only simplification:
> 
>     https://github.com/avocado-framework/avocado/pull/1293
> 

BTW, why was this PR closed?  Inteded to be sent again with other work?

>>>
>>>> Given the above, the multiplexer (or any other component, like a
>>>> "cartesian config" implementation from Autotest) would be bound
>>>> to these APIs.
>>> The cartesian config is not related to params at all. Avocado-vt uses a
>>> hybrid mode and it replaces the params with their custom object, while
>>> keeping the `avocado` params in `test.avocado_params`. So in
>>> `avocado_vt`
>>> tests you don't have `self.params`, but you have `test.params` and
>>> `test.avocado_params`, where `test.params` is a dictionary and
>>> `test.avocado_params` the avocado params interface with path/key/value.
>>> Cartesian config produces variants not by creating test variants, but by
>>> adding multiple tests with different parameters to the test_suite.
>>
>> What I mean is that we probably could, in theory at least,
>> implement a plugin that parses a "cartesian config" and provides
>> the data as needed to fill the variants and param APIs I
>> described above. I'm not saying we should do that, much less that
>> it would be useful as a replacement for the current cartesian
>> config implementation in avocado-vt. I'm simply stating that once
>> we have a clear plugin API for Params and Variants, we should be
>> able to replace the multiplexer with other mechanisms that
>> provide a similar functionality.
>>
>> Thanks.
>>    - Ademar
>>
> 
> In that case yes. You can see it in the conclusion that even the simpler
> version (parser->tree) is capable of using cartesian_config as source of
> params.
> 

So, to make sure we're on the same page:  we intend to allow users to
write and choose their own tree producers (it's pluggable).  With a
given tree producer active, the multiplex mechanism is going to be, at
this point, a single, non-pluggable one.

Right?

> Regards,
> Lukáš
> 

[1] - https://docs.python.org/2.6/glossary.html#term-iterable

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/avocado-devel/attachments/20160718/181cb417/attachment.sig>