[Avocado-devel] Multiplexer: mechanism for tests to retrieve variables

Tue Jan 20 18:01:48 UTC 2015

On Tue, Jan 20, 2015 at 05:57:10PM +0100, Lukáš Doktor wrote:
> Hi guys,
> 
> I'm struggling a bit with the inverse version of the multiplexer, because I
> designed it with the `mechanism for tests to retrieve variables` in mind. I
> thought about what the inverse version represents and actually couldn't
> sleep last night because of it. (I know how it works, but can't think of
> what it represents and it's always necessarily to see the big picture before
> choosing the path. So I have !nomux version already in my tree - the
> difference is about 10 lines, but I can't merge it before really
> understanding how it affects the concept)
> 
> 
> There is an example tree (incompatible with the old version; remove !nomux
> tags to execute with RFC or remove !multiplex tags to work with inverse
> version):
> 
> !multiplex
> hw:  !multiplex
>     nic:
>         rtl8139:
>             type = rtl8139
>         e1000:
>             type = e1000
>         virtio_net:
>             type = virtio_net
>         xennet:
>             type = xennet
>         spapr-vlan:
>             type = spapr-vlan
>         nic_custom:
>             type = nic_custom
>     smp:
>         up:
>             count = 1
>         smp2:
>             count = 2
>     drive_format:
>         ide:
>             type = ide
>         scsi:
>             type = scsi
>         sd:
>             type = sd
>         virtio_blk:
>             type = virtio_blk
>         virtio_scsi:
>             type = virtio_scsi
>         spapr_vscsi:
>             type = spapr_vscsi
>         lsi_scsi:
>             type = lsi_scsi
>         ahci:
>             type = ahci
>         usb2:
>             type = usb2
>         xenblk:
>             type = xenblk
>     image_format: !nomux
>         qcow2:
>             type = qcow2
>             2v3:
>                 params = compat=1.1
>             2:
>         vmdk:
>             type = vmdk
>         raw:
>             type = raw
>         raw_dd:
>             type = raw_dd
>         qed:
>             type = qed
>     pci_assignable:
>         no_pci_assignable:
>             type = false
>         pf_assignable:
>             type = pf
>         vf_assignable:
>             type = vf
>     pagesize:
>         smallpages:
>         hugepages:
>             type = hugepage
>     9p:
>         no_9p_export:
>         9p_export:
>             type = p9
>     gluster:
>         filesystem:
>         gluster:
>             type = gluster
>     lvm:
>         no_lvm_support:
>         lvm_partition:
>             type = lvm
>         emulated_lvm:
>             type = emulated
> 
> os: !multiplex
>     platform:
>         32:
>         64:
>     type: !nomux
>         Windows:
>             2000:
>             xp:
>             2003:
>             7:
>             8:
>             10:
>         Linux: !nomux
>             Fedora:
>                 10:
>                 11:
>                 12:
>                 13:
>                 14:
>                 15:
>                 16:
>                 17:
>                 18:
>                 19:
>                 20:
>                 21:
>             RHEL: !nomux
>                 3: !nomux
>                     0:
>                     1:
>                     2:
>                     3:
>                     4:
>                     5:
>                     6:
>                     7:
>                 4: !nomux
>                     0:
>                     1:
>                     2:
>                     3:
>                     4:
>                 5: !nomux
>                     0:
>                     1:
>                     2:
>                     3:
>                 6: !nomux
>                     0:
>                     1:
>                     2:
>                     3:
>                 7: !nomux
>                     0:
>                     1:
>                     beta:
> 
> machines:
>     i440fx:
>     q35:
>     pseries:
>     arm:

Looking at the tree above, I say we should simplify the format
and provide an automatic variable "name" (your "type") that just
returns the full variant name to the caller?

Or maybe better, just extend the API to have something like:

    params.get("key") --> returns "key" variable from current variant
    params.variant_name() --> returns the full variant name. Your
                              "type" above could be trivially extracted
                              from it.

> 
> 
> There are the problems I had in mind separated into sections:
> 
> 
> [Namespace issue]
> 
> Current situation:
> 1) Variants are created as combination of non-sibling leaf nodes
> 2) In the end we pass only dictionary where some values might be rewritten
> from values from later nodes
> 3) We ask for a certain key without any namespaces
> === params.get('type') returns completely useless 'lvm'...

I'm going to rewrite the section below to make the discussion
easy:

> !multiplex:
> 1) We gather leaves per each !multiplex domain (each child of !multiplex
> node is separate multiplex domain)
> 
> nomux:
> 1) the same as !multiplex

OK, no issue.

> !multiplex:
> 2) We pass an object, which contain multiplex domains with current variant's
> values (/hw/cpu, /hw/disk, ...)
> 
> nomux:
> 2) similar to !multiplex, only most of the nodes are !multiplex so it's
> harder to pinpoint the end-multiplex domains. (for humans, computer does it
> easily)

OK, no issue (it's expected that it's harder, it's a trade-off
we're willing to pay to make the simple case simpler)

> !multiplex:
> 3) We ask for a certain key. Without namespace it  returns either first or
> last match (needs to be decided).
> 
> nomux:
> 3) the same as !multiplex

OK, no issue.

> !multiplex:
> 4) We can ask for the value inside a given namespace, eg:
> params.get('/hw/nic', 'type'). Then in first variant it returns the value of
> /hw/nic/rtl8139, in second /hw/nic/e1000, ... (because we know which leaf
> belongs to which multiplex domain).
> 
> nomux:
> 4) similar, only this time all nodes are multiplexed. So we need to guess
> which one is end-point and much easier we can get multiple matching leaves.
> === params.get('type') works the same way, we can also use
> params.get('/hw/nic', 'type') but as we are lazy and don't specify multiplex
> domains, we might accidentally query for bad nodes, eg.
> params.get('/hw/image_format/qcow', 'type'). For first 2 rounds this
> succeeds ['/hw/image_format/qcow/2', '/hw/image_format/qcow/2v3'], but in
> third variant it fails to find the leaf (because the current leaf is
> '/hw/image_format/vmdk'.

It seems to me you're again describing something that is error
prone, or more complex in the complex case. I agree, but I see it
as, again, a necessary trade-off to keep the simple case simple.

> !multiplex:
> 5) Collisions might occur when using non-end-multiplex domain to ask for a
> value, eg: params.get('/hw', 'type'). We don't know whether user wants
> 'type' from '/hw/nic' or '/hw/disk'. As people create the structure, they
> should know which nodes are marked as !multiplex and they should always use
> them. Then the situation is clear.
> === params.get('type') returns useless 'lvm' as previous, but we can use
> params.get('/hw/nic', 'type') to get the real value

You don't describe the case for !nomux, but I believe it's
similar. Users are not expected to be retrieving variables from
the global namespace unless they know what they're doing.

> 
> 
> [Matching nodes - endswith]
> 
> the leaf nodes are usually something like '/hw/nic/rtl8139' or
> '/hw/nic/e1000'. where the last part varies over variants. Actually it's not
> only the last part, eg: /hw/image_format/qcow/2v3 is sibling to
> /hw/image_format/raw. So matching '/hw/image_format/qcow/2v3' makes no
> sense. We always need to match the last multiplex group (in this case
> '/hw/image_format').

ACK

> 
> On the other hand when we query only for '/hw', we get ['/hw/nic/rtl8139',
> '/hw/cpu/smp2', '/hw/drive_format/ide', ...] and we need to decide which key
> to return (for example try parmas.get('/hw', 'type')).

Like I said above, users are not expected to be retrieving
variables from the global namespace unless they know what they're
doing. It might be OK for trivial cases, but certainly not OK for
anything that needs to scale.

This is akin to using global variables in a C program. OK for
small programs, but won't scale and won't allow your code to be
used as a module inside a larger program or library.

> 
> [Matching nodes - startswith]
> 
> There is also an opposite problem with the beginning. Usually we encourage
> people to use simple yaml files to multiplex tests (eg. the sleeptest
> multiplex:
> 
>     short:
>         sleep_length: 0.5
>     medium:
>         sleep_length: 1
>     long:
>         sleep_length: 5
>     longest:
>         sleep_length: 10
> 
> The tree is:
> 
> --- short
>  |- medium
>  |-long
>  \-longest
> 
> so the result leaves are:
> 
> [/short, /medium, /long, /longest]
> 
> So when writing test for this simple version, we'd ask for params.get('/',
> 'sleep_length').

Or just params.get("sleep_lenght")

> 
> But what if someones want's more complicated version and he puts this into
> another branch, eg:
> 
>     tests:
>         sleeptest:
>             by_length:
>                 short:
>                 medium:
>                 long:
>                 longest:
> 
> When he develops the test, he'd use params.get('/tests/sleeptest/by_length',
> 'sleep_length') to obtain the value from the correct namespace. This would
> might cause trouble when executing this test with the simple version (the
> issue is more serious as most of the time it'd work fine, but when the keys
> are duplicate, other value might win).

The simple call "params.get("sleep_lenght") will still work if
the user has a small mux file.

Now, when the user combines multiple mux files, then you need a
hierarchical mechanism for the variable resolution (more below).

> 
> This might be eliminated a bit by separating framework-related and
> test-related multiplexing.

Or just make a hierarchical resolution when there are multiple
options. Avocado could have a few pre-defined namespaces and a
hierarchy like the following to resolve
"params.get("sleep_lenght"):

 1. Local namespace: anything that doesn't belong to a
 pre-defined namespace.
 2. Pre-defined namespaces, whose order would be documented.
 Example:
   2.1 /config
   2.2 /plugins
   2.3 /environment
   2.4 /setup
   2.5 /tests
   [...]

> 
> 1) framework-related (plugin-related) should have defined structure so we
> can safely assume `/virt/hw/nic` defines each key only once and is used to
> obtain information about the current `nic`.
> 2) test-related should be unstructured and should extend the `/test`
> namespace. That way we don't mix values from other namespaces (other plugins
> or complex structures defined by users) and we only query for
> `params.get('/test', key)` or if we know we defined substructres for
> `params.get('/test/our_subvariant', key)`.
> 
> But let me know if you know of a better solution (params.get('/', key)
> returns all of the leafs including /hw/nic/rtl8139, /os/type/linux/Fedora/8,
> ... so one can only guess what's returned.

That's expected, as I discussed above: if both [...]/rtl8139 and
[...]/Fedora/8 have a key with the same name (say: "foobar"), a
simple call such as params.get("foobar") will need to follow a
resolution hierarchy to return the right key.

We should not encourage this kind of usage, it's a bad practice.

> 
> [params.get_variant()]
> 
> Another simplification could be to provide `params.get_variant(path)` API,
> which would return the currently matching leaves to the provided path. This
> can simplify the yaml file as shown in '/os' (+below) and speed as for
> simple cases we won't need to query environment, which is expensive.
> 
> Instead of `params.get('/hw/nic', 'type'), you could use
> `params.get_variant('/hw/nic'). This returns `/hw/nic/rtl8139` (or
> `params.get_variant('/hw/nic', strip=True)` => `rtl8139`). This is a
> sufficient information for us and we don't need to specify `type = ...` on
> every line and focus only on the actual key=value pairs (eg. queues = ...,
> if needed).
> 
> Note: For `params.get_variant('/os/type', True)` returns 'Linux/RHEL/3/7'
> 
> on the other hand `params.get_variant('/os', True) returns ['platform/32',
> 'type/Linux/RHEL/3/7'] as multiple leaves matches.

ACK, same idea I described in the beginning of this e-mail.

> 
> 
> [INI config]
> 
> For safety reasons I think it might be good to reserve '/config' branch
> which would be only writable by INI config parser. On the other hand INI
> should be able to extend any part (eg. default qemu path)

ACK, makes sense to me.

> 
> 
> [Per-test variants]
> 
> I'm still a bit troubled about the tests variants. When we execute a single
> test ourselves, we can easily change the --mux to different setting. But
> correct me if I'm wrong, there is currently no way to execute various
> different tests and multiplex some tests with different variants. There are
> again multiple ways:
> 
> 1) use different runs per each test
> 2) define per-test variants in specific path (this option was discussed in
> my multiplexer RFC, put multiplex file into $test.data/$test.yaml directory
> and it'd extend the tests run when --mux-test specified)
> 3) having tests as part of the multiplex tree (this is very similar to how
> virttest worked), there is a need to map test names to test paths.
> ...

These options are not exclusive, all of them could be
implemented.

> 
> I liked the 3rd approach a lot, but as Avocado executes anything as test, I
> can't see the way to reliably map names to files (full path makes no sense
> as path usually varies over multiple machines).

That's what I prefer too, but the mapping has to come from
somewhere, it can't be automatic. The way I envision it, it
should use the relative test path as its test ID, which implies
the multiplex file will have to be local relative to the test
source:

  ./tests/whatever/test1.py
  ./tests/whatever/test2.py
  ./tests/foobar/test3.sh
  ./tests/foobar/test4.py
  ./mux.yaml
     |--> will reference the tests above using their relative
     path. mux.yaml should have filter-out/only in the test
     entries.

(one could argue that we can include hostname, source etc to the
test ID, but for the sake of simplicity, let's consider it local
for this particular use case).

> 
> This leaves me with the 2). In this case I'd extend the tree on-the-fly of
> the tree from `$test.data/$test.yaml` file into `/test` path so people can
> safely use `params.get('/test', key)` or `params.get('/test/my_subvariant',
> 'key')`. Note that `/test` already contains all the `/` values... Anyway the
> problem is in modifying these (one can easily only filter the existing
> variants but not replacing the multiplex files)

Like I said above, I don't think the options are exclusive. I can
certainly see a way to implement (3) in addition to (2) and (1).

> 
> Or we can just assume people always run single test and combines the results
> themselves.
> 
> 
> Congratulation on reading such a long mail, all ideas are welcome.

:-)

> 
> Sincerely yours completely exhausted Lukáš.

Hopefully you're having fun exercising your brain with this
problem. :-)

Thanks for the analysis and questions. Have a good rest.

Cheers.
   - Ademar

-- 
Ademar de Souza Reis Jr.
Red Hat

^[:wq!