[Avocado-devel] Multiplexer: mechanism for tests to retrieve variables

Tue Jan 20 18:54:27 UTC 2015

Dne 20.1.2015 v 19:01 Ademar Reis napsal(a):
> On Tue, Jan 20, 2015 at 05:57:10PM +0100, Lukáš Doktor wrote:
>> Hi guys,
>>
>> I'm struggling a bit with the inverse version of the multiplexer, because I
>> designed it with the `mechanism for tests to retrieve variables` in mind. I
>> thought about what the inverse version represents and actually couldn't
>> sleep last night because of it. (I know how it works, but can't think of
>> what it represents and it's always necessarily to see the big picture before
>> choosing the path. So I have !nomux version already in my tree - the
>> difference is about 10 lines, but I can't merge it before really
>> understanding how it affects the concept)
>>
>>
>> There is an example tree (incompatible with the old version; remove !nomux
>> tags to execute with RFC or remove !multiplex tags to work with inverse
>> version):
>>
>> !multiplex
>> hw:  !multiplex
>>      nic:
>>          rtl8139:
>>              type = rtl8139
>>          e1000:
>>              type = e1000
>>          virtio_net:
>>              type = virtio_net
>>          xennet:
>>              type = xennet
>>          spapr-vlan:
>>              type = spapr-vlan
>>          nic_custom:
>>              type = nic_custom
>>      smp:
>>          up:
>>              count = 1
>>          smp2:
>>              count = 2
>>      drive_format:
>>          ide:
>>              type = ide
>>          scsi:
>>              type = scsi
>>          sd:
>>              type = sd
>>          virtio_blk:
>>              type = virtio_blk
>>          virtio_scsi:
>>              type = virtio_scsi
>>          spapr_vscsi:
>>              type = spapr_vscsi
>>          lsi_scsi:
>>              type = lsi_scsi
>>          ahci:
>>              type = ahci
>>          usb2:
>>              type = usb2
>>          xenblk:
>>              type = xenblk
>>      image_format: !nomux
>>          qcow2:
>>              type = qcow2
>>              2v3:
>>                  params = compat=1.1
>>              2:
>>          vmdk:
>>              type = vmdk
>>          raw:
>>              type = raw
>>          raw_dd:
>>              type = raw_dd
>>          qed:
>>              type = qed
>>      pci_assignable:
>>          no_pci_assignable:
>>              type = false
>>          pf_assignable:
>>              type = pf
>>          vf_assignable:
>>              type = vf
>>      pagesize:
>>          smallpages:
>>          hugepages:
>>              type = hugepage
>>      9p:
>>          no_9p_export:
>>          9p_export:
>>              type = p9
>>      gluster:
>>          filesystem:
>>          gluster:
>>              type = gluster
>>      lvm:
>>          no_lvm_support:
>>          lvm_partition:
>>              type = lvm
>>          emulated_lvm:
>>              type = emulated
>>
>> os: !multiplex
>>      platform:
>>          32:
>>          64:
>>      type: !nomux
>>          Windows:
>>              2000:
>>              xp:
>>              2003:
>>              7:
>>              8:
>>              10:
>>          Linux: !nomux
>>              Fedora:
>>                  10:
>>                  11:
>>                  12:
>>                  13:
>>                  14:
>>                  15:
>>                  16:
>>                  17:
>>                  18:
>>                  19:
>>                  20:
>>                  21:
>>              RHEL: !nomux
>>                  3: !nomux
>>                      0:
>>                      1:
>>                      2:
>>                      3:
>>                      4:
>>                      5:
>>                      6:
>>                      7:
>>                  4: !nomux
>>                      0:
>>                      1:
>>                      2:
>>                      3:
>>                      4:
>>                  5: !nomux
>>                      0:
>>                      1:
>>                      2:
>>                      3:
>>                  6: !nomux
>>                      0:
>>                      1:
>>                      2:
>>                      3:
>>                  7: !nomux
>>                      0:
>>                      1:
>>                      beta:
>>
>> machines:
>>      i440fx:
>>      q35:
>>      pseries:
>>      arm:
>
> Looking at the tree above, I say we should simplify the format
> and provide an automatic variable "name" (your "type") that just
> returns the full variant name to the caller?
>
> Or maybe better, just extend the API to have something like:
>
>      params.get("key") --> returns "key" variable from current variant
>      params.variant_name() --> returns the full variant name. Your
>                                "type" above could be trivially extracted
>                                from it.

yep, the same thinking...

>
>>
>>
>> There are the problems I had in mind separated into sections:
>>
>>
>> [Namespace issue]
>>
>> Current situation:
>> 1) Variants are created as combination of non-sibling leaf nodes
>> 2) In the end we pass only dictionary where some values might be rewritten
>> from values from later nodes
>> 3) We ask for a certain key without any namespaces
>> === params.get('type') returns completely useless 'lvm'...
>
> I'm going to rewrite the section below to make the discussion
> easy:
>
>> !multiplex:
>> 1) We gather leaves per each !multiplex domain (each child of !multiplex
>> node is separate multiplex domain)
>>
>> nomux:
>> 1) the same as !multiplex
>
> OK, no issue.
>
>> !multiplex:
>> 2) We pass an object, which contain multiplex domains with current variant's
>> values (/hw/cpu, /hw/disk, ...)
>>
>> nomux:
>> 2) similar to !multiplex, only most of the nodes are !multiplex so it's
>> harder to pinpoint the end-multiplex domains. (for humans, computer does it
>> easily)
>
> OK, no issue (it's expected that it's harder, it's a trade-off
> we're willing to pay to make the simple case simpler)
>
>
>> !multiplex:
>> 3) We ask for a certain key. Without namespace it  returns either first or
>> last match (needs to be decided).
>>
>> nomux:
>> 3) the same as !multiplex
>
> OK, no issue.
>
>> !multiplex:
>> 4) We can ask for the value inside a given namespace, eg:
>> params.get('/hw/nic', 'type'). Then in first variant it returns the value of
>> /hw/nic/rtl8139, in second /hw/nic/e1000, ... (because we know which leaf
>> belongs to which multiplex domain).
>>
>> nomux:
>> 4) similar, only this time all nodes are multiplexed. So we need to guess
>> which one is end-point and much easier we can get multiple matching leaves.
>> === params.get('type') works the same way, we can also use
>> params.get('/hw/nic', 'type') but as we are lazy and don't specify multiplex
>> domains, we might accidentally query for bad nodes, eg.
>> params.get('/hw/image_format/qcow', 'type'). For first 2 rounds this
>> succeeds ['/hw/image_format/qcow/2', '/hw/image_format/qcow/2v3'], but in
>> third variant it fails to find the leaf (because the current leaf is
>> '/hw/image_format/vmdk'.
>
> It seems to me you're again describing something that is error
> prone, or more complex in the complex case. I agree, but I see it
> as, again, a necessary trade-off to keep the simple case simple.
>
>> !multiplex:
>> 5) Collisions might occur when using non-end-multiplex domain to ask for a
>> value, eg: params.get('/hw', 'type'). We don't know whether user wants
>> 'type' from '/hw/nic' or '/hw/disk'. As people create the structure, they
>> should know which nodes are marked as !multiplex and they should always use
>> them. Then the situation is clear.
>> === params.get('type') returns useless 'lvm' as previous, but we can use
>> params.get('/hw/nic', 'type') to get the real value
>
> You don't describe the case for !nomux, but I believe it's
> similar. Users are not expected to be retrieving variables from
> the global namespace unless they know what they're doing.

Yes, these are the same only with !multiplex it's by nature (you 
specifically call the groups) easier to remember the groups. So it's 
less probably to have these collisions.

>
>>
>>
>> [Matching nodes - endswith]
>>
>> the leaf nodes are usually something like '/hw/nic/rtl8139' or
>> '/hw/nic/e1000'. where the last part varies over variants. Actually it's not
>> only the last part, eg: /hw/image_format/qcow/2v3 is sibling to
>> /hw/image_format/raw. So matching '/hw/image_format/qcow/2v3' makes no
>> sense. We always need to match the last multiplex group (in this case
>> '/hw/image_format').
>
> ACK
>
>>
>> On the other hand when we query only for '/hw', we get ['/hw/nic/rtl8139',
>> '/hw/cpu/smp2', '/hw/drive_format/ide', ...] and we need to decide which key
>> to return (for example try parmas.get('/hw', 'type')).
>
> Like I said above, users are not expected to be retrieving
> variables from the global namespace unless they know what they're
> doing. It might be OK for trivial cases, but certainly not OK for
> anything that needs to scale.
>
> This is akin to using global variables in a C program. OK for
> small programs, but won't scale and won't allow your code to be
> used as a module inside a larger program or library.

Sure, this is here to demonstrate the need for the pre-defined 
namespaces... (or other solution)

>
>>
>> [Matching nodes - startswith]
>>
>> There is also an opposite problem with the beginning. Usually we encourage
>> people to use simple yaml files to multiplex tests (eg. the sleeptest
>> multiplex:
>>
>>      short:
>>          sleep_length: 0.5
>>      medium:
>>          sleep_length: 1
>>      long:
>>          sleep_length: 5
>>      longest:
>>          sleep_length: 10
>>
>> The tree is:
>>
>> --- short
>>   |- medium
>>   |-long
>>   \-longest
>>
>> so the result leaves are:
>>
>> [/short, /medium, /long, /longest]
>>
>> So when writing test for this simple version, we'd ask for params.get('/',
>> 'sleep_length').
>
> Or just params.get("sleep_lenght")

The problem here is, from where the `sleep_length` is used. It can be on 
multiple places if you combine this simple multiplex file with other 
existing files...

>
>>
>> But what if someones want's more complicated version and he puts this into
>> another branch, eg:
>>
>>      tests:
>>          sleeptest:
>>              by_length:
>>                  short:
>>                  medium:
>>                  long:
>>                  longest:
>>
>> When he develops the test, he'd use params.get('/tests/sleeptest/by_length',
>> 'sleep_length') to obtain the value from the correct namespace. This would
>> might cause trouble when executing this test with the simple version (the
>> issue is more serious as most of the time it'd work fine, but when the keys
>> are duplicate, other value might win).
>
> The simple call "params.get("sleep_lenght") will still work if
> the user has a small mux file.
>
> Now, when the user combines multiple mux files, then you need a
> hierarchical mechanism for the variable resolution (more below).

The main issue is, that people might combine multiple yaml files and 
multiple attitudes. Again, this is here to demonstrate possible issues 
which might be solved by the predefined namespaces.

>
>>
>> This might be eliminated a bit by separating framework-related and
>> test-related multiplexing.
>
> Or just make a hierarchical resolution when there are multiple
> options. Avocado could have a few pre-defined namespaces and a
> hierarchy like the following to resolve
> "params.get("sleep_lenght"):
>
>   1. Local namespace: anything that doesn't belong to a
>   pre-defined namespace.
>   2. Pre-defined namespaces, whose order would be documented.
>   Example:
>     2.1 /config
>     2.2 /plugins
>     2.3 /environment
>     2.4 /setup
>     2.5 /tests
>     [...]

Well, that's the solution I prepared only with more namespaces 
predefined. I'd rather start with few of them and each plugin can define 
it's own. Anyway I'd hate paths like '/plugins/virt/hw/image_format'...

>
>>
>> 1) framework-related (plugin-related) should have defined structure so we
>> can safely assume `/virt/hw/nic` defines each key only once and is used to
>> obtain information about the current `nic`.
>> 2) test-related should be unstructured and should extend the `/test`
>> namespace. That way we don't mix values from other namespaces (other plugins
>> or complex structures defined by users) and we only query for
>> `params.get('/test', key)` or if we know we defined substructres for
>> `params.get('/test/our_subvariant', key)`.
>>
>> But let me know if you know of a better solution (params.get('/', key)
>> returns all of the leafs including /hw/nic/rtl8139, /os/type/linux/Fedora/8,
>> ... so one can only guess what's returned.
>
> That's expected, as I discussed above: if both [...]/rtl8139 and
> [...]/Fedora/8 have a key with the same name (say: "foobar"), a
> simple call such as params.get("foobar") will need to follow a
> resolution hierarchy to return the right key.
>
> We should not encourage this kind of usage, it's a bad practice.
>

Sure, this was about the beauty of named multiplex domains. But I can 
see we don't share the same view on that (which is fine, we have 
different usage in mind)

>>
>> [params.get_variant()]
>>
>> Another simplification could be to provide `params.get_variant(path)` API,
>> which would return the currently matching leaves to the provided path. This
>> can simplify the yaml file as shown in '/os' (+below) and speed as for
>> simple cases we won't need to query environment, which is expensive.
>>
>> Instead of `params.get('/hw/nic', 'type'), you could use
>> `params.get_variant('/hw/nic'). This returns `/hw/nic/rtl8139` (or
>> `params.get_variant('/hw/nic', strip=True)` => `rtl8139`). This is a
>> sufficient information for us and we don't need to specify `type = ...` on
>> every line and focus only on the actual key=value pairs (eg. queues = ...,
>> if needed).
>>
>> Note: For `params.get_variant('/os/type', True)` returns 'Linux/RHEL/3/7'
>>
>> on the other hand `params.get_variant('/os', True) returns ['platform/32',
>> 'type/Linux/RHEL/3/7'] as multiple leaves matches.
>
> ACK, same idea I described in the beginning of this e-mail.
>
>>
>>
>> [INI config]
>>
>> For safety reasons I think it might be good to reserve '/config' branch
>> which would be only writable by INI config parser. On the other hand INI
>> should be able to extend any part (eg. default qemu path)
>
> ACK, makes sense to me.
>
>>
>>
>> [Per-test variants]
>>
>> I'm still a bit troubled about the tests variants. When we execute a single
>> test ourselves, we can easily change the --mux to different setting. But
>> correct me if I'm wrong, there is currently no way to execute various
>> different tests and multiplex some tests with different variants. There are
>> again multiple ways:
>>
>> 1) use different runs per each test
>> 2) define per-test variants in specific path (this option was discussed in
>> my multiplexer RFC, put multiplex file into $test.data/$test.yaml directory
>> and it'd extend the tests run when --mux-test specified)
>> 3) having tests as part of the multiplex tree (this is very similar to how
>> virttest worked), there is a need to map test names to test paths.
>> ...
>
> These options are not exclusive, all of them could be
> implemented.

Sure.

>
>>
>> I liked the 3rd approach a lot, but as Avocado executes anything as test, I
>> can't see the way to reliably map names to files (full path makes no sense
>> as path usually varies over multiple machines).
>
> That's what I prefer too, but the mapping has to come from
> somewhere, it can't be automatic. The way I envision it, it
> should use the relative test path as its test ID, which implies
> the multiplex file will have to be local relative to the test
> source:
>
>    ./tests/whatever/test1.py
>    ./tests/whatever/test2.py
>    ./tests/foobar/test3.sh
>    ./tests/foobar/test4.py
>    ./mux.yaml
>       |--> will reference the tests above using their relative
>       path. mux.yaml should have filter-out/only in the test
>       entries.
>
> (one could argue that we can include hostname, source etc to the
> test ID, but for the sake of simplicity, let's consider it local
> for this particular use case).
>
>>
>> This leaves me with the 2). In this case I'd extend the tree on-the-fly of
>> the tree from `$test.data/$test.yaml` file into `/test` path so people can
>> safely use `params.get('/test', key)` or `params.get('/test/my_subvariant',
>> 'key')`. Note that `/test` already contains all the `/` values... Anyway the
>> problem is in modifying these (one can easily only filter the existing
>> variants but not replacing the multiplex files)
>
> Like I said above, I don't think the options are exclusive. I can
> certainly see a way to implement (3) in addition to (2) and (1).
>
>>
>> Or we can just assume people always run single test and combines the results
>> themselves.
>>
>>
>> Congratulation on reading such a long mail, all ideas are welcome.
>
> :-)
>
>>
>> Sincerely yours completely exhausted Lukáš.
>
> Hopefully you're having fun exercising your brain with this
> problem. :-)

Definitely, exhausted, but aroused :-)

Also I thought about another approach I previously rejected for the sake 
of systematic approach. We can say that top level is always !multiplex. 
That way all simple yaml files would work the same way. You'd easily 
extend the pre-defined namespaces (test, plugin, config, ...) without 
even knowing !multiplex exists.

On the other hand you'd still be able to create deeper named !multiplex 
groups as proposed by me and welcomed by Paolo.

So this simple yaml file:

     short:
         sleep_length: 0.5
     medium:
         sleep_length: 1
     long:
         sleep_length: 5
     longest:
         sleep_length: 10

would work perfectly without the need for any tags.

More complex:

os:
     Windows:
         2000:
         xp:
     Linux:
         RHEL:
             7:
                 0:
                 1:

again no changes required as it's multiplexed only at the top level.

And our favorited:

hw:  !multiplex
     nic:
         rtl8139:
         virtio_net:
     smp:
         up:
         smp2:
     drive_format:
         ide:
         scsi:

Requires !multiplex to emphasize nic/smp/drive_format can share the same 
values and we should query for their values using their names 
(/hw/$name). Also we specifically say /hw/{nic,smp,drive_format} are 
different namespaces we might use in tests to say it make sense to 
multiplex this file with different nic types. (and in another test it 
make sense to use different drive_formats, ...).

To me this combines the simplicity while forcing people to thing about 
the namespaces in more complex examples. What do you think about it?

>
> Thanks for the analysis and questions. Have a good rest.
>
> Cheers.
>     - Ademar
>