[Avocado-devel] [RFC] Introduce proper test IDs

Fri Mar 25 13:43:37 UTC 2016

Hi folks.

As of today, Avocado is inconsistent and confusing in the way it
handles test names, or references. In some places, they're
referenced as "URLs", in other as "test names" and also as "test
IDs". There's also the concept of a "test alias". When there are
clashes, we add integer suffixes to test names, the same way we
also add suffixes to multiplex variants.  Finally, test result
directories are difficult to navigate and sometimes there's no
proper relationship between the test name (from the UI and logs)
and the test result directories.

Initial versions of this RFC have been reviewed by Lukas Doktor
and Cleber Rosa already, so consider it a refined and polished
version of the original idea.

Let's start with some definitions to clarify the role of each
component in Avocado and have a clear terminology:

Job, Job IDs and Job Results
----------------------------

A Job is an execution of a set of tests by Avocado.  Everything
that happens related to test execution happens in the context of
a Job.

Each job is assigned a Job-ID, which is a unique, random 160 bits
string (output of SHA1 on a set of random bytes).

When a job is run, logs and all relevant information to
understand the environment where the tests were run are stored in
the "Job Result", which is usually a set of files in a directory.

Avocado is multi-job aware. As long as the tests themselves don't
share resources or interfere with each other, there can be an
arbitrary number of jobs running at the same time in the same
machine and even by the same user.

Examples:
  Job ID: 5e0033f472452ab5dc08bc1daa61c02499c9442b
  Job Result directory: ~/avocado/job-results/job-2016-03-18T10.32-5e0033f

Test Resolver
-------------

The Test Resolver is the Avocado component responsible for
turning an arbitrary string into a list of tests to be run by
Avocado (a process we call 'test resolution'). In very simple
terms, a resolver receives a string and returns a set of unique
tests. The test resolver can have its behavior tunned at runtime
via a configuration API.

As of now the Avocado Test Resolver is called "Test Loader"
and has a plugin architecture based on priorities: when a
particular string needs to be resolved into a set of tests, each
plugin is invoked in order, until one or more tests that match
that string is found.  Once a match is found for a particular
string, the other plugins are ignored. See the Avocado
documentation for more details (including --loaders, an option to
the run command).

A more sophisticated Test Resolver could, for example, interpret
the string looking for prefixes that allow a clear identification
of which loader it should use, or give priority to. For example,
it could be implemented so that a string started by 'vt:' would
be sent first to the avocado-vt resolver.

Test References
---------------

A Test Reference is a string that can be resolved into
(interpreted as) one or more tests by the Avocado Test Resolver.
Given resolver plugins are free to interpret a test reference,
this string is completely abstract to the other components of
Avocado.

In the current implementation of Avocado, Test References can be
provided explicitly in the command-line as parameters to 'avocado
run' or implicitly via command-line switches like --vt-config
from avocado-vt (which also works as a configuration mechanism of
the avocado-vt resolver).  In both cases, internally the Test
Resolver is turning test References into a list of Test Names to
be run.

Test Name
---------

A test name is an arbitrarily long string that unambiguously
points to the source of a single test. In other words the Avocado
Test Resolver, as configured for a particular job, should return
one and only one test as the interpretation of this name.

This name can be as specific as necessary to make it unique.
Therefore it can contain an arbitrary number of variables,
prefixes, suffixes, tags, etc.  It all depends on user
preferences, what is supported by Avocado via its Test Resolvers and
the context of the job.

The output of the Test Resolver when resolving Test References
should always be a list of unambiguous Test Names (for that
particular job).

Notice that although the Test Name has to be unique, one test can
be run more than once inside a job.

By definition, a Test Name is a Test Reference, but the
reciprocal is not necessarily true, as the latter can represent
more than one test.

Variant IDs
-----------

The multiplexer component creates different sets of variables
(known as "variants"), to allow tests to be run individually in
each of them.

A Variant ID is an arbitrary and abstract string created by the
multiplexer to identify each variant. It should be unique per
variant inside a set. In other words, the multiplexer generates a
set of variants, identified by unique IDs.

A simpler implementation of the multiplexer uses serial integers
as Variant IDs. A more sophisticated implementation could
generate Variant IDs with more semantic, potentially representing
their contents.

Test ID
--------

A test ID is a string that uniquely identifies a test in the
context of a job. When considering a single job, there are no two
tests with the same ID.

A test ID should encapsulate the Test Name and the Variant ID, to
allow direct identification of a test. In other words, by looking
at the test ID it should be possible to identify:

  - What's the test name
  - What's the variant used to run this test (if any)

Test IDs don't necessarily keep their uniqueness properties when
considered outside of a particular job, but two identical jobs
run in the exact same environment should generate a identical
sets of Test IDs.

Test ID format and implementation details
-----------------------------------------

A test ID is composed of three parts:

Syntax:
  <unique-id>-<test-name>[;<variant-id>]

 - Unique ID: an arbitrary and unique (inside a job) alphanumeric
   id set by the test runner, assigned at job run time.  Outside
   of a particular job, it has no practical value and can be
   ignored.

   In the current implementation of Avocado this ID should be
   just a serial number starting from 1 with padded zeros, for
   simplicity and practicability.

   It's defined as an arbitrary alphanumeric value anticipating a
   future where Avocado might run tests in parallel or run groups
   of tests; in these cases it can use a more sophisticated id
   (like time, group name, or something else).

 - Test name: As defined previously.

   Examples of test-names:

    '/bin/true'
    '/bin/grep foobar /etc/passwd'
    'passtest.py:Passtest.test'
    'file:///tmp/passtest.py:Passtest.test'
    'multiple_tests.py:MultipleTests.test_hello'
    'type_specific.io-github-autotest-qemu.systemtap_tracing.qemu.qemu_free'

   Examples of invalid test names, as they would result in
   ambiguous or multiple tests:

    'passtest.py' ## when run as an instrumented test
    'multiple_tests.py' ## from examples/tests/
    'qemu.qemu_free' ## from avocado-vt

 - Variant ID: As defined previously.

   Although the Variant ID is an arbitrary string, the ';'
   character is forbidden because it's used as the separator
   between the Test Name and the Variant ID.

Given the above, post-job clashes in test IDs are impossible by
definition (they contain an unique alphanumeric ID).

When considered outside of a job, Test IDs have useful Test Names
and Variant IDs in them which can be used in subsequent jobs
(potentially resulting in the same test code being run, as long
as the job configuration stays the same).

Here are some examples of resulting Test IDs after Avocado is
invoked (with avocado-vt configured). Notice the Avocado UI is
not being shown here, these are just the resulting Test IDs:

Test References:
    /bin/true
    /bin/false
    passtest.py
    multiple_tests.py
    qemu.qemu_free 

Test IDs:
    1-/bin/true;
    2-/bin/false;
    3-passtest.py:Passtest.test;
    4-multiple_tests.py:MultipleTests.test_hello;
    5-multiple_tests.py:MultipleTests.testIdentity;
    6-type_specific.io-github-autotest-qemu.systemtap_tracing.qemu.qemu_free;

Now the same tests, but with the addition of two variants, with
the Variant IDs being the strings "1" and "2":

    01-/bin/true;1
    02-/bin/true;2
    03-/bin/false;1
    04-/bin/false;2
    05-passtest.py:Passtest.test;1
    06-passtest.py:Passtest.test;2
    07-multiple_tests.py:MultipleTests.test_hello;1
    08-multiple_tests.py:MultipleTests.test_hello;2
    09-multiple_tests.py:MultipleTests.testIdentity;1
    10-multiple_tests.py:MultipleTests.testIdentity;2
    11-type_specific.io-github-autotest-qemu.systemtap_tracing.qemu.qemu_free;1
    12-type_specific.io-github-autotest-qemu.systemtap_tracing.qemu.qemu_free;2

File-system serialization of Test IDs
-------------------------------------

When there's a need to represent a test ID in a file-system
friendly manner (for example, as directories for test-results, or
as parts of a URL), a 1-way serialization should be made. Which
means there's no guaranteed way to extract the original test ID
from the serialized result.

The serialization should follow this protocol:

 - Replace each non-fs-friendly character (e.g. '/') of the
   test ID by '_'.
 - If the resulting string is too long to be used as a file name,
   Avocado should truncate the test ID. Preference should be made
   to truncate the <test-name>, then <variant-id>. The details
   of how this truncation happens are not specified and therefore
   there should be no expectations that the resulting names will
   be stable across different jobs.
 - The Unique ID should never be truncated.

Using Test IDs in Avocado
-------------------------

- UI and Logs:

  All references to Avocado tests in logs should use the full
  Test ID string, unformatted.

  The UI can interpret the test ID to make it look "nicer" by
  hiding or highlighting fields or separators, but the three
  parts should be completely abstract and handled as strings (as
  defined), without any parsing or interpretation.

  This RFC doesn't cover the specifics of how the UI will format
  test IDs, but based on the description and definitions above,
  the current UI is actually compliant, although a few minor
  changes would be welcome.

  A couple of hypothetical examples:

    ## based on the current UI (the <unique-id> is hidden)
    $ avocado run /bin/true passtest --multiplex 2variants.yaml
    ...
    TESTS: 4
     (1/4) /bin/true;1: PASS
     (2/4) /bin/true;2: PASS
     (3/4) passtest.py:PassTest.test_foobar;1: PASS
     (4/4) passtest.py:PassTest.test_foobar;2: PASS
     ....

    # the <unique-id> is hidden and the <variant-id> is
    # highlighted
    $ avocado run /bin/true passtest --multiplex 2variants.yaml
    TESTS: 4
     (1/4) /bin/true [1]: PASS
     (2/4) /bin/true [2]: PASS
     (3/4) passtest.py:PassTest.test_foobar [1]: PASS
     (4/4) passtest.py:PassTest.test_foobar [2]: PASS
     ....

- Using Test References in Avocado (e.g.: in 'avocado run'):

  A full Test ID cannot be safely parsed and split when used as a
  Test References because there's no proper way to unambiguously
  split the fields. If used as a Test Reference, a full Test ID
  will be interpreted as a raw string.

  There's a special case for the usage of the combination
  <test-name>;<variant-id>, but it requires explicit
  configuration of Avocado. The suggested mechanism for this
  would be:

   --extract-variant-ids={on|off}' (default: 'off')
   config:extract-variant-ids={on|off}' (default: 'off')
     Tells avocado to try to extract variant ids from Test
     References. With this enabled, the rightmost ';', if
     present, will be interpreted as a separator between the Test
     Reference and a Variant ID.

   --strict-test-references={on|off} (default: off)
   config:[strict-test-references={on|off} (default: off)
     Forces avocado to interpret Test References as Test Names.
     Meaning only tests which have a perfect 1:1 match for each
     test reference will be loaded.

   Examples:

   $ avocado run 1-foobar;2

     --> will use the raw string '1-foobar;2' as a Test
     Reference. The resulting tests will depend on how the Test
     Resolvers interpret this string.

   $ avocado run foobar;2

     --> will use 'foobar;2' as a Test Reference. The resulting
     tests will depend on the behavior of the available Test
     Resolvers;

   $ avocado run foobar;2 --multiplex 2variants.yaml

     --> ditto (Test Names and References are arbitrary strings,
     so there's no way for Avocado to tell if ';2' is a Variant
     ID, or if it's part of the Test Reference)

   $ avocado run foobar;2 --multiplex 2variants.yaml \
     --extract-variants-ids --strict-test-references

     --> will interpret 'foobar' as the Test Name (not just a
     Test Reference) and '2' as a Variant ID. In this case, only
     the test 'foobar' with a variant '2' will be run (if a match
     is found). The resulting Test ID would be '1-foobar;2'.

   $ avocado run 1-foobar;2 --multiplex 2variants.yaml \
     --extract-variants-ids --strict-test-references

     --> will interpret '1-foobar' as a Test Name and '2' as a
     Variant ID. If a match is found, the resulting Test ID will
     be 1-1-foobar;2.

Thanks.
   - Ademar

-- 
Ademar Reis
Red Hat

^[:wq!