[Avocado-devel] RFC: N(ext) Runner - A proposal to the finish line

Tue May 26 15:25:36 UTC 2020

Hi Cleber,

I will try to be brief this time:

On 2020-05-21 02:32, Cleber Rosa wrote:
> Intro
> =====
> 
> This is a more technical follow up to the points given in a previous
> thread.  Because that thread and the current N(ext) Runner documentation
> for a good context for this proposal, I encourage everyone to read them
> first:
> 
>   https://www.redhat.com/archives/avocado-devel/2020-May/msg00009.html
> 
>   https://avocado-framework.readthedocs.io/en/79.0/future/core/nrunner.html
> 
> The N(ext) Runner allows for greater flexibility than the the current
> runner, so to be effective in delivering the N(ext) Runner for general
> usage, we must define the bare minimum that still needs to be
> implemented.

In fact I would prefer if we get more technical so that we get clearer mental
models about what we are talking about and what remains to be decided.

> Basic Job and Task execution
> ============================
> 
> An Task, within the context of the N(ext) Runner, is described as "one
> specific instance/occurrence of the execution of a runnable with its
> respective runner".
> 
> A Task is a very important building block for Avocado Job, and running
> an Avocado Job means, to a large extent, running a number of Tasks.
> The Tasks that need to be executed in a Job, are created during
> the ``create_test_suite()`` phase:
> 
>   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.create_test_suite
> 
> And are kept in the Job's ``test_suite`` attribute:
> 
>   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.test_suite

So I guess varianters are meant to be incorporated as resolvers providing
no longer simple test factories but resolutions wrapper into tasks. As we
already provide the support of, there is a possibility of a given reference
on the command line resulting in thousands of loaded/resolved tests/runnables.
It could reach the point where splitting tens of thousands of tests into
separate tasks might become a parallelism bottleneck. Have we considered a
possibility to wrap multiple tests or runnables into a single task? Is there
any good way to resolve this given that the task handling will be enforced
per runnable?

> Running the tests, then, happens during the ``run_tests()`` phase:
> 
>   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.run_tests
> 
> During the ``run_tests()`` phase, a plugin that run test suites on a
> job is chosen, based on the ``run.test_runner`` configuration.
> The current "work in progress" implementation for the N(ext) Runner,
> can be activated either by setting that configuration key to ``nrunner``,
> which can be easily done on the command line too::
> 
>   avocado run --test-runner=nrunner /bin/true
> 
> A general rule for measuring the quality and completeness of the
> ``nrunner`` implementation is to run the same jobs with the current
> runner, and compare its behavior and output with that of the
> ``nrunner``.  For here on, we'll call this simply the "nrunner
> plugin".

Then I guess custom runners that implement extra behavior (in my case for
instance a graph traversal of test nodes) should simply inherit from the
nrunner? Or perhaps the order and overall management on how and which tests
are run will no longer be controllable? Or should be delegated to classes
inheriting from only parts of the runner implementation like custom scheduler
and or spawners? In this case won't the settings for selection of runners become
redundant as we will always be able to select just one runner?

> Known issues and limitations of the current implementation
> ==========================================================
> 
> Different Test IDs
> ------------------
> 
> When running tests with the current runner, the Test IDs are different::
> 
>    $ avocado run --test-runner=runner --json=- -- /bin/true /bin/false /bin/uname | grep \"id\"
>             "id": "1-/bin/true",
>             "id": "2-/bin/false",
>             "id": "3-/bin/uname",
> 
>    $ avocado run --test-runner=nrunner --json=- -- /bin/true /bin/false /bin/uname | grep \"id\"
>             "id": "1-1-/bin/true",
>             "id": "2-2-/bin/false",
>             "id": "3-3-/bin/uname",
> 
> The goal is to make the IDs the same.

I guess this is only necessary for compatibility once the switch is done. Our custom
runner produces a different type of IDs that is really architecture specific and I
think the freedom to define the test IDs should remain to the developers at least as
long as they still have the freedom to resolve and control the running process of their
own test suites while hopefully keeping some of the large parallelism improvements above.
Is this the idea behind calling this a limitation or am I missing something? I saw no
explicit explanation why this is a problem so I guess it must be compatibility.

> Inability to run Tasks other than exec, exec-test, python-unittest (and noop)
> -----------------------------------------------------------------------------
> 
> The current implementation of the nrunner plugin is based on the fact that
> Tasks are already present at ``test_suite`` job attribute, and that running
> Tasks can be (but shouldn't always be) a matter of iterating of the result
> of its ``run()`` method.  This is part of the actual code::
> 
>     for status in task.run():
>       result_dispatcher.map_method('test_progress', False)
>       statuses.append(status)
> 
> The problem here is that only the Python classes implemented in the core
> "avocado.core.nrunner" module, and registered at:
> 
>   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.nrunner.RUNNERS_REGISTRY_PYTHON_CLASS
> 
> The goal is to have all other Python classes that inherit from
> "avocado.core.nrunner.BaseRunner" available in such a registry.

I guess this relates on making task management flexible as the next to last sentence
there is not finished. If so then I guess this reduces to the custom test suite running
limitations I mentioned above.

> Inability to run Tasks with Spawners
> ------------------------------------
> 
> While the "avocado nrun" command makes use of the Spawners, the
> current implementation of the nrunner plugin described earlier,
> calls a Task's ``run()`` method directly, and clearly doesn't
> use spawners.
> 
> The goal here is to leverage spawners so that other isolation
> models (or execution environments, depending how you look at
> processes, containers, etc) are supported.

+1 on prioritizing this since the current blueprint is confusing with both tasks handled
directly and through spawners

> Unoptmized execution of Tasks (extra serialization/deserialization)
> -------------------------------------------------------------------
> 
> At this time, the nrunner plugin runs a Task directly through its
> ``run()`` method.  Besides the earlier point of not supporting
> other isolation models/execution environments (that means not using
> spawners), there's an extra layer of work happening when running
> a task which is most often not necessary: turning a Task instance
> into a command line, and within its execution, turning it into a
> Task instance again.
> 
> The goal is to support an optmized execution of the tasks, without
> having to turn them into command lines, and back into Task instances.
> The idea is already present in the spawning method definitions:
> 
>   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.spawners.html#avocado.core.spawners.common.SpawnMethod.PYTHON_CLASS
> 
> And a PoC on top of the ``nrun`` command was implemented here:
> 
>   https://github.com/avocado-framework/avocado/pull/3766/commits/ae57ee78df7f2935e40394cdfc72a34b458cdcef

Speaking from the perspective of the past, what Autotest used to do to run code on
guest vms (or containers or anything assuming only just a remote address) was deploy
control files to the remote/precooked environment and run the control file. I still
preserved this in the "remote door" utility mentioned in the previous thread and in
addition added minimal metaprogramming capabilities (python scripts writing minimal
python scripts and a few others) and python remote object serialization through Pyro.

I mention this because I think it could be useful to take a look at some of these
approaches and reuse as much as possible from them. Some of them also offer optimizations
for more specific test scenarios.

> Task execution coordination goals
> ---------------------------------
> 
> As stated earlier, to run a job, tasks must be executed. Differently
> than the current runner, the N(ext) Runner architecture allows those
> to be executed in a much more decoupled way. This characteristic will
> be maintained, but it needs to be adapted into the current Job
> execution.
> 
> From a high level view, the nrunner plugin needs to:
> 
> 1. Break apart from the "one at a time" Task execution model that it
>    currently employs;
> 
> 2. Check if a Tasks can be executed, that is, if its requirements can
>    be fulfilled (the most basic requirement for a task is a matching
>    runner;
> 
> 3. Prepare for the execution of a task, such as the fulfillment of
>    extra tasks requirements. The requirements resolver is one, if not
>    the only way, component that should be given a chance to act here;

The requirements resolver should remain possible to inherit from and customize though.
The consideration of downloading all assets from the internet has security implications,
bandwidth overhead, and is simply not universal for all test dependency scenarios. So at
least keeping the door open here will no doubt be useful in the long term.

> 4. Executes a task in prepared environment;
> 
> 5. Monitor the execution of a task (from an external PoV);
> 
> 6. Collect the status messages that tasks will send;
> 
>    a. Forward the status messages to the appropriate job components,
>       such as the result plugins.
> 
>    b. Depending on the content of messages, such as the ones
>       containing "status: started" or "status: finished", interfere in
>       the Task execution status, and consequently, in the Job
>       execution status.
> 
> 7. Verify, warn the user, and attempt to clean up stray tasks.  This
>    may be for instance, necessary if a Task on a container seems to
>    be stuck, and the container can not be destroyed.  The same applies
>    to process in some time of uninterruptile sleeps.

+1

> Parallelization
> ---------------
> 
> Because the N(ext) Runner features allow for parallel execution of tasks,
> all other aspects of task execution coordination (fulfilling requirements,
> collecting results, etc) should not block each other.
> 
> There are a number of strategies for concurrent programming in Python
> these days, and the "avocado nrun" command currently makes use of
> asyncio to have coroutines that spawn tasks and collect results
> concurrently (in a cooperative preemptive model).  The actual language
> or library features used is, IMO, less important than the end result.

I think this requirement might be too strong. I am ware one could disable parallel runs and
go entirely sequentially but it is too strong as well. I think the most configurable approach
would be in the middle - if tasks could be allowed to have sequential asset or even mutual
dependencies and a scheduler could execute only tasks with currently satisfied requirements
this will be way more flexible.

> Suggested terminology
> ---------------------

+1 on all items

> Task workflow
> -------------
> ...
> Iteration I
> ~~~~~~~~~~~
> 
> Task #1 is selected on the first iteration, and it's found that:
> 
> 1. A suitable runner for tasks of kind ``python-unittest`` exists
> 
> 2. The ``mylib.py`` requirement is already present on the current
>    environment
> 
> 3. The ``gcc`` and ``libc-devel`` packages are not installed in the
>    current environment
> 
> 4. The system is capable of *attempting* to fulfill "package" types of
>    requirements.

I guess by a capable system you mean the system performing additional actions to modify
the state of the environment. Could there be any type of support for undoing  such changes?
I guess if there are 1000 tasks with the same dependency, the first iteration will provide
it and the remaining 999 tasks will reuse it which is great but what if a later one would
require the environment to be brought back to the previous state? We make use of LVM to switch
and track the provision of snapshots that could be something useful to think about here for the
future. Such implementation would be faster than downloading/preparing new environments from
scratch.

> Tallying results
> ~~~~~~~~~~~~~~~~
> 
> The nrunner plugin should be able to provide meaningful results to the Job,
> and consequently to the user, based on the resulting information on the
> final iteration.
> 
> Notice that some information will come, such as the ``PASS`` for the
> first test, will come from the "result" given in a status message from
> the task itself.  Some other status, such as the ``INTERRUPTED``
> status for the second test will not come from a status message
> received, but from a realization of the actual management of the task
> execution.  It's expected to other information will also have to be
> inferred, and "filled in" by the nrunner plugin implementation
> 
> In the end, it's expected that results similar to this would be
> presented::
> 
>   JOB ID     : f59bd40b8ac905864c4558dc02b6177d4f422ca3
>   JOB LOG    : /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/job.log
>    (1/2) tests.py:Test.test_2: PASS (2.56 s)
>    (2/2) tests.py:Test.test_1: INTERRUPT (900 s)
>   RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
>   JOB TIME   : 0.19 s
>   JOB HTML   : /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/results.html
> 
> Notice how Task #2 shows up before Task #1, because it was both started first,
> and finished earlier.  There may be issues associated with the current UI to
> dealt with regarding out of order task status updates.

Perhaps the status collection could regularly update itself and provide an ongoing view of
the number of collected PASS, ERROR, INTERRUPT, and so on tasks? If it provides such a view,
it could always sort the received statuses.

> Summary
> =======
> 
> This proposal contains a number of items that can become GitHub issues
> at this stage.  It also contains a general explanation of what I believe
> are the crucial missing features to make the N(ext) Runner implementation
> available to the general public.
> 
> Feedback is highly appreciated, and it's expected that this document will
> evolve into a better version, and possibly become a formal Blue Print.
> 
> Thanks,
> - Cleber.
> 

I hope you find some of my comments useful and are willing to provide some further comments so
that we could all contribute and coordinate on what is to become of such large reimplementation.

Best,
Plamen

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/avocado-devel/attachments/20200526/c3679b5f/attachment.sig>