[Avocado-devel] RFC: N(ext) Runner - A proposal to the finish line

Wed Jun 10 02:42:33 UTC 2020

On Tue, May 26, 2020 at 06:25:36PM +0300, Plamen Dimitrov wrote:
> Hi Cleber,
> 
> I will try to be brief this time:
> 
> On 2020-05-21 02:32, Cleber Rosa wrote:
> > Intro
> > =====
> > 
> > This is a more technical follow up to the points given in a previous
> > thread.  Because that thread and the current N(ext) Runner documentation
> > for a good context for this proposal, I encourage everyone to read them
> > first:
> > 
> >   https://www.redhat.com/archives/avocado-devel/2020-May/msg00009.html
> > 
> >   https://avocado-framework.readthedocs.io/en/79.0/future/core/nrunner.html
> > 
> > The N(ext) Runner allows for greater flexibility than the the current
> > runner, so to be effective in delivering the N(ext) Runner for general
> > usage, we must define the bare minimum that still needs to be
> > implemented.
> 
> In fact I would prefer if we get more technical so that we get clearer mental
> models about what we are talking about and what remains to be decided.
>

Hi Plamen,

Sorry for taking this long to reply.  That's very valid... hopefully
the discussion here can meet your expectations too.  Don't refrain from
going into any level of details you feel like.

> > Basic Job and Task execution
> > ============================
> > 
> > An Task, within the context of the N(ext) Runner, is described as "one
> > specific instance/occurrence of the execution of a runnable with its
> > respective runner".
> > 
> > A Task is a very important building block for Avocado Job, and running
> > an Avocado Job means, to a large extent, running a number of Tasks.
> > The Tasks that need to be executed in a Job, are created during
> > the ``create_test_suite()`` phase:
> > 
> >   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.create_test_suite
> > 
> > And are kept in the Job's ``test_suite`` attribute:
> > 
> >   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.test_suite
> 
> So I guess varianters are meant to be incorporated as resolvers providing
> no longer simple test factories but resolutions wrapper into tasks. As we
> already provide the support of, there is a possibility of a given reference
> on the command line resulting in thousands of loaded/resolved tests/runnables.

Right.

> It could reach the point where splitting tens of thousands of tests into
> separate tasks might become a parallelism bottleneck. Have we considered a
> possibility to wrap multiple tests or runnables into a single task? Is there
> any good way to resolve this given that the task handling will be enforced
> per runnable?

Yes, I've thought about it.  Right now, to keep things simple, our
runners interface contain a "task-run" command, say:

  $ avocado-runner task-run -k exec -i 1-/bin/name -u /bin/uname
  {'status': 'started', 'time': 1591753851.3962588, 'output_dir': '/tmp/.avocado-task-yw3wf9fc', 'id': '1-/bin/name'}
  {'status': 'running', 'time': 1591753851.4065444, 'id': '1-/bin/name'}
  {'status': 'finished', 'time': 1591753851.4067187, 'returncode': 0, 'stdout': b'Linux\n', 'stderr': b'', 'id': '1-/bin/name'}

There's absolutely nothing in the output there, that would prevent
more than one task to be run at once.  The command line would not
be flexible enough, as a means of transporting the information about
the tests (id, uri, arguments), but, there's already support for
using JSON-based files for that, which could be extended to support
multiple files.  The interface could be something like:

  $ avocado-runner task-run-multiple-from-recipe /path/to/json_files

Producing:

  {'status': 'started', 'time': 1591754043.5634456, 'output_dir': '/tmp/.avocado-task-kn5p5gc3', 'id': '1-/bin/name'}
  {'status': 'running', 'time': 1591754043.5737987, 'id': '1-/bin/name'}
  {'status': 'finished', 'time': 1591754043.5739698, 'returncode': 0, 'stdout': b'Linux\n', 'stderr': b'', 'id': '1-/bin/name'}
  {'status': 'started', 'time': 1591754055.003983, 'output_dir': '/tmp/.avocado-task-wlcfz9f6', 'id': '2-/bin/true'}
  {'status': 'running', 'time': 1591754055.0142286, 'id': '2-/bin/true'}
  {'status': 'finished', 'time': 1591754055.0143316, 'returncode': 0, 'stdout': b'', 'stderr': b'', 'id': '2-/bin/true'}

Implementing parallelization into the runner themselves is *not*
planned or expected at the moment though.  It'd happen at a higher
layer.

> 
> > Running the tests, then, happens during the ``run_tests()`` phase:
> > 
> >   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.run_tests
> > 
> > During the ``run_tests()`` phase, a plugin that run test suites on a
> > job is chosen, based on the ``run.test_runner`` configuration.
> > The current "work in progress" implementation for the N(ext) Runner,
> > can be activated either by setting that configuration key to ``nrunner``,
> > which can be easily done on the command line too::
> > 
> >   avocado run --test-runner=nrunner /bin/true
> > 
> > A general rule for measuring the quality and completeness of the
> > ``nrunner`` implementation is to run the same jobs with the current
> > runner, and compare its behavior and output with that of the
> > ``nrunner``.  For here on, we'll call this simply the "nrunner
> > plugin".
> 
> Then I guess custom runners that implement extra behavior (in my case for
> instance a graph traversal of test nodes) should simply inherit from the
> nrunner? Or perhaps the order and overall management on how and which tests

From day one, when the nrunner is available as supported, it should be
possible to set its parallel level and the test order.  Those *concepts*
are currently available in "nrun" command (to be removed) as the
"--parallel-tasks" and "--disable-task-randomization" options.

> are run will no longer be controllable? Or should be delegated to classes
> inheriting from only parts of the runner implementation like custom scheduler
> and or spawners? In this case won't the settings for selection of runners become
> redundant as we will always be able to select just one runner?
>

If tweaks of such as parallel level and order is not enough, it should
be possible indeed to implement a custom runner.  I'm not sure if this
specific use case you have would extend into custom
spawners... initially I think it would not.

> > Known issues and limitations of the current implementation
> > ==========================================================
> > 
> > Different Test IDs
> > ------------------
> > 
> > When running tests with the current runner, the Test IDs are different::
> > 
> >    $ avocado run --test-runner=runner --json=- -- /bin/true /bin/false /bin/uname | grep \"id\"
> >             "id": "1-/bin/true",
> >             "id": "2-/bin/false",
> >             "id": "3-/bin/uname",
> > 
> >    $ avocado run --test-runner=nrunner --json=- -- /bin/true /bin/false /bin/uname | grep \"id\"
> >             "id": "1-1-/bin/true",
> >             "id": "2-2-/bin/false",
> >             "id": "3-3-/bin/uname",
> > 
> > The goal is to make the IDs the same.
> 
> I guess this is only necessary for compatibility once the switch is done. Our custom

Right.  I'm really thinking of users transition from the current
runner to the nrunner here.

> runner produces a different type of IDs that is really architecture specific and I
> think the freedom to define the test IDs should remain to the developers at least as
> long as they still have the freedom to resolve and control the running process of their
> own test suites while hopefully keeping some of the large parallelism improvements above.

So, a "Task's ID", in the nrunner terminology, can be *anything at all*.  Here is
just what the runner will use to populate that field.

> Is this the idea behind calling this a limitation or am I missing something? I saw no
> explicit explanation why this is a problem so I guess it must be compatibility.
>

You got that right.  In theory, runners could set any type of ID they want.  Now,
I'm not sure if we should allow the base nrunner implementation to allow users to
go wild with the IDs, but writing your own runner will definitely do it.

Finally, on this topic, this specific issue was already addressed in the
80.0 version, and should give you some ideas about what we've discussed
here:

  https://github.com/avocado-framework/avocado/pull/3871

> > Inability to run Tasks other than exec, exec-test, python-unittest (and noop)
> > -----------------------------------------------------------------------------
> > 
> > The current implementation of the nrunner plugin is based on the fact that
> > Tasks are already present at ``test_suite`` job attribute, and that running
> > Tasks can be (but shouldn't always be) a matter of iterating of the result
> > of its ``run()`` method.  This is part of the actual code::
> > 
> >     for status in task.run():
> >       result_dispatcher.map_method('test_progress', False)
> >       statuses.append(status)
> > 
> > The problem here is that only the Python classes implemented in the core
> > "avocado.core.nrunner" module, and registered at:
> > 
> >   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.nrunner.RUNNERS_REGISTRY_PYTHON_CLASS
> > 
> > The goal is to have all other Python classes that inherit from
> > "avocado.core.nrunner.BaseRunner" available in such a registry.
> 
> I guess this relates on making task management flexible as the next to last sentence
> there is not finished. If so then I guess this reduces to the custom test suite running
> limitations I mentioned above.
>

This relates to the fact that the nrunner can run a task by gathering the
information it has about it, and running a command such as:

  avocado-runner task-run $information-about-a-task

Such a command will, in the end, construct a Task instance from the given
information and run it.  But, if the user requested a local execution
(a process based spawner, see the next point), and the runner for such
a task is based on Python code (there's no requirement for a runner
such as avocado-runner-go to be written in Python), then it should be
possible to use the Python based class directly.  For that, knowledge
about it is required.

Example:

  $ pip install avocado-framework
  $ pip install avocado-framework-runner-pytest
  $ grep BaseRunner ~/.local/lib/python3.7/site-packages/avocado-framework-runner-pytest/*
  __init__.py: class PyTestRunner(nrunner.BaseRunner)

Assuming nrunner plugin is now the default (--test-runner=nrunner),
running this:

  $ avocado run /path/to/pytests

Would know about PyTestRunner, and not *need* to run the also installed
"avocado-runner-pytest" command line executable.  Let me know if that
makes sense to you, and for more info about this BaseRunner class:

  https://avocado-framework.readthedocs.io/en/80.0/future/core/nrunner.html#writing-new-runner-scripts

> > Inability to run Tasks with Spawners
> > ------------------------------------
> > 
> > While the "avocado nrun" command makes use of the Spawners, the
> > current implementation of the nrunner plugin described earlier,
> > calls a Task's ``run()`` method directly, and clearly doesn't
> > use spawners.
> > 
> > The goal here is to leverage spawners so that other isolation
> > models (or execution environments, depending how you look at
> > processes, containers, etc) are supported.
> 
> +1 on prioritizing this since the current blueprint is confusing with both tasks handled
> directly and through spawners
>

I agree.

> > Unoptmized execution of Tasks (extra serialization/deserialization)
> > -------------------------------------------------------------------
> > 
> > At this time, the nrunner plugin runs a Task directly through its
> > ``run()`` method.  Besides the earlier point of not supporting
> > other isolation models/execution environments (that means not using
> > spawners), there's an extra layer of work happening when running
> > a task which is most often not necessary: turning a Task instance
> > into a command line, and within its execution, turning it into a
> > Task instance again.
> > 
> > The goal is to support an optmized execution of the tasks, without
> > having to turn them into command lines, and back into Task instances.
> > The idea is already present in the spawning method definitions:
> > 
> >   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.spawners.html#avocado.core.spawners.common.SpawnMethod.PYTHON_CLASS
> > 
> > And a PoC on top of the ``nrun`` command was implemented here:
> > 
> >   https://github.com/avocado-framework/avocado/pull/3766/commits/ae57ee78df7f2935e40394cdfc72a34b458cdcef
> 
> Speaking from the perspective of the past, what Autotest used to do to run code on
> guest vms (or containers or anything assuming only just a remote address) was deploy
> control files to the remote/precooked environment and run the control file. I still
> preserved this in the "remote door" utility mentioned in the previous thread and in
> addition added minimal metaprogramming capabilities (python scripts writing minimal
> python scripts and a few others) and python remote object serialization through Pyro.
>

Right, and I can't say that I feel too excited about writing control
files.  I know that if it's kept limited, it will *probably* not shoot
you in the feet, but I feel that the model of describing what you want
to execute wins.  I'm clearly talking about packing that information in
a avocado.core.nrunner.Task, and, if parameters are not enough, it should
be easy to write a new "runnable kind".

For instance, this model should allow for a "python-code" kind of
runnable, which takes as its parameters just that.  It should even
allow for an "autotest-control" runnable type!  OK, please don't take
me seriously on this last point :).

> I mention this because I think it could be useful to take a look at some of these
> approaches and reuse as much as possible from them. Some of them also offer optimizations
> for more specific test scenarios.
>

Sure, I'll definitely look into them.  But, one of the limitations
we're trying to break apart here is the Python specific solutions.  So
far, it looks like this model would allow Avocado to manage tests for
a more diverse environment, which is important in itself, and even
more in integration testing environments, where different components
may have been written in different languages.

> > Task execution coordination goals
> > ---------------------------------
> > 
> > As stated earlier, to run a job, tasks must be executed. Differently
> > than the current runner, the N(ext) Runner architecture allows those
> > to be executed in a much more decoupled way. This characteristic will
> > be maintained, but it needs to be adapted into the current Job
> > execution.
> > 
> > From a high level view, the nrunner plugin needs to:
> > 
> > 1. Break apart from the "one at a time" Task execution model that it
> >    currently employs;
> > 
> > 2. Check if a Tasks can be executed, that is, if its requirements can
> >    be fulfilled (the most basic requirement for a task is a matching
> >    runner;
> > 
> > 3. Prepare for the execution of a task, such as the fulfillment of
> >    extra tasks requirements. The requirements resolver is one, if not
> >    the only way, component that should be given a chance to act here;
>

FIY, I meant s/only way/only one/.

> The requirements resolver should remain possible to inherit from and customize though.
> The consideration of downloading all assets from the internet has security implications,
> bandwidth overhead, and is simply not universal for all test dependency scenarios. So at
> least keeping the door open here will no doubt be useful in the long term.
>

Right, the "requirement" concept is abstract on purpose.

> > 4. Executes a task in prepared environment;
> > 
> > 5. Monitor the execution of a task (from an external PoV);
> > 
> > 6. Collect the status messages that tasks will send;
> > 
> >    a. Forward the status messages to the appropriate job components,
> >       such as the result plugins.
> > 
> >    b. Depending on the content of messages, such as the ones
> >       containing "status: started" or "status: finished", interfere in
> >       the Task execution status, and consequently, in the Job
> >       execution status.
> > 
> > 7. Verify, warn the user, and attempt to clean up stray tasks.  This
> >    may be for instance, necessary if a Task on a container seems to
> >    be stuck, and the container can not be destroyed.  The same applies
> >    to process in some time of uninterruptile sleeps.
> 
> +1
> 
> > Parallelization
> > ---------------
> > 
> > Because the N(ext) Runner features allow for parallel execution of tasks,
> > all other aspects of task execution coordination (fulfilling requirements,
> > collecting results, etc) should not block each other.
> > 
> > There are a number of strategies for concurrent programming in Python
> > these days, and the "avocado nrun" command currently makes use of
> > asyncio to have coroutines that spawn tasks and collect results
> > concurrently (in a cooperative preemptive model).  The actual language
> > or library features used is, IMO, less important than the end result.
> 
> I think this requirement might be too strong. I am ware one could disable parallel runs and
> go entirely sequentially but it is too strong as well. I think the most configurable approach
> would be in the middle - if tasks could be allowed to have sequential asset or even mutual
> dependencies and a scheduler could execute only tasks with currently satisfied requirements
> this will be way more flexible.
>

That's what I mean by "should not block each other".  If a given
requirement is being worked on, that is, attempted to be fulfilled,
tasks that have all requirements available should be allowed to run.

> > Suggested terminology
> > ---------------------
> 
> +1 on all items
> 
> > Task workflow
> > -------------
> > ...
> > Iteration I
> > ~~~~~~~~~~~
> > 
> > Task #1 is selected on the first iteration, and it's found that:
> > 
> > 1. A suitable runner for tasks of kind ``python-unittest`` exists
> > 
> > 2. The ``mylib.py`` requirement is already present on the current
> >    environment
> > 
> > 3. The ``gcc`` and ``libc-devel`` packages are not installed in the
> >    current environment
> > 
> > 4. The system is capable of *attempting* to fulfill "package" types of
> >    requirements.
> 
> I guess by a capable system you mean the system performing additional actions to modify
> the state of the environment. Could there be any type of support for undoing  such changes?

I believe you mean something like ansible's with its "state:
installed|removed|..." support in some of its modules, right?
TBH, I have not addressed this type of problem.  So far, it's my
understanding that we'll be able to leverage environments that can be
reset to a previous state externally, such as container or VM images.
But, it doesn't mean that some requirement fulfillers could not
implement such a behavior in the future, it'd basically be another
hook, called if the user requests and the fulfiller supports it.

> I guess if there are 1000 tasks with the same dependency, the first iteration will provide
> it and the remaining 999 tasks will reuse it which is great but what if a later one would
> require the environment to be brought back to the previous state? We make use of LVM to switch
> and track the provision of snapshots that could be something useful to think about here for the
> future. Such implementation would be faster than downloading/preparing new environments from
> scratch.
>

That's right on the spot.  The goal is to evolve such as resolver to keep
track of what has been done, and avoid doing it again.  Think of it as
an evolution of the current "asset fetch/cache" idea, but made generic.

Say, if the "gcc" package is a requirement, and the spawner in use is
a container based one, configured to use image "fedora:32", Avocado
would realize that:

 1. for spawner==container, I don't know about gcc being installed
    on an fedora:32 image, this is a "cache miss"

 2. let me run a Task("package-isntall", args=("gcc")) with a container
    spawner, using the fedora:32 image... success!  Don't remove that
    image.

 3. keep a note now that, for spawner==container, and fedora:32 image,
    there's an image ready.

On a subsequent execution, that container image would be used.    

> > Tallying results
> > ~~~~~~~~~~~~~~~~
> > 
> > The nrunner plugin should be able to provide meaningful results to the Job,
> > and consequently to the user, based on the resulting information on the
> > final iteration.
> > 
> > Notice that some information will come, such as the ``PASS`` for the
> > first test, will come from the "result" given in a status message from
> > the task itself.  Some other status, such as the ``INTERRUPTED``
> > status for the second test will not come from a status message
> > received, but from a realization of the actual management of the task
> > execution.  It's expected to other information will also have to be
> > inferred, and "filled in" by the nrunner plugin implementation
> > 
> > In the end, it's expected that results similar to this would be
> > presented::
> > 
> >   JOB ID     : f59bd40b8ac905864c4558dc02b6177d4f422ca3
> >   JOB LOG    : /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/job.log
> >    (1/2) tests.py:Test.test_2: PASS (2.56 s)
> >    (2/2) tests.py:Test.test_1: INTERRUPT (900 s)
> >   RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
> >   JOB TIME   : 0.19 s
> >   JOB HTML   : /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/results.html
> > 
> > Notice how Task #2 shows up before Task #1, because it was both started first,
> > and finished earlier.  There may be issues associated with the current UI to
> > dealt with regarding out of order task status updates.
> 
> Perhaps the status collection could regularly update itself and provide an ongoing view of
> the number of collected PASS, ERROR, INTERRUPT, and so on tasks? If it provides such a view,
> it could always sort the received statuses.
>

Right, that's the goal.  It's going to involve a bit of terminal /
curses hackery / improvements for the current UI, but it's definitely
the goal.

> > Summary
> > =======
> > 
> > This proposal contains a number of items that can become GitHub issues
> > at this stage.  It also contains a general explanation of what I believe
> > are the crucial missing features to make the N(ext) Runner implementation
> > available to the general public.
> > 
> > Feedback is highly appreciated, and it's expected that this document will
> > evolve into a better version, and possibly become a formal Blue Print.
> > 
> > Thanks,
> > - Cleber.
> > 
> 
> I hope you find some of my comments useful and are willing to provide some further comments so
> that we could all contribute and coordinate on what is to become of such large reimplementation.
> 
> Best,
> Plamen
> 

I sure did!  Let me know if my comments made our goals and ideas
clearer, and I look forward to getting patches too! :)

Take care!
- Cleber.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/avocado-devel/attachments/20200609/18955646/attachment.sig>