[Avocado-devel] RFC: N(ext) Runner - A proposal to the finish line

Willian Rampazzo wrampazz at redhat.com
Fri May 22 20:45:23 UTC 2020


Hello Cleber,

Thanks for this RFC, it is appreciated. I see you have different
points for discussion in this RFC, it would be better to discuss them
in different places/ways, I will try to give my contribution to those
that I can, but I will hold my comments about the task scheduler. The
format of a blueprint, using the motivation and dividing into sections
would be better for my understanding for this kind of architecture
related discussion.

On Wed, May 20, 2020 at 8:33 PM Cleber Rosa <crosa at redhat.com> wrote:
>
> Intro
> =====
>
> This is a more technical follow up to the points given in a previous
> thread.  Because that thread and the current N(ext) Runner documentation
> for a good context for this proposal, I encourage everyone to read them
> first:
>
>   https://www.redhat.com/archives/avocado-devel/2020-May/msg00009.html
>
>   https://avocado-framework.readthedocs.io/en/79.0/future/core/nrunner.html
>
> The N(ext) Runner allows for greater flexibility than the the current
> runner, so to be effective in delivering the N(ext) Runner for general
> usage, we must define the bare minimum that still needs to be
> implemented.
>
> Basic Job and Task execution
> ============================
>
> An Task, within the context of the N(ext) Runner, is described as "one
> specific instance/occurrence of the execution of a runnable with its
> respective runner".
>
> A Task is a very important building block for Avocado Job, and running
> an Avocado Job means, to a large extent, running a number of Tasks.
> The Tasks that need to be executed in a Job, are created during
> the ``create_test_suite()`` phase:
>
>   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.create_test_suite
>
> And are kept in the Job's ``test_suite`` attribute:
>
>   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.test_suite
>
> Running the tests, then, happens during the ``run_tests()`` phase:
>
>   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.run_tests
>
> During the ``run_tests()`` phase, a plugin that run test suites on a
> job is chosen, based on the ``run.test_runner`` configuration.
> The current "work in progress" implementation for the N(ext) Runner,
> can be activated either by setting that configuration key to ``nrunner``,
> which can be easily done on the command line too::
>
>   avocado run --test-runner=nrunner /bin/true
>
> A general rule for measuring the quality and completeness of the
> ``nrunner`` implementation is to run the same jobs with the current
> runner, and compare its behavior and output with that of the
> ``nrunner``.  For here on, we'll call this simply the "nrunner
> plugin".
>
> Known issues and limitations of the current implementation
> ==========================================================
>
> Different Test IDs
> ------------------
>
> When running tests with the current runner, the Test IDs are different::
>
>    $ avocado run --test-runner=runner --json=- -- /bin/true /bin/false /bin/uname | grep \"id\"
>             "id": "1-/bin/true",
>             "id": "2-/bin/false",
>             "id": "3-/bin/uname",
>
>    $ avocado run --test-runner=nrunner --json=- -- /bin/true /bin/false /bin/uname | grep \"id\"
>             "id": "1-1-/bin/true",
>             "id": "2-2-/bin/false",
>             "id": "3-3-/bin/uname",
>
> The goal is to make the IDs the same.
>

In my opinion, this seems to be a simple issue that is easily tracked
on GitHub. If we are going to keep the output of the nrunner just like
the current runner, there is not much to discuss, only implement.

> Inability to run Tasks other than exec, exec-test, python-unittest (and noop)
> -----------------------------------------------------------------------------
>
> The current implementation of the nrunner plugin is based on the fact that
> Tasks are already present at ``test_suite`` job attribute, and that running
> Tasks can be (but shouldn't always be) a matter of iterating of the result
> of its ``run()`` method.  This is part of the actual code::
>
>     for status in task.run():
>       result_dispatcher.map_method('test_progress', False)
>       statuses.append(status)
>
> The problem here is that only the Python classes implemented in the core
> "avocado.core.nrunner" module, and registered at:
>
>   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.nrunner.RUNNERS_REGISTRY_PYTHON_CLASS
>
> The goal is to have all other Python classes that inherit from
> "avocado.core.nrunner.BaseRunner" available in such a registry.
>

Agreed, we need to find a way to centralize supported runners, not
only implemented into the Avocado core. A registration method like we
are using for the new avocado parameters is an option. Another option
is to do the same way we register plugins today, utilizing the
setup.py. The problem I see with both solutions is breaking the
"standalone" effect of nrunner.py. Right now, I don't have a better
solution for it.

> Inability to run Tasks with Spawners
> ------------------------------------
>
> While the "avocado nrun" command makes use of the Spawners, the
> current implementation of the nrunner plugin described earlier,
> calls a Task's ``run()`` method directly, and clearly doesn't
> use spawners.
>
> The goal here is to leverage spawners so that other isolation
> models (or execution environments, depending how you look at
> processes, containers, etc) are supported.
>

Agreed! If tasks are the default way to run a test on nrunner,
Spawners should be the default "way of transportation" to achieve it.
This discussion and its issues can be tracked as an epic on GitHub.

> Unoptmized execution of Tasks (extra serialization/deserialization)
> -------------------------------------------------------------------
>
> At this time, the nrunner plugin runs a Task directly through its
> ``run()`` method.  Besides the earlier point of not supporting
> other isolation models/execution environments (that means not using
> spawners), there's an extra layer of work happening when running
> a task which is most often not necessary: turning a Task instance
> into a command line, and within its execution, turning it into a
> Task instance again.
>
> The goal is to support an optmized execution of the tasks, without
> having to turn them into command lines, and back into Task instances.
> The idea is already present in the spawning method definitions:
>
>   https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.spawners.html#avocado.core.spawners.common.SpawnMethod.PYTHON_CLASS
>
> And a PoC on top of the ``nrun`` command was implemented here:
>
>   https://github.com/avocado-framework/avocado/pull/3766/commits/ae57ee78df7f2935e40394cdfc72a34b458cdcef
>

If I understood correctly, starting here, you would discuss the
architecture for a task scheduler. I understood the phases, but this
discussion should be self-contained, decoupled from the previous
content, better if it is in the blueprint format with motivation, and
divided into sections.

> Proposal
> ========
>
> Besides the known limitations listed previously, there are others that
> will appear along the way, and certainly some new challeges as we
> solve them.
>
> The goal of this proposal is to attempt to identify those challenges,
> and lay out a plan that can be tackled by the Avocado team/community
> and not by a single person.
>
> Task execution coordination goals
> ---------------------------------
>
> As stated earlier, to run a job, tasks must be executed. Differently
> than the current runner, the N(ext) Runner architecture allows those
> to be executed in a much more decoupled way. This characteristic will
> be maintained, but it needs to be adapted into the current Job
> execution.
>
> From a high level view, the nrunner plugin needs to:
>
> 1. Break apart from the "one at a time" Task execution model that it
>    currently employs;
>
> 2. Check if a Tasks can be executed, that is, if its requirements can
>    be fulfilled (the most basic requirement for a task is a matching
>    runner;
>
> 3. Prepare for the execution of a task, such as the fulfillment of
>    extra tasks requirements. The requirements resolver is one, if not
>    the only way, component that should be given a chance to act here;
>
> 4. Executes a task in prepared environment;
>
> 5. Monitor the execution of a task (from an external PoV);
>
> 6. Collect the status messages that tasks will send;
>
>    a. Forward the status messages to the appropriate job components,
>       such as the result plugins.
>
>    b. Depending on the content of messages, such as the ones
>       containing "status: started" or "status: finished", interfere in
>       the Task execution status, and consequently, in the Job
>       execution status.
>
> 7. Verify, warn the user, and attempt to clean up stray tasks.  This
>    may be for instance, necessary if a Task on a container seems to
>    be stuck, and the container can not be destroyed.  The same applies
>    to process in some time of uninterruptile sleeps.
>
> Parallelization
> ---------------
>
> Because the N(ext) Runner features allow for parallel execution of tasks,
> all other aspects of task execution coordination (fulfilling requirements,
> collecting results, etc) should not block each other.
>
> There are a number of strategies for concurrent programming in Python
> these days, and the "avocado nrun" command currently makes use of
> asyncio to have coroutines that spawn tasks and collect results
> concurrently (in a cooperative preemptive model).  The actual language
> or library features used is, IMO, less important than the end result.
>
> Suggested terminology
> ---------------------
>
> Task execution has been requested
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> A Task whose execution was requested by the user.  All of the tasks on
> a Job's ``test_suite`` attribute are requested tasks.
>
> If a software component deals with this type of task, it's advisable
> that it refers to ``TASK_REQUESTED`` or ``requested_tasks`` or a
> similar name that links to this definition.
>
> Task is being triaged
> ~~~~~~~~~~~~~~~~~~~~~
>
> The details of the task are being analyzed, including and most
> importantly the ability of the system to *attempt* to fulfill its
> requirements. A task leaves triage and it's either considered
> "discarded" or proceeds to be prepared and then executed.
>
> If a software component deals with this type of task, for instance if
> a "task scheduler" is looking for runners matching the Task's kind, it
> should keep it under a ``tasks_under_triage`` or mark the tasks as
> ``UNDER_TRIAGE`` or ``TRIAGING`` a similar name that links to this
> definition.
>
> Task is being prepared
> ~~~~~~~~~~~~~~~~~~~~~~
>
> Task has left triage, and it has not been discarded, that is, it's
> a candidate to be setup, and if it goes well, executed.
>
> The requirements for a task are being prepared in its reespective
> isolation model/execution environment, that is, the spawner it'll
> be executed with is known, and the setup actions will be visible
> by the task.
>
> If a software component deals with this type of task, for instance the
> implementation of resolution of specific requirements, it should
> should keep it under a ``tasks_preparing`` or mark the tasks as
> ``PREPARING`` or a similar name that links to this definition.
>
> Task is ready to be started
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Task has been prepared succesfully, and can now be executed.
>
> If a software component deals with this type of task, it should
> should keep it under a ``tasks_ready`` or mark the tasks as
> ``READY`` or a similar name that links to this definition.
>
> Task is being started
> ~~~~~~~~~~~~~~~~~~~~~
>
> A hopefully short lived state, in which a task that is ready to be started
> (see previous point) will be given to the reespective spawner to be started.
>
> If a software component deals with this type of task, it should should
> keep it under a ``tasks_starting`` or mark the tasks as ``STARTING``
> or a similar name that links to this definition.
>
> The spawner should know if the starting of the task succeeded or failed,
> and the task should be categorized accordingly.
>
> Task has been started
> ~~~~~~~~~~~~~~~~~~~~~
>
> A task was successfully started by a spawner.
>
> Note that it does *not* mean that the test that the task runner (say,
> an "avocado-runner-$kind task-run" command) will run has already been
> started.  This will be signalled by a "status: started" kind of
> message.
>
> If a software component deals with this type of task, it should should
> keep it under a ``tasks_started`` or mark the tasks as ``STARTED`` or
> a similar name that links to this definition.
>
> Task has failed to start
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
> Quite self explanatory. If the spawner failed to start a task, it
> should be kept under a ``tasks_failed_to_start`` structure or be
> marked as ``FAILED_TO_START`` or a similar name that links to this
> definition.
>
> Task is finished
> ~~~~~~~~~~~~~~~~
>
> This means that the task has started, and is now finished.  There's no
> associated meaning here about the pass/fail output of the test payload
> executed by the task.
>
> It should be kept under a ``tasks_finished`` structure or be marked as
> ``FINISHED`` or a similar name that links to this definition.
>
> Task has been interrupted
> ~~~~~~~~~~~~~~~~~~~~~~~~~
>
> This means that the task has started, but has not finished and it's
> past due.
>
> It should be kept under a ``tasks_interrupted`` structure or be marked
> as ``INTERRUPTED`` or a similar name that links to this definition.
>
> Task workflow
> -------------
>
> A task will usually be created from a Runnable.  A Runnable will, in
> turn, almost always be created as part of the "avocado.core.resolver"
> module.  Let's consider the following output of a resolution::
>
>   +--------------------------------------+
>   | ReferenceResolution #1               |
>   +--------------------------------------+
>   | Reference: test.py                   |
>   | Result: SUCCESS                      |
>   | +----------------------------------+ |
>   | | Resolution #1 (Runnable):        | |
>   | |  - kind: python-unittest         | |
>   | |  - uri: test.py:Test.test_1      | |
>   | |  - requirements:                 | |
>   | |    + file: mylib.py              | |
>   | |    + package: gcc                | |
>   | |    + package: libc-devel         | |
>   | +----------------------------------+ |
>   | +----------------------------------+ |
>   | | Resolution #2 (Runnable):        | |
>   | |  - kind: python-unittest         | |
>   | |  - uri: test.py:Test.test_2      | |
>   | |  - requirements:                 | |
>   | |    + file: mylib.py              | |
>   | +----------------------------------+ |
>   +--------------------------------------+
>
> Two Runnables here will be transformed into Tasks.  The process
> usually includes adding an identification (I) and a status URI (II)::
>
>   +----------------------------------+    +----------------------------------+
>   | Resolution #1 (Runnable):        |    | Resolution #2 (Runnable):        |
>   |  - kind: python-unittest         |    |  - kind: python-unittest         |
>   |  - uri: test.py:Test.test_1      |    |  - uri: test.py:Test.test_2      |
>   |  - requirements:                 |    |  - requirements:                 |
>   |    + file: mylib.py              |    |    + file: mylib.py              |
>   |    + package: gcc                |    +----------------------------------+
>   |    + package: libc-devel         |                    ||
>   +----------------------------------+                    ||
>                   ||                                      ||
>                   ||                                      ||
>                   \/                                      \/
>   +----------------------------------+    +----------------------------------+
>   | Task #1:                         |    | Task #2:                         |
>   |  - id: 1-test.py:Test.test_1  (I)|    |  - id: 2-test.py:Test.test_2  (I)|
>   |  - kind: python-unittest         |    |  - kind: python-unittest         |
>   |  - uri: test.py:Test.test_1      |    |  - uri: test.py:Test.test_2      |
>   |  - requirements:                 |    |  - requirements:                 |
>   |    + file: mylib.py              |    |    + file: mylib.py              |
>   |    + package: gcc                |    |  - status uris:                  |
>   |    + package: libc-devel         |    |    + 127.0.0.1:8080          (II)|
>   |  - status uris:                  |    +----------------------------------+
>   |    + 127.0.0.1:8080          (II)|
>   +----------------------------------+
>
> In the end, a job will contain a ``test_suite`` with "Task #1" and
> "Task #2".  It means that the execution of both tasks were requested
> by the Job owner::
>
>   +---------------------------------------------------------------------------+
>   | REQUESTED                                                                 |
>   +---------------------------------------------------------------------------+
>   | +----------------------------------+ +----------------------------------+ |
>   | | Task #1:                         | | Task #2:                         | |
>   | |  - id: 1-test.py:Test.test_1     | |  - id: 2-test.py:Test.test_2     | |
>   | |  - kind: python-unittest         | |  - kind: python-unittest         | |
>   | |  - uri: test.py:Test.test_1      | |  - uri: test.py:Test.test_2      | |
>   | |  - requirements:                 | |  - requirements:                 | |
>   | |    + file: mylib.py              | |    + file: mylib.py              | |
>   | |    + package: gcc                | |  - status uris:                  | |
>   | |    + package: libc-devel         | |    + 127.0.0.1:8080              | |
>   | |  - status uris:                  | +----------------------------------+ |
>   | |    + 127.0.0.1:8080              |                                      |
>   | +----------------------------------+                                      |
>   +---------------------------------------------------------------------------+
>
> These tasks now will be triaged.  A suitable implementation will
> move those tasks to a ``tasks_under_triage`` queue,  mark them as
> ``UNDER_TRIAGE`` or some other strategy to differentiate the
> tasks at this stage::
>
>   +---------------------------------------------------------------------------+
>   | UNDER_TRIAGE                                                              |
>   +---------------------------------------------------------------------------+
>   | +----------------------------------+ +----------------------------------+ |
>   | | Task #1:                         | | Task #2:                         | |
>   | |  - id: 1-test.py:Test.test_1     | |  - id: 2-test.py:Test.test_2     | |
>   | |  - kind: python-unittest         | |  - kind: python-unittest         | |
>   | |  - uri: test.py:Test.test_1      | |  - uri: test.py:Test.test_2      | |
>   | |  - requirements:                 | |  - requirements:                 | |
>   | |    + file: mylib.py              | |    + file: mylib.py              | |
>   | |    + package: gcc                | |  - status uris:                  | |
>   | |    + package: libc-devel         | |    + 127.0.0.1:8080              | |
>   | |  - status uris:                  | +----------------------------------+ |
>   | |    + 127.0.0.1:8080              |                                      |
>   | +----------------------------------+                                      |
>   +---------------------------------------------------------------------------+
>
> Iteration I
> ~~~~~~~~~~~
>
> Task #1 is selected on the first iteration, and it's found that:
>
> 1. A suitable runner for tasks of kind ``python-unittest`` exists
>
> 2. The ``mylib.py`` requirement is already present on the current
>    environment
>
> 3. The ``gcc`` and ``libc-devel`` packages are not installed in the
>    current environment
>
> 4. The system is capable of *attempting* to fulfill "package" types of
>    requirements.
>
> Task #1 will then be prepared.  No further action is performed on the
> first iteration, because no other relevant state exists (Task #2, the
> only other requested task, has not progressed beyone its initial
> stage)::
>
>   +---------------------------------------------------------------------------+
>   | UNDER_TRIAGE                                                              |
>   +---------------------------------------------------------------------------+
>   |                                      +----------------------------------+ |
>   |                                      | Task #2:                         | |
>   |                                      |  - id: 2-test.py:Test.test_2     | |
>   |                                      |  - kind: python-unittest         | |
>   |                                      |  - uri: test.py:Test.test_2      | |
>   |                                      |  - requirements:                 | |
>   |                                      |    + file: mylib.py              | |
>   |                                      |  - status uris:                  | |
>   |                                      |    + 127.0.0.1:8080              | |
>   |                                      +----------------------------------+ |
>   |                                                                           |
>   |                                                                           |
>   +---------------------------------------------------------------------------+
>
>   +---------------------------------------------------------------------------+
>   | PREPARING                                                                 |
>   +---------------------------------------------------------------------------+
>   | +----------------------------------+                                      |
>   | | Task #1:                         |                                      |
>   | |  - id: 1-test.py:Test.test_1     |                                      |
>   | |  - kind: python-unittest         |                                      |
>   | |  - uri: test.py:Test.test_1      |                                      |
>   | |  - requirements:                 |                                      |
>   | |    + file: mylib.py              |                                      |
>   | |    + package: gcc                |                                      |
>   | |    + package: libc-devel         |                                      |
>   | |  - status uris:                  |                                      |
>   | |    + 127.0.0.1:8080              |                                      |
>   | +----------------------------------+                                      |
>   +---------------------------------------------------------------------------+
>
> Iteration II
> ~~~~~~~~~~~~
>
> On the second iteration, Task #2 is selected, and it's found that:
>
> 1. A suitable runner for tasks of kind ``python-unittest`` exists
>
> 2. The ``mylib.py`` requirement is already present on the current
>    environment
>
> Task #2 is now ready to be started.  Possibily concurrently, the
> setup of Task #1, selected as the single entry being prepared,
> is having its requirements prepared::
>
>   +---------------------------------------------------------------------------+
>   | UNDER_TRIAGE                                                              |
>   +---------------------------------------------------------------------------+
>   |                                                                           |
>   +---------------------------------------------------------------------------+
>
>   +---------------------------------------------------------------------------+
>   | READY                                                                     |
>   +---------------------------------------------------------------------------+
>   |                                      +----------------------------------+ |
>   |                                      | Task #2:                         | |
>   |                                      |  - id: 2-test.py:Test.test_2     | |
>   |                                      |  - kind: python-unittest         | |
>   |                                      |  - uri: test.py:Test.test_2      | |
>   |                                      |  - requirements:                 | |
>   |                                      |    + file: mylib.py              | |
>   |                                      |  - status uris:                  | |
>   |                                      |    + 127.0.0.1:8080              | |
>   |                                      +----------------------------------+ |
>   |                                                                           |
>   |                                                                           |
>   +---------------------------------------------------------------------------+
>
>   +---------------------------------------------------------------------------+
>   | PREPARING                                                                 |
>   +---------------------------------------------------------------------------+
>   | +----------------------------------+                                      |
>   | | Task #1:                         |                                      |
>   | |  - id: 1-test.py:Test.test_1     |                                      |
>   | |  - kind: python-unittest         |                                      |
>   | |  - uri: test.py:Test.test_1      |                                      |
>   | |  - requirements:                 |                                      |
>   | |    + file: mylib.py              |                                      |
>   | |    + package: gcc                |                                      |
>   | |    + package: libc-devel         |                                      |
>   | |  - status uris:                  |                                      |
>   | |    + 127.0.0.1:8080              |                                      |
>   | +----------------------------------+                                      |
>   +---------------------------------------------------------------------------+
>
> Iteration III
> ~~~~~~~~~~~~~
>
> On the third iteration, there are no tasks left under triage, so
> the action is now limited to tasks being prepared and ready to
> be started.
>
> Supposing that the "status uri" 127.0.0.1:8080, was set by the job, as
> its internal status server, it must be started before any task, to
> avoid any status message being lost.
>
> At this stage, Task #2 is started, and Task #1 is now ready::
>
>   +---------------------------------------------------------------------------+
>   | UNDER_TRIAGE                                                              |
>   +---------------------------------------------------------------------------+
>   |                                                                           |
>   +---------------------------------------------------------------------------+
>
>   +---------------------------------------------------------------------------+
>   | STARTED                                                                   |
>   +---------------------------------------------------------------------------+
>   |                                      +----------------------------------+ |
>   |                                      | Task #2:                         | |
>   |                                      |  - id: 2-test.py:Test.test_2     | |
>   |                                      |  - kind: python-unittest         | |
>   |                                      |  - uri: test.py:Test.test_2      | |
>   |                                      |  - requirements:                 | |
>   |                                      |    + file: mylib.py              | |
>   |                                      |  - status uris:                  | |
>   |                                      |    + 127.0.0.1:8080              | |
>   |                                      +----------------------------------+ |
>   |                                                                           |
>   |                                                                           |
>   +---------------------------------------------------------------------------+
>
>   +---------------------------------------------------------------------------+
>   | READY                                                                     |
>   +---------------------------------------------------------------------------+
>   | +----------------------------------+                                      |
>   | | Task #1:                         |                                      |
>   | |  - id: 1-test.py:Test.test_1     |                                      |
>   | |  - kind: python-unittest         |                                      |
>   | |  - uri: test.py:Test.test_1      |                                      |
>   | |  - requirements:                 |                                      |
>   | |    + file: mylib.py              |                                      |
>   | |    + package: gcc                |                                      |
>   | |    + package: libc-devel         |                                      |
>   | |  - status uris:                  |                                      |
>   | |    + 127.0.0.1:8080              |                                      |
>   | +----------------------------------+                                      |
>   +---------------------------------------------------------------------------+
>
>   +---------------------------------------------------------------------------+
>   | STATUS SERVER "127.0.0.1:8080"                                            |
>   +---------------------------------------------------------------------------+
>   | Status Messages: []                                                       |
>   +---------------------------------------------------------------------------+
>
>
> Iteration IV
> ~~~~~~~~~~~~
>
> On the fourth iteration, Task #1 is started::
>
>   +---------------------------------------------------------------------------+
>   | STARTED                                                                   |
>   +---------------------------------------------------------------------------+
>   | +----------------------------------+ +----------------------------------+ |
>   | | Task #1:                         | | Task #2:                         | |
>   | |  - id: 1-test.py:Test.test_1     | |  - id: 2-test.py:Test.test_2     | |
>   | |  - kind: python-unittest         | |  - kind: python-unittest         | |
>   | |  - uri: test.py:Test.test_1      | |  - uri: test.py:Test.test_2      | |
>   | |  - requirements:                 | |  - requirements:                 | |
>   | |    + file: mylib.py              | |    + file: mylib.py              | |
>   | |    + package: gcc                | |  - status uris:                  | |
>   | |    + package: libc-devel         | |    + 127.0.0.1:8080              | |
>   | |  - status uris:                  | +----------------------------------+ |
>   | |    + 127.0.0.1:8080              |                                      |
>   | +----------------------------------+                                      |
>   +---------------------------------------------------------------------------+
>
>   +---------------------------------------------------------------------------+
>   | STATUS SERVER "127.0.0.1:8080"                                            |
>   +---------------------------------------------------------------------------+
>   | Status Messages:                                                          |
>   | - {id: 2-test.py:Test.test_2, status: started}                            |
>   +---------------------------------------------------------------------------+
>
> Note: the ideal level of parallelization is still to be defined, that
> is, it may be that triaging and preparing and starting tasks, all run
> concurrently.  An initial implementation that, on each iteration,
> looks at all Task states, and attempts to advance them further,
> blocking other Tasks as much as little as possible should be
> acceptable.
>
> Iteration V
> ~~~~~~~~~~~
>
> On the fifth iteration, the spawner reports that Task #2 is not alive anymore,
> and the status server has received a message about it (and also a message about
> Task #1 having started)::
>
>   +---------------------------------------------------------------------------+
>   | STATUS SERVER "127.0.0.1:8080"                                            |
>   +---------------------------------------------------------------------------+
>   | Status Messages:                                                          |
>   | - {id: 2-test.py:Test.test_2, status: started}                            |
>   | - {id: 1-test.py:Test.test_1, status: started}                            |
>   | - {id: 2-test.py:Test.test_2, status: finished, result: pass}             |
>   +---------------------------------------------------------------------------+
>
> Because of that, Task #2 is now considered ``FINISHED``::
>
>   +---------------------------------------------------------------------------+
>   | FINISHED                                                                  |
>   +---------------------------------------------------------------------------+
>   |                                      +----------------------------------+ |
>   |                                      | Task #2:                         | |
>   |                                      |  - id: 2-test.py:Test.test_2     | |
>   |                                      |  - kind: python-unittest         | |
>   |                                      |  - uri: test.py:Test.test_2      | |
>   |                                      |  - requirements:                 | |
>   |                                      |    + file: mylib.py              | |
>   |                                      |  - status uris:                  | |
>   |                                      |    + 127.0.0.1:8080              | |
>   |                                      +----------------------------------+ |
>   +---------------------------------------------------------------------------+
>
> And Task #1 is still a ``STARTED`` task::
>
>   +---------------------------------------------------------------------------+
>   | STARTED                                                                   |
>   +---------------------------------------------------------------------------+
>   | +----------------------------------+                                      |
>   | | Task #1:                         |                                      |
>   | |  - id: 1-test.py:Test.test_1     |                                      |
>   | |  - kind: python-unittest         |                                      |
>   | |  - uri: test.py:Test.test_1      |                                      |
>   | |  - requirements:                 |                                      |
>   | |    + file: mylib.py              |                                      |
>   | |    + package: gcc                |                                      |
>   | |    + package: libc-devel         |                                      |
>   | |  - status uris:                  |                                      |
>   | |    + 127.0.0.1:8080              |                                      |
>   | +----------------------------------+                                      |
>   +---------------------------------------------------------------------------+
>
> Final Iteration
> ~~~~~~~~~~~~~~~
>
> After a number of iterations with no status changes, and because of a
> timeout implementation at the job level, it's decided that Task #1 is
> not to be waited on.
>
> The spawner continues to inform that Task #1 is alive (from its PoV),
> but no further status message has been received.  Provided the spawner
> has support for that, it may attempt to clean up the task (such as
> destroying a container or killing a process).  In the end, it's left
> with::
>
>   +---------------------------------------------------------------------------+
>   | STATUS SERVER "127.0.0.1:8080"                                            |
>   +---------------------------------------------------------------------------+
>   | Status Messages:                                                          |
>   | - {id: 2-test.py:Test.test_2, status: started}                            |
>   | - {id: 1-test.py:Test.test_1, status: started}                            |
>   | - {id: 2-test.py:Test.test_2, status: finished, result: pass}             |
>   +---------------------------------------------------------------------------+
>
>   +---------------------------------------------------------------------------+
>   | FINISHED                                                                  |
>   +---------------------------------------------------------------------------+
>   |                                      +----------------------------------+ |
>   |                                      | Task #2:                         | |
>   |                                      |  - id: 2-test.py:Test.test_2     | |
>   |                                      |  - kind: python-unittest         | |
>   |                                      |  - uri: test.py:Test.test_2      | |
>   |                                      |  - requirements:                 | |
>   |                                      |    + file: mylib.py              | |
>   |                                      |  - status uris:                  | |
>   |                                      |    + 127.0.0.1:8080              | |
>   |                                      +----------------------------------+ |
>   +---------------------------------------------------------------------------+
>
>   +---------------------------------------------------------------------------+
>   | INTERRUPTED                                                               |
>   +---------------------------------------------------------------------------+
>   | +----------------------------------+                                      |
>   | | Task #1:                         |                                      |
>   | |  - id: 1-test.py:Test.test_1     |                                      |
>   | |  - kind: python-unittest         |                                      |
>   | |  - uri: test.py:Test.test_1      |                                      |
>   | |  - requirements:                 |                                      |
>   | |    + file: mylib.py              |                                      |
>   | |    + package: gcc                |                                      |
>   | |    + package: libc-devel         |                                      |
>   | |  - status uris:                  |                                      |
>   | |    + 127.0.0.1:8080              |                                      |
>   | +----------------------------------+                                      |
>   +---------------------------------------------------------------------------+
>

I have attached a diagram with the phases of your proposal and the
example you gave, for those that like diagrams.

> Tallying results
> ~~~~~~~~~~~~~~~~
>
> The nrunner plugin should be able to provide meaningful results to the Job,
> and consequently to the user, based on the resulting information on the
> final iteration.
>
> Notice that some information will come, such as the ``PASS`` for the
> first test, will come from the "result" given in a status message from
> the task itself.  Some other status, such as the ``INTERRUPTED``
> status for the second test will not come from a status message
> received, but from a realization of the actual management of the task
> execution.  It's expected to other information will also have to be
> inferred, and "filled in" by the nrunner plugin implementation
>
> In the end, it's expected that results similar to this would be
> presented::
>
>   JOB ID     : f59bd40b8ac905864c4558dc02b6177d4f422ca3
>   JOB LOG    : /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/job.log
>    (1/2) tests.py:Test.test_2: PASS (2.56 s)
>    (2/2) tests.py:Test.test_1: INTERRUPT (900 s)
>   RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
>   JOB TIME   : 0.19 s
>   JOB HTML   : /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/results.html
>
> Notice how Task #2 shows up before Task #1, because it was both started first,
> and finished earlier.  There may be issues associated with the current UI to
> dealt with regarding out of order task status updates.
>
> Summary
> =======
>
> This proposal contains a number of items that can become GitHub issues
> at this stage.  It also contains a general explanation of what I believe
> are the crucial missing features to make the N(ext) Runner implementation
> available to the general public.
>
> Feedback is highly appreciated, and it's expected that this document will
> evolve into a better version, and possibly become a formal Blue Print.
>
> Thanks,
> - Cleber.

I think the idea for the task scheduler is promising. I have some
suggestions, but, as I said before if the text is structured in a
self-contained blueprint way, it will be better for the discussion and
documentation.

Thanks,

Willian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Avocado task scheduler - Cleber.png
Type: image/png
Size: 77894 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/avocado-devel/attachments/20200522/7c33d120/attachment.png>


More information about the Avocado-devel mailing list