[Avocado-devel] RFC: N(ext) Runner - A proposal to the finish line
Cleber Rosa
crosa at redhat.com
Thu Jun 11 02:00:55 UTC 2020
On Fri, May 22, 2020 at 05:45:23PM -0300, Willian Rampazzo wrote:
> Hello Cleber,
>
> Thanks for this RFC, it is appreciated. I see you have different
> points for discussion in this RFC, it would be better to discuss them
> in different places/ways, I will try to give my contribution to those
> that I can, but I will hold my comments about the task scheduler. The
> format of a blueprint, using the motivation and dividing into sections
> would be better for my understanding for this kind of architecture
> related discussion.
>
Hi Willian,
Ack. The individual smaller issues have been turned into "GitHub
issues", so we can move the discussion about the scheduler to its
blueprint.
> On Wed, May 20, 2020 at 8:33 PM Cleber Rosa <crosa at redhat.com> wrote:
> >
> > Intro
> > =====
> >
> > This is a more technical follow up to the points given in a previous
> > thread. Because that thread and the current N(ext) Runner documentation
> > for a good context for this proposal, I encourage everyone to read them
> > first:
> >
> > https://www.redhat.com/archives/avocado-devel/2020-May/msg00009.html
> >
> > https://avocado-framework.readthedocs.io/en/79.0/future/core/nrunner.html
> >
> > The N(ext) Runner allows for greater flexibility than the the current
> > runner, so to be effective in delivering the N(ext) Runner for general
> > usage, we must define the bare minimum that still needs to be
> > implemented.
> >
> > Basic Job and Task execution
> > ============================
> >
> > An Task, within the context of the N(ext) Runner, is described as "one
> > specific instance/occurrence of the execution of a runnable with its
> > respective runner".
> >
> > A Task is a very important building block for Avocado Job, and running
> > an Avocado Job means, to a large extent, running a number of Tasks.
> > The Tasks that need to be executed in a Job, are created during
> > the ``create_test_suite()`` phase:
> >
> > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.create_test_suite
> >
> > And are kept in the Job's ``test_suite`` attribute:
> >
> > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.test_suite
> >
> > Running the tests, then, happens during the ``run_tests()`` phase:
> >
> > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.run_tests
> >
> > During the ``run_tests()`` phase, a plugin that run test suites on a
> > job is chosen, based on the ``run.test_runner`` configuration.
> > The current "work in progress" implementation for the N(ext) Runner,
> > can be activated either by setting that configuration key to ``nrunner``,
> > which can be easily done on the command line too::
> >
> > avocado run --test-runner=nrunner /bin/true
> >
> > A general rule for measuring the quality and completeness of the
> > ``nrunner`` implementation is to run the same jobs with the current
> > runner, and compare its behavior and output with that of the
> > ``nrunner``. For here on, we'll call this simply the "nrunner
> > plugin".
> >
> > Known issues and limitations of the current implementation
> > ==========================================================
> >
> > Different Test IDs
> > ------------------
> >
> > When running tests with the current runner, the Test IDs are different::
> >
> > $ avocado run --test-runner=runner --json=- -- /bin/true /bin/false /bin/uname | grep \"id\"
> > "id": "1-/bin/true",
> > "id": "2-/bin/false",
> > "id": "3-/bin/uname",
> >
> > $ avocado run --test-runner=nrunner --json=- -- /bin/true /bin/false /bin/uname | grep \"id\"
> > "id": "1-1-/bin/true",
> > "id": "2-2-/bin/false",
> > "id": "3-3-/bin/uname",
> >
> > The goal is to make the IDs the same.
> >
>
> In my opinion, this seems to be a simple issue that is easily tracked
> on GitHub. If we are going to keep the output of the nrunner just like
> the current runner, there is not much to discuss, only implement.
>
Ack, and it's done now.
> > Inability to run Tasks other than exec, exec-test, python-unittest (and noop)
> > -----------------------------------------------------------------------------
> >
> > The current implementation of the nrunner plugin is based on the fact that
> > Tasks are already present at ``test_suite`` job attribute, and that running
> > Tasks can be (but shouldn't always be) a matter of iterating of the result
> > of its ``run()`` method. This is part of the actual code::
> >
> > for status in task.run():
> > result_dispatcher.map_method('test_progress', False)
> > statuses.append(status)
> >
> > The problem here is that only the Python classes implemented in the core
> > "avocado.core.nrunner" module, and registered at:
> >
> > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.nrunner.RUNNERS_REGISTRY_PYTHON_CLASS
> >
> > The goal is to have all other Python classes that inherit from
> > "avocado.core.nrunner.BaseRunner" available in such a registry.
> >
>
> Agreed, we need to find a way to centralize supported runners, not
> only implemented into the Avocado core. A registration method like we
> are using for the new avocado parameters is an option. Another option
> is to do the same way we register plugins today, utilizing the
> setup.py. The problem I see with both solutions is breaking the
> "standalone" effect of nrunner.py. Right now, I don't have a better
> solution for it.
>
Exactly. In fact, I got to work on something which I believe matches
your comment:
https://github.com/avocado-framework/avocado/pull/3908
> > Inability to run Tasks with Spawners
> > ------------------------------------
> >
> > While the "avocado nrun" command makes use of the Spawners, the
> > current implementation of the nrunner plugin described earlier,
> > calls a Task's ``run()`` method directly, and clearly doesn't
> > use spawners.
> >
> > The goal here is to leverage spawners so that other isolation
> > models (or execution environments, depending how you look at
> > processes, containers, etc) are supported.
> >
>
> Agreed! If tasks are the default way to run a test on nrunner,
> Spawners should be the default "way of transportation" to achieve it.
> This discussion and its issues can be tracked as an epic on GitHub.
>
Ack, issue is here:
https://github.com/avocado-framework/avocado/issues/3866
> > Unoptmized execution of Tasks (extra serialization/deserialization)
> > -------------------------------------------------------------------
> >
> > At this time, the nrunner plugin runs a Task directly through its
> > ``run()`` method. Besides the earlier point of not supporting
> > other isolation models/execution environments (that means not using
> > spawners), there's an extra layer of work happening when running
> > a task which is most often not necessary: turning a Task instance
> > into a command line, and within its execution, turning it into a
> > Task instance again.
> >
> > The goal is to support an optmized execution of the tasks, without
> > having to turn them into command lines, and back into Task instances.
> > The idea is already present in the spawning method definitions:
> >
> > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.spawners.html#avocado.core.spawners.common.SpawnMethod.PYTHON_CLASS
> >
> > And a PoC on top of the ``nrun`` command was implemented here:
> >
> > https://github.com/avocado-framework/avocado/pull/3766/commits/ae57ee78df7f2935e40394cdfc72a34b458cdcef
> >
>
> If I understood correctly, starting here, you would discuss the
> architecture for a task scheduler. I understood the phases, but this
> discussion should be self-contained, decoupled from the previous
> content, better if it is in the blueprint format with motivation, and
> divided into sections.
>
Ack.
> > Proposal
> > ========
> >
> > Besides the known limitations listed previously, there are others that
> > will appear along the way, and certainly some new challeges as we
> > solve them.
> >
> > The goal of this proposal is to attempt to identify those challenges,
> > and lay out a plan that can be tackled by the Avocado team/community
> > and not by a single person.
> >
> > Task execution coordination goals
> > ---------------------------------
> >
> > As stated earlier, to run a job, tasks must be executed. Differently
> > than the current runner, the N(ext) Runner architecture allows those
> > to be executed in a much more decoupled way. This characteristic will
> > be maintained, but it needs to be adapted into the current Job
> > execution.
> >
> > From a high level view, the nrunner plugin needs to:
> >
> > 1. Break apart from the "one at a time" Task execution model that it
> > currently employs;
> >
> > 2. Check if a Tasks can be executed, that is, if its requirements can
> > be fulfilled (the most basic requirement for a task is a matching
> > runner;
> >
> > 3. Prepare for the execution of a task, such as the fulfillment of
> > extra tasks requirements. The requirements resolver is one, if not
> > the only way, component that should be given a chance to act here;
> >
> > 4. Executes a task in prepared environment;
> >
> > 5. Monitor the execution of a task (from an external PoV);
> >
> > 6. Collect the status messages that tasks will send;
> >
> > a. Forward the status messages to the appropriate job components,
> > such as the result plugins.
> >
> > b. Depending on the content of messages, such as the ones
> > containing "status: started" or "status: finished", interfere in
> > the Task execution status, and consequently, in the Job
> > execution status.
> >
> > 7. Verify, warn the user, and attempt to clean up stray tasks. This
> > may be for instance, necessary if a Task on a container seems to
> > be stuck, and the container can not be destroyed. The same applies
> > to process in some time of uninterruptile sleeps.
> >
> > Parallelization
> > ---------------
> >
> > Because the N(ext) Runner features allow for parallel execution of tasks,
> > all other aspects of task execution coordination (fulfilling requirements,
> > collecting results, etc) should not block each other.
> >
> > There are a number of strategies for concurrent programming in Python
> > these days, and the "avocado nrun" command currently makes use of
> > asyncio to have coroutines that spawn tasks and collect results
> > concurrently (in a cooperative preemptive model). The actual language
> > or library features used is, IMO, less important than the end result.
> >
> > Suggested terminology
> > ---------------------
> >
> > Task execution has been requested
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > A Task whose execution was requested by the user. All of the tasks on
> > a Job's ``test_suite`` attribute are requested tasks.
> >
> > If a software component deals with this type of task, it's advisable
> > that it refers to ``TASK_REQUESTED`` or ``requested_tasks`` or a
> > similar name that links to this definition.
> >
> > Task is being triaged
> > ~~~~~~~~~~~~~~~~~~~~~
> >
> > The details of the task are being analyzed, including and most
> > importantly the ability of the system to *attempt* to fulfill its
> > requirements. A task leaves triage and it's either considered
> > "discarded" or proceeds to be prepared and then executed.
> >
> > If a software component deals with this type of task, for instance if
> > a "task scheduler" is looking for runners matching the Task's kind, it
> > should keep it under a ``tasks_under_triage`` or mark the tasks as
> > ``UNDER_TRIAGE`` or ``TRIAGING`` a similar name that links to this
> > definition.
> >
> > Task is being prepared
> > ~~~~~~~~~~~~~~~~~~~~~~
> >
> > Task has left triage, and it has not been discarded, that is, it's
> > a candidate to be setup, and if it goes well, executed.
> >
> > The requirements for a task are being prepared in its reespective
> > isolation model/execution environment, that is, the spawner it'll
> > be executed with is known, and the setup actions will be visible
> > by the task.
> >
> > If a software component deals with this type of task, for instance the
> > implementation of resolution of specific requirements, it should
> > should keep it under a ``tasks_preparing`` or mark the tasks as
> > ``PREPARING`` or a similar name that links to this definition.
> >
> > Task is ready to be started
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Task has been prepared succesfully, and can now be executed.
> >
> > If a software component deals with this type of task, it should
> > should keep it under a ``tasks_ready`` or mark the tasks as
> > ``READY`` or a similar name that links to this definition.
> >
> > Task is being started
> > ~~~~~~~~~~~~~~~~~~~~~
> >
> > A hopefully short lived state, in which a task that is ready to be started
> > (see previous point) will be given to the reespective spawner to be started.
> >
> > If a software component deals with this type of task, it should should
> > keep it under a ``tasks_starting`` or mark the tasks as ``STARTING``
> > or a similar name that links to this definition.
> >
> > The spawner should know if the starting of the task succeeded or failed,
> > and the task should be categorized accordingly.
> >
> > Task has been started
> > ~~~~~~~~~~~~~~~~~~~~~
> >
> > A task was successfully started by a spawner.
> >
> > Note that it does *not* mean that the test that the task runner (say,
> > an "avocado-runner-$kind task-run" command) will run has already been
> > started. This will be signalled by a "status: started" kind of
> > message.
> >
> > If a software component deals with this type of task, it should should
> > keep it under a ``tasks_started`` or mark the tasks as ``STARTED`` or
> > a similar name that links to this definition.
> >
> > Task has failed to start
> > ~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Quite self explanatory. If the spawner failed to start a task, it
> > should be kept under a ``tasks_failed_to_start`` structure or be
> > marked as ``FAILED_TO_START`` or a similar name that links to this
> > definition.
> >
> > Task is finished
> > ~~~~~~~~~~~~~~~~
> >
> > This means that the task has started, and is now finished. There's no
> > associated meaning here about the pass/fail output of the test payload
> > executed by the task.
> >
> > It should be kept under a ``tasks_finished`` structure or be marked as
> > ``FINISHED`` or a similar name that links to this definition.
> >
> > Task has been interrupted
> > ~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > This means that the task has started, but has not finished and it's
> > past due.
> >
> > It should be kept under a ``tasks_interrupted`` structure or be marked
> > as ``INTERRUPTED`` or a similar name that links to this definition.
> >
> > Task workflow
> > -------------
> >
> > A task will usually be created from a Runnable. A Runnable will, in
> > turn, almost always be created as part of the "avocado.core.resolver"
> > module. Let's consider the following output of a resolution::
> >
> > +--------------------------------------+
> > | ReferenceResolution #1 |
> > +--------------------------------------+
> > | Reference: test.py |
> > | Result: SUCCESS |
> > | +----------------------------------+ |
> > | | Resolution #1 (Runnable): | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_1 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | + package: gcc | |
> > | | + package: libc-devel | |
> > | +----------------------------------+ |
> > | +----------------------------------+ |
> > | | Resolution #2 (Runnable): | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_2 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | +----------------------------------+ |
> > +--------------------------------------+
> >
> > Two Runnables here will be transformed into Tasks. The process
> > usually includes adding an identification (I) and a status URI (II)::
> >
> > +----------------------------------+ +----------------------------------+
> > | Resolution #1 (Runnable): | | Resolution #2 (Runnable): |
> > | - kind: python-unittest | | - kind: python-unittest |
> > | - uri: test.py:Test.test_1 | | - uri: test.py:Test.test_2 |
> > | - requirements: | | - requirements: |
> > | + file: mylib.py | | + file: mylib.py |
> > | + package: gcc | +----------------------------------+
> > | + package: libc-devel | ||
> > +----------------------------------+ ||
> > || ||
> > || ||
> > \/ \/
> > +----------------------------------+ +----------------------------------+
> > | Task #1: | | Task #2: |
> > | - id: 1-test.py:Test.test_1 (I)| | - id: 2-test.py:Test.test_2 (I)|
> > | - kind: python-unittest | | - kind: python-unittest |
> > | - uri: test.py:Test.test_1 | | - uri: test.py:Test.test_2 |
> > | - requirements: | | - requirements: |
> > | + file: mylib.py | | + file: mylib.py |
> > | + package: gcc | | - status uris: |
> > | + package: libc-devel | | + 127.0.0.1:8080 (II)|
> > | - status uris: | +----------------------------------+
> > | + 127.0.0.1:8080 (II)|
> > +----------------------------------+
> >
> > In the end, a job will contain a ``test_suite`` with "Task #1" and
> > "Task #2". It means that the execution of both tasks were requested
> > by the Job owner::
> >
> > +---------------------------------------------------------------------------+
> > | REQUESTED |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ +----------------------------------+ |
> > | | Task #1: | | Task #2: | |
> > | | - id: 1-test.py:Test.test_1 | | - id: 2-test.py:Test.test_2 | |
> > | | - kind: python-unittest | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_1 | | - uri: test.py:Test.test_2 | |
> > | | - requirements: | | - requirements: | |
> > | | + file: mylib.py | | + file: mylib.py | |
> > | | + package: gcc | | - status uris: | |
> > | | + package: libc-devel | | + 127.0.0.1:8080 | |
> > | | - status uris: | +----------------------------------+ |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > +---------------------------------------------------------------------------+
> >
> > These tasks now will be triaged. A suitable implementation will
> > move those tasks to a ``tasks_under_triage`` queue, mark them as
> > ``UNDER_TRIAGE`` or some other strategy to differentiate the
> > tasks at this stage::
> >
> > +---------------------------------------------------------------------------+
> > | UNDER_TRIAGE |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ +----------------------------------+ |
> > | | Task #1: | | Task #2: | |
> > | | - id: 1-test.py:Test.test_1 | | - id: 2-test.py:Test.test_2 | |
> > | | - kind: python-unittest | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_1 | | - uri: test.py:Test.test_2 | |
> > | | - requirements: | | - requirements: | |
> > | | + file: mylib.py | | + file: mylib.py | |
> > | | + package: gcc | | - status uris: | |
> > | | + package: libc-devel | | + 127.0.0.1:8080 | |
> > | | - status uris: | +----------------------------------+ |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > +---------------------------------------------------------------------------+
> >
> > Iteration I
> > ~~~~~~~~~~~
> >
> > Task #1 is selected on the first iteration, and it's found that:
> >
> > 1. A suitable runner for tasks of kind ``python-unittest`` exists
> >
> > 2. The ``mylib.py`` requirement is already present on the current
> > environment
> >
> > 3. The ``gcc`` and ``libc-devel`` packages are not installed in the
> > current environment
> >
> > 4. The system is capable of *attempting* to fulfill "package" types of
> > requirements.
> >
> > Task #1 will then be prepared. No further action is performed on the
> > first iteration, because no other relevant state exists (Task #2, the
> > only other requested task, has not progressed beyone its initial
> > stage)::
> >
> > +---------------------------------------------------------------------------+
> > | UNDER_TRIAGE |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ |
> > | | Task #2: | |
> > | | - id: 2-test.py:Test.test_2 | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_2 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | - status uris: | |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > | |
> > | |
> > +---------------------------------------------------------------------------+
> >
> > +---------------------------------------------------------------------------+
> > | PREPARING |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ |
> > | | Task #1: | |
> > | | - id: 1-test.py:Test.test_1 | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_1 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | + package: gcc | |
> > | | + package: libc-devel | |
> > | | - status uris: | |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > +---------------------------------------------------------------------------+
> >
> > Iteration II
> > ~~~~~~~~~~~~
> >
> > On the second iteration, Task #2 is selected, and it's found that:
> >
> > 1. A suitable runner for tasks of kind ``python-unittest`` exists
> >
> > 2. The ``mylib.py`` requirement is already present on the current
> > environment
> >
> > Task #2 is now ready to be started. Possibily concurrently, the
> > setup of Task #1, selected as the single entry being prepared,
> > is having its requirements prepared::
> >
> > +---------------------------------------------------------------------------+
> > | UNDER_TRIAGE |
> > +---------------------------------------------------------------------------+
> > | |
> > +---------------------------------------------------------------------------+
> >
> > +---------------------------------------------------------------------------+
> > | READY |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ |
> > | | Task #2: | |
> > | | - id: 2-test.py:Test.test_2 | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_2 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | - status uris: | |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > | |
> > | |
> > +---------------------------------------------------------------------------+
> >
> > +---------------------------------------------------------------------------+
> > | PREPARING |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ |
> > | | Task #1: | |
> > | | - id: 1-test.py:Test.test_1 | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_1 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | + package: gcc | |
> > | | + package: libc-devel | |
> > | | - status uris: | |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > +---------------------------------------------------------------------------+
> >
> > Iteration III
> > ~~~~~~~~~~~~~
> >
> > On the third iteration, there are no tasks left under triage, so
> > the action is now limited to tasks being prepared and ready to
> > be started.
> >
> > Supposing that the "status uri" 127.0.0.1:8080, was set by the job, as
> > its internal status server, it must be started before any task, to
> > avoid any status message being lost.
> >
> > At this stage, Task #2 is started, and Task #1 is now ready::
> >
> > +---------------------------------------------------------------------------+
> > | UNDER_TRIAGE |
> > +---------------------------------------------------------------------------+
> > | |
> > +---------------------------------------------------------------------------+
> >
> > +---------------------------------------------------------------------------+
> > | STARTED |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ |
> > | | Task #2: | |
> > | | - id: 2-test.py:Test.test_2 | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_2 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | - status uris: | |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > | |
> > | |
> > +---------------------------------------------------------------------------+
> >
> > +---------------------------------------------------------------------------+
> > | READY |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ |
> > | | Task #1: | |
> > | | - id: 1-test.py:Test.test_1 | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_1 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | + package: gcc | |
> > | | + package: libc-devel | |
> > | | - status uris: | |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > +---------------------------------------------------------------------------+
> >
> > +---------------------------------------------------------------------------+
> > | STATUS SERVER "127.0.0.1:8080" |
> > +---------------------------------------------------------------------------+
> > | Status Messages: [] |
> > +---------------------------------------------------------------------------+
> >
> >
> > Iteration IV
> > ~~~~~~~~~~~~
> >
> > On the fourth iteration, Task #1 is started::
> >
> > +---------------------------------------------------------------------------+
> > | STARTED |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ +----------------------------------+ |
> > | | Task #1: | | Task #2: | |
> > | | - id: 1-test.py:Test.test_1 | | - id: 2-test.py:Test.test_2 | |
> > | | - kind: python-unittest | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_1 | | - uri: test.py:Test.test_2 | |
> > | | - requirements: | | - requirements: | |
> > | | + file: mylib.py | | + file: mylib.py | |
> > | | + package: gcc | | - status uris: | |
> > | | + package: libc-devel | | + 127.0.0.1:8080 | |
> > | | - status uris: | +----------------------------------+ |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > +---------------------------------------------------------------------------+
> >
> > +---------------------------------------------------------------------------+
> > | STATUS SERVER "127.0.0.1:8080" |
> > +---------------------------------------------------------------------------+
> > | Status Messages: |
> > | - {id: 2-test.py:Test.test_2, status: started} |
> > +---------------------------------------------------------------------------+
> >
> > Note: the ideal level of parallelization is still to be defined, that
> > is, it may be that triaging and preparing and starting tasks, all run
> > concurrently. An initial implementation that, on each iteration,
> > looks at all Task states, and attempts to advance them further,
> > blocking other Tasks as much as little as possible should be
> > acceptable.
> >
> > Iteration V
> > ~~~~~~~~~~~
> >
> > On the fifth iteration, the spawner reports that Task #2 is not alive anymore,
> > and the status server has received a message about it (and also a message about
> > Task #1 having started)::
> >
> > +---------------------------------------------------------------------------+
> > | STATUS SERVER "127.0.0.1:8080" |
> > +---------------------------------------------------------------------------+
> > | Status Messages: |
> > | - {id: 2-test.py:Test.test_2, status: started} |
> > | - {id: 1-test.py:Test.test_1, status: started} |
> > | - {id: 2-test.py:Test.test_2, status: finished, result: pass} |
> > +---------------------------------------------------------------------------+
> >
> > Because of that, Task #2 is now considered ``FINISHED``::
> >
> > +---------------------------------------------------------------------------+
> > | FINISHED |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ |
> > | | Task #2: | |
> > | | - id: 2-test.py:Test.test_2 | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_2 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | - status uris: | |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > +---------------------------------------------------------------------------+
> >
> > And Task #1 is still a ``STARTED`` task::
> >
> > +---------------------------------------------------------------------------+
> > | STARTED |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ |
> > | | Task #1: | |
> > | | - id: 1-test.py:Test.test_1 | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_1 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | + package: gcc | |
> > | | + package: libc-devel | |
> > | | - status uris: | |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > +---------------------------------------------------------------------------+
> >
> > Final Iteration
> > ~~~~~~~~~~~~~~~
> >
> > After a number of iterations with no status changes, and because of a
> > timeout implementation at the job level, it's decided that Task #1 is
> > not to be waited on.
> >
> > The spawner continues to inform that Task #1 is alive (from its PoV),
> > but no further status message has been received. Provided the spawner
> > has support for that, it may attempt to clean up the task (such as
> > destroying a container or killing a process). In the end, it's left
> > with::
> >
> > +---------------------------------------------------------------------------+
> > | STATUS SERVER "127.0.0.1:8080" |
> > +---------------------------------------------------------------------------+
> > | Status Messages: |
> > | - {id: 2-test.py:Test.test_2, status: started} |
> > | - {id: 1-test.py:Test.test_1, status: started} |
> > | - {id: 2-test.py:Test.test_2, status: finished, result: pass} |
> > +---------------------------------------------------------------------------+
> >
> > +---------------------------------------------------------------------------+
> > | FINISHED |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ |
> > | | Task #2: | |
> > | | - id: 2-test.py:Test.test_2 | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_2 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | - status uris: | |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > +---------------------------------------------------------------------------+
> >
> > +---------------------------------------------------------------------------+
> > | INTERRUPTED |
> > +---------------------------------------------------------------------------+
> > | +----------------------------------+ |
> > | | Task #1: | |
> > | | - id: 1-test.py:Test.test_1 | |
> > | | - kind: python-unittest | |
> > | | - uri: test.py:Test.test_1 | |
> > | | - requirements: | |
> > | | + file: mylib.py | |
> > | | + package: gcc | |
> > | | + package: libc-devel | |
> > | | - status uris: | |
> > | | + 127.0.0.1:8080 | |
> > | +----------------------------------+ |
> > +---------------------------------------------------------------------------+
> >
>
> I have attached a diagram with the phases of your proposal and the
> example you gave, for those that like diagrams.
>
> > Tallying results
> > ~~~~~~~~~~~~~~~~
> >
> > The nrunner plugin should be able to provide meaningful results to the Job,
> > and consequently to the user, based on the resulting information on the
> > final iteration.
> >
> > Notice that some information will come, such as the ``PASS`` for the
> > first test, will come from the "result" given in a status message from
> > the task itself. Some other status, such as the ``INTERRUPTED``
> > status for the second test will not come from a status message
> > received, but from a realization of the actual management of the task
> > execution. It's expected to other information will also have to be
> > inferred, and "filled in" by the nrunner plugin implementation
> >
> > In the end, it's expected that results similar to this would be
> > presented::
> >
> > JOB ID : f59bd40b8ac905864c4558dc02b6177d4f422ca3
> > JOB LOG : /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/job.log
> > (1/2) tests.py:Test.test_2: PASS (2.56 s)
> > (2/2) tests.py:Test.test_1: INTERRUPT (900 s)
> > RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
> > JOB TIME : 0.19 s
> > JOB HTML : /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/results.html
> >
> > Notice how Task #2 shows up before Task #1, because it was both started first,
> > and finished earlier. There may be issues associated with the current UI to
> > dealt with regarding out of order task status updates.
> >
> > Summary
> > =======
> >
> > This proposal contains a number of items that can become GitHub issues
> > at this stage. It also contains a general explanation of what I believe
> > are the crucial missing features to make the N(ext) Runner implementation
> > available to the general public.
> >
> > Feedback is highly appreciated, and it's expected that this document will
> > evolve into a better version, and possibly become a formal Blue Print.
> >
> > Thanks,
> > - Cleber.
>
> I think the idea for the task scheduler is promising. I have some
> suggestions, but, as I said before if the text is structured in a
> self-contained blueprint way, it will be better for the discussion and
> documentation.
>
Cool, and thanks for the providing the blueprint "kickstart" PR. I'll
work on top of that.
> Thanks,
>
> Willian
Thanks,
- Cleber.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/avocado-devel/attachments/20200610/116261b0/attachment.sig>
More information about the Avocado-devel
mailing list