[Avocado-devel] RFC: multi-stream test (previously multi-test) [v4]

Cleber Rosa crosa at redhat.com
Tue May 3 00:32:29 UTC 2016



On 04/29/2016 05:35 AM, Lukáš Doktor wrote:
> Dne 29.4.2016 v 00:48 Ademar Reis napsal(a):
>> On Thu, Apr 28, 2016 at 05:10:07PM +0200, Lukáš Doktor wrote:
>>> Hello again,
>>>
>>> This version removes the rejected variants and hopefully clarifies all
>>> the goals needed for multi-stream (and also multi-host) tests available.
>>
>> Hi Lukas.
>>
>> Thank you for following up with v4. It's more concise and simple
>> to discuss. Feedback below:
>>
>>>
>>> Changes:
>>>
>>>     v2: Rewritten from scratch
>>>     v2: Added examples for the demonstration to avoid confusion
>>>     v2: Removed the mht format (which was there to demonstrate manual
>>>         execution)
>>>     v2: Added 2 solutions for multi-tests
>>>     v2: Described ways to support synchronization
>>>     v3: Renamed to multi-stream as it befits the purpose
>>>     v3: Improved introduction
>>>     v3: Workers are renamed to streams
>>>     v3: Added example which uses library, instead of new test
>>>     v3: Multi-test renamed to nested tests
>>>     v3: Added section regarding Job API RFC
>>>     v3: Better description of the Synchronization section
>>>     v3: Improved conclusion
>>>     v3: Removed the "Internal API" section (it was a transition between
>>>         no support and "nested test API", not a "real" solution)
>>>     v3: Using per-test granularity in nested tests (requires plugins
>>>         refactor from Job API, but allows greater flexibility)
>>>     v4: Removed "Standard python libraries" section (rejected)
>>>     v4: Removed "API backed by cmdline" (rejected)
>>>     v4: Simplified "Synchronization" section (only describes the
>>>         purpose)
>>>     v4: Refined all sections
>>>     v4: Improved the complex example and added comments
>>>     v4: Formulated the problem of multiple tasks in one stream
>>>     v4: Rejected the idea of bounding it inside MultiTest class
>>>         inherited from avocado.Test, using a library-only approach
>>>
>>>
>>> The problem
>>> ===========
>>>
>>> Allow tests to have some if its block of code run in separate stream(s).
>>> We'll discuss the range of "block of code" further in the text as well
>>> as what the streams stands for.
>>>
>>> One example could be a user, who wants to run netperf on 2 machines,
>>> which requires following manual steps:
>>>
>>>     stream1: netserver -D
>>>     stream1: # Wait till netserver is initialized
>>>     stream2: netperf -H $machine1 -l 60
>>>     stream2: # Wait till it finishes and report the results
>>>     stream1: # stop the netserver and report possible failures
>>>
>>> the test would have to contain the code for both, stream1 and stream2
>>> and it executes them in two separate streams, which might or not be
>>> executed on the same machine.
>>>
>>> Some other examples might be:
>>>
>>> 1. A simple stress routine being executed in parallel (the same or
>>> different hosts)
>>>    * utilize a service under testing from multiple hosts (stress test)
>>> 2. Several code blocks being combined into a complex scenario(s)
>>>    * netperf + other test
>>>    * multi-host QEMU migration
>>>    * migrate while changing interfaces and running cpu stress
>>> 3. Running the same test along with stress test in background
>>>    * cpu stress test + cpu hotplug test
>>>    * memory stress test + migration
>>
>> During the discussions in v3 you mentioned some real tests from
>> avocado-vt that run other tests... I was hoping to see them
>> mentioned here, they would be really useful. What are they?
>>
> memory stress test + migration ;-) (and not just memory stress test)
>

Or, more specifically, let's give a quick rundown of the 
"type_specific.io-github-autotest-qemu.migrate.with_autotest.dbench.tcp"[1] 
Avocado-VT test.

This test is obviously a *migration* test. While at it, adds the 
execution of extra code.  The extra code happens to be the dbench 
benchmark suite.  Also it happens to be executed on the (virtual) 
machine to be put under migration.

Since running parts of a test in a "separate execution stream" is not a 
first level citizen in Avocado-VT, this relies on code that is not 
particular pretty nor flexible.  Since it's not abstract enough, it's 
written to run previously existing Autotest tests.  For completeness, 
this one uses the Autotest dbench test[2].

References:

[1] This test is not compatible with the JeOS guest OS, so it can only 
be listed/run with a compatible guest OS, such as:

  $ avocado list --vt-guest-os Linux.RHEL.7.2.x86_64.i440fx
  $ avocado run 
type_specific.io-github-autotest-qemu.migrate.with_autotest.dbench.tcp 
--vt-guest-os Linux.RHEL.7.2.x86_64.i440fx

[2] https://github.com/autotest/autotest-client-tests/tree/master/dbench

>>>
>>>
>>> Solution
>>> ========
>>>
>>> Stream
>>> ------
>>>
>>> From the introduction you can see that "Stream" stands for a "Worker"
>>> which allows to execute the code in parallel to the main test routine
>>> and the main test routine can offload tasks to it. The primary
>>> requirement is to allow this execution on the same as well on a
>>> different machine.
>>>
>>>
>>> Block of code
>>> -------------
>>>
>>> Throughout the first 3 versions we discussed what the "block of code"
>>> should be. The result is a avocado.Test compatible class, which follows
>>> the same workflow as normal test and reports the results back to the
>>> stream. It is not the smallest piece of code that could be theoretically
>>> executed (think of functions), but it has many benefits:
>>>
>>> 1. Well known structure including information in case of failure
>>> 2. Allows simple development of components (in form of tests)
>>> 3. Allows to re-use existing tests and combine them into complex
>>>    scenarios
>>>
>>> Note: Smaller pieces of code can be still executed in parallel without
>>> the framework support using standard python libraries (multiprocessing,
>>> threading). This RFC is focusing on simplifying the development of
>>> complex cases, where test as a minimal block of code fits quite well.
>>>
>>
>> I think the definitions are still confusing. Even though you
>> tried to define the "block of code" in a more abstract way, you
>> keep using "tests" as actual references to it all over this RFC:
>> "resolving the tests", "tests inside a stream", "combine
>> tests"... You even call them "nested tests" somewhere. We need a
>> clear vocabulary to use.
>>

This is also something that I relate to.  I suggested, verbally, that a 
useful check is to ask: "what if we decide to put <something> as the 
code block unit" ? Would all the concepts and proposed architecture 
still hold true.

This would help to validated that the proposed architecture indeed have 
layers whose implementations can be swapped.  Also, it would achieve, 
without greater costs, much greater flexibility than the one Avocado-VT 
test rundown example mentioned earlier.

> Well I don't see the big problem in calling them tests (although I tried
> to replace some places with tasks) as it was introduced here.
>
> Regarding the nested test it's a copy&paste mistake, I'm sorry about it.
>
>>>
>>> Resolving the tests
>>> -------------------
>>>
>>> String
>>> ~~~~~~
>>>
>>> As mentioned earlier, the `stream` should be able to handle
>>> avocado.Test-like classes, which means the test needs to find one.
>>> Luckily, avocado already has such feature as part of internal API. I'd
>>> like to use it by passing string `test reference` to the stream, which
>>> should resolve it and execute.
>>
>> I believe using tests as the abstraction for what is run in a
>> stream/worker is fundamentally wrong, because it breaks the
>> abstractions of Job and Test. You're basically introducing
>> sub-tests, or even "sub-jobs" (more about it later).
>>
> I'd be lost if I keep talking about it as a block of code. We need to
> pass the message that the actual usage of this would require "Test-like"
> classes to be resolved and passed to the stream. I'm not sure how to
> describe it better.
>

The problem is not the the four letters that give it a name (t-e-s-t), 
but what's behind it.  By calling it a test, you're most probably going 
to think of tests, forget the abstract design, and model it yet another 
test runner.  One inside the other.  We're not after writing a 
no-limits-nested-test-runner.  But if we stick with that terminology, 
than idea forms, which "infects" design, and finally the code.

If we go that route, we should stop using "stream" and talk about an SSH 
connection.  "Test" is the last non-abstract player we have on this 
RFC... it's having a hard time to die, but it really should.

>>>
>>> Resolver
>>> ~~~~~~~~
>>>
>>> Some users might prefer tweaking the resolver. This is currently not
>>> supported, but is part of the "JobAPI RFC". Once it's developed, we
>>> should be able to benefit from it and use it to resolve the `test
>>> references` to test-like definitions and pass it over to the stream.
>>>
>>> Local reference
>>> ~~~~~~~~~~~~~~~
>>>
>>> Last but not least, some users might prefer keeping the code in one
>>> file. This is currently also not possible as the in-stream-test-class
>>> would either be also resolved as a main test or they would not be
>>> resolved by the stream.
>>>
>>> We faced a similar problem with the deep inheritance and we solved it by
>>> a docstring tag:
>>>
>>>     class MyTest(Test):
>>>         '''
>>>         Some description
>>>         :avocado: disable
>>>         '''
>>>         def test(self):
>>>             pass
>>>
>>> which tells the resolver to avoid this class. We can expand it and use
>>> for example "strict" to only be executed when the full path
>>> ($FILE:$TEST.$METHOD) is used. This way we could put all the parts in a
>>> single file and reference the tasks by a full path.
>>>
>>> Alternatively we could introduce another class
>>>
>>>     class Worker(avocado.Test):
>>>         pass
>>>
>>> and the file loader would detect it and only yield it when full path is
>>> provided (similarly to SimpleTest class).
>>>
>>
>> From these three, what would be your recommendation?
>>
> If out of three means string/resolver/local reference, then all of them,
> ideally. Actually, as I mentioned on reply to Cleber's response, I could
> live without the first one, if we have the resolver available soon. (and
> I agree it'd be way cleaner in terms of layers and complexity). I'm not
> in favor of the last one as it makes the identification on the remote
> side harder (without --remote-no-copy it means you need to copy the
> current file and reference the class+method. With --remote-no-copy it
> means you have to have the test in the same location on the remote
> host). Anyway for local execution it could be quite convenient to bundle
> the whole multi-test in one, so I can live with it.
>

First, can you see how much of possible implementation details you give 
away here?  This shows that you most probably have an implementation 
plan already, and is heavily influencing your thoughts, even though it 
should be all design-level stuff at this point.

Secondly, there's the "multi-test" again.  I'm starting to regret having 
suggested the use of `avocado.Test` as the default "block of code", not 
because it doesn't have its uses or doesn't make sense, but because it's 
forcing too much implementation into the design phase.

> If you mean the methods for implementing "Local reference" than I'd go
> with the docstring approach, although I'm not in favor of the term
> "strict", but I couldn't think of a better name.
>

This method of resolution indeed impacts the user experience, but let's 
just remind ourselves that it's a resolution-level problem.  If it's 
beneficial to our users to have a tag that would, by default, make a 
code-block "unseen", and only "seen" if specified by a fully qualified 
name, great!  The rules look pretty simple, and the stream itself would 
not have anything to do with that.

>>>
>>> Synchronization
>>> ---------------
>>>
>>> Some tests do not need any synchronization, users just need to run them.
>>> But some multi-stream tests needs to be precisely synchronized or they
>>> need to exchange data.
>>>
>>> For synchronization purposes usually "barriers" are used, where barrier
>>> guards the entry into a section identified by "name" and "number of
>>> clients". All parties asking an entry into the section will be delayed
>>> until the "number of clients" reach the section (or timeout). Then they
>>> are resumed and can entry the section. Any failure while waiting for a
>>> barrier propagates to other waiting parties.
>>>
>>> One way is to use existing python libraries, but they usually require
>>> some boilerplate code around. One of the tasks on the multi-stream tests
>>> should be to implement basic barrier interface, which would be
>>> initialized in `avocado.Streams` and details should be propagated to the
>>> parts executed inside streams.
>>>
>>> The way I see this is to implement simple tcp-based protocol (to allow
>>> manual debug) and pass the details to tests inside streams via params.
>>> So `avocado.Streams` init would start the daemon and one would connect
>>> to it from the test by:
>>>
>>>     from avocado.plugins.sync import Sync
>>>     # Connect the sync server on address stored in params
>>>     # which could be injected by the multi-stream test
>>>     # or set manually.
>>>     sync = Sync(self, params.get("sync_server", "/plugins/sync_server"))
>>>     # wait until 2 tests ask to enter "setup" barrier (60s timeout)
>>>     sync.barrier("setup", 2, 60)
>>>
>>> The new protocol is quite necessary as we need support for re-connection
>>> and other tweaks which are not supported by multiprocessing library.
>>>
>>>
>>> Very simple example
>>> -------------------
>>>
>>> This example demonstrates a test, which tries to access "example.org"
>>> concurrently from N machines without any synchronization.
>>>
>>>     import avocado
>>>
>>>     class WgetExample(avocado.Test):
>>>         def setUp(self):
>>>             # Initialize streams
>>>             self.streams = avocado.Streams(self)
>>>             for machine in machines:
>>>                 # Add one stream per machine, create the connection
>>>                 # and prepare for execution.
>>>                 self.streams.add_stream(machine)
>>>         def test(self)
>>>             for stream in self.streams:
>>>                 # Resolve the "/usr..." into
>>>                 # SimpleTest("/usr/bin/wget example.org") and
>>>                 # schedule the execution inside the current stream
>>>                 stream.run_bg("/usr/bin/wget example.org")
>>>             # Wait till both streams finish all tasks and fail the test
>>>             # in case any of them fails.
>>>             self.streams.wait(ignore_errors=False)
>>>
>>> where the `avocado.Stream` represents a worker (local or remote) which
>>> allows running avocado tests in it (foreground or background). This
>>> should provide enough flexibility to combine existing tests in complex
>>> tests.
>>>
>>>
>>> Advanced example
>>> ----------------
>>>
>>> MultiNetperf.py:
>>>
>>>     class MultiNetperf(avocado.NestedTest):
>>>         def setUp(self):
>>>             # Initialize streams (start sync server, ...)
>>>             self.streams = avocado.Streams(self)
>>>             machines = ["localhost", "192.168.122.2"]
>>>             for machine in machines:
>>>                 # Add one stream per machine
>>>                 self.streams.add_stream(machine)
>>>         def test(self):
>>>             # Ask the first stream to resolve "NetServer", pass the {}
>>>             # params to it (together with sync-server url),
>>>             # schedule the job in stream and return to main thread
>>>             # while the stream executes the code.
>>>             self.streams[0].run_bg("NetServer",
>>>                                    {"no_clients": len(self.streams)})
>>>             for stream in self.streams[1:]:
>>>                 # Resolve "NetPerf", pass the {} params to it,
>>>                 # schedule the job in stream and return to main
>>>                 # thread while the stream executes the code
>>>                 stream.run_bg("NetPerf",
>>>                               {"no_clients": len(self.workers),
>>>                                "server_ip": machines[0]})
>>>             # Wait for all streams to finish all scheduled tasks
>>>             self.streams.wait(ignore_failures=False)
>>>
>>> NetServer.py:
>>>
>>>     class NetServer(avocado.NestedTest):
>>>         def setUp(self):
>>>             # Initialize sync client
>>>             self.sync = avocado.Sync(self)
>>>             process.run("netserver")
>>>             # Contact sync server (url was passed in `stream.run_bg`)
>>>             # and ask to enter "setup" barrier with "no_clients"
>>>             # clients
>>>             self.sync.barrier("setup", self.params.get("no_clients"))
>>>         def test(self):
>>>             pass
>>>         def tearDown(self):
>>>             self.sync.barrier("finished", self.params.get("no_clients"))
>>>             process.run("killall netserver")
>>>
>>> NetPerf:
>>
>> I guess you mean "NetPerf.py:"
>>
> yep, but it's unix, the ext is optional :-) (will fix in v5)
>
>>>
>>>     class NetPerf(avocado.NestedTest):
>>>         def setUp(self):
>>>             # Initialize sync client
>>>             self.sync = avocado.Sync(self)
>>>             process.run("netserver")
>>>             # Contact sync server (url was passed in `stream.run_bg`)
>>>             # and ask to enter "setup" barrier with "no_clients"
>>>             # clients
>>>             self.sync.barrier("setup", self.params.get("no_clients"))
>>>         def test(self):
>>>             process.run("netperf -H %s -l 60"
>>>                         % params.get("server_ip"))
>>>             barrier("finished", params.get("no_clients"))
>>
>> Do you really believe having the NetServer.py:NetPerf:test above
>> available as a standalone test has any value *at all*?
>>
>> To me, splitting this test in three files is a very good example
>> of how *not to* write a NetPerf test.
> Well you could re-use just parts of it from other tests, but sure, it
> can be in one file too and one would reference the
> `NetPerf:NetServer.test` instead of `Netserver`. But if I was writing
> the test I'd split it into multiple files.
>
>>
>> Finally, is there a need to use setUp() and teardown() in the
>> code that runs in the streams, or is it just your preference?
>>
> Nope, it'd just be easier to understand that the test failed during
> setup, or when running the actual test. The only reason for passing the
> current test to it is to get the result location (and later maybe more,
> like params, ...)
>

First of all, I think I fast forward through this on my previous reply, 
so I apologize. Now, hello again to the {sub,nested}-test.  Now even 
with params and who-knows what else that only *real* tests can have!

Can you see Matrix coming to swallow us alive? The end of humanity draws 
near! Run! Run! :)

Now, about the setUp/tearDown only example.  This indeed makes a point 
against using `avocado.Test` as code blocks.  I'm saying that because a 
test *like* that, with `setUp()` and `tearDown()`, but without test 
code, would probably never be written in real life.  So it may still be 
useful to have a code block with automatic setUp/tearDown, but we'd be 
obviously exploiting something that was not meant for that.

Maybe it was just that example, and the majority of other examples map 
to this pattern.  Still, it's making me think more about.

>>>
>>> Possible implementation
>>> -----------------------
>>>
>>> _Previously: API backed by internal API_
>>>
>>> One way to drive this is to use existing internal API and create a layer
>>> in between, which invokes runner (local/remote based on the stream
>>> machine) to execute the code on `stream.run_bg` calls.
>>>
>>> This means the internal API would stay internal and (roughly) the same,
>>> but we'd develop a class to invoke the internal API. This class would
>>> have to be public and supported.
>>>
>>> + runs native python
>>> + easy interaction and development
>>> + easily extensible by either using internal API (and risk changes) or
>>> by inheriting and extending the features.
>>> - lots of internal API will be involved, thus with almost every change
>>> of internal API we'd have to adjust this code to keep the NestedTest working
>>> - fabric/paramiko is not thread/parallel process safe and fails badly so
>>> first we'd have to rewrite our remote execution code (use autotest's
>>> worker, or aexpect+ssh)
>>>
>>>
>>> Queue vs. signle task
>>> ---------------------
>>>
>>> Up to this point I always talked about stream as an entity, which drives
>>> the execution of "a code block". A big question is, whether it should
>>> behave like a queue, or only a single task:
>>>
>>> queue - allows scheduling several tasks and reports list of results
>>> single task - stream would only accept one task and produce one result
>>>
>>> I'd prefer the queue-like approach as it's more natural to me to first
>>> prepare streams and then keep adding tasks until all my work is done and
>>> I'd expect per-stream results to be bounded together, so I can know what
>>> happened. This means I could run `stream.run_bg(first);
>>> stream.run_bg(second); stream.run_fg(third); stream.run_bg(fourth)` and
>>> the stream should start task "first", queue task "second", queue task
>>> "third", wait for it to finish and report "third" results. Then it
>>> should resume the main thread and queue the "fourth" task (FIFO queue).
>>> Each stream should then allow to query for all results (list of
>>> json-results) as well as it should create a directory inside results and
>>> per-task sub-directory with task results.
>>>
>>> On the other hand the "single task" should always establish the new
>>> connection and create separate results per-each task added. This means
>>> preparing the streams is not needed as each added task is executed
>>> inside a different stream. So the interface could be
>>> `self.streams.run_bg(where, what, details)` and it should report the
>>> task id or task results in case of `run_fg`. The big question is what
>>> should happen when a task resolves in multiple tasks (eg: `gdbtest`).
>>> Should it fail or create streams per each task? What should it report,
>>> then? I can imagine a function `run_all_{fg,bg}` which would create a
>>> stream for each worker and return list of id/results in case the writer
>>> is not sure (or knows) that the test reference resolves into several tasks.
>>>
>>> See more details in the next chapter
>>>
>>>
>>> Results directory
>>> -----------------
>>>
>>> This demonstrates the results for a modified "MultiNetperf" test. The
>>> difference is that it runs 2 variants of netperf:
>>>
>>> * Netperf.bigbuf    # netperf using big buffers
>>> * Netperf.smallbuf  # netperf using small buffers
>>>
>>> Queue-like approach:
>>>
>>>     job-2016-04-15T.../
>>>     ├── id
>>>     ├── job.log
>>>     └── test-results
>>>         └── 1-MultiNetperf
>>>             ├── debug.log
>>>             ├── stream1       # one could provide custom name/host
>>>             │   ├── 1-Netperf.bigbuf
>>>             │   │   ├── debug.log
>>>             │   │   └── whiteboard
>>>             │   └── 2-Netperf.smallbuf
>>>             │       ├── debug.log
>>>             │       └── whiteboard
>>>             ├── stream2
>>>             │   └── 1-NetServer
>>>             │       ├── debug.log
>>>             │       └── whiteboard
>>>             └── whiteboard
>>
>> Here we see a good example of why the abstraction is being broken
>> and why what you're proposing is actually "support for
>> sub-tests": "1-Netperf.bigbuff", "2-Netperf.smallbuf" and
>> "1-NetServer" look like Test IDs to me. Was this intentional?
>>
> Yes, because people are (will be soon) used to it. They know what to
> expect there. I don't see a point in using `01 Netperf` just to make it
> look different. Knowing what was the order of executed tasks is
> essential. (that's why I pushed so hard to get the serialized test ids
> in, because I really really hated going through results and trying to
> remember what was the exact name of the 4th test, which I know failed).
>

The key point here, IMHO, is that the {sub,nested}-test idea is very 
strong in different parts of the proposal.

>>>
>>> Single task approach:
>>>
>>>     job-2016-04-16T.../
>>>     ├── id
>>>     ├── job.log
>>>     └── test-results
>>>         └── 1-MultiNetperf
>>>             ├── debug.log
>>>             ├── whiteboard
>>>             ├── 1-Netperf.bigbuf
>>>             │   ├── debug.log
>>>             │   └── whiteboard
>>>             ├── 2-Netperf.smallbuf
>>>             │   ├── debug.log
>>>             │   └── whiteboard
>>>             └── 3-Netperf.smallbuf
>>>                 ├── debug.log
>>>                 └── whiteboard
>>>
>>> The difference is that queue-like approach bundles the result
>>> per-worker, which could be useful when using multiple machines.
>>>
>>> The single-task approach makes it easier to follow how the execution
>>> went, but one needs to see the log to see on which machine was the task
>>> executed.
>>>
>>
>> This is what I proposed:
>>
>>     $ tree job-2016-04-15T.../
>>     job-2016-04-15T.../
>>     ├── id
>>     ├── job.log
>>     ├── replay/
>>     ├── sysinfo/
>>     └── test-results/
>>         ├── 01-NetPerf:NetPerf.test/ (the serialized Test ID)
>>         │   ├── data/
>>         │   ├── sysinfo/
>>         │   ├── debug.log
>>         │   ├── whiteboard
>>         │   ├── ...
>>         │   ├── NetServer/ (a name, not a Test ID)
>>         │   │   ├── data/
>>         │   │   ├── whiteboard
>>         │   │   └── debug.log
>>         │   └── NetClient (a name, not a Test ID)
>>         │       ├── data/
>>         │       ├── whiteboard
>>         │       └── debug.log
>>>>         ├── 02... (other Tests from the same job)
>>         ├── 03... (other Tests from the same job)
>>         ...
>>
>> Or even better: put the streams under its own directory (or even
>> inside data), to keep it consistent with non-multi-stream tests:
>>
>>     $ tree job-2016-04-15T.../
>>     job-2016-04-15T.../
>>     ├── id
>>     ├── job.log
>>     ├── replay/
>>     ├── sysinfo/
>>     └── test-results/
>>         ├── 01-NetPerf:NetPerf.test/ (the serialized Test ID)
>>         │   ├── sysinfo/
>>         │   ├── debug.log
>>         │   ├── whiteboard
>>         │   ├── data/
>>         │   └── streams/
>>         │       ├── NetServer/ (a name, not a Test ID)
>>         │       │   ├── whiteboard
>>         │       │   ├── data/
>>         │       │   ├── streams/
>>         │       │   └── debug.log
>>         │       └── NetClient (a name, not a Test ID)
>>         │           ├── whiteboard
>>         │           ├── data/
>>         │           ├── streams/
>>         │           └── debug.log
> Yes, but what happens when you execute NetClient twice? And how can you
> tell which one was executed first? Those are essential information when
> you have more than just 2 tasks offloaded to results.
>

How do you check if a given line of code was executed before another 
one?  You add log statements to your code, and check the log file.  If 
the stream is running "blocks of code" that's how you'd do it.  If the 
stream is yet-another test runner, than having order and the same 
directory structure applies.

Is it easier to create code that supports running "code blocks" on 
another (possibly remote) place or writing a test runner?  If we're 
still doing Avocado is because the second is much harder.

Are we confident that, adding a test runner, this time called a stream, 
inside a test runner will not cost *a lot* more in terms of development 
time? Our ability to support the development outcome? That it won't 
entangle the project future? That it won't confuse our users?

> Actually this is the only reason why I prefer this solution to my
> original queue-like proposal, because I know the global overview of the
> executed tasks.
>
> Additionally I in reply to Cleber's response I mentioned, that I'd
> extend the name of the stream-tag/id. So the result would be a bit
> different:
>
>     1-server-NetServer
>     2-client-NetClient
>     ...
>
> I don't see a point in denying it's test-like, it shares the structure,
> shares the workflow only it does not execute the optional plugins.
> Basically it's just a raw test offloaded into a separate stream and as
> we need to reference them in case we execute them in background, they in
> fact require per-(main)-test unique id. It's still the same story, still
> the same problems.
>
>

By saying it's test-like, we very very soon will "acknowledge" that it's 
indeed a test.  This series of RFCs support this theory, because there 
have been so many uses of "test" to refer to what runs on a stream.

Now the user reads the docs, and sees that feature "X" is available for 
their tests... let's say profilers.  Isn't it natural the user asks: 
"How do I enable a profiler for my test that runs inside a stream" ?

Do we want that?  Oh, too late, the Matrix already caught me! :)

>>>>         ├── 02... (other Tests from the same job)
>>         ├── 03... (other Tests from the same job)
>>         ...
>>
>>
>>>
>>> Job API RFC
>>> ===========
>>>
>>> Recently introduced Job API RFC covers very similar topic as "nested
>>> test", but it's not the same. The Job API is enabling users to modify
>>> the job execution, eventually even write a runner which would suit them
>>> to run groups of tests. On the contrary this RFC covers a way to combine
>>> code-blocks/tests to reuse them into a single test. In a hackish way,
>>> they can supplement each others, but the purpose is different.
>>>
>>> One of the most obvious differences is, that a failed "nested" test can
>>> be intentional (eg. reusing the NetPerf test to check if unreachable
>>> machines can talk to each other), while in Job API it's always a failure.
>>>
>>> I hope you see the pattern. They are similar, but on a different layer.
>>>
>>> Internally, though, they can share some pieces like execution the
>>> individual tests concurrently with different params/plugins
>>> (locally/remotely). All the needed plugin modifications would also be
>>> useful for both of these RFCs.
>>>
>>> Some examples:
>>>
>>> User1 wants to run "compile_kernel" test on a machine followed by
>>> "install_compiled_kernel passtest failtest warntest" on "machine1
>>> machine2". They depend on the status of the previous test, but they
>>> don't create a scenario. So the user should use Job API (or execute 3
>>> jobs manually).
>>>
>>> User2 wants to create migration test, which starts migration from
>>> machine1 and receives the migration on machine2. It requires cooperation
>>> and together it creates one complex usecase so the user should use
>>> multi-stream test.
>>>
>>>
>>> Conclusion
>>> ==========
>>>
>>> This RFC proposes to add a simple API to allow triggering
>>> avocado.Test-like instances on local or remote machine. The main point
>>> is it should allow very simple code-reuse and modular test development.
>>> I believe it'll be easier, than having users to handle the
>>> multiprocessing library, which might allow similar features, but with a
>>> lot of boilerplate code and even more code to handle possible exceptions.
>>>
>>> This concept also plays nicely with the Job API RFC, it could utilize
>>> most of tasks needed for it and together they should allow amazing
>>> flexibility with known and similar structure (therefor easy to learn).
>>>
>>
>> I see you are trying to make the definitions more clear and a bit
>> less strict, but at the end of the day, what you're proposing is
>> that a test should be able to run other tests, plain and simple.
>> Maybe even worse, a Test would be able to run "jobs", disguised
>> as streams that run multiple tests.
>>
>> This is basically what you've been proposing since the beginning
>> and in case it's not crystal clear yet, I'm strongly against it
>> because I think it's a fundamental breakage of the abstractions
>> present in Avocado.
>>
>> I insist on something more abstract, like this:
>>
>>    Tests can run multiple streams, which can be defined as
>>    different processes or threads that run parts of the test
>>    being executed. These parts are implemented in the form of
>>    classes that inherit from avocado.Test.
>>
>>    (My initial feeling is that these parts should not even have
>>    setUp() and tearDown() methods; or if they have, they should
>>    be ignored by default when the implementation is run in a
>>    stream. In my view, these parts should be defined as "one
>>    method in a class that inherits from avocado.Test", with the
>>    class being instantiated in the actual stream runtime
>>    environment.  But this probably deserves some discussion, I
>>    miss some real-world use-cases here)
>>
>>    The only runtime variable that can be configured per-stream is
>>    the execution location (or where it's run): a VM, a container,
>>    remotely, etc. For everything else, Streams are run under the
>>    same environment as the test is.
>>
>>    Notice Streams are not handled as tests: they are not visible
>>    outside of the test that is running them. They don't have
>>    individual variants, don't have Test IDs, don't trigger
>>    pre/post hooks, can't change the list of plugins
>>    enabled/disabled (or configure them) and their results are not
>>    visible at the Job level.  The actual Test is responsible for
>>    interpreting and interacting with the code that is run in a
>>    stream.
>>
> So basically you're proposing to extract the method, copy it over to the
> other host and trigger it. In the end copy back the results, right?
>
> That would work in case of no failures. But if anything goes wrong, you
> have absolute no idea what happened, unless you prepare the code
> intended for execution to it. I really prefer being able to trigger real
> tests in remote environment from my tests, because:
>
> 1. I need to write the test just once and either use it as one test, or
> combine it with other existing tests to create a complex scenario
> 2. I know exactly what happened and where, because test execution
> follows certain workflow. I'm used to the workflow from normal execution
> so if anything goes wrong, I get quite extensive set of information
> regarding the failure, without any need to adjust the test code.
> 3. While writing the "inner" test, I don't need to handle the results. I
> use the streams available to me, I get `self` containing all the
> information like results dir, whiteboard, .... It's very convenient and
> using just methods with some arguments (or just stdout?) would be a huge
> step back.
>
> I mean for normal execution, it's usable (it loses the possibility of
> re-use the existing tests, for example as stresses) but when the "inner"
> test fails, I know nothing unless I pay a great attention and add a lot
> of debug information while writing the test.
>
>> Now let me repeat something from a previous e-mail, originally
>> written as feedback to v3:
>>
>> I'm convinced that your proposal breaks the abstraction and will
>> result in numerous problems in the future.
>>
>> To me whatever we run inside a stream is not and should not be
>> defined as a test.  It's simply a block of code that gets run
>> under the control of the actual test. The fact we can find these
>> "blocks of code" using the resolver is secondary. A nice and
>> useful feature, but secondary. The fact we can reuse the avocado
>> test runner remotely is purely an implementation detail. A nice
>> detail that will help with debugging and make our lives easier
>> when implementing the feature, but again, purely an
>> implementation detail.
>>
>> The test writer should have strict control of what gets run in a
>> stream, with a constrained API where the concepts are very clear.
>> We should not, under any circumstances, induce users to think of
>> streams as something that runs tests. To me this is utterly
>> important.
>>
>> For example, if we allow streams to run tests, or Test
>> References, then running `avocado run *cpuid*` and
>> `stream.run("*cpuid*")` will look similar at first, but with
>> several subtle differences in behavior, confusing users.
>>
>> Users will inevitably ask questions about these differences and
>> we'll end up having to revisit some concepts and refine the
>> documentation, a result of breaking the abstraction.
>>
>> A few examples of these differences which might not be
>> immediately clear:
>>
>>    * No pre/post hooks for jobs or tests get run inside a stream.
>>    * No per-test sysinfo collection inside a stream.
>>    * No per-job sysinfo collection inside a stream.
>>    * Per-stream, there's basically nothing that can be configured
>>      about the environment other than *where* it runs.
>>      Everything is inherited from the actual test. Streams should
>>      have access to the exact same APIs that *tests* have.
>>    * If users see streams as something that runs tests, it's
>>      inevitable that they will start asking for knobs
>>      to fine-tune the runtime environment:
>>      * Should there be a timeout per stream?
>>      * Hmm, at least support enabling/disabling gdb or wrappers
>>        in a stream? No? Why not!?
>>      * Hmm, maybe allow multiplex="file" in stream.run()?
>>      * Why can't I disable or enable plugins per-stream? Or at
>>        least configure them?
>>
> Basically just running a RAW test, without any features the default
> avocado runner provides. I'm fine with that.
>
> I slightly disagree there are no way of modifying the environment as the
> resolver resolves into template, which contains all the params given to
> the test. So one could modify basically everything regarding the test.
> The only thing one can't configure, nor use are the job features (like
> the pre-post hooks, plugins, ...)
>
>> And here are some other questions, which seem logical at first:
>>
>>    * Hey, you know what would be awesome? Let me upload the
>>      test results from a stream as if it was a job! Maybe a
>>      tool to convert stream test results to job results? Or a
>>      plugin that handles them!
>>    * Even more awesome: a feature to replay a stream!
>>    * And since I can run multiple tests in a stream, why can't I
>>      run a job there? It's a logical next step!
>>
>> The simple fact the questions above are being asked is a sign the
>> abstraction is broken: we shouldn't have to revisit previous
>> concepts to clarify the behavior when something is being added in
>> a different layer.
>>
>> Am I making sense?
>>
> IMO you're describing a different situation. We should have the Job API,
> which should suit users, who need the features you described, so they
> don't need to "workaround" it using this API.
>
> Other users might prefer the multiprocessing, fabric or autotest's
> remote_commander, to execute just a plain simple methods/scripts on
> other machines.
>
> But if you need to run something complex, you need a runner, which gives
> you the neat features to avoid the boilerplate code used to produce
> outputs in case of failure, or other features like streams, datadirs, ...).
>
> Therefor I believe allowing to trigger tests in background from test
> would be very useful and the best way of solving this I can imagine. As
> a test writer I would not want to learn yet another way of expressing
> myself when splitting the task in several streams. I want the same
> development, I expect the same results and yes, I don't expect the full
> job. Having just a raw test without any extra job features is sufficient
> and well understandable.
>
> Btw the only controversial think I can imagine is, that some (me
> including) would have nothing against offloading multi-stream tests into
> a stream (so basically nesting). And yes, I expect it to work and create
> yet another directory inside the stream's results. (eg. to run
> multi-host netperf as a stresser while running multi-host migration. I
> could either reference each party - netserver, netclient, migrate_from,
> migrate_to - or I can just say - multi_netperf, multi_migrate and expect
> the netserver+netclient streams to be created inside multi_netperf
> results and the same for migrate. Conceptually I have no problem with
> that and as a test writer I'd use the second, because putting together
> building blocks is IMO the way to go.
>

I can only say that, at this time, it's very clear to me what's 
nested-test support and what's multi-stream test support.  Let's call 
them by different names, because they're indeed different, and decide on 
one.

- Cleber.

> Lukáš
>
>> Thanks.
>>    - Ademar
>>
>
>

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]




More information about the Avocado-devel mailing list