[Avocado-devel] RFC: multi-stream test (previously multi-test) [v4]

Thu Apr 28 20:28:21 UTC 2016

On 04/28/2016 12:10 PM, Lukáš Doktor wrote:
> Hello again,
>
> This version removes the rejected variants and hopefully clarifies all
> the goals needed for multi-stream (and also multi-host) tests available.
>
> Changes:
>
>     v2: Rewritten from scratch
>     v2: Added examples for the demonstration to avoid confusion
>     v2: Removed the mht format (which was there to demonstrate manual
>         execution)
>     v2: Added 2 solutions for multi-tests
>     v2: Described ways to support synchronization
>     v3: Renamed to multi-stream as it befits the purpose
>     v3: Improved introduction
>     v3: Workers are renamed to streams
>     v3: Added example which uses library, instead of new test
>     v3: Multi-test renamed to nested tests
>     v3: Added section regarding Job API RFC
>     v3: Better description of the Synchronization section
>     v3: Improved conclusion
>     v3: Removed the "Internal API" section (it was a transition between
>         no support and "nested test API", not a "real" solution)
>     v3: Using per-test granularity in nested tests (requires plugins
>         refactor from Job API, but allows greater flexibility)
>     v4: Removed "Standard python libraries" section (rejected)
>     v4: Removed "API backed by cmdline" (rejected)
>     v4: Simplified "Synchronization" section (only describes the
>         purpose)
>     v4: Refined all sections
>     v4: Improved the complex example and added comments
>     v4: Formulated the problem of multiple tasks in one stream
>     v4: Rejected the idea of bounding it inside MultiTest class
>         inherited from avocado.Test, using a library-only approach
>
>
> The problem
> ===========
>
> Allow tests to have some if its block of code run in separate stream(s).
> We'll discuss the range of "block of code" further in the text as well
> as what the streams stands for.
>
> One example could be a user, who wants to run netperf on 2 machines,
> which requires following manual steps:
>
>     stream1: netserver -D
>     stream1: # Wait till netserver is initialized
>     stream2: netperf -H $machine1 -l 60
>     stream2: # Wait till it finishes and report the results
>     stream1: # stop the netserver and report possible failures
>
> the test would have to contain the code for both, stream1 and stream2
> and it executes them in two separate streams, which might or not be
> executed on the same machine.
>

Right, this clearly shows that the use case is "user wants to write/run 
a test", which is right on Avocado's business.  Then, "by the way", he 
wants to leverage netperf for that, which fine (but not directly related 
to this proposal).  Oh, and BTW (again), test requires a part of it to 
be run on a different place.  Checks all necessary boxes IMO.

> Some other examples might be:
>
> 1. A simple stress routine being executed in parallel (the same or
> different hosts)
>    * utilize a service under testing from multiple hosts (stress test)

The "test requires a part of it to be run a different place" requirement 
again. Fine.

> 2. Several code blocks being combined into a complex scenario(s)
>    * netperf + other test
>    * multi-host QEMU migration
>    * migrate while changing interfaces and running cpu stress

Yep, sounds like the very same requirement, just more imaginative 
combinations.

> 3. Running the same test along with stress test in background
>    * cpu stress test + cpu hotplug test
>    * memory stress test + migration
>

Here, "a different place" is "the same place", but it's still seen as a 
separate execution stream.  The big difference is that you mention "test 
x" + "test y".  I know what's coming, but it gives it away that we're 
possibly talking about having "blocks of code" made out of other tests. 
IMO, it sounds good.

>
> Solution
> ========
>
>
> Stream
> ------
>
> From the introduction you can see that "Stream" stands for a "Worker"
> which allows to execute the code in parallel to the main test routine
> and the main test routine can offload tasks to it. The primary
> requirement is to allow this execution on the same as well on a
> different machine.
>
>

Is a "Worker" a proper entity?  In status parity with a "Stream"?  Or is 
it a synonym for "Worker"?

Or maybe you meant that "Stream stands for a worker" (lowercase)?

> Block of code
> -------------
>
> Throughout the first 3 versions we discussed what the "block of code"
> should be. The result is a avocado.Test compatible class, which follows
> the same workflow as normal test and reports the results back to the
> stream. It is not the smallest piece of code that could be theoretically
> executed (think of functions), but it has many benefits:
>
> 1. Well known structure including information in case of failure
> 2. Allows simple development of components (in form of tests)
> 3. Allows to re-use existing tests and combine them into complex
>    scenarios
>
> Note: Smaller pieces of code can be still executed in parallel without
> the framework support using standard python libraries (multiprocessing,
> threading). This RFC is focusing on simplifying the development of
> complex cases, where test as a minimal block of code fits quite well.
>
>

Sounds good.

> Resolving the tests
> -------------------
>
> String
> ~~~~~~
>
> As mentioned earlier, the `stream` should be able to handle
> avocado.Test-like classes, which means the test needs to find one.
> Luckily, avocado already has such feature as part of internal API. I'd
> like to use it by passing string `test reference` to the stream, which
> should resolve it and execute.
>

Just to make it even more clear, this could also be (re-)written as:

"... which means the test needs to unambiguously identify his block of 
code, which also happens to be a valid avocado.Test."

Right?

By setting the reference to be evaluated by the stream, IMHO, you add 
responsibilities to the stream.  How will be behave on the various error 
scenarios?  Internally, the stream will most likely use the 
loader/resolver, but then it would need to communicate the 
loader/resolver status/exceptions back to the test.  Looks like this 
could be better layered.

> Resolver
> ~~~~~~~~
>
> Some users might prefer tweaking the resolver. This is currently not
> supported, but is part of the "JobAPI RFC". Once it's developed, we
> should be able to benefit from it and use it to resolve the `test
> references` to test-like definitions and pass it over to the stream.
>

What do you mean by "tweaking"?  I don't think developers of a 
multi-stream test would tweak a resolver.

Now, let's look at: "... use it to resolve the `test references` to 
test-like definitions ...".  There's something slipping there, and 
lacking a more clear definition.

You probably mean: ".. use it to resolve the `test references` to "code 
blocks" that would be passed to the stream.".  Although I don't support 
defining the result of the resolver as a "code block", the important 
thing here is to define that a resolver API can either return the 
`<class user_module.UserTest>` "Python reference/pointer" or some other 
opaque structure that is well understood as being a valid and 
unambiguous reference to a "code block".

I see the result of a "resolve()" call returning something that packs 
more information besides the `<class user_module.UserTest>` "pointer". 
Right now, the closest we have to this are the "test factories".

> Local reference
> ~~~~~~~~~~~~~~~
>
> Last but not least, some users might prefer keeping the code in one
> file. This is currently also not possible as the in-stream-test-class
> would either be also resolved as a main test or they would not be
> resolved by the stream.
>
> We faced a similar problem with the deep inheritance and we solved it by
> a docstring tag:
>
>     class MyTest(Test):
>         '''
>         Some description
>         :avocado: disable
>         '''
>         def test(self):
>             pass
>
> which tells the resolver to avoid this class. We can expand it and use
> for example "strict" to only be executed when the full path
> ($FILE:$TEST.$METHOD) is used. This way we could put all the parts in a
> single file and reference the tasks by a full path.
>
> Alternatively we could introduce another class
>
>     class Worker(avocado.Test):
>         pass
>
> and the file loader would detect it and only yield it when full path is
> provided (similarly to SimpleTest class).
>
>

If the loader acknowledges those nested classes as valid `avocado.Test`, 
then the resolver can certainly return information about them in the 
analog to our current "test factories".  This way, the internal (same 
file) referencing could indeed be cleanly implemented.

> Synchronization
> ---------------
>
> Some tests do not need any synchronization, users just need to run them.
> But some multi-stream tests needs to be precisely synchronized or they
> need to exchange data.
>
> For synchronization purposes usually "barriers" are used, where barrier
> guards the entry into a section identified by "name" and "number of
> clients". All parties asking an entry into the section will be delayed
> until the "number of clients" reach the section (or timeout). Then they
> are resumed and can entry the section. Any failure while waiting for a
> barrier propagates to other waiting parties.
>
> One way is to use existing python libraries, but they usually require
> some boilerplate code around. One of the tasks on the multi-stream tests
> should be to implement basic barrier interface, which would be
> initialized in `avocado.Streams` and details should be propagated to the
> parts executed inside streams.
>
> The way I see this is to implement simple tcp-based protocol (to allow
> manual debug) and pass the details to tests inside streams via params.
> So `avocado.Streams` init would start the daemon and one would connect
> to it from the test by:
>
>     from avocado.plugins.sync import Sync
>     # Connect the sync server on address stored in params
>     # which could be injected by the multi-stream test
>     # or set manually.
>     sync = Sync(self, params.get("sync_server", "/plugins/sync_server"))
>     # wait until 2 tests ask to enter "setup" barrier (60s timeout)
>     sync.barrier("setup", 2, 60)
>

OK, so the execution streams can react to "test wide" synchronization 
parameters.  I don't see anything wrong with that at this point.

> The new protocol is quite necessary as we need support for re-connection
> and other tweaks which are not supported by multiprocessing library.
>
>
> Very simple example
> -------------------
>
> This example demonstrates a test, which tries to access "example.org"
> concurrently from N machines without any synchronization.
>
>     import avocado
>
>     class WgetExample(avocado.Test):
>         def setUp(self):
>             # Initialize streams
>             self.streams = avocado.Streams(self)
>             for machine in machines:
>                 # Add one stream per machine, create the connection
>                 # and prepare for execution.
>                 self.streams.add_stream(machine)
>         def test(self)
>             for stream in self.streams:
>                 # Resolve the "/usr..." into
>                 # SimpleTest("/usr/bin/wget example.org") and
>                 # schedule the execution inside the current stream
>                 stream.run_bg("/usr/bin/wget example.org")
>             # Wait till both streams finish all tasks and fail the test
>             # in case any of them fails.
>             self.streams.wait(ignore_errors=False)
>
> where the `avocado.Stream` represents a worker (local or remote) which
> allows running avocado tests in it (foreground or background). This
> should provide enough flexibility to combine existing tests in complex
> tests.
>
>

Of course questions such as "where to machines com from?" would arise, 
but I understand the possibilities.  My only very strong opinion here is 
to not link the resolution and execution on the primary APIs.  Maybe a 
`resolve_and_run()` utility could exist, but I'm not entirely convinced. 
  I really see the two things (resolution and execution) as two 
different layers.

> Advanced example
> ----------------
>
> MultiNetperf.py:
>
>     class MultiNetperf(avocado.NestedTest):
>         def setUp(self):
>             # Initialize streams (start sync server, ...)
>             self.streams = avocado.Streams(self)
>             machines = ["localhost", "192.168.122.2"]
>             for machine in machines:
>                 # Add one stream per machine
>                 self.streams.add_stream(machine)
>         def test(self):
>             # Ask the first stream to resolve "NetServer", pass the {}
>             # params to it (together with sync-server url),
>             # schedule the job in stream and return to main thread
>             # while the stream executes the code.
>             self.streams[0].run_bg("NetServer",
>                                    {"no_clients": len(self.streams)})
>             for stream in self.streams[1:]:
>                 # Resolve "NetPerf", pass the {} params to it,
>                 # schedule the job in stream and return to main
>                 # thread while the stream executes the code
>                 stream.run_bg("NetPerf",
>                               {"no_clients": len(self.workers),
>                                "server_ip": machines[0]})
>             # Wait for all streams to finish all scheduled tasks
>             self.streams.wait(ignore_failures=False)
>

You lost me here with `avocado.NestedTest`...

> NetServer.py:
>
>     class NetServer(avocado.NestedTest):
>         def setUp(self):
>             # Initialize sync client
>             self.sync = avocado.Sync(self)
>             process.run("netserver")
>             # Contact sync server (url was passed in `stream.run_bg`)
>             # and ask to enter "setup" barrier with "no_clients"
>             # clients
>             self.sync.barrier("setup", self.params.get("no_clients"))
>         def test(self):
>             pass
>         def tearDown(self):
>             self.sync.barrier("finished", self.params.get("no_clients"))
>             process.run("killall netserver")
>
> NetPerf:
>
>     class NetPerf(avocado.NestedTest):
>         def setUp(self):
>             # Initialize sync client
>             self.sync = avocado.Sync(self)
>             process.run("netserver")
>             # Contact sync server (url was passed in `stream.run_bg`)
>             # and ask to enter "setup" barrier with "no_clients"
>             # clients
>             self.sync.barrier("setup", self.params.get("no_clients"))
>         def test(self):
>             process.run("netperf -H %s -l 60"
>                         % params.get("server_ip"))
>             barrier("finished", params.get("no_clients"))
>
>
> Possible implementation
> -----------------------
>
> _Previously: API backed by internal API_
>
> One way to drive this is to use existing internal API and create a layer
> in between, which invokes runner (local/remote based on the stream
> machine) to execute the code on `stream.run_bg` calls.
>
> This means the internal API would stay internal and (roughly) the same,
> but we'd develop a class to invoke the internal API. This class would
> have to be public and supported.
>
> + runs native python
> + easy interaction and development
> + easily extensible by either using internal API (and risk changes) or
> by inheriting and extending the features.
> - lots of internal API will be involved, thus with almost every change
> of internal API we'd have to adjust this code to keep the NestedTest working
> - fabric/paramiko is not thread/parallel process safe and fails badly so
> first we'd have to rewrite our remote execution code (use autotest's
> worker, or aexpect+ssh)
>
>
> Queue vs. signle task
> ---------------------
>
> Up to this point I always talked about stream as an entity, which drives
> the execution of "a code block". A big question is, whether it should
> behave like a queue, or only a single task:
>
> queue - allows scheduling several tasks and reports list of results
> single task - stream would only accept one task and produce one result
>
> I'd prefer the queue-like approach as it's more natural to me to first
> prepare streams and then keep adding tasks until all my work is done and
> I'd expect per-stream results to be bounded together, so I can know what
> happened. This means I could run `stream.run_bg(first);
> stream.run_bg(second); stream.run_fg(third); stream.run_bg(fourth)` and
> the stream should start task "first", queue task "second", queue task
> "third", wait for it to finish and report "third" results. Then it
> should resume the main thread and queue the "fourth" task (FIFO queue).
> Each stream should then allow to query for all results (list of
> json-results) as well as it should create a directory inside results and
> per-task sub-directory with task results.
>

I do see that the "queue" approach is more powerful, and I would love 
having something like that for my own use.  But (there's always a but), 
to decide on that approach we also have to consider:

* Increased complexity
* Increased development cost
* Passing the wrong message to users, that could look at this as a way 
to, say, build conditional executions on the same stream and have now a 
bunch of "micro" code blocks

These are the questions that come to my mind, and they all be dismissed 
as discussion progresses.  I'm just playing devil's advocate at this point.

> On the other hand the "single task" should always establish the new
> connection and create separate results per-each task added. This means
> preparing the streams is not needed as each added task is executed
> inside a different stream. So the interface could be
> `self.streams.run_bg(where, what, details)` and it should report the
> task id or task results in case of `run_fg`. The big question is what
> should happen when a task resolves in multiple tasks (eg: `gdbtest`).

That's why the "block of code" reference should be unambiguous.  No 
special situation to deal with.  It'd be a major confusion to have more 
than one "block of code" executed unintentionally.

> Should it fail or create streams per each task? What should it report,
> then? I can imagine a function `run_all_{fg,bg}` which would create a
> stream for each worker and return list of id/results in case the writer
> is not sure (or knows) that the test reference resolves into several tasks.
>

Let's try to favor simpler interfaces, which would not introduce this 
number o special scenarios.

> See more details in the next chapter
>
>
> Results directory
> -----------------
>
> This demonstrates the results for a modified "MultiNetperf" test. The
> difference is that it runs 2 variants of netperf:
>
> * Netperf.bigbuf    # netperf using big buffers
> * Netperf.smallbuf  # netperf using small buffers
>
> Queue-like approach:
>
>     job-2016-04-15T.../
>     ├── id
>     ├── job.log
>     └── test-results
>         └── 1-MultiNetperf
>             ├── debug.log
>             ├── stream1       # one could provide custom name/host
>             │   ├── 1-Netperf.bigbuf
>             │   │   ├── debug.log
>             │   │   └── whiteboard
>             │   └── 2-Netperf.smallbuf
>             │       ├── debug.log
>             │       └── whiteboard
>             ├── stream2
>             │   └── 1-NetServer
>             │       ├── debug.log
>             │       └── whiteboard
>             └── whiteboard
>
> Single task approach:
>
>     job-2016-04-16T.../
>     ├── id
>     ├── job.log
>     └── test-results
>         └── 1-MultiNetperf
>             ├── debug.log
>             ├── whiteboard
>             ├── 1-Netperf.bigbuf
>             │   ├── debug.log
>             │   └── whiteboard
>             ├── 2-Netperf.smallbuf
>             │   ├── debug.log
>             │   └── whiteboard
>             └── 3-Netperf.smallbuf
>                 ├── debug.log
>                 └── whiteboard
>
> The difference is that queue-like approach bundles the result
> per-worker, which could be useful when using multiple machines.
>
> The single-task approach makes it easier to follow how the execution
> went, but one needs to see the log to see on which machine was the task
> executed.
>
>

The logs can indeed be useful.  And the choices about single .vs. queue 
wouldn't really depend on this... this is, quite obviously the *result* 
of that choice.

> Job API RFC
> ===========
>
> Recently introduced Job API RFC covers very similar topic as "nested
> test", but it's not the same. The Job API is enabling users to modify
> the job execution, eventually even write a runner which would suit them
> to run groups of tests. On the contrary this RFC covers a way to combine
> code-blocks/tests to reuse them into a single test. In a hackish way,
> they can supplement each others, but the purpose is different.
>

"nested", without a previous definition, really confuses me.  Other than 
that, ACK.

> One of the most obvious differences is, that a failed "nested" test can
> be intentional (eg. reusing the NetPerf test to check if unreachable
> machines can talk to each other), while in Job API it's always a failure.
>

It may just be me, but I fail to see how this is one obvious difference.

> I hope you see the pattern. They are similar, but on a different layer.
> Internally, though, they can share some pieces like execution the
> individual tests concurrently with different params/plugins
> (locally/remotely). All the needed plugin modifications would also be
> useful for both of these RFCs.
>

The layers involved, and the proposed usage, should be the obvious 
differences.  If they're not cleanly seen, we're doing something wrong.

> Some examples:
>
> User1 wants to run "compile_kernel" test on a machine followed by
> "install_compiled_kernel passtest failtest warntest" on "machine1
> machine2". They depend on the status of the previous test, but they
> don't create a scenario. So the user should use Job API (or execute 3
> jobs manually).
>
> User2 wants to create migration test, which starts migration from
> machine1 and receives the migration on machine2. It requires cooperation
> and together it creates one complex usecase so the user should use
> multi-stream test.
>
>

OK.

> Conclusion
> ==========
>
> This RFC proposes to add a simple API to allow triggering
> avocado.Test-like instances on local or remote machine. The main point
> is it should allow very simple code-reuse and modular test development.
> I believe it'll be easier, than having users to handle the
> multiprocessing library, which might allow similar features, but with a
> lot of boilerplate code and even more code to handle possible exceptions.
>
> This concept also plays nicely with the Job API RFC, it could utilize
> most of tasks needed for it and together they should allow amazing
> flexibility with known and similar structure (therefor easy to learn).
>

Thanks for the much cleaner v4!  I see that consensus and a common view 
is now approaching.

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]