[Avocado-devel] RFC: multi-stream test (previously multi-test) [v4]

Thu Apr 28 15:10:07 UTC 2016

Hello again,

This version removes the rejected variants and hopefully clarifies all
the goals needed for multi-stream (and also multi-host) tests available.

Changes:

    v2: Rewritten from scratch
    v2: Added examples for the demonstration to avoid confusion
    v2: Removed the mht format (which was there to demonstrate manual
        execution)
    v2: Added 2 solutions for multi-tests
    v2: Described ways to support synchronization
    v3: Renamed to multi-stream as it befits the purpose
    v3: Improved introduction
    v3: Workers are renamed to streams
    v3: Added example which uses library, instead of new test
    v3: Multi-test renamed to nested tests
    v3: Added section regarding Job API RFC
    v3: Better description of the Synchronization section
    v3: Improved conclusion
    v3: Removed the "Internal API" section (it was a transition between
        no support and "nested test API", not a "real" solution)
    v3: Using per-test granularity in nested tests (requires plugins
        refactor from Job API, but allows greater flexibility)
    v4: Removed "Standard python libraries" section (rejected)
    v4: Removed "API backed by cmdline" (rejected)
    v4: Simplified "Synchronization" section (only describes the
        purpose)
    v4: Refined all sections
    v4: Improved the complex example and added comments
    v4: Formulated the problem of multiple tasks in one stream
    v4: Rejected the idea of bounding it inside MultiTest class
        inherited from avocado.Test, using a library-only approach

The problem
===========

Allow tests to have some if its block of code run in separate stream(s).
We'll discuss the range of "block of code" further in the text as well
as what the streams stands for.

One example could be a user, who wants to run netperf on 2 machines,
which requires following manual steps:

    stream1: netserver -D
    stream1: # Wait till netserver is initialized
    stream2: netperf -H $machine1 -l 60
    stream2: # Wait till it finishes and report the results
    stream1: # stop the netserver and report possible failures

the test would have to contain the code for both, stream1 and stream2
and it executes them in two separate streams, which might or not be
executed on the same machine.

Some other examples might be:

1. A simple stress routine being executed in parallel (the same or
different hosts)
   * utilize a service under testing from multiple hosts (stress test)
2. Several code blocks being combined into a complex scenario(s)
   * netperf + other test
   * multi-host QEMU migration
   * migrate while changing interfaces and running cpu stress
3. Running the same test along with stress test in background
   * cpu stress test + cpu hotplug test
   * memory stress test + migration

Solution
========

Stream
------

From the introduction you can see that "Stream" stands for a "Worker"
which allows to execute the code in parallel to the main test routine
and the main test routine can offload tasks to it. The primary
requirement is to allow this execution on the same as well on a
different machine.

Block of code
-------------

Throughout the first 3 versions we discussed what the "block of code"
should be. The result is a avocado.Test compatible class, which follows
the same workflow as normal test and reports the results back to the
stream. It is not the smallest piece of code that could be theoretically
executed (think of functions), but it has many benefits:

1. Well known structure including information in case of failure
2. Allows simple development of components (in form of tests)
3. Allows to re-use existing tests and combine them into complex
   scenarios

Note: Smaller pieces of code can be still executed in parallel without
the framework support using standard python libraries (multiprocessing,
threading). This RFC is focusing on simplifying the development of
complex cases, where test as a minimal block of code fits quite well.

Resolving the tests
-------------------

String
~~~~~~

As mentioned earlier, the `stream` should be able to handle
avocado.Test-like classes, which means the test needs to find one.
Luckily, avocado already has such feature as part of internal API. I'd
like to use it by passing string `test reference` to the stream, which
should resolve it and execute.

Resolver
~~~~~~~~

Some users might prefer tweaking the resolver. This is currently not
supported, but is part of the "JobAPI RFC". Once it's developed, we
should be able to benefit from it and use it to resolve the `test
references` to test-like definitions and pass it over to the stream.

Local reference
~~~~~~~~~~~~~~~

Last but not least, some users might prefer keeping the code in one
file. This is currently also not possible as the in-stream-test-class
would either be also resolved as a main test or they would not be
resolved by the stream.

We faced a similar problem with the deep inheritance and we solved it by
a docstring tag:

    class MyTest(Test):
        '''
        Some description
        :avocado: disable
        '''
        def test(self):
            pass

which tells the resolver to avoid this class. We can expand it and use
for example "strict" to only be executed when the full path
($FILE:$TEST.$METHOD) is used. This way we could put all the parts in a
single file and reference the tasks by a full path.

Alternatively we could introduce another class

    class Worker(avocado.Test):
        pass

and the file loader would detect it and only yield it when full path is
provided (similarly to SimpleTest class).

Synchronization
---------------

Some tests do not need any synchronization, users just need to run them.
But some multi-stream tests needs to be precisely synchronized or they
need to exchange data.

For synchronization purposes usually "barriers" are used, where barrier
guards the entry into a section identified by "name" and "number of
clients". All parties asking an entry into the section will be delayed
until the "number of clients" reach the section (or timeout). Then they
are resumed and can entry the section. Any failure while waiting for a
barrier propagates to other waiting parties.

One way is to use existing python libraries, but they usually require
some boilerplate code around. One of the tasks on the multi-stream tests
should be to implement basic barrier interface, which would be
initialized in `avocado.Streams` and details should be propagated to the
parts executed inside streams.

The way I see this is to implement simple tcp-based protocol (to allow
manual debug) and pass the details to tests inside streams via params.
So `avocado.Streams` init would start the daemon and one would connect
to it from the test by:

    from avocado.plugins.sync import Sync
    # Connect the sync server on address stored in params
    # which could be injected by the multi-stream test
    # or set manually.
    sync = Sync(self, params.get("sync_server", "/plugins/sync_server"))
    # wait until 2 tests ask to enter "setup" barrier (60s timeout)
    sync.barrier("setup", 2, 60)

The new protocol is quite necessary as we need support for re-connection
and other tweaks which are not supported by multiprocessing library.

Very simple example
-------------------

This example demonstrates a test, which tries to access "example.org"
concurrently from N machines without any synchronization.

    import avocado

    class WgetExample(avocado.Test):
        def setUp(self):
            # Initialize streams
            self.streams = avocado.Streams(self)
            for machine in machines:
                # Add one stream per machine, create the connection
                # and prepare for execution.
                self.streams.add_stream(machine)
        def test(self)
            for stream in self.streams:
                # Resolve the "/usr..." into
                # SimpleTest("/usr/bin/wget example.org") and
                # schedule the execution inside the current stream
                stream.run_bg("/usr/bin/wget example.org")
            # Wait till both streams finish all tasks and fail the test
            # in case any of them fails.
            self.streams.wait(ignore_errors=False)

where the `avocado.Stream` represents a worker (local or remote) which
allows running avocado tests in it (foreground or background). This
should provide enough flexibility to combine existing tests in complex
tests.

Advanced example
----------------

MultiNetperf.py:

    class MultiNetperf(avocado.NestedTest):
        def setUp(self):
            # Initialize streams (start sync server, ...)
            self.streams = avocado.Streams(self)
            machines = ["localhost", "192.168.122.2"]
            for machine in machines:
                # Add one stream per machine
                self.streams.add_stream(machine)
        def test(self):
            # Ask the first stream to resolve "NetServer", pass the {}
            # params to it (together with sync-server url),
            # schedule the job in stream and return to main thread
            # while the stream executes the code.
            self.streams[0].run_bg("NetServer",
                                   {"no_clients": len(self.streams)})
            for stream in self.streams[1:]:
                # Resolve "NetPerf", pass the {} params to it,
                # schedule the job in stream and return to main
                # thread while the stream executes the code
                stream.run_bg("NetPerf",
                              {"no_clients": len(self.workers),
                               "server_ip": machines[0]})
            # Wait for all streams to finish all scheduled tasks
            self.streams.wait(ignore_failures=False)

NetServer.py:

    class NetServer(avocado.NestedTest):
        def setUp(self):
            # Initialize sync client
            self.sync = avocado.Sync(self)
            process.run("netserver")
            # Contact sync server (url was passed in `stream.run_bg`)
            # and ask to enter "setup" barrier with "no_clients"
            # clients
            self.sync.barrier("setup", self.params.get("no_clients"))
        def test(self):
            pass
        def tearDown(self):
            self.sync.barrier("finished", self.params.get("no_clients"))
            process.run("killall netserver")

NetPerf:

    class NetPerf(avocado.NestedTest):
        def setUp(self):
            # Initialize sync client
            self.sync = avocado.Sync(self)
            process.run("netserver")
            # Contact sync server (url was passed in `stream.run_bg`)
            # and ask to enter "setup" barrier with "no_clients"
            # clients
            self.sync.barrier("setup", self.params.get("no_clients"))
        def test(self):
            process.run("netperf -H %s -l 60"
                        % params.get("server_ip"))
            barrier("finished", params.get("no_clients"))

Possible implementation
-----------------------

_Previously: API backed by internal API_

One way to drive this is to use existing internal API and create a layer
in between, which invokes runner (local/remote based on the stream
machine) to execute the code on `stream.run_bg` calls.

This means the internal API would stay internal and (roughly) the same,
but we'd develop a class to invoke the internal API. This class would
have to be public and supported.

+ runs native python
+ easy interaction and development
+ easily extensible by either using internal API (and risk changes) or
by inheriting and extending the features.
- lots of internal API will be involved, thus with almost every change
of internal API we'd have to adjust this code to keep the NestedTest working
- fabric/paramiko is not thread/parallel process safe and fails badly so
first we'd have to rewrite our remote execution code (use autotest's
worker, or aexpect+ssh)

Queue vs. signle task
---------------------

Up to this point I always talked about stream as an entity, which drives
the execution of "a code block". A big question is, whether it should
behave like a queue, or only a single task:

queue - allows scheduling several tasks and reports list of results
single task - stream would only accept one task and produce one result

I'd prefer the queue-like approach as it's more natural to me to first
prepare streams and then keep adding tasks until all my work is done and
I'd expect per-stream results to be bounded together, so I can know what
happened. This means I could run `stream.run_bg(first);
stream.run_bg(second); stream.run_fg(third); stream.run_bg(fourth)` and
the stream should start task "first", queue task "second", queue task
"third", wait for it to finish and report "third" results. Then it
should resume the main thread and queue the "fourth" task (FIFO queue).
Each stream should then allow to query for all results (list of
json-results) as well as it should create a directory inside results and
per-task sub-directory with task results.

On the other hand the "single task" should always establish the new
connection and create separate results per-each task added. This means
preparing the streams is not needed as each added task is executed
inside a different stream. So the interface could be
`self.streams.run_bg(where, what, details)` and it should report the
task id or task results in case of `run_fg`. The big question is what
should happen when a task resolves in multiple tasks (eg: `gdbtest`).
Should it fail or create streams per each task? What should it report,
then? I can imagine a function `run_all_{fg,bg}` which would create a
stream for each worker and return list of id/results in case the writer
is not sure (or knows) that the test reference resolves into several tasks.

See more details in the next chapter

Results directory
-----------------

This demonstrates the results for a modified "MultiNetperf" test. The
difference is that it runs 2 variants of netperf:

* Netperf.bigbuf    # netperf using big buffers
* Netperf.smallbuf  # netperf using small buffers

Queue-like approach:

    job-2016-04-15T.../
    ├── id
    ├── job.log
    └── test-results
        └── 1-MultiNetperf
            ├── debug.log
            ├── stream1       # one could provide custom name/host
            │   ├── 1-Netperf.bigbuf
            │   │   ├── debug.log
            │   │   └── whiteboard
            │   └── 2-Netperf.smallbuf
            │       ├── debug.log
            │       └── whiteboard
            ├── stream2
            │   └── 1-NetServer
            │       ├── debug.log
            │       └── whiteboard
            └── whiteboard

Single task approach:

    job-2016-04-16T.../
    ├── id
    ├── job.log
    └── test-results
        └── 1-MultiNetperf
            ├── debug.log
            ├── whiteboard
            ├── 1-Netperf.bigbuf
            │   ├── debug.log
            │   └── whiteboard
            ├── 2-Netperf.smallbuf
            │   ├── debug.log
            │   └── whiteboard
            └── 3-Netperf.smallbuf
                ├── debug.log
                └── whiteboard

The difference is that queue-like approach bundles the result
per-worker, which could be useful when using multiple machines.

The single-task approach makes it easier to follow how the execution
went, but one needs to see the log to see on which machine was the task
executed.

Job API RFC
===========

Recently introduced Job API RFC covers very similar topic as "nested
test", but it's not the same. The Job API is enabling users to modify
the job execution, eventually even write a runner which would suit them
to run groups of tests. On the contrary this RFC covers a way to combine
code-blocks/tests to reuse them into a single test. In a hackish way,
they can supplement each others, but the purpose is different.

One of the most obvious differences is, that a failed "nested" test can
be intentional (eg. reusing the NetPerf test to check if unreachable
machines can talk to each other), while in Job API it's always a failure.

I hope you see the pattern. They are similar, but on a different layer.
Internally, though, they can share some pieces like execution the
individual tests concurrently with different params/plugins
(locally/remotely). All the needed plugin modifications would also be
useful for both of these RFCs.

Some examples:

User1 wants to run "compile_kernel" test on a machine followed by
"install_compiled_kernel passtest failtest warntest" on "machine1
machine2". They depend on the status of the previous test, but they
don't create a scenario. So the user should use Job API (or execute 3
jobs manually).

User2 wants to create migration test, which starts migration from
machine1 and receives the migration on machine2. It requires cooperation
and together it creates one complex usecase so the user should use
multi-stream test.

Conclusion
==========

This RFC proposes to add a simple API to allow triggering
avocado.Test-like instances on local or remote machine. The main point
is it should allow very simple code-reuse and modular test development.
I believe it'll be easier, than having users to handle the
multiprocessing library, which might allow similar features, but with a
lot of boilerplate code and even more code to handle possible exceptions.

This concept also plays nicely with the Job API RFC, it could utilize
most of tasks needed for it and together they should allow amazing
flexibility with known and similar structure (therefor easy to learn).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/avocado-devel/attachments/20160428/54fbef88/attachment.sig>