[Avocado-devel] RFC: Multi tests (previously multi-host test) [v2]

Mon Apr 4 05:14:28 UTC 2016

On 03/31/2016 12:55 PM, Lukáš Doktor wrote:
> Hello guys,
>
> This is a v2 of the multi tests RFC, previously known as multi-host RFC.
>
> Changes:
>
>      v2: Rewritten from scratch
>      v2: Added examples for the demonstration to avoid confusion
>      v2: Removed the mht format (which was there to demonstrate manual
> execution)
>      v2: Added 2 solutions for multi-tests
>      v2: Described ways to support synchronization
>
> The problem
> ===========
>

I believe a formal definition of the problem may help us to keep the 
possible solutions in a closer sight. I would describe the problem as:

"Allow tests to have some of its blocks of code run in separate 
stream(s)[1]".

"Blocks of code", for now, is a rather abstract concept, to be discussed 
later.

> A user wants to run netperf on 2 machines, which requires following
> manual steps:
>
>      machine1: netserver -D
>      machine1: # Wait till netserver is initialized
>      machine2: netperf -H $machine1 -l 60
>      machine2: # Wait till it finishes and report store the results
>      machine1: # stop the netserver and report possible failures
>

Using the definition given above, all code run on prefixed with 
"machine1:" would be one execution stream and and all code prefixed with 
"machine2:" would be a second stream.

The test itself would be a single entity, composed of its own code in 
addition the code to be run on machine1 and machine2, as covered before.

> Another use-cases might be:
>

Using the same definition, these use cases would become:

> 1. triggering several un-related tests in parallel

"Running blocks of code in parallel".

> 2. triggering several tests in parallel with synchronization

"Running blocks of code in parallel with synchronization".

> 3. spreading several tests into multiple machines

"Running blocks of code in multiple external machines". A point here: 
this could mean either sequentially or in parallel.

> 4. triggering several various tests on multiple machines

"Running varied blocks of code in multiple external machines".

>
> The problem is not only about running tests on multiple machines, but
> generally about ways to trigger tests/set of tests in whatever way the
> user needs to.
>

Based on the definition given, "running tests on multiple machines" is 
not the *direct* scope of this RFC. Running *tests* um multiple machines 
(either sequentially or in parallel) could be the scope of a "multi host 
*job*" RFC, that is a Job that encompass tests that would be run into 
multiple different machines. In such a (multi host) job, there would be 
a 1:N relationship between a job and machines, and a 1:1 relationship 
between a test and a machine.

>
> Running the tests
> =================
>
> In v1 we rejected the idea to run custom code from inside the tests in
> bacground as it requires implementing the remote-tests again and again
> and we decided that executing full tests or set of tests with support
> for remote synchronization/data exchange is the way to go. There were
> two-three bigger categories so let's describe each so we can pick the
> most suitable one (at this moment).
>

My current understanding is that the approach of implementing the remote 
execution of code based on the execution of multiple "avocado" 
instances, with different command line options to reflect the multiple 
executions was abandoned.

> For demonstration purposes I'll be writing very simple multi-host test
> which triggers on 3 machines "/usr/bin/wget example.org" to simulate
> very basic stress tests.
>

Again revising all statements in light of the definition given before, 
this clearly means one single test, with the same "block of code" 
(/usr/bin/wget example.org) to be executed on 3 machines.

> Synchronization and parametrization will not be covered in this section
> as synchronization will be described in the next chapter and is the same
> for all solutions and parametrization is a standard avocado feature.
>
>
> Internal API
> ------------
>
> One of the ways to allow people to trigger tests and set of tests (jobs)
> from inside test is to pick the minimal required set of internal API
> which handles remote job execution, make it public (and supported) and
> refactor it so it can be realistically called from inside test.
>
> Example (pseudocode)
>
>      class WgetExample(avocado.Test):
>          jobs = []
>          for i, machine in enumerate(["127.0.0.1", "192.168.122.2",
>                                       "192.168.122.3"]):
>              jobs.append(avocado.Job(urls=["/usr/bin/wget example.org"],
>                                      remote_machine=machine,
>                                      logdir=os.path.join(self.logdir,
>                                                          i)))

The example given may lead a reader into thinking the problem that is 
attempted to be solved here is one of remote execution of commands. So' 
let's just remind ourselves that the problem at stake, IMHO, is:

"Allow tests to have some of its blocks of code run in separate stream(s)".

>          for job in jobs:
>              job.run_background()
>          errors = []
>          for i, job in enumerate(jobs):
>              result = job.wait()     # returns json results
>              if result["pass"] != result["total"]:
>                  errors.append("Tests on worker %s (%s) failed"
>                                % (i, machines[i]))
>          if errors:
>              self.fail("Some workers failed:\n%s" % "\n".join(errors))
>

This example defines a "code block" unit as an Avocado Job. So, 
essentially, using the previous definition I gave, the suggestion would 
be to translated to:

"Allow Avocado tests to have Avocado Jobs run in separate stream(s)"

The most striking aspect of this example is of course the use of an 
Avocado Job inside an Avocado Test. An Avocado Job, by definition and 
implementation, is a "logical container" for tests. Having a *test* 
firing *jobs* as part of the official solution to crosses the layers we 
defined and designed ourselves.

Given that an Avocado Job includes most of the functionality of Avocado 
(as a whole), too many questions can be raised with regards to what 
aspects of these (intra test) Jobs are to be supported.

To summarize it, I'm skeptical that an Avocado Job should be the "code 
block" unit for problem at hand.

> alternatively even require the user to define the whole workflow:
>
> 1. discover test (loader)
> 2. add params/variants
> 3. setup remote execution (RemoteTestRunner)
> 4. setup results (RemoteResults)
>
> which would require even more internal API to be turned public.
>
> + easy to develop, we simply identify set of classes and make them public
> - hard to maintain as the API would have to stay stable, therefor
> realistically it requires big cleanup before doing this step
>
> Multi-tests API
> --------------
>
> To avoid the need to make the API which drives testing public, we can
> also introduce an API to trigger jobs/set of jobs. It would be sort of
> proxy between internal API, which can and changes more-often an the
> public multi-host API which would be supported and kept stable.
>
> I see two basic backends supporting this API, but they both share the
> same public API.
>
> Example (pseudocode)
>
>      class WgetExample(avocado.MultiTest):
>          for machine in ["127.0.0.1", "192.168.122.2", "192.168.122.3"]):
>              self.add_worker(machine)
>          for worker in self.workers:
>              worker.add_test("/usr/bin/wget example.org")
>          #self.start()
>          #results = self.wait()
>          #if results["failures"]:
>          #    self.fail(results["failures"])
>          self.run()  # does the above
>

I have hopes, maybe naive ones, that regular Avocado tests can have some 
of its code blocks run on different streams with the aid of some APIs. 
What I mean is that I Avocado would not *require* a specialized class 
for that.

> The basic set of API should contain:
>
> * MultiTest.workers - list of defined workers
> * MultiTest.add_worker(machine="localhost") - to add new sub-job
> * MultiTest.run(timeout=None) - to start all workers, wait for results
> and fail the current test if any of the workers reported failure
> * MultiTest.start() - start testing in background (allow this test to
> monitor or interact with the workers)
> * MultiTest.wait(timeout=None) - wait till all workers finish
> * Worker.add_test(url) - add test to be executed
> * Worker.add_tests(urls) - add list of tests to be executed
> * Worker.abort() - abort the execution
>
> I didn't wanted to talk about params but they are essential for
> multi-tests. I think we should allow passing default params for all tests:
>
> * Worker.params(params) - where params should be in any supported format
> by Test class (currently AvocadoParams or dict)
>
> or per test during "add_test":
>
> * Worker.add_test(url, params=None) - again, params should be any
> supported format (currently only possible via internal API, but even
> without multi-tests I'm fighting for such support on the command line)
>
> Another option could be to allow supplying all "test" arguments using
> **kwargs inside the "add_test":
>
> * Worker.add_test(url, **kwargs=None) -> discover_url and override test
> arguments if provided (currently only possible via internal API,
> probably never possible on the command line, but the arguments are
> methodName, name, params, base_logdir, tag, job, runner_queue and I
> don't see a value in overriding any them but the params)
>

This example now suggests an Avocado Test as the "code block" unit. That 
is, the problem definition would translate roughly to:

"Allow Avocado tests to have Avocado tests run in separate stream(s)"

We now have a smaller "code block" unit, but still one that can make the 
design and definitions a little confusing at first. Questions that 
immediately arise:

Q) Is a test the smaller "code block" unit we could use?
A) Definitely not. We could use single code statements, a function or a 
("builtin" object inherited) class.

Q) Is it common to have "existing tests" run as part of "multi host" tests?
A) Pretty common (think of benchmark tests).

Q) Is there value in letting developers keep the same development flow 
and use the same test APIs for the "code blocks"?
A) IMHO, yes.

Q) Should a test (which is not a container) hold test results?
A) IMHO, no.

My understanding is that it is still possible to keep the design 
predictable by setting an "Avocado Test" as code block. One more aspect 
that supports this view is that there's consensus and ongoing work to 
make an Avocado Test slimmer, and remove from it the machinery needed to 
make/help tests run.

This way, from the Avocado design and results perspective we'd still have:

[Job [test #1] [test #2] [...] [test #n]]

And from the developer perspective, we'd have:

     from avocado_misc_tests.perf.stress import Stress
     class StressVMOnHost(avocado.Test):
         def test(self):
             ...
             worker1.add(Stress)
             worker2.add(Stress)
             ...
             if not require_pass(worker1, worker2):
                 self.fail(reason)

One alternative approach would be to formally introduce the concept of 
"code blocks", and allow Avocado Tests to be used as such:

     from avocado_misc_tests.perf.stress import Stress
     class StressVMOnHost(avocado.Test):
         def test(self):
             code_block = code_block_from_test(Stress)
             ...
             worker1.add(code_block)
             worker2.add(code_block)
             ...
             if not require_success(worker1, worker2):
                 self.fail(reason)

>
> API backed by internal API
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> This would implement the multi-test API using the internal API (from
> avocado.core).
>
> + runs native python
> + easy interaction and development
> + easily extensible by either using internal API (and risk changes) or
> by inheriting and extending the features.
> - lots of internal API will be involved, thus with almost every change
> of internal API we'd have to adjust this code to keep the MultiTest working
> - fabric/paramiko is not thread/parallel process safe and fails badly so
> first we'd have to rewrite our remote execution code (use autotest's
> worker, or aexpect+ssh)
>

Even with listed challenges (and certainly more to come), this is the 
the way to go.

>
> API backed by cmdline
> ~~~~~~~~~~~~~~~~~~~~~
>
> This would implement the multi-test API by translating it into "avocado
> run" commands during "self.start()".
>
> + easy to debug as users are used to the "avocado run" syntax and issues
> + allows manual mode where users trigger the "avocado run" manually
> + cmdline args are part of public API so they should stay stable
> + no issues with fabric/paramiko as each process is separate
> + even easier extensible as one just needs to implement the feature for
> "avocado run" and then can use it as extra_params in the worker, or send
> PR to support it in the stable environment.
> - only features available on the cmdline can be supported (currently not
> limiting)
> - rely on stdout parsing (but avocado supports machine readable output)
>

I wholeheartedly disagree with this implementation suggestion. Some 
reasons where given on the previous RFC version response.

>
> Synchronization
> ===============
>
> Some tests does not need any synchronization, users just need to run
> them. But some multi-tests needs to be synchronized or they need to
> exchange data. For synchronization usually "barriers" are used, where
> barrier requires a "name" and "number of clients". One requests entry
> into barrier guarded section, it's interrupted until "number of clients"
> are waiting for it (or timeout is reached).
>
> To do so the test needs and IP address+port where the synchronization
> server is listening. We can start this from the multi-test and only
> support it this way:
>
>      self.sync_server.start(addr=None, port=None)  # start listening
>      self.sync_server.stop()    # stop listening
>      self.sync_server.details   # contact information to be used by workers
>
> Alternatively we might even support this on the command line to allow
> manual execution:
>
>      --sync-server [addr[:port]] - listen on addr:port (pick one by
> default)
>      --sync addr:port - when barrier/data exchange is used, use
> addr:port to contact sync server.
>
> The  cmdline argument would allow manual executions, for example for
> testing purposes or execution inside custom build systems (jenkins,
> beaker, ...) without the multi-test support.
>
> The result is the same, avocado listens on some port and the spawned
> workers connect to this port, identify themselves and ask for
> barriers/data exchange, with the support for re-connection. To do so we
> have various possibilities:
>
> Standard multiprocess API
> -------------------------
>
> The standard python's multiprocessing library contains over the TCP
> synchronization. The only problem is that "barriers" were introduced in
> python3 so we'd have to backport it and it does not fit all our needs so
> we'd have to tweak it a bit.
>
>
> Autotest's syncdata
> -------------------
>
> Python 2.4 friendly, supports barriers and data synchronization. On the
> contrary it's quite hackish and full of shortcuts.
>
>
> Custom code
> -----------
>
> We can inspire by the above and create simple human-readable (easy to
> debug or interact with manually) protocol to support barriers and data
> exchange via pickling. IMO that would be easier to maintain than
> backporting and adjusting of the multiprocessing or fixing the autotest
> syncdata. A proof-of-concept can be found here:
>
>      https://github.com/avocado-framework/avocado/pull/1019
>
> It modifies the "passtest" to be only executed when it's executed by 2
> workers at the same time. It does not support the multi-tests yet, so
> one has to run "avocado run passtest" twice using the same
> "--sync-server" (once --sync-server and once --sync).
>
>
> Conclusion
> ==========
>
> Given the reasons I like the idea of "API backed by cmdline" as all
> cmdline options are stable, the output is machine readable and known to
> users so easily to debug manually.
>
> For synchronization that requires the "--sync" and "--sync-server"
> arguments to be present, also not necessarily used when the users uses
> the multi-test (the multi-test can start the the server if not already
> started and add "--sync" for each worker if not provided).
>
> The netperf example from introduction would look like this:
>
> The client tests are ordinary "avocado.Test" tests that can even be
> executed manually without any synchronization (by providing no_client=1)
>
>      class NetServer(avocado.Test):
>          def setUp(self):
>              process.run("netserver")
>              self.barrier("setup", params.get("no_clients"))
>          def test(self):
>              pass
>          def tearDown(self):
>              self.barrier("finished", params.get("no_clients"))
>              process.run("killall netserver")
>
>      class NetPerf(avocado.Test):
>          def setUp(self):
>              self.barrier("setup", params.get("no_clients"))
>          def test(self):
>              process.run("netperf -H %s -l 60"
>                          % params.get("server_ip"))
>              barrier("finished", params.get("no_clients"))
>
> One would be able to run this manually (or from build systems) using:
>
>      avocado run NetServer --sync-server $IP:12345 &
>      avocado run NetPerf --sync $IP:12345 &
>
> (one would have to hardcode or provide the "no_clients" and "server_ip"
> params on the cmdline)
>
> and the NetPerf would wait till NetServer is initialized, then it'd run
> the test while NetServer would wait till it finishes. For some users
> this is sufficient, but let's add the multi-test test to get a single
> results (pseudo code):
>
>      class MultiNetperf(avocado.MultiTest):
>          machines = params.get("machines")
>          assert len(machines) > 1
>          for machine in params.get("machines"):
>              self.add_worker(machine, sync=True)     # enable sync server
>          self.workers[0].add_test("NetServer")
>          self.workers[0].set_params({"no_clients": len(self.workers)})
>          for worker in self.workers[1:]:
>              worker.add_test("NetPerf")
>              worker.set_parmas({"no_clients": len(self.workers),
>                                 "server_ip": machines[0]})
>          self.run()
>
> Running:
>
>      avocado run MultiNetperf
>
> would run a single test, which based on the params given to the test
> would run on several machines using the first machine as server and the
> rest as clients and all of them would start at the same time.
>
> It'd produce a single results with one test id and following structure
> (example):
>
>
>      $ tree $RESULTDIR
>        └── test-results
>            └── simple.mht

As you pointed out during our chat, the suffices ".mht" was not intended 
here.

>                ├── job.log
>                    ...
>                ├── 1
>                │   └── job.log
>                        ...
>                └── 2
>                    └── job.log
>                        ...
>

Getting back to the definitions that were laid out, I revised my 
understanding and now I believe/suggest that we should have a single 
"job.log" per job.

> where 1 and 2 are the results of worker 1 and worker 2. For all of the
> solution proposed those would give the user the standard results as they
> know them from normal avocado executions, each with a unique id, which
> should help analyzing and debugging the results.

[1] - Using "streams" instead of "threads" to reduce confusion with the 
classical multi-processing pattern of threaded programming and the OS 
features that support the same pattern. That being said, "threads" could 
be one type of execution "stream" supported by Avocado, albeit it's not 
a primary development target for various reasons, including the good 
support for threads already present in the underlying Python standard 
library.

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]