[Avocado-devel] RFC: Multi tests (previously multi-host test) [v2]

Mon Apr 4 09:59:13 UTC 2016

Dne 4.4.2016 v 07:14 Cleber Rosa napsal(a):
> On 03/31/2016 12:55 PM, Lukáš Doktor wrote:
>> Hello guys,
>>
>> This is a v2 of the multi tests RFC, previously known as multi-host RFC.
>>
>> Changes:
>>
>>      v2: Rewritten from scratch
>>      v2: Added examples for the demonstration to avoid confusion
>>      v2: Removed the mht format (which was there to demonstrate manual
>> execution)
>>      v2: Added 2 solutions for multi-tests
>>      v2: Described ways to support synchronization
>>
>> The problem
>> ===========
>>
>
> I believe a formal definition of the problem may help us to keep the
> possible solutions in a closer sight. I would describe the problem as:
>
> "Allow tests to have some of its blocks of code run in separate
> stream(s)[1]".
Streams sounds good.

>
> "Blocks of code", for now, is a rather abstract concept, to be discussed
> later.
>
>> A user wants to run netperf on 2 machines, which requires following
>> manual steps:
>>
>>      machine1: netserver -D
>>      machine1: # Wait till netserver is initialized
>>      machine2: netperf -H $machine1 -l 60
>>      machine2: # Wait till it finishes and report store the results
>>      machine1: # stop the netserver and report possible failures
>>
>
> Using the definition given above, all code run on prefixed with
> "machine1:" would be one execution stream and and all code prefixed with
> "machine2:" would be a second stream.
>
> The test itself would be a single entity, composed of its own code in
> addition the code to be run on machine1 and machine2, as covered before.
yep

>
>> Another use-cases might be:
>>
>
> Using the same definition, these use cases would become:
>
>> 1. triggering several un-related tests in parallel
>
> "Running blocks of code in parallel".
>
>> 2. triggering several tests in parallel with synchronization
>
> "Running blocks of code in parallel with synchronization".
>
>> 3. spreading several tests into multiple machines
>
> "Running blocks of code in multiple external machines". A point here:
> this could mean either sequentially or in parallel.
>
>> 4. triggering several various tests on multiple machines
>
> "Running varied blocks of code in multiple external machines".
>
>>
>> The problem is not only about running tests on multiple machines, but
>> generally about ways to trigger tests/set of tests in whatever way the
>> user needs to.
>>
>
> Based on the definition given, "running tests on multiple machines" is
> not the *direct* scope of this RFC. Running *tests* um multiple machines
> (either sequentially or in parallel) could be the scope of a "multi host
> *job*" RFC, that is a Job that encompass tests that would be run into
> multiple different machines. In such a (multi host) job, there would be
> a 1:N relationship between a job and machines, and a 1:1 relationship
> between a test and a machine.
>
Probably yes. For clarification the relation of job:machines:streams 
would be 1:1:N on multi-test tests and 1:N:M (N<=M) for multi-host-tests.

>>
>> Running the tests
>> =================
>>
>> In v1 we rejected the idea to run custom code from inside the tests in
>> bacground as it requires implementing the remote-tests again and again
>> and we decided that executing full tests or set of tests with support
>> for remote synchronization/data exchange is the way to go. There were
>> two-three bigger categories so let's describe each so we can pick the
>> most suitable one (at this moment).
>>
>
> My current understanding is that the approach of implementing the remote
> execution of code based on the execution of multiple "avocado"
> instances, with different command line options to reflect the multiple
> executions was abandoned.
>
I understood we rejected the idea of running functions/methods as the 
executed blocks, because it does not allow easy share of those segment.

>> For demonstration purposes I'll be writing very simple multi-host test
>> which triggers on 3 machines "/usr/bin/wget example.org" to simulate
>> very basic stress tests.
>>
>
> Again revising all statements in light of the definition given before,
> this clearly means one single test, with the same "block of code"
> (/usr/bin/wget example.org) to be executed on 3 machines.
>
>> Synchronization and parametrization will not be covered in this section
>> as synchronization will be described in the next chapter and is the same
>> for all solutions and parametrization is a standard avocado feature.
>>
>>
>> Internal API
>> ------------
>>
>> One of the ways to allow people to trigger tests and set of tests (jobs)
>> from inside test is to pick the minimal required set of internal API
>> which handles remote job execution, make it public (and supported) and
>> refactor it so it can be realistically called from inside test.
>>
>> Example (pseudocode)
>>
>>      class WgetExample(avocado.Test):
>>          jobs = []
>>          for i, machine in enumerate(["127.0.0.1", "192.168.122.2",
>>                                       "192.168.122.3"]):
>>              jobs.append(avocado.Job(urls=["/usr/bin/wget example.org"],
>>                                      remote_machine=machine,
>>                                      logdir=os.path.join(self.logdir,
>>                                                          i)))
>
> The example given may lead a reader into thinking the problem that is
> attempted to be solved here is one of remote execution of commands. So'
> let's just remind ourselves that the problem at stake, IMHO, is:
>
> "Allow tests to have some of its blocks of code run in separate stream(s)".
>
Yep, we could use ["127.0.0.1", "127.0.0.1", "127.0.0.1"]. This example 
is only trying to be generic enough to describe the minimal API.

>>          for job in jobs:
>>              job.run_background()
>>          errors = []
>>          for i, job in enumerate(jobs):
>>              result = job.wait()     # returns json results
>>              if result["pass"] != result["total"]:
>>                  errors.append("Tests on worker %s (%s) failed"
>>                                % (i, machines[i]))
>>          if errors:
>>              self.fail("Some workers failed:\n%s" % "\n".join(errors))
>>
>
> This example defines a "code block" unit as an Avocado Job. So,
> essentially, using the previous definition I gave, the suggestion would
> be to translated to:
>
> "Allow Avocado tests to have Avocado Jobs run in separate stream(s)"
>
> The most striking aspect of this example is of course the use of an
> Avocado Job inside an Avocado Test. An Avocado Job, by definition and
> implementation, is a "logical container" for tests. Having a *test*
> firing *jobs* as part of the official solution to crosses the layers we
> defined and designed ourselves.
>
> Given that an Avocado Job includes most of the functionality of Avocado
> (as a whole), too many questions can be raised with regards to what
> aspects of these (intra test) Jobs are to be supported.
>
> To summarize it, I'm skeptical that an Avocado Job should be the "code
> block" unit for problem at hand.
>
Yes, this is correct. As mentioned below, we can avoid using job by 
using Loader+RemoteTestRunner+RemoteResults+Multiplexer to achieve this 
only for single tests.

As I wrote and you supported me in this, solving it this way makes it 
hard to distinguish what of the set of features avocado support is 
supported in the "nested" job.

>> alternatively even require the user to define the whole workflow:
>>
>> 1. discover test (loader)
>> 2. add params/variants
>> 3. setup remote execution (RemoteTestRunner)
>> 4. setup results (RemoteResults)
>>
>> which would require even more internal API to be turned public.
>>
>> + easy to develop, we simply identify set of classes and make them public
>> - hard to maintain as the API would have to stay stable, therefor
>> realistically it requires big cleanup before doing this step
>>
>> Multi-tests API
>> --------------
>>
>> To avoid the need to make the API which drives testing public, we can
>> also introduce an API to trigger jobs/set of jobs. It would be sort of
>> proxy between internal API, which can and changes more-often an the
>> public multi-host API which would be supported and kept stable.
>>
>> I see two basic backends supporting this API, but they both share the
>> same public API.
>>
>> Example (pseudocode)
>>
>>      class WgetExample(avocado.MultiTest):
>>          for machine in ["127.0.0.1", "192.168.122.2", "192.168.122.3"]):
>>              self.add_worker(machine)
>>          for worker in self.workers:
>>              worker.add_test("/usr/bin/wget example.org")
>>          #self.start()
>>          #results = self.wait()
>>          #if results["failures"]:
>>          #    self.fail(results["failures"])
>>          self.run()  # does the above
>>
>
> I have hopes, maybe naive ones, that regular Avocado tests can have some
> of its code blocks run on different streams with the aid of some APIs.
> What I mean is that I Avocado would not *require* a specialized class
> for that.
>
Currently we're talking about ~(5-10) methods. I don't like polluting of 
the `avocado.Test` class, that's why I choose `avocado.MultiTest` 
instead. But we can talk about the options:

* `avocado.MultiTest.*` - inherited from `avocado.Test`, adding some 
helpers to create and feed the streams. (my favorited)
* `avocado.Test.avocado.*` - If you remember I proposed moving 
non-critical methods from `Test` to `Test.avocado`. This could be a 1st 
class citizen there.
* `avocado.Test.avocado.multi.*` - the same as above but the multi-API 
would be inside `multi` object.
* `avocado.Test.multi` - the multi-API would be part of the main Test, 
but inside the `multi` object
* `avocado.Test.*` - I'd not support this as we're extending the main 
interface of yet another bunch of methods many people are not interested 
in at all.

Note: Imagine whatever keyword instead of `multi` like (streams, 
workers, multihost, nested, ...)

>> The basic set of API should contain:
>>
>> * MultiTest.workers - list of defined workers
>> * MultiTest.add_worker(machine="localhost") - to add new sub-job
>> * MultiTest.run(timeout=None) - to start all workers, wait for results
>> and fail the current test if any of the workers reported failure
>> * MultiTest.start() - start testing in background (allow this test to
>> monitor or interact with the workers)
>> * MultiTest.wait(timeout=None) - wait till all workers finish
>> * Worker.add_test(url) - add test to be executed
>> * Worker.add_tests(urls) - add list of tests to be executed
>> * Worker.abort() - abort the execution
>>
>> I didn't wanted to talk about params but they are essential for
>> multi-tests. I think we should allow passing default params for all
>> tests:
>>
>> * Worker.params(params) - where params should be in any supported format
>> by Test class (currently AvocadoParams or dict)
>>
>> or per test during "add_test":
>>
>> * Worker.add_test(url, params=None) - again, params should be any
>> supported format (currently only possible via internal API, but even
>> without multi-tests I'm fighting for such support on the command line)
>>
>> Another option could be to allow supplying all "test" arguments using
>> **kwargs inside the "add_test":
>>
>> * Worker.add_test(url, **kwargs=None) -> discover_url and override test
>> arguments if provided (currently only possible via internal API,
>> probably never possible on the command line, but the arguments are
>> methodName, name, params, base_logdir, tag, job, runner_queue and I
>> don't see a value in overriding any them but the params)
>>
>
> This example now suggests an Avocado Test as the "code block" unit. That
> is, the problem definition would translate roughly to:
>
> "Allow Avocado tests to have Avocado tests run in separate stream(s)"
>
> We now have a smaller "code block" unit, but still one that can make the
> design and definitions a little confusing at first. Questions that
> immediately arise:
>
> Q) Is a test the smaller "code block" unit we could use?
> A) Definitely not. We could use single code statements, a function or a
> ("builtin" object inherited) class.
>
> Q) Is it common to have "existing tests" run as part of "multi host" tests?
> A) Pretty common (think of benchmark tests).
>
> Q) Is there value in letting developers keep the same development flow
> and use the same test APIs for the "code blocks"?
> A) IMHO, yes.
>
> Q) Should a test (which is not a container) hold test results?
> A) IMHO, no.
I disagree with that. In case of failure it's sometimes better to see 
the combined results, but sometimes it's better to dig deeper and see 
each test in separate (as people designed some tests to be executed 
alone and then combined them. They are used to their results).

>
> My understanding is that it is still possible to keep the design
> predictable by setting an "Avocado Test" as code block. One more aspect
> that supports this view is that there's consensus and ongoing work to
> make an Avocado Test slimmer, and remove from it the machinery needed to
> make/help tests run.
Yep I agree with that and as I mentioned during our mini-meeting I 
thinks something like standalone execution could be the way to go (not 
really standalone, but very striped out execution of the test with 
machine-readable results).

>
> This way, from the Avocado design and results perspective we'd still have:
>
> [Job [test #1] [test #2] [...] [test #n]]
>
> And from the developer perspective, we'd have:
>
>      from avocado_misc_tests.perf.stress import Stress
>      class StressVMOnHost(avocado.Test):
>          def test(self):
>              ...
>              worker1.add(Stress)
>              worker2.add(Stress)
>              ...
>              if not require_pass(worker1, worker2):
>                  self.fail(reason)
Worth mention what the worker (stream) supports and reports. In my head 
it allows multiple tests execution (in sequence) so the same way it 
should report list of the results. Also I think PASS/FAIL is not really 
a sufficient result, it should report list of json results of all added 
tests.

The function `require_pass` would then go through the list and check if 
all statuses are `PASS/WARN`, but some users might add custom logic.

>
> One alternative approach would be to formally introduce the concept of
> "code blocks", and allow Avocado Tests to be used as such:
>
>      from avocado_misc_tests.perf.stress import Stress
>      class StressVMOnHost(avocado.Test):
>          def test(self):
>              code_block = code_block_from_test(Stress)
>              ...
>              worker1.add(code_block)
>              worker2.add(code_block)
>              ...
>              if not require_success(worker1, worker2):
>                  self.fail(reason)
>
I'm not sure what the `Stress` is. In my vision we should support the 
test `urls` (so the API discovers it using loaders). The problem is if 
users provide url which spawns into multiple tests. Should we run them 
sequentially? Should we fail? What do we report?

>
>>
>> API backed by internal API
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> This would implement the multi-test API using the internal API (from
>> avocado.core).
>>
>> + runs native python
>> + easy interaction and development
>> + easily extensible by either using internal API (and risk changes) or
>> by inheriting and extending the features.
>> - lots of internal API will be involved, thus with almost every change
>> of internal API we'd have to adjust this code to keep the MultiTest
>> working
>> - fabric/paramiko is not thread/parallel process safe and fails badly so
>> first we'd have to rewrite our remote execution code (use autotest's
>> worker, or aexpect+ssh)
>>
>
> Even with listed challenges (and certainly more to come), this is the
> the way to go.
>
I got to this point too, although IMO it requires more work than the 
cmdline-backed API. The deal-breaker for me is the support to add tests 
during execution and more-flow control.

>>
>> API backed by cmdline
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> This would implement the multi-test API by translating it into "avocado
>> run" commands during "self.start()".
>>
>> + easy to debug as users are used to the "avocado run" syntax and issues
>> + allows manual mode where users trigger the "avocado run" manually
>> + cmdline args are part of public API so they should stay stable
>> + no issues with fabric/paramiko as each process is separate
>> + even easier extensible as one just needs to implement the feature for
>> "avocado run" and then can use it as extra_params in the worker, or send
>> PR to support it in the stable environment.
>> - only features available on the cmdline can be supported (currently not
>> limiting)
>> - rely on stdout parsing (but avocado supports machine readable output)
>>
>
> I wholeheartedly disagree with this implementation suggestion. Some
> reasons where given on the previous RFC version response.
Yes I know, but I'm still a bit fond of this version (but as mentioned 
earlier I'm more inclined to the internal-backed API). The reasons are 
that __ALL__ cmdline options are supported and should be stable. That 
means users could actually pass any "extra_params" matching for their 
custom plugins and have them supported across versions. Doing the same 
for internal API would require further modifications as the internal of 
the multi API would be non-public API, therefor it could be changing all 
the time.

>
>>
>> Synchronization
>> ===============
>>
>> Some tests does not need any synchronization, users just need to run
>> them. But some multi-tests needs to be synchronized or they need to
>> exchange data. For synchronization usually "barriers" are used, where
>> barrier requires a "name" and "number of clients". One requests entry
>> into barrier guarded section, it's interrupted until "number of clients"
>> are waiting for it (or timeout is reached).
>>
>> To do so the test needs and IP address+port where the synchronization
>> server is listening. We can start this from the multi-test and only
>> support it this way:
>>
>>      self.sync_server.start(addr=None, port=None)  # start listening
>>      self.sync_server.stop()    # stop listening
>>      self.sync_server.details   # contact information to be used by
>> workers
>>
>> Alternatively we might even support this on the command line to allow
>> manual execution:
>>
>>      --sync-server [addr[:port]] - listen on addr:port (pick one by
>> default)
>>      --sync addr:port - when barrier/data exchange is used, use
>> addr:port to contact sync server.
>>
>> The  cmdline argument would allow manual executions, for example for
>> testing purposes or execution inside custom build systems (jenkins,
>> beaker, ...) without the multi-test support.
>>
>> The result is the same, avocado listens on some port and the spawned
>> workers connect to this port, identify themselves and ask for
>> barriers/data exchange, with the support for re-connection. To do so we
>> have various possibilities:
>>
>> Standard multiprocess API
>> -------------------------
>>
>> The standard python's multiprocessing library contains over the TCP
>> synchronization. The only problem is that "barriers" were introduced in
>> python3 so we'd have to backport it and it does not fit all our needs so
>> we'd have to tweak it a bit.
>>
>>
>> Autotest's syncdata
>> -------------------
>>
>> Python 2.4 friendly, supports barriers and data synchronization. On the
>> contrary it's quite hackish and full of shortcuts.
>>
>>
>> Custom code
>> -----------
>>
>> We can inspire by the above and create simple human-readable (easy to
>> debug or interact with manually) protocol to support barriers and data
>> exchange via pickling. IMO that would be easier to maintain than
>> backporting and adjusting of the multiprocessing or fixing the autotest
>> syncdata. A proof-of-concept can be found here:
>>
>>      https://github.com/avocado-framework/avocado/pull/1019
>>
>> It modifies the "passtest" to be only executed when it's executed by 2
>> workers at the same time. It does not support the multi-tests yet, so
>> one has to run "avocado run passtest" twice using the same
>> "--sync-server" (once --sync-server and once --sync).
>>
>>
>> Conclusion
>> ==========
>>
>> Given the reasons I like the idea of "API backed by cmdline" as all
>> cmdline options are stable, the output is machine readable and known to
>> users so easily to debug manually.
>>
>> For synchronization that requires the "--sync" and "--sync-server"
>> arguments to be present, also not necessarily used when the users uses
>> the multi-test (the multi-test can start the the server if not already
>> started and add "--sync" for each worker if not provided).
>>
>> The netperf example from introduction would look like this:
>>
>> The client tests are ordinary "avocado.Test" tests that can even be
>> executed manually without any synchronization (by providing no_client=1)
>>
>>      class NetServer(avocado.Test):
>>          def setUp(self):
>>              process.run("netserver")
>>              self.barrier("setup", params.get("no_clients"))
>>          def test(self):
>>              pass
>>          def tearDown(self):
>>              self.barrier("finished", params.get("no_clients"))
>>              process.run("killall netserver")
>>
>>      class NetPerf(avocado.Test):
>>          def setUp(self):
>>              self.barrier("setup", params.get("no_clients"))
>>          def test(self):
>>              process.run("netperf -H %s -l 60"
>>                          % params.get("server_ip"))
>>              barrier("finished", params.get("no_clients"))
>>
>> One would be able to run this manually (or from build systems) using:
>>
>>      avocado run NetServer --sync-server $IP:12345 &
>>      avocado run NetPerf --sync $IP:12345 &
>>
>> (one would have to hardcode or provide the "no_clients" and "server_ip"
>> params on the cmdline)
>>
>> and the NetPerf would wait till NetServer is initialized, then it'd run
>> the test while NetServer would wait till it finishes. For some users
>> this is sufficient, but let's add the multi-test test to get a single
>> results (pseudo code):
>>
>>      class MultiNetperf(avocado.MultiTest):
>>          machines = params.get("machines")
>>          assert len(machines) > 1
>>          for machine in params.get("machines"):
>>              self.add_worker(machine, sync=True)     # enable sync server
>>          self.workers[0].add_test("NetServer")
>>          self.workers[0].set_params({"no_clients": len(self.workers)})
>>          for worker in self.workers[1:]:
>>              worker.add_test("NetPerf")
>>              worker.set_parmas({"no_clients": len(self.workers),
>>                                 "server_ip": machines[0]})
>>          self.run()
>>
>> Running:
>>
>>      avocado run MultiNetperf
>>
>> would run a single test, which based on the params given to the test
>> would run on several machines using the first machine as server and the
>> rest as clients and all of them would start at the same time.
>>
>> It'd produce a single results with one test id and following structure
>> (example):
>>
>>
>>      $ tree $RESULTDIR
>>        └── test-results
>>            └── simple.mht
>
> As you pointed out during our chat, the suffices ".mht" was not intended
> here.
>
I'm sorry, it was copy&paste mistake. It's just a test name, so imagine 
"MultiNetperf" instead.

>>                ├── job.log
>>                    ...
>>                ├── 1
>>                │   └── job.log
>>                        ...
>>                └── 2
>>                    └── job.log
>>                        ...
>>
>
> Getting back to the definitions that were laid out, I revised my
> understanding and now I believe/suggest that we should have a single
> "job.log" per job.
>
As mentioned earlier I disagree with this. I think we need include 
per-stream results too. Not necessarily with all the job info. So 
updated example for the MultiNetperf would be:

     job-2016-04-01T13.19-795dad3
     ├── job.log
     └── test-results
         └── netperf.NetPerf.test
             ├── debug.log
             ├── stream1
             │   ├── 000_SystemInfo
             │   └── 001_NetServer.log
             └── stream2
                 ├── 000_SystemInfo
                 └── 001_NetPerf.log

Where:

* job.log contains job log
* debug.log contains logs from the MultiNetperf as well as outputs of 
stream1 and stream2 as they happen (if possible)
* 000_SystemInfo contains system info of the worker (could be either 
directory as we know from test, or simplified sys-info)
* \d+_$name - contains the output of the individual executed "code 
blocks" per code-block.

We could even allow people to name the streams, but that's just a detail.

>> where 1 and 2 are the results of worker 1 and worker 2. For all of the
>> solution proposed those would give the user the standard results as they
>> know them from normal avocado executions, each with a unique id, which
>> should help analyzing and debugging the results.
>
> [1] - Using "streams" instead of "threads" to reduce confusion with the
> classical multi-processing pattern of threaded programming and the OS
> features that support the same pattern. That being said, "threads" could
> be one type of execution "stream" supported by Avocado, albeit it's not
> a primary development target for various reasons, including the good
> support for threads already present in the underlying Python standard
> library.
>