[Avocado-devel] RFC: Multi-host tests

Tue Mar 29 18:25:22 UTC 2016

On 03/29/2016 04:11 AM, Lukáš Doktor wrote:
> Dne 28.3.2016 v 21:49 Cleber Rosa napsal(a):
>>
>>
>> ----- Original Message -----
>>> From: "Cleber Rosa" <crosa at redhat.com>
>>> To: "Lukáš Doktor" <ldoktor at redhat.com>
>>> Cc: "Amador Pahim" <apahim at redhat.com>, "avocado-devel" 
>>> <avocado-devel at redhat.com>, "Ademar Reis" <areis at redhat.com>
>>> Sent: Monday, March 28, 2016 4:44:15 PM
>>> Subject: Re: [Avocado-devel] RFC: Multi-host tests
>>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Lukáš Doktor" <ldoktor at redhat.com>
>>>> To: "Ademar Reis" <areis at redhat.com>, "Cleber Rosa" 
>>>> <crosa at redhat.com>,
>>>> "Amador Pahim" <apahim at redhat.com>, "Lucas
>>>> Meneghel Rodrigues" <lookkas at gmail.com>, "avocado-devel"
>>>> <avocado-devel at redhat.com>
>>>> Sent: Saturday, March 26, 2016 4:01:15 PM
>>>> Subject: RFC: Multi-host tests
>>>>
>>>> Hello guys,
>>>>
>>>> Let's open a discussion regarding the multi-host tests for avocado.
>>>>
>>>> The problem
>>>> ===========
>>>>
>>>> A user wants to run netperf on 2 machines. To do it manually he does:
>>>>
>>>>       machine1: netserver -D
>>>>       machine1: # Wait till netserver is initialized
>>>>       machine2: netperf -H $machine1 -l 60
>>>>       machine2: # Wait till it finishes and report store the results
>>>>       machine1: # stop the netserver and report possible failures
>>>>
>>>> Now how to support this in avocado, ideally as custom tests, ideally
>>>> even with broken connections/reboots?
>>>>
>>>>
>>>> Super tests
>>>> ===========
>>>>
>>>> We don't need to do anything and leave everything on the user. He is
>>>> free to write code like:
>>>>
>>>>       ...
>>>>       machine1 = aexpect.ShellSession("ssh $machine1")
>>>>       machine2 = aexpect.ShellSession("ssh $machine2")
>>>>       machine1.sendline("netserver -D")
>>>>       # wait till the netserver starts
>>>>       machine1.read_until_any_line_matches(["Starting netserver"], 60)
>>>>       output = machine2.cmd_output("netperf -H $machine1 -l 
>>>> $duration")
>>>>       # interrupt the netserver
>>>>       machine1.sendline("\03")
>>>>       # verify netserver finished
>>>>       machine1.cmd("true")
>>>>       ...
>>>>
>>>> the problem is it requires active connection and the user needs to
>>>> manually handle the results.
>>>
>>> And of course the biggest problem here is that it doesn't solve the
>>> Avocado problem: providing a framework and tools for tests that span
>>> multiple (Avocado) execution threads, possibly on multiple hosts.
>>>
> Well it does, each "ShellSession" is a new parallel process. The only 
> problem I have with this design is that it does not allow easy code 
> reuse and the results strictly depend on the test writer.
>

Yes, *aexpect* allows parallel execution in an asynchronous fashion. Not 
targeted to tests *at all*. Avocado, as a test framework, should deliver 
more. Repeating the previous wording, it should be "providing a 
framework and tools for tests that span multiple (Avocado) execution 
threads, possibly on multiple hosts."

>>>>
>>>>
>>>> Triggered simple tests
>>>> ======================
>>>>
>>>> Alternatively we can say each machine/worker is nothing but yet 
>>>> another
>>>> test, which occasionally needs a synchronization or data-exchange. The
>>>> same example would look like this:
>>>>
>>>> machine1.py:
>>>>
>>>>      process.run("netserver")
>>>>      barrier("server-started", 2)
>>>>      barrier("test-finished", 2)
>>>>      process.run("killall netserver")
>>>>
>>>> machine2.py:
>>>>
>>>>       barrier("server-started", 2)
>>>>       self.log.debug(process.run("netperf -H %s -l 60"
>>>>                                  % params.get("server_ip"))
>>>>       barrier("test-finished", 2)
>>>>
>>>> where "barrier(name, no_clients)" is a framework function which makes
>>>> the process wait till the specified number of processes are waiting 
>>>> for
>>>> the same barrier.
>>>
>>> The barrier mechanism looks like an appropriate and useful utility 
>>> for the
>>> example given.  Even though your use case example explicitly 
>>> requires it,
>>> it's worth pointing out and keeping in mind that there may be valid 
>>> use cases
>>> which won't require any kind of synchronization.  This may even be 
>>> true to
>>> the executions of tests that spawn multiple *local* "Avocado runs".
>>>
> Absolutely, this would actually allow Julio to run his "Parallel 
> (clustered) testing".

So, let's try to identify what we're really looking for. For both the 
use case I mentioned and Julio's "Parallel (clustered) testing", we need 
a (the same) test run by multiple *runners*. A runner in this context is 
something that implements the `TestRunner` interface, such as the 
`RemoteTestRunner`:

https://github.com/avocado-framework/avocado/blob/master/avocado/core/remote/runner.py#L37

The following (pseudo) Avocado Test could be written:

from avocado import Test

# These are currently private APIs that could/should be or

# be exposed under another level. Also, the current API is

# very different from what is used here, please take it as

# pseudo code that might look like a future implementation

from avocado.core.remote.runner import RemoteTestRunner

from avocado.core.runner import run_multi

from avocado.core.resolver import TestResolver

from avocado.utils.wait import wait_for

class Multi(Test):

     def test(self):

         worker1 = RemoteTestRunner('worker1')

         worker2 = RemoteTestRunner('worker2')

         # Resolve a local test to send it to be run on multiple machines

         test = TestResolver().resolve('bonnie.py')

         # run_multi is asynchronous, and results can be queried about its status

         results = run_multi([worker1, worker2], test)

         wait_for(results.finished, self.timeout)

         # combine remote whiteboard (with performance results) keyed by worker name

         whiteboard = {}

         for worker_result in results:

             whiteboard[worker_result.name] = worker_result.whiteboard

         self.whiteboard = whiteboard

If any kind of synchronization was necessary between workers, the 
barrier utility library could be used, maybe even
transparently as part of "run_multi". Parameter passing to tests is also 
a layered issue. My point is that this seems to
include the features needed to allow "tests that span multiple machines".

Does it look reasonable?

A lot of sugar coating can (and should) be added on top. Creating 
workers automatically, having a superclass for tests
that span multiple machines, plugins that take worker names directly 
from command line options and what not are
likely natural additions.

>
>>>>
>>>> The barrier needs to know which server to use for communication so we
>>>> can either create a new service, or simply use one of the 
>>>> executions as
>>>> "server" and make both processes use it for data exchange. So to 
>>>> run the
>>>> above tests the user would have to execute 2 avocado commands:
>>>>
>>>>       avocado run machine1.py --sync-server machine1:6547
>>>>       avocado run machine2.py --remote-hostname machine2 --mux-inject
>>>> server_ip:machine1 --sync machine1:6547
>>>>
>>>> where:
>>>>       --sync-server tells avocado to listen on ip address machine1 
>>>> port 6547
>>>>       --remote-hostname tells the avocado to run remotely on machine2
>>>>       --mux-inject adds the "server_ip" into params
>>>>       --sync tells the second avocado to connect to machine1:6547 for
>>>> synchronization
>>>
>>> To be honest, apart from the barrier utility, this provides little 
>>> value
>>> from the PoV of a *test framework*, and possibly unintentionally, 
>>> competes
>>> and overlaps with "remote" tools such as fabric.
>>>
>>> Also, given that the multiplexer is an optional Avocado feature, such
>>> a feature should not depend on it.
> It does not, these are only used to demonstrate this particular 
> feature. You can hardcode the values in the tests, you can use 
> env-variables or any other feature.
>
> Basically this "mht" format is nothing more, than list of "avocado 
> run" commands to be executed in parallel and it's focus was on 
> simplicity, maybe even only for demonstration purposes.
>
>>>
>>>>
>>>> Running those two tests has only one benefit compare to the previous
>>>> solution and that is it gathers the results independently and makes
>>>> allows one to re-use simple tests. For example you can create a 3rd
>>>> test, which uses different params for netperf, run it on "machine2" 
>>>> and
>>>> keep the same script for "machine1". Or running 2 netperf senders 
>>>> at the
>>>> same time. This would require libraries and more custom code when 
>>>> using
>>>> "Super test" approach.
>>>>
>>>> There are additional benefits for this solution. When we introduce the
>>>> locking API, tests running on a remote machine will be actually 
>>>> directly
>>>> executed in avocado, therefor the locking API will work for them,
>>>> avoiding problems with multiple tests using the same shared resource.
>>>>
>>>> Another future benefit would be system reboot/lost connection when we
>>>> introduce this support for individual tests. The way it'd work is that
>>>> user triggers the jobs, the master remembers the test ids and would 
>>>> poll
>>>> for results until they finish/timeout.
>>>>
>>>> All of this we get for free thanks to re-using the existing
>>>> infrastructure (or the future infrastructure), so I believe this is 
>>>> the
>>>> right way to go and in this RFC I'm describing details of this 
>>>> approach.
>>>>
>>>
>>> All of the benefits listed are directly based on the fact that tests on
>>> remote systems would be run under the Avocado test runner and would 
>>> have
>>> it's runtime libraries available.  This is a valid point, but again it
>>> doesn't bring a significant change in the user experience wrt running
>>> tests that span multiple "Avocado runs" (possibly on remote machines).
>>>
> Basically this is the key part of this RFC. I like the idea of running 
> avocado processes for each test, instead of yet another remote 
> execution handling. The biggest benefit are the test results in well 
> known format and the possibility to run/combine all the tests 
> supported by avocado.
>
> Actually I have avocado-in-avocado script in my CI testing, it just 
> waits for the long-names fix to be applied as it generates too long 
> test names. But I tested it with the fix and the results are very nice 
> and easy to analyze as you simply go through results you know from 
> simple testing.
>
>>>>
>>>> Triggering the jobs
>>>> -------------------
>>>>
>>>> Previous example required the user to run the avocado 2 times (per 
>>>> each
>>>> machine) and sharing the same sync server. Additionally it resulted 
>>>> into
>>>> 2 separated results. Let's try to eliminate this problem.
>>>>
>>>>
>>>> Basic tests
>>>> ~~~~~~~~~~~
>>>>
>>>> For basic setups, we can come up with very simple format to describe
>>>> which tests should be triggered and avocado should take care of
>>>> executing it. The way I have in my mind is to simply accept list of
>>>> "avocado run" commands:
>>>>
>>>> simple_multi_host.mht:
>>>>
>>>>       machine1.py
>>>>       machine2.py --remote-hostname machine2 --mux-inject 
>>>> server_ip:machine1
>>>>
>>>> Running this test:
>>>>
>>>>       avocado run simple_multi_host.mht --sync-server 0.0.0.0
>>>>
>>>> avocado would pick a free port and start the sync server on it. 
>>>> Then it
>>>> would prepend "avocado run" and append "--sync $sync-server
>>>> --job-results-dir $this-job-results" to each line in
>>>> "simple_multi_host.mht" and run them in parallel. Afterward it'd wait
>>>> till both processes finish and report pass/fail depending on the 
>>>> status.
>>>>
>>>> This way users get overall results as well as individual ones and 
>>>> simple
>>>> way to define static setups.
>>>>
>>>
>>> First, the given usage example would require Avocado to introduce:
>>>
>>>   * A brand new file format
>>>   * A new test type (say MULTI_HOST_TEST, in addition to the SIMPLE,
>>>     INSTRUMENTED, etc).
>>>
>>> Introducing a brand new file format may look like a very simple thing
>>> to do, but it's not.  I can predict that we'd learn very quickly that
>>> our original file format definition is very limited.  Then we'd either
>>> have to live with that, or introduce new file format versions, or just
>>> break the initial definition or compatibility.  These are all problems
>>> related to file formats, not really to your proposed file format.
>>>
>>> Then, analogous to the "remote tools (fabric)" example I gave before,
>>> this looks to be outside of the problem scope of Avocado, in the sense
>>> that "template" tools can do it better.
>>>
>>> Introducing a new test type, and a test resolver/loader, would be a
>>> mandatory step to achieve this design, but it looks like a necessary
>>> action only to make the use of "MHT" file format possible.
>>>
>>> Please note that having a design that allow users to fire multiple
>>> Avocado command line instances executions in their own scripts is a bad
>>> thing, but as a test framework, I believe we can deliver a better, more
>>> focused experience.
>>
>> I meant "is *not* a bad thing".
>>
> I think you have a point here. My idea was to support new-line 
> separated list of avocado executions as a simple wrapper to run 
> processes in parallel as it's very simple to develop and it's not 
> promising anything. It simply takes whatever you hand it over, spawns 
> multiple processes and gives you results.
>
> Then to add some value I added the --sync handling as it's one 
> problematic thing. Basically it can be written in a generic way, but I 
> see your point with hard-to-debug failures or unexpected behavior.
>
> It was meant to be a very simple and easy to understand way to promote 
> multi-host-testing but it can as well become very painful thing if 
> people start relying on it. So maybe we should only introduce the real 
> thing below.
>
>>>
>>>>
>>>> Contrib scripts
>>>> ~~~~~~~~~~~~~~~
>>>>
>>>> The beauty of executing simple lines is, that users might create 
>>>> contrib
>>>> scripts to generate the "mht" files to get even better flexibility.
>>>
>>> Since I don't think a new file format and test type is a good thing, 
>>> this
>>> also becomes a bad idea IMHO.
>>>
>>>>
>>>>
>>>> Advanced tests
>>>> ~~~~~~~~~~~~~~
>>>>
>>>> The above might still not be flexible enough. But the system 
>>>> underneath
>>>> is very simple and flexible. So how about creating instrumented tests,
>>>> which generate the setup? The same simple example as before:
>>>>
>>>> multi_host.py
>>>>
>>>>       runners = ["machine1.py"]
>>>>       runners.append("machine2.py --remote-hostname machine2 
>>>> --mux-inject
>>>> server_ip:machine1")
>>>>       self.execute(runners)
>>>>
>>>
>>> A major plus here is that there's no attempt to define new file 
>>> formats,
>>> test types and other items that are necessary only to fulfill a use 
>>> case
>>> requirement.  Since Avocado's primary language of choice is Python, we
>>> should stick to it, given that it's expressive enough and well 
>>> maintained
>>> enough.  This is of course a lesson we learned with Autotest itself, 
>>> let's
>>> not forget it.
>>>
>>> Then, a couple of things I dislike here:
>>>
>>>   1) First runner is special/magical (sync server would be run here)
>>>   2) Interface with runner execution is done by command line parameters
>>>
> Well the 0-st runner is special (the one which executes the 
> multi-host-instrumented-test). It needs to listen on any free port and 
> pass this port to all executed tests (if they use barriers/sync).
>
> I'll talk about the 2nd point later....
>
>
>>>> where the "self.execute(tests)" would take the list and does the 
>>>> same as
>>>> for basic tests. Optionally it could return the json results per each
>>>> tests so the test itself can react and modify the results.
>>>>
>>>> The above was just a direct translation of the previous example, 
>>>> but to
>>>> demonstrate the real power of this let's try a PingPong multi host 
>>>> test:
>>>>
>>>>       class PingPong(MultiHostTest):
>>>>           def test(self):
>>>>               hosts = self.params.get("hosts", default="").split(";")
>>>>               assert len(hosts) >= 2
>>>>               runners = ["ping_pong --remote-hostname %s" % _
>>>>                               for _ in hosts]
>>>>               # Start creating multiplex tree interactively
>>>>               mux = MuxVariants("variants")
>>>>               # add /run/variants/ping with {} values
>>>>               mux.add("ping", {"url": hosts[1], "direction": "ping",
>>>>                                "barrier": "ping1"})
>>>>               # add /run/variants/pong with {} values
>>>>               mux.add("pong", {"url": hosts[-1], "direction": "pong",
>>>>                                "barrier": "ping%s" % len(hosts) + 1})
>>>>               # Append "--mux-inject mux-tree..." to the first command
>>>>               runners[0] += "--mux-inject %s" % mux.dump()
>>>>               for i in xrange(1, len(hosts)):
>>>>                   mux = MuxVariants("variants")
>>>>                   next_host = hosts[i+1 % len(hosts)]
>>>>                   prev_host = hosts[i-1]
>>>>                   mux.add("pong", {"url": prev_host, "direction": 
>>>> "pong",
>>>>                                    "barrier": "ping%s" % i})
>>>>                   mux.add("ping", {"url": next_host, "direction": 
>>>> "ping",
>>>>                                    "barrier": "ping%s" % i+1})
>>>>                   runners[i] += "--mux-inject %s" % mux.dump()
>>>>               # Now do the same magic as in basic multihost test on
>>>>               # the dynamically created scenario
>>>>               self.execute(runners)
>>>>
>>>> The `self.execute` generates the "simple test"-like list of "avocado
>>>> run" commands to be executed. But the test writer can define some
>>>> additional behavior. In this example it generates
>>>> machine1->machine2->...->machine1 chain of ping-pong tests.
>>>
>>> You mean that this would basically generate a "shell script like" list
>>> of avocado runs?  This looks to be a very strong design decision, and
>>> I fail to see how it would lend itself to be flexible enough and 
>>> deliver
>>> the "test writer can define some additional behavior" requirement.
>>>
> Explanation below...
>
>>>>
>>>> When running "avocado run pingpong --mux-inject 
>>>> hosts:machine1;machine2"
>>>> this generates 2 jobs, both running just a single "ping_pong" test 
>>>> with
>>>> 2 multiplex variants:
>>>>
>>>> machine1:
>>>>
>>>>       variants: !mux
>>>>           ping:
>>>>               url: machine2
>>>>               direction: pong
>>>>               barrier: ping1
>>>>           pong:
>>>>               url: machine2
>>>>               direction: pong
>>>>               barrier: ping2
>>>> machine2:
>>>>
>>>>       variants: !mux
>>>>           pong:
>>>>               url: machine1
>>>>               direction: pong
>>>>               barrier: ping1
>>>>           ping:
>>>>               url: machine1
>>>>               direction: ping
>>>>               barrier: ping2
>>>>
>>>> The first multiplex tree for three machines looks like this:
>>>>
>>>>       variants: !mux
>>>>           ping:
>>>>               url: machine2
>>>>               direction: pong
>>>>               barrier: ping1
>>>>           pong:
>>>>               url: machine3
>>>>               direction: pong
>>>>               barrier: ping
>>>>
>>>> Btw I simplified the format for the sake of this RFC. I think 
>>>> instead of
>>>> generating the strings we should support API to specify test,
>>>> multiplexer, options... and then turn them into the parallel executed
>>>> jobs (usually remotely). But these are just details to be solved if we
>>>> decide to work on it.
>>>
>>> This statement completely changes what you have proposed up to this 
>>> point.
>>>
>>> IMHO it's far from being just details, because that would define the 
>>> lowest
>>> and commonest level of this feature set that we would advertise and 
>>> support.
>>> The design should really be from this level up, and not from the 
>>> opposite
>>> direction.
>>>
>>> If external users want to define file formats (say your own MHT 
>>> proposal) on
>>> top of our "framework for running tests that span multiple execution 
>>> threads"
>>> at once, they should be able to do so.
>>>
>>> If you ask me, having sound Avocado APIs that users could use to fire
>>> multiple
>>> portions of their *tests* at once and have their *results* coalesced 
>>> into a
>>> single
>>> *test* result is about what Avocado should focus on.
> And this was suppose to be the answer. In the end yes, I think it 
> should generate the "avocado run" command with result-dir based inside 
> this test's results. The reason is it gives you the results you know 
> per each worker and they can run independently (survive the network 
> issues, system reboots when we add the support for it in avocado)
>

This would really be an implementation detail of the chosen runner. The 
current remote runner actually run Avocado, but that's its 
(RemoteTestRunner) own internal design decision. It does have a lot of 
pluses, but that is not the focus of this conversation. Another runner, 
say, ThinTestRunner, could choose to do things differently.

Having said that, I completely agree that we should, unless proven 
wrong, reuse the RemoteTestRunner for multi-host tests.

> The alternative is to create a client worker, which executes code on 
> demand, but that's more complex and it'd double the effort if we 
> decide to support system reboots/connection issues.

Agreed. Having an agent/broker on the remote side does not seem to be 
necessary or beneficial at this point.

>
> What this paragraph was about is that it should not probably directly 
> generate the arguments, but we should define an API which adds 
> individual pieces of information and is translated into the command at 
> the end.
>
> I decided not to go into details here as I thought it's better to 
> focus on part1 (--sync --sync-server) which already has a proof of 
> concept version out there. Then I wanted to create the "mht" file, 
> which would demonstrate how the results could look like, and how it 
> all goes together and when we have those results and issues, we can 
> introduce the instrumented-test API which would evolve from the 
> real-world issues.
>
>>>
>>>>
>>>>
>>>> Results and the UI
>>>> ==================
>>>>
>>>> The idea is, that the user is free to run the jobs separately, or to
>>>> define the setup in a "wrapper" job. The benefit of using the 
>>>> "wrapper"
>>>> job are the results in one place and the `--sync` handling.
>>>>
>>>> The difference is that running them individually looks like this:
>>>>
>>>>       1 | avocado run ping_pong --mux-inject url:192.168.1.58:6001
>>>> --sync-server
>>>>       1 | JOB ID     : 6057f4ea2c99c43670fd7d362eaab6801fa06a77
>>>>       1 | JOB LOG    :
>>>> /home/medic/avocado/job-results/job-2016-01-22T05.33-6057f4e/job.log
>>>>       1 | SYNC       : 0.0.0.0:6001
>>>>       1 | TESTS      : 1
>>>>       1 |  (1/1) ping_pong: \
>>>>       2 | avocado run ping_pong --mux-inject :url::6001 direction:pong
>>>> --sync 192.168.1.1:6001 --remote-host 192.168.1.1
>>>>       2 | JOB ID     : 6057f4ea2c99c43670fd7d362eaab6801fa06a77
>>>>       2 | JOB LOG    :
>>>> /home/medic/avocado/job-results/job-2016-01-22T05.33-6057f4e/job.log
>>>>       2 | TESTS      : 1
>>>>       2 |  (1/1) ping_pong: PASS
>>>>       1 |  (1/1) ping_pong: PASS
>>>>
>>>> and you have 2 results directories and 2 statuses. By running them
>>>> wrapped inside simple.mht test you get:
>>>>
>>>>       avocado run simple.mht --sync-server 192.168.122.1
>>>>       JOB ID     : 6057f4ea2c99c43670fd7d362eaab6801fa06a77
>>>>       JOB LOG    :
>>>> /home/medic/avocado/job-results/job-2016-01-22T05.33-6057f4e/job.log
>>>>       TESTS      : 1
>>>>        (1/1) simple.mht: PASS
>>>>       RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | 
>>>> INTERRUPT 0
>>>>       TIME       : 0.00 s
>>>>
>>>> And single results:
>>>>
>>>>       $ tree $RESULTDIR
>>>>
>>>>       └── test-results
>>>>           └── simple.mht
>>>>               ├── job.log
>>>>                   ...
>>>>               ├── 1
>>>>               │   └── job.log
>>>>                       ...
>>>>               └── 2
>>>>                   └── job.log
>>>>                       ...
>>>>
>>>>       tail -f job.log:
>>>>       running avocado run ping pong ping pong
>>>>       running avocado run pong ping pong ping --remote-hostname
>>>> 192.168.122.53
>>>>       waiting for processes to finish...
>>>>       PASS avocado run ping pong ping pong
>>>>       FAIL avocado run pong ping pong ping --remote-hostname 
>>>> 192.168.122.53
>>>>       this job FAILED
>>>>
>>>
>>> I won't spend much time here, since the UI is bound to follow other 
>>> design
>>> ideas/decisions.
>>>
> Sure, the important part here is the results format.
>
>>>>
>>>> Demonstration
>>>> =============
>>>>
>>>> While considering the design I developed a WIP example. You can 
>>>> find it
>>>> here:
>>>>
>>>>       https://github.com/avocado-framework/avocado/pull/1019
>>>>
>>>> It demonstrates the `Triggered simple tests` chapter without the
>>>> wrapping tests. Hopefully it helps you understand what I had in 
>>>> mind. It
>>>> contains modified "examples/tests/passtest.py" which requires 2
>>>> concurrent executions (for example if you want to test your server and
>>>> run multiple concurrent "wget" connections). Feel free to play with 
>>>> it,
>>>> change the number of connections, set different barriers, combine
>>>> multiple different tests...
>>>>
>>>>
>>>> Autotest
>>>> ========
>>>>
>>>> Avocado was developed by people familiar with Autotest, so let's just
>>>> mention here, that this method is not all that different from Autotest
>>>> one. The way Autotest supports parallel execution is it let's users to
>>>> create the "control" files inside the multi-host-control-file and then
>>>> run those in parallel. For synchronization it contains master->slave
>>>> barrier mechanism extended of SyncData to send pickled data to all
>>>> registered runners.
>>>>
>>>> I considered if we should re-use the code, but:
>>>>
>>>> 1. we do not support control files, so I just inspired by passing the
>>>> params to the remote instances
>>>
>>> One of the wonderful things about Autotest control files is that
>>> it's not a custom file format.  This can not be underestimated.  While
>>> other frameworks have had huge XML based file formats to drive their
>>> jobs, Autotest control files are infinitely more capable and their
>>> readability is a lot more scalable.
>>>
>>> The separation of client and server test types (and control files) is
>>> actually what prevents control files from nearing perfection IMHO.
> Yep
>
>>>
>>> The server API allows you to run client control files on given hosts.
>>> These client control files usually need tweaking for each host.  Then
>>> you're suddenly doing code generation (control files Python code). That
>>> is not nice.
> The tests I saw usually generated simple "runTest" with different 
> params. So what I'm proposing is actually similar, let's run avocado 
> and allow params passing.
>
>>>
>>> I believe that, if Avocado provides such an API that allows regular 
>>> Python
>>> code to operate similarly to server control files, while giving more 
>>> control
>>> and granularity to what is run on the individual job executions (say
>>> on remote machines), and help to coalesce the individual portions 
>>> into a
>>> single test result, it would be a very attractive tool.
> I think the multi-host test should only pick existing normal tests and 
> run the set of tests they need to perform the task using barriers to 
> synchronize it.
>
> Actually there is one thing which is significantly limiting the usage 
> and that's the multiplexer. I'd like to run:
>
> "avocado run boot migrate recievemigrate migrate recievemigrate 
> shutdown" tests and use different params for each tests. Currently 
> this is not possible and it's something I'd been proposing all the 
> time. (mapping params to individual tests).
>
> Anyway even without this mapping we can do all kinds of setups and 
> when we add such feature we can always start using it in 
> multi-host-testing as multi-host-testing is just triggering 
> avocado-jobs in terms of this RFC so all features available in avocado 
> are available to each worker in multi-host-testing.
>
> PS: The multiplexer is not needed for multi-host-tests, you're free to 
> hard-code the values inside tests or to use whatever way to tell the 
> test what it should do. The barriers are using the server from 
> "--sync" cmdline argument so the test is the only component which 
> might need to be parametric.

I will, on purpose, not explore the parameter passing problems until we 
are more or less on the same page about the bare bones of "running a 
test that span multiple machines". Then we can explore this optional but 
very important aspect.

>
>>>
>>>> 2. the barriers and syncdata are quite hackish, master->slave
>>>> communication. I think the described (and demonstrated) approach does
>>>> the same in a less hackish way and is easy to extend
>>>>
>>>> Using this RFC we'd be able to run autotest-multi-host tests, but it'd
>>>> require rewriting the control files to "mht" (or contrib) files. 
>>>> It'd be
>>>> probably even possible to write a contrib script to run the control 
>>>> file
>>>> and generate the "mht" file which would run the autotest test. Anyway
>>>> the good think for us is, that this does not affect "avocado-vt",
>>>> because all of the "avocado-vt" multi-host tests are using a single
>>>> "control" file, which only prepares the params for simple avocado-vt
>>>> executions. The only necessary thing is a custom "tests.cfg" as by
>>>> default it disallows multi-host tests (or we can modify the 
>>>> "tests.cfg"
>>>> and include the filter inside the "avocado-vt" loader, but these are
>>>> just the details to be sorted when we start running avocado-vt
>>>> multi-host tests.
>>>>
>>>> Conclusion
>>>> ==========
>>>>
>>>> Multi-host testing was solved many times in the history. Some hardcode
>>>> tests with communication, but most framework I had seen support
>>>> triggering "normal/ordinary" tests and add some kind of barrier 
>>>> (either
>>>> inside the code or between the tests) mechanism to synchronize the
>>>> execution. I'm for the flexibility and easy test sharing and that 
>>>> is how
>>>> I described it here.
>>>>
>>>> Kind regards,
>>>> Lukáš
>>>>
>>>
>>> _______________________________________________
>>> Avocado-devel mailing list
>>> Avocado-devel at redhat.com
>>> https://www.redhat.com/mailman/listinfo/avocado-devel
>>>
>