[Avocado-devel] RFC: Multi-host tests

Mon Mar 28 19:44:15 UTC 2016

----- Original Message -----
> From: "Lukáš Doktor" <ldoktor at redhat.com>
> To: "Ademar Reis" <areis at redhat.com>, "Cleber Rosa" <crosa at redhat.com>, "Amador Pahim" <apahim at redhat.com>, "Lucas
> Meneghel Rodrigues" <lookkas at gmail.com>, "avocado-devel" <avocado-devel at redhat.com>
> Sent: Saturday, March 26, 2016 4:01:15 PM
> Subject: RFC: Multi-host tests
> 
> Hello guys,
> 
> Let's open a discussion regarding the multi-host tests for avocado.
> 
> The problem
> ===========
> 
> A user wants to run netperf on 2 machines. To do it manually he does:
> 
>      machine1: netserver -D
>      machine1: # Wait till netserver is initialized
>      machine2: netperf -H $machine1 -l 60
>      machine2: # Wait till it finishes and report store the results
>      machine1: # stop the netserver and report possible failures
> 
> Now how to support this in avocado, ideally as custom tests, ideally
> even with broken connections/reboots?
> 
> 
> Super tests
> ===========
> 
> We don't need to do anything and leave everything on the user. He is
> free to write code like:
> 
>      ...
>      machine1 = aexpect.ShellSession("ssh $machine1")
>      machine2 = aexpect.ShellSession("ssh $machine2")
>      machine1.sendline("netserver -D")
>      # wait till the netserver starts
>      machine1.read_until_any_line_matches(["Starting netserver"], 60)
>      output = machine2.cmd_output("netperf -H $machine1 -l $duration")
>      # interrupt the netserver
>      machine1.sendline("\03")
>      # verify netserver finished
>      machine1.cmd("true")
>      ...
> 
> the problem is it requires active connection and the user needs to
> manually handle the results.

And of course the biggest problem here is that it doesn't solve the
Avocado problem: providing a framework and tools for tests that span
multiple (Avocado) execution threads, possibly on multiple hosts. 

> 
> 
> Triggered simple tests
> ======================
> 
> Alternatively we can say each machine/worker is nothing but yet another
> test, which occasionally needs a synchronization or data-exchange. The
> same example would look like this:
> 
> machine1.py:
> 
>     process.run("netserver")
>     barrier("server-started", 2)
>     barrier("test-finished", 2)
>     process.run("killall netserver")
> 
> machine2.py:
> 
>      barrier("server-started", 2)
>      self.log.debug(process.run("netperf -H %s -l 60"
>                                 % params.get("server_ip"))
>      barrier("test-finished", 2)
> 
> where "barrier(name, no_clients)" is a framework function which makes
> the process wait till the specified number of processes are waiting for
> the same barrier.

The barrier mechanism looks like an appropriate and useful utility for the
example given.  Even though your use case example explicitly requires it,
it's worth pointing out and keeping in mind that there may be valid use cases
which won't require any kind of synchronization.  This may even be true to
the executions of tests that spawn multiple *local* "Avocado runs". 

> 
> The barrier needs to know which server to use for communication so we
> can either create a new service, or simply use one of the executions as
> "server" and make both processes use it for data exchange. So to run the
> above tests the user would have to execute 2 avocado commands:
> 
>      avocado run machine1.py --sync-server machine1:6547
>      avocado run machine2.py --remote-hostname machine2 --mux-inject
> server_ip:machine1 --sync machine1:6547
> 
> where:
>      --sync-server tells avocado to listen on ip address machine1 port 6547
>      --remote-hostname tells the avocado to run remotely on machine2
>      --mux-inject adds the "server_ip" into params
>      --sync tells the second avocado to connect to machine1:6547 for
> synchronization

To be honest, apart from the barrier utility, this provides little value
from the PoV of a *test framework*, and possibly unintentionally, competes
and overlaps with "remote" tools such as fabric.

Also, given that the multiplexer is an optional Avocado feature, such
a feature should not depend on it.

> 
> Running those two tests has only one benefit compare to the previous
> solution and that is it gathers the results independently and makes
> allows one to re-use simple tests. For example you can create a 3rd
> test, which uses different params for netperf, run it on "machine2" and
> keep the same script for "machine1". Or running 2 netperf senders at the
> same time. This would require libraries and more custom code when using
> "Super test" approach.
> 
> There are additional benefits for this solution. When we introduce the
> locking API, tests running on a remote machine will be actually directly
> executed in avocado, therefor the locking API will work for them,
> avoiding problems with multiple tests using the same shared resource.
> 
> Another future benefit would be system reboot/lost connection when we
> introduce this support for individual tests. The way it'd work is that
> user triggers the jobs, the master remembers the test ids and would poll
> for results until they finish/timeout.
> 
> All of this we get for free thanks to re-using the existing
> infrastructure (or the future infrastructure), so I believe this is the
> right way to go and in this RFC I'm describing details of this approach.
>

All of the benefits listed are directly based on the fact that tests on
remote systems would be run under the Avocado test runner and would have
it's runtime libraries available.  This is a valid point, but again it
doesn't bring a significant change in the user experience wrt running
tests that span multiple "Avocado runs" (possibly on remote machines).

> 
> Triggering the jobs
> -------------------
> 
> Previous example required the user to run the avocado 2 times (per each
> machine) and sharing the same sync server. Additionally it resulted into
> 2 separated results. Let's try to eliminate this problem.
> 
> 
> Basic tests
> ~~~~~~~~~~~
> 
> For basic setups, we can come up with very simple format to describe
> which tests should be triggered and avocado should take care of
> executing it. The way I have in my mind is to simply accept list of
> "avocado run" commands:
> 
> simple_multi_host.mht:
> 
>      machine1.py
>      machine2.py --remote-hostname machine2 --mux-inject server_ip:machine1
> 
> Running this test:
> 
>      avocado run simple_multi_host.mht --sync-server 0.0.0.0
> 
> avocado would pick a free port and start the sync server on it. Then it
> would prepend "avocado run" and append "--sync $sync-server
> --job-results-dir $this-job-results" to each line in
> "simple_multi_host.mht" and run them in parallel. Afterward it'd wait
> till both processes finish and report pass/fail depending on the status.
> 
> This way users get overall results as well as individual ones and simple
> way to define static setups.
> 

First, the given usage example would require Avocado to introduce:

 * A brand new file format
 * A new test type (say MULTI_HOST_TEST, in addition to the SIMPLE,
   INSTRUMENTED, etc).

Introducing a brand new file format may look like a very simple thing
to do, but it's not.  I can predict that we'd learn very quickly that
our original file format definition is very limited.  Then we'd either
have to live with that, or introduce new file format versions, or just
break the initial definition or compatibility.  These are all problems
related to file formats, not really to your proposed file format.

Then, analogous to the "remote tools (fabric)" example I gave before,
this looks to be outside of the problem scope of Avocado, in the sense
that "template" tools can do it better.

Introducing a new test type, and a test resolver/loader, would be a
mandatory step to achieve this design, but it looks like a necessary
action only to make the use of "MHT" file format possible. 

Please note that having a design that allow users to fire multiple
Avocado command line instances executions in their own scripts is a bad
thing, but as a test framework, I believe we can deliver a better, more
focused experience. 

> 
> Contrib scripts
> ~~~~~~~~~~~~~~~
> 
> The beauty of executing simple lines is, that users might create contrib
> scripts to generate the "mht" files to get even better flexibility.

Since I don't think a new file format and test type is a good thing, this
also becomes a bad idea IMHO.

> 
> 
> Advanced tests
> ~~~~~~~~~~~~~~
> 
> The above might still not be flexible enough. But the system underneath
> is very simple and flexible. So how about creating instrumented tests,
> which generate the setup? The same simple example as before:
> 
> multi_host.py
> 
>      runners = ["machine1.py"]
>      runners.append("machine2.py --remote-hostname machine2 --mux-inject
> server_ip:machine1")
>      self.execute(runners)
> 

A major plus here is that there's no attempt to define new file formats,
test types and other items that are necessary only to fulfill a use case
requirement.  Since Avocado's primary language of choice is Python, we
should stick to it, given that it's expressive enough and well maintained
enough.  This is of course a lesson we learned with Autotest itself, let's
not forget it.

Then, a couple of things I dislike here:

 1) First runner is special/magical (sync server would be run here)
 2) Interface with runner execution is done by command line parameters

> where the "self.execute(tests)" would take the list and does the same as
> for basic tests. Optionally it could return the json results per each
> tests so the test itself can react and modify the results.
> 
> The above was just a direct translation of the previous example, but to
> demonstrate the real power of this let's try a PingPong multi host test:
> 
>      class PingPong(MultiHostTest):
>          def test(self):
>              hosts = self.params.get("hosts", default="").split(";")
>              assert len(hosts) >= 2
>              runners = ["ping_pong --remote-hostname %s" % _
>                              for _ in hosts]
>              # Start creating multiplex tree interactively
>              mux = MuxVariants("variants")
>              # add /run/variants/ping with {} values
>              mux.add("ping", {"url": hosts[1], "direction": "ping",
>                               "barrier": "ping1"})
>              # add /run/variants/pong with {} values
>              mux.add("pong", {"url": hosts[-1], "direction": "pong",
>                               "barrier": "ping%s" % len(hosts) + 1})
>              # Append "--mux-inject mux-tree..." to the first command
>              runners[0] += "--mux-inject %s" % mux.dump()
>              for i in xrange(1, len(hosts)):
>                  mux = MuxVariants("variants")
>                  next_host = hosts[i+1 % len(hosts)]
>                  prev_host = hosts[i-1]
>                  mux.add("pong", {"url": prev_host, "direction": "pong",
>                                   "barrier": "ping%s" % i})
>                  mux.add("ping", {"url": next_host, "direction": "ping",
>                                   "barrier": "ping%s" % i+1})
>                  runners[i] += "--mux-inject %s" % mux.dump()
>              # Now do the same magic as in basic multihost test on
>              # the dynamically created scenario
>              self.execute(runners)
> 
> The `self.execute` generates the "simple test"-like list of "avocado
> run" commands to be executed. But the test writer can define some
> additional behavior. In this example it generates
> machine1->machine2->...->machine1 chain of ping-pong tests.

You mean that this would basically generate a "shell script like" list
of avocado runs?  This looks to be a very strong design decision, and
I fail to see how it would lend itself to be flexible enough and deliver
the "test writer can define some additional behavior" requirement.

> 
> When running "avocado run pingpong --mux-inject hosts:machine1;machine2"
> this generates 2 jobs, both running just a single "ping_pong" test with
> 2 multiplex variants:
> 
> machine1:
> 
>      variants: !mux
>          ping:
>              url: machine2
>              direction: pong
>              barrier: ping1
>          pong:
>              url: machine2
>              direction: pong
>              barrier: ping2
> machine2:
> 
>      variants: !mux
>          pong:
>              url: machine1
>              direction: pong
>              barrier: ping1
>          ping:
>              url: machine1
>              direction: ping
>              barrier: ping2
> 
> The first multiplex tree for three machines looks like this:
> 
>      variants: !mux
>          ping:
>              url: machine2
>              direction: pong
>              barrier: ping1
>          pong:
>              url: machine3
>              direction: pong
>              barrier: ping
> 
> Btw I simplified the format for the sake of this RFC. I think instead of
> generating the strings we should support API to specify test,
> multiplexer, options... and then turn them into the parallel executed
> jobs (usually remotely). But these are just details to be solved if we
> decide to work on it.

This statement completely changes what you have proposed up to this point.

IMHO it's far from being just details, because that would define the lowest
and commonest level of this feature set that we would advertise and support.
The design should really be from this level up, and not from the opposite
direction.

If external users want to define file formats (say your own MHT proposal) on
top of our "framework for running tests that span multiple execution threads"
at once, they should be able to do so.

If you ask me, having sound Avocado APIs that users could use to fire multiple
portions of their *tests* at once and have their *results* coalesced into a single
*test* result is about what Avocado should focus on.

> 
> 
> Results and the UI
> ==================
> 
> The idea is, that the user is free to run the jobs separately, or to
> define the setup in a "wrapper" job. The benefit of using the "wrapper"
> job are the results in one place and the `--sync` handling.
> 
> The difference is that running them individually looks like this:
> 
>      1 | avocado run ping_pong --mux-inject url:192.168.1.58:6001
> --sync-server
>      1 | JOB ID     : 6057f4ea2c99c43670fd7d362eaab6801fa06a77
>      1 | JOB LOG    :
> /home/medic/avocado/job-results/job-2016-01-22T05.33-6057f4e/job.log
>      1 | SYNC       : 0.0.0.0:6001
>      1 | TESTS      : 1
>      1 |  (1/1) ping_pong: \
>      2 | avocado run ping_pong --mux-inject :url::6001 direction:pong
> --sync 192.168.1.1:6001 --remote-host 192.168.1.1
>      2 | JOB ID     : 6057f4ea2c99c43670fd7d362eaab6801fa06a77
>      2 | JOB LOG    :
> /home/medic/avocado/job-results/job-2016-01-22T05.33-6057f4e/job.log
>      2 | TESTS      : 1
>      2 |  (1/1) ping_pong: PASS
>      1 |  (1/1) ping_pong: PASS
> 
> and you have 2 results directories and 2 statuses. By running them
> wrapped inside simple.mht test you get:
> 
>      avocado run simple.mht --sync-server 192.168.122.1
>      JOB ID     : 6057f4ea2c99c43670fd7d362eaab6801fa06a77
>      JOB LOG    :
> /home/medic/avocado/job-results/job-2016-01-22T05.33-6057f4e/job.log
>      TESTS      : 1
>       (1/1) simple.mht: PASS
>      RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0
>      TIME       : 0.00 s
> 
> And single results:
> 
>      $ tree $RESULTDIR
> 
>      └── test-results
>          └── simple.mht
>              ├── job.log
>                  ...
>              ├── 1
>              │   └── job.log
>                      ...
>              └── 2
>                  └── job.log
>                      ...
> 
>      tail -f job.log:
>      running avocado run ping pong ping pong
>      running avocado run pong ping pong ping --remote-hostname
> 192.168.122.53
>      waiting for processes to finish...
>      PASS avocado run ping pong ping pong
>      FAIL avocado run pong ping pong ping --remote-hostname 192.168.122.53
>      this job FAILED
> 

I won't spend much time here, since the UI is bound to follow other design
ideas/decisions.

> 
> Demonstration
> =============
> 
> While considering the design I developed a WIP example. You can find it
> here:
> 
>      https://github.com/avocado-framework/avocado/pull/1019
> 
> It demonstrates the `Triggered simple tests` chapter without the
> wrapping tests. Hopefully it helps you understand what I had in mind. It
> contains modified "examples/tests/passtest.py" which requires 2
> concurrent executions (for example if you want to test your server and
> run multiple concurrent "wget" connections). Feel free to play with it,
> change the number of connections, set different barriers, combine
> multiple different tests...
> 
> 
> Autotest
> ========
> 
> Avocado was developed by people familiar with Autotest, so let's just
> mention here, that this method is not all that different from Autotest
> one. The way Autotest supports parallel execution is it let's users to
> create the "control" files inside the multi-host-control-file and then
> run those in parallel. For synchronization it contains master->slave
> barrier mechanism extended of SyncData to send pickled data to all
> registered runners.
> 
> I considered if we should re-use the code, but:
> 
> 1. we do not support control files, so I just inspired by passing the
> params to the remote instances

One of the wonderful things about Autotest control files is that
it's not a custom file format.  This can not be underestimated.  While
other frameworks have had huge XML based file formats to drive their
jobs, Autotest control files are infinitely more capable and their
readability is a lot more scalable.

The separation of client and server test types (and control files) is
actually what prevents control files from nearing perfection IMHO.

The server API allows you to run client control files on given hosts.
These client control files usually need tweaking for each host.  Then
you're suddenly doing code generation (control files Python code). That
is not nice.

I believe that, if Avocado provides such an API that allows regular Python
code to operate similarly to server control files, while giving more control
and granularity to what is run on the individual job executions (say
on remote machines), and help to coalesce the individual portions into a
single test result, it would be a very attractive tool.

> 2. the barriers and syncdata are quite hackish, master->slave
> communication. I think the described (and demonstrated) approach does
> the same in a less hackish way and is easy to extend
> 
> Using this RFC we'd be able to run autotest-multi-host tests, but it'd
> require rewriting the control files to "mht" (or contrib) files. It'd be
> probably even possible to write a contrib script to run the control file
> and generate the "mht" file which would run the autotest test. Anyway
> the good think for us is, that this does not affect "avocado-vt",
> because all of the "avocado-vt" multi-host tests are using a single
> "control" file, which only prepares the params for simple avocado-vt
> executions. The only necessary thing is a custom "tests.cfg" as by
> default it disallows multi-host tests (or we can modify the "tests.cfg"
> and include the filter inside the "avocado-vt" loader, but these are
> just the details to be sorted when we start running avocado-vt
> multi-host tests.
> 
> Conclusion
> ==========
> 
> Multi-host testing was solved many times in the history. Some hardcode
> tests with communication, but most framework I had seen support
> triggering "normal/ordinary" tests and add some kind of barrier (either
> inside the code or between the tests) mechanism to synchronize the
> execution. I'm for the flexibility and easy test sharing and that is how
> I described it here.
> 
> Kind regards,
> Lukáš
>