[Avocado-devel] Multi Stream Test Support

Wed Apr 5 16:13:08 UTC 2017

On 04/05/2017 03:29 AM, Lukáš Doktor wrote:
> Dne 3.4.2017 v 15:48 Cleber Rosa napsal(a):
>> Note: this document can be view in rendered format at:
>>
>> https://github.com/clebergnu/avocado/blob/RFC_multi_stream_v1/docs/source/rfcs/multi_stream.rst
>>
>>
>> ===========================
>>  Multi Stream Test Support
>> ===========================
>>
>> Introduction
>> ============
>>
>> Avocado currently does not provide test writers with standard tools
>> or guidelines for developing tests that spawn multiple machines.
>>
>> Since these days the concept of a "machine" is blurring really
>> quickly, this proposal for Avocado's version of "multi machine" test
>> support is more abstract (that's an early and quick explanation of
>> what a "stream" means).  One of the major goal is to be more flexible
>> and stand the "test" (pun intended) of time.
>>
>> This is a counter proposal to a previous RFC posted and discussed on
>> Avocado development mailing list.  Many of the concepts detailed here
>> were introduced there:
>>
>> * https://www.redhat.com/archives/avocado-devel/2016-March/msg00025.html
>> * https://www.redhat.com/archives/avocado-devel/2016-March/msg00035.html
>> * https://www.redhat.com/archives/avocado-devel/2016-April/msg00042.html
>> * https://www.redhat.com/archives/avocado-devel/2016-April/msg00072.html
>>
>> Background
>> ==========
>>
>> The prior art that influences Avocado the most is Autotest.  The
>> reason is that many of the Avocado developers worked on Autotest
>> before, and both share various common goals.  Let's use Autotest,
>> which provided support for multiple machine test support as a basis
>> for comparison.
>>
>> Back in the Autotest days, a test that would spawn multiple machines
>> was a very particular type of test.  To write such a test, one would
>> write a **different** type of "control file" (a server one).  Then, by
>> running a "server control file" with an **also different** command
>> line application (``autoserv``, A.K.A. ``autotest-remote``), the
>> server control file would have access to some special variables, such
>> as the ``machines`` one.  By using an **also different** type of job
>> implementation, the control file could run a given **Python function**
>> on these various ``machines``.
>>
>> An actual sample server control file (``server/samples/reboot.srv``)
>> for Autotest looks like this::
>>
>>    1  def run(machine):
>>    2     host = hosts.create_host(machine)
>>    3     host.reboot()
>>    4
>>    5  job.parallel_simple(run, machines)
>>
>> Line #5 makes use of the different (server) job implementation to run
>> function ``run`` (defined in line #1) in parallel on machines given by
>> the special variable ``machines`` (made available by the also special
>> ``autoserv`` tool).
>>
>> This quick background check shows two important facts:
>>
>> 1) The functionality is not scoped to tests.  It's not easy to understand
>>    where a test begins or ends by looking at such a control file.
>>
>> 2) Users (and most importantly test writers) have to learn about
>>    different tools and APIs when writing "multi machine" code;
>>
>> 3) The machines are defined outside the test itself (in the form of
>>    arguments to the ``autoserv`` command line application);
>>
>> Please keep these Autotest characteristics in mind: Avocado's multi
>> stream test support goals will be presented shortly, and will detail
>> how they contrast with those.
>>
>> Avocado's Multi Stream Test Support Goals
>> =========================================
>>
>> This is a hopefully complete summary of our goals:
>>
>> 1) To not require a different type of test, that is, allow users
>>    to *write* a plain `avocado.Test` while still having access to
>>    multi stream goodies;
>>
>> 2) To allow for clear separation between the test itself and its
>>    execution environment (focus here on the execution streams
>>    environment);
>>
>> 3) To allow increased flexibility by abstracting the "machines"
>>    concept into "excution streams";
>>
>> 4) To allow for even increased flexibility by allowing test writers to
>>    use not only Python functions, but other representations of code to
>>    be executed on those separate streams;
>>
>> Comparison with prior art
>> -------------------------
>>
>> When compared to the Autotest version of multiple machine support for
>> tests, Avocado's version is similar in that it keeps the separation of
>> machine and test definition.  That means that tests written in
>> accordance to the official guidelines, will not contain reference to
>> the machines ("execution streams") on which they will have portions of
>> themselves executed on.
>>
>> But, a major difference from the Autotest version is that this
>> proposal attempts to provide the **same basic tools and test APIs** to
>> the test writers needing the multiple stream support.  Of course,
>> additional tools and APIs will be available, but they will not
>> incompatible with traditional Avocado INSTRUMENTED tests.
>>
>> Core concepts
>> =============
>>
>> Because the first goal of this RFC is to set the general scope and
>> approach to Multi Stream test support, it's important to properly
>> describe each of the core concepts (usually abstractions) that will be
>> used in later parts of this document.
>>
>> Execution Stream
>> ----------------
>>
>> An *Execution Stream* is defined as a disposable execution environment,
>> different and ideally isolated from the main test execution environment.
>>
>> A simplistic but still valid implementation of an execution
>> environment could be based on an Operating System level process.
>> Another valid implementation would be based on a lightweight
>> container.  Yet another valid example could be based on a remote
>> execution interface (such as a secure shell connection).
>>
>> These examples makes it clear that level of isolation is determined
>> solely by the implementation.
>>
>>  .. note:: Even though the idea is very similar, the term *thread* was
>>            intentionally avoided here, so that readers are not led to
>> think
>>            that the architecture is based on an OS level thread.
>>
>> An execution stream is the *"where"* to execute a "Block Of Code"
>> (which is the *"what"*).
>>
>> Block of Code
>> -------------
>>
>> A *Block of Code* is defined as computer executable code that can run
>> from start to finish under a given environment and is able to report
>> its outcome.
>>
>> For instance, a command such as ``grep -q vmx /proc/cpuinfo; echo $?``
>> is valid computer executable code that can run under various shell
>> implementations.  A Python function or module, a shell command, or
>> even an Avocado INSTRUMENTED test could qualify as a block of code,
>> given that an environment knows how to run them.
>>
>> Again, this is the *what* to be run on a "Execution Streams" (which,
>> in turn, is *"where"* it can be run).
>>
>> Basic interface
>> ===============
>>
>> Without initial implementation attempts, it's unreasonable to document
>> interfaces at this point and do not expect them to change.  Still, the
>> already existing understanding of use cases suggests an early view of
>> the interfaces that would be made available.
>>
>> Execution Stream Interface
>> --------------------------
>>
>> One individual execution stream, within the context of a test, should
>> allow its users (test writers) to control it with a clean interface.
>> Actions that an execution stream implementation should provide:
>>
>> * ``run``: Starts the execution of the given block of code (async,
>>   non-blocking).
>> * ``wait``: Block until the execution of the block of code has
>>   finished.  ``run`` can be given a ``wait`` parameter that will
>>   automatically block until the execution of code has finished.
>> * ``terminate``: Terminate the execution stream, interrupting the
>>   execution of the block of code and freeing all resources
>>   associated with this disposable environment
>>
>> The following properties should be provided to let users monitor the
>> progress and outcome of the execution:
>>
>> * ``active``: Signals with True or False wether the block of code
>>   given on ``run`` has finished executing.  This will always return
>>   False if ``wait`` is used, but can return either True or False when
>>   running in async mode.
>> * ``success``: A simplistic but precise view of the outcome of the
>>   execution.
>> * ``output``: A dictionary of various outputs that may have been
>>   created by ``run``, keyed by a descriptive name.
>>
>> The following properties could be provided to transport block of code
>> payloads to the execution environment:
>>
>> * ``send``: Sends the given content to the execution stream
>>   environment.
> This is ambitious. Arbitrary code chunks have dependencies, be it
> binary, script, module, whatever... But sure, each implementation (Bash,
> PythonModule, PythonCode) can support various means to allow deps.
> 

Writing many different implementations is indeed ambitious.  The major
goal here, though, is to define a sane interface that works for the
basic and most urgent use cases while still not locking out doors to
future needs.  Basically, making sure we don't loose the investment here
and rewriting large portions of Avocado just because we now find that
running, say, Ansible playbooks, in different streams is now a very
important thing for our tests.

> Anyway for the python script I'd recommend looking at
> `avocado-vt/virttest/remote_commander` which is a generic interface to
> execute tasks remotely to get the sight of complexity.
> 

Yep, I'm aware of it, but thanks for the pointer since it does indeed
make sense to keep it in mind.

>>
>> Block of Code Interface for test writers
>> ----------------------------------------
>>
>> When a test writer intends to execute a block code, he must choose from
>> one of the available implementations.  Since the test writer must know
>> what type of code it's executing, the user inteface with the
>> implementation
>> can be much more flexible.
>>
>> For instance, suppose a Block Of Code implementation called
>> ``PythonModule`` exists.  This implementation would possibly run
>> something like
>> ``python -m <modulename>`` and collect its outcome.
>>
>> A user of such an implementation could write a test such as::
>>
>>   from avocado import Test
>>   from avocado.streams.code import PythonModule
>>
>>   class ModuleTest(Test):
>>     def test(self):
>>         self.streams[1].run(PythonModule("mymodule",
>>                                          path=["/opt/myproject"]))
>>
>> The ``path`` interface in this example is made available and supported
>> by the ``PythonModule`` implementation alone and will not be used the
>> execution stream implementations. As a general rule, the "payload"
>> should be the first argument to all block of code implementations.
>> Other arguments can follow.
>>
>> Another possibility related to parameters is to have the Avocado's own
>> test parameters ``self.params`` passed through to the block of code
>> implementations, either all of them, or a subset based on path.  This
>> could allow for example, a parameter signaling a "debug" condition to
>> be passed on to the execution of the block of code.  Example::
>>
>>   from avocado import Test
>>   from avocado.streams.code import PythonModule
>>
>>   class ModuleTest(Test):
>>     def test(self):
>>         self.streams[1].run(PythonModule("mymodule",
>>                                          path=["/opt/myproject"],
>>                                          params=self.params))
>>
>> Block of Code Interface for Execution Stream usage
>> --------------------------------------------------
>>
>> Another type of public interface, in the sense that it's well known
>> and documented, is the interface that Execution Stream implementations
>> will use to interact with Block of Code implementations.  This is not
>> intended to be used by test writers, though.
>>
>> Again, it's too early to define a frozen implementation, but this is
>> how it could look like:
>>
>> * ``send_self``: uses the Execution Stream's ``send`` interface to
>> properly
>>   populate the payload or other necessary assets for its execution.
>> * ``run``: Starts the execution of the payload, and waits for the outcome
>>   in a synchronous way.  The asynchronous support is handled at the
>> Execution
>>   Stream side.
>> * ``success``: Reports the positive or negative outcome in a
>>   simplistic but precise way.
>> * ``output``: A dictionary of various outputs that may be generated by
>> the
>>   execution of the code.  The Execution Stream implementation may
>> merge this
>>   content with its own ``output`` dictionary, given an unified view of
>> the
>>   output produced there.
>>
>> Advanced topics and internals
>> =============================
>>
>> Execution Streams
>> -----------------
>>
>> An execution stream  was defined as a "disposable execution
>> environment".  A "disposable execution environment", currently in the
>> form of a fresh and separate process, is exactly what the Avocado
>> test runner gives to a test in execution.
>>
>> While there may be similarities between the Avocado Test Process
>> (created by the test runner) and execution streams, please note that
>> the execution streams are created *by* one's test code.  The following
>> diagram may help to make the roles clearer::
>>
>>    +-----------------------------------+
>>    |       Avocado Test Process        |  <= created by the test runner
>>    | +-------------------------------+ |
>>    | | main execution stream         | |  <= executes your `test*()`
>> method
>>    | +-------------------------------+ |
>>    | | execution stream #1           | |  <= initialized on demand by
>> one's
>>    | | ...                           | |     test code.  utilities to
>> do so
>>    | | execution stream #n           | |     are provided by the
>> framework
>>    | +-------------------------------+ |
>>    +-----------------------------------+
>>
>> Even though the proposed mechanism is to let the framework create the
>> execution lazily (on demand), the use of the execution stream is the
>> definitive trigger for its creation.  With that in mind, it's accurate
>> to say that the execution streams are created by one's test code
>> (running on the "main execution stream").
>>
>> Synchronous, asynchronous and synchronized execution
>> ----------------------------------------------------
>>
>> As can be seen in the interface proposal for ``run``, the default
>> behavior is to have asynchronous executions, as most observed use
>> cases seem to fit this execution mode.
>>
>> Still, it may be useful to also have synchronous execution.  For that,
>> it'd be a matter of setting the ``wait`` option to ``run``.
>>
>> Another valid execution mode is synchronized execution.  This has been
>> thoroughly documented by the previous RFCs, under sections named
>> "Synchronization".  In theory, both synchronous and asynchronous
>> execution modes could be combined with a synchronized execution, since
>> the synchronization would happen among the execution streams
>> themselves.  The synchronization mechanism, usually called a "barrier",
>> won't be given too much focus here, since on the previous RFCs, it was
>> considered a somehow agreed and understood point.
>>
>> Termination
>> -----------
>>
>> By favoring asynchronous execution, execution streams need to also
>> have a default behavior for handling termination of termination
>> of resources.  For instance, for a process based execution stream,
>> if the following code is executed::
>>
>>   from avocado import Test
>>   from avocado.streams.code import shell
>>   import time
>>
>>   class MyTest(avocado.Test):
>>       def test(self):
>>           self.streams[0].run(shell("sleep 100"))
>>           time.sleep(10)
>>
>> The process created as part of the execution stream would run for
>> 10 seconds, and not 100 seconds.  This reflects that execution streams
>> are, by definition, **disposable** execution environments.
>>
>> Execution streams are thus limited to the scope of one test, so
>> implementations will need to terminate and clean up all associated
>> resources.
>>
>> .. note:: based on initial experiments, this will usually mean that a
>>           ``__del__`` method will be written to handle the cleanup.
> I'd suggest the runner should after the test execution run:
> 
>     for stream in test_instance.streams:
>         stream.close()
> 
> which should explicitly take care of closing the streams (and
> terminating it's processes) as `__del__` might not be executed on all
> occasions and I'd suggest doing this during/after `tearDown`.
> 

Right.  Let's keep that in mind.  At the end of the day, it's an
implementation level detail, and as long as the cleanup is tested and
works, we're good to go.

>>
>> Avocado Utility Libraries
>> -------------------------
>>
>> Based on initial evaluation, it looks like most of the features necessary
>> to implement multi stream execution support can be architected as a set
> to implement multi stream execution support can be __designed__ as a set
> 
>> of utility libraries.
>>
>> One example of pseudo code that could be possible with this design::
>>
>>   from avocado import Test
>>   from avocado.streams import get_implementation
>>   from avocado.streams.code import shell
>>
>>   class Remote(Test):
>>
>>       def test_filtering(self):
>>           klass = get_implementation("remote")
> 
> Well this is not really scalable. Imagine that in one execution you want
> to use local, in next remote and then docker container. In your example
> you'd have to change the source code of the test. How about this:
> 
>     get_implementation(params.get("first_stream"))
>     get_implementation(params.get("second_stream"))
> 

Your suggestion is really a "how to write an Avocado test".  Yes, we
absolutely should use parameters in tests, and that's why the "klass"
parameters come from parameters.  The use of "remote" here is just to
make things clearer with regards to how one would refer to
implementations (by name).

> where:
> 
>     first_stream = "localhost"
>     second_stream = "test:123456 at 192.168.122.10"
>     third_stream = "docker://create=yes,image=fedora,remove=always"
>     fourth_stream = "libvirt://create=no,domain=test_machine,start=yes"
>     ...
> 

These URLs are just painful to read IMO.

> Another option would be to allow `path` of the Avocado Test `params` and
> the class would get the details from the provided params+path. The cons
> would be it'd be hard to use anything but `params` for that:
> 
>     streams:
>         first:
>             type: localhost
>         second:
>             hostname: 192.168.122.10
>             user: test
>             password: 123456
>         third:
>             type: docker
>             create: yes
>         ...
> 

I like this structure better.

>     get_implementation(self.params, "/streams/first/*") => would use
> `params.get(..., "/streams/first/*")` to get all necessary parameters
> 

The idea of `get_implementation` is pretty simple: get an Execution
Stream implementation by its name.  What you're describing here (with
regards to getting the "right" parameters and instantiating/activating
the execution stream easily/automatically) is exactly what I propose later.

>>           if klass is not None:
>>               stream = klass(host=self.params.get("remote_hostname"),
>>                              username=self.params.get("remote_username")
>>                              password=self.params.get("remote_password"))
>>               cmd = "ping -c 1 %s" %
>> self.params.get("test_host_hostname")
>>               stream.run(shell(cmd))
> I do like the rest of the example (only the klass would be already
> initialized by the `get_implementation`.
> 

This conflicts with the idea of "utility library to be used by test
developer that wants full control of the individual creation and use of
the execution streams".  I mean, if some implementations are made
available at the `avocado.utils` namespace, then they could even be
referred to directly by module/name.

Again, since we're talking about making whatever makes sense in the
utility namespace, there could a function such as
"get_stream_parameters(name)" that would return the parameters to an
Execution Stream, that is:

  stream_name = "server"
  path = "/avocado/streams/%s/*" % stream_name
  impl = self.params.get("type", path=path)
  stream = get_implementation(impl)(**get_stream_parameters(path))

>>
>> Please note that this is not the intended end result of this proposal,
>> but
>> a side effect of implementing it using different software layers.  Most
>> users should favor the simplified (higher level) interface.
>>
>> Writing a Multi-Stream test
>> ===========================
>>
>> As mentioned before, users have not yet been given tools **and
>> guidelines** for writing multi-host (multi-stream in Avocado lingo)
>> tests.  By setting a standard and supported way to use the available
>> tools, we can certainly expect advanced multi-stream tests to become
>> easier to write and then much more common, robust and better supported
>> by Avocado itself.
>>
>> Mapping from parameters
>> -----------------------
>>
>> The separation of stream definitions and test is a very important goal
>> of this proposal.  Avocado already has a advanced parameter system, in
> of this proposal.  Avocado already has __an__ advanced parameter system, in
> 
>> which a test received parameters from various sources.The most common
>> way of passing parameters at this point is by means of YAML files, so
>> these will be used as the example format.
> Well this might be quite hard to understand, how about just saying:
> "Avocado supports test parametrisation via Test Parameters system and
> the most common way is to use a YAML file by using `yaml_to_mux` plugin."
> 
>>
>> Parameters that match a predefined schema (based on paths and node
>> names) will be by evaluated by a tests' ``streams`` instance
>> (available as ``self.streams`` within a test).
>>
>> For instance, the following snippet of test code::
>>
>>   from avocado import Test
>>
>>   class MyTest(Test):
>>       def test(self):
>>           self.streams[1].run(python("import mylib; mylib.action()"))
>>
>> Together with the following YAML file fed as input to the parameter
>> system::
>>
>>   avocado:
>>      streams:
>>       - 1:
>>           type: remote
>>           host: foo.example.com
> This is currently not supported by our yaml parser as any dictionary is
> mapped to multiplex structure and I'm not sure it'd be possible (in a
> sane manner) to treat dict inside lists differently. Anyway as I
> mentioned earlier we could use:
> 

Oops, I may have used an incorrect syntax or idea (or both).

>     avocado:
>         streams:
>             1: ssh://foo.example.com
> 
> or:
> 
>     avocado:
>         streams:
>             1:
>                 type: remote
>                 host: foo.example.com
> 

This one looks good IMO.  I'm all in for more explicitly naming **when**
you go to the lengths of defining them.

> or:
> 
>     avocado:
>         streams:
>             - type: remote
>               host: foo.example.com
> 
> Another thing is I'd probably prefer names to ints so "1" or "server" or
> "worker1" etc, which goes nicely with the first 2 examples. The last
> example goes well with indexes, but it starts with 0 (which would be my
> recommendation anyway if we decided to go with indexes).
> 

I think I mentioned somewhere "if only integers"...  I actually share
your fondness of names.  I did not say it explicitly, but I think we can
support both, as the slicing examples make a lot of sense to me.

But, the slicing examples can be expanded to its own dialect, such as
supporting regexes.  For instance, `self.streams["client-\d+"]` makes a
lot of sense IMO.

>>
>> Would result in the execution of ``import mylib; mylib.action()``
>> in a Python interpreter on host ``foo.example.com``.
>>
>> If test environments are refered to on a test, but have not been defined
> If test environments are __referred__ to on a test, but have not been
> defined
> 

OK, thanks.

>> in the outlined schema, Avocado's ``streams`` attribute implementation
>> can use a default Execution Stream implementation, such as a local
>> process
>> based one.  This default implementation can, of course, also be
>> configured
>> at the system and user level by means of configuration files, command
>> line
>> arguments and so on.
>>
>> Another possibility is an "execution stream strict mode", in which no
>> default implementation would be used, but an error condition would be
>> generated.  This may be useful on environments or tests that are
>> really tied to their execution stream types.
> I'd solve this by supporting `__len__` where `len(self.streams)` should
> report number of defined streams.
> 
> Note the number of defined streams changes based on how many streams are
> defined __OR__ used by the test. So:
> 
>     avocado:
>         streams:
>             - 0:
>             - 1:
> 
> 
>     len(self.streams)  => 2
>     self.streams[5].run(cmd)
>     len(self.streams)  => 6
> 
> where the streams 2-5 are the default streams.
> 

I have mixed feelings here.  In my understanding, all of the streams are
created on demand, that is, when they're used.  A configuration that
defines thousands of them will not cause an empty test (think of
`passtest.py`) to initialize them.

But, the meaning of `len(self.streams)`, that is, defined or
initialized, is something we can further discuss later.

>>
>> Intercommunication Test Example
>> -------------------------------
>>
>> This is a simple example that exercises the most important aspects
>> proposed here.  The use case is to check that different hosts can
>> communicate among themselves.  To do that, we define two streams as
>> parameters (using YAML here), backed by a "remote" implementation::
>>
>>   avocado:
>>      streams:
>>       - 1:
>>           type: remote
>>           host: foo.example.com
>>       - 2:
>>           type: remote
>>           host: bar.example.com
>>
>> Then, the following Avocado Test code makes use of them::
>>
>>   from avocado import Test
>>   from avocado.streams.code import shell
>>
>>   class InterCommunication(Test):
>>       def test(self):
>>           self.streams[1].run(shell("ping -c 1 %s" %
>> self.streams[2].host))
>>           self.streams[2].run(shell("ping -c 1 %s" %
>> self.streams[1].host))
>>           self.streams.wait()
>>           self.assertTrue(self.streams.success)
> Brainstorming here, how about letting `wait` raise exception when it
> fails unless we use `wait(ignore_failure)`. The exception would contain
> all the information so it'd be THE exception which failed the test?
> 

Yep, this is a valid question.  I think the answer will depend on how
much we want the **test** result to be bound to what happens on the
streams.  Right now it's obvious that I decided to keep them pretty much
separate.

> As for the `streams.success`, I guess it'd be a property, which would go
> through all streams results, and report `any(_.failure for _ in
> self.streams)`, right?
> 

Exactly.

>>
>> The ``streams`` attribute provide a aggregated interface for all the
>> streams.
>> Calling ``self.streams.wait()`` waits for all execution streams (and
>> their
>> block of code) to finish execution.
>>
>> Support for slicing, if execution streams names based on integers only
>> could
>> be added, allowing for writing tests such as::
>>
>>   avocado:
>>      streams:
>>       - 1:
>>           type: remote
>>           host: foo.example.com
>>       - 2:
>>           type: remote
>>           host: bar.example.com
>>       - 3:
>>           type: remote
>>           host: blackhat.example.com
>>       - 4:
>>           type: remote
>>           host: pentest.example.com
>>
>>   from avocado import Test
>>   from avocado.streams.code import shell
>>
>>   class InterCommunication(Test):
>>       def test(self):
>>           self.streams[1].run(shell("ping -c 1 %s" %
>> self.streams[2].host))
>>           self.streams[2].run(shell("ping -c 1 %s" %
>> self.streams[1].host))
>>           self.streams[3].run(shell("ping -c 1 %s" %
>> self.streams[1].host))
>>           self.streams[4].run(shell("ping -c 1 %s" %
>> self.streams[1].host))
>>           self.streams.wait()
>>           self.assertTrue(self.streams[1:2].success)
>>           self.assertFalse(self.streams[3:4].success)
> As mentioned earlier I'd prefer names to indexes, anyway I see the
> indexes useful as well. How about supporting a name or index?
> 

Yep, also thought of that.

> As for the slices, I'd prefer list-like slice to stream-like slices as
> it'd be more natural to me to interact with a list of individual streams
> rather than a Stream object with a limited subset of streams. Anyway
> that's a matter of taste and I can definitely live with this as well.
> 

See my previous comments.

> Now about this example, it's really limited. Again you are hard-coding
> the scenario and changing it is really complicated. I'd prefer something
> like:
> 
>     self.streams[0].run(server_cmd)
>     self.streams[1:].run(contact_server_cmd)
>     self.assertTrue(self.streams.success)
> 

Real tests will probably (hopefully) use better (symbolic) names.  The
goal here is to focus on the mechanisms, which on yours and on my
version are identical.

>>
>> Support for synchronized execution also maps really well to the
>> slicing example.  For instance, consider this::
>>
>>   from avocado import Test
>>   from avocado.streams.code import shell
>>
>>   class InterCommunication(Test):
>>       def test(self):
>>           self.streams[1].run(shell("ping -c 60 %s" %
>> self.streams[2].host)
>>           self.streams[2].run(shell("ping -c 60 %s" %
>> self.streams[1].host))
>>           ddos = shell("ddos --target %s" self.streams[1].host)
>>           self.streams[3:4].run(ddos, synchronized=True)
>>           self.streams[1:2].wait()
>>           self.assertTrue(self.streams.success)
>>
>> This instructs streams 1 and 2 to start connectivity checks as soon as
>> they **individually** can, while, for a full DDOS effect, streams 3
>> and 4 would start only when they are both ready to do so.
> OK so this is about before-start-synchronisation. Well again, I'm not
> much fond of boundling the streams so I'd prefer allowing to define the
> workload (which returns when the workload is ready), trigger it (which
> triggers it and reports immediately) and then wait for it. The
> difference in usage is:
> 
> 
>     self.streams[3].run(ddos, stopped=True)
>     self.streams[4].run(ddos, stopped=True)
>     self.streams[3].start()
>     self.streams[4].start()
> 
> The result is the same (unless you create processes per each stream and
> in `self.streams[3:4]` you use signals to synchronize the execution) but
> it allows greater flexibility like synchronizing other tasks then just
> streams...
> 
> Or we can create methods `establish(cmd)`, `start()` and `wait()` which
> might better describe the actions.

I think you missed my point here.  Streams #3 and #4, in my example,
wait for *each other*.  Using a mechanism such as barriers.

> 
>>
>> Feedback and future versions
>> ============================
>>
>> This being an RFC, feedback is extremely welcome.  Also, exepect new
>> versions
>> based on feedback, discussions and further development of the ideas
>> initially
>> exposed here.
>>
> Overall I like it, I don't really see it as a counter-proposal as I
> think they are pretty similar, only you defined the block-of-code to be
> more generic and refined the high-level interface.
> 

I guess it's a good thing that you like it and that this is just a
"only" (small and simple) kind of thing.  I guess my goal was attained
at some level :).

> As for the implementation I think the `self.stream` description and all
> the details about comes a bit too early. I'd start with the low level
> interface, which is a library allowing to create `Streams()`, defines
> `Code()` and `Stream()` objects independently on Avocado and later when
> we see the usage I'd standardized the usage and embedded it into the
> `Test` class. Anyway I know we don't share this vision and I'm fine with
> doing it the other way around as the result is the same, only we might
> found some limits later which might be hard to solve in the current
> schema. But based on what I know from virtualization this (together with
> the barrier synchronization) should be enough to support the tests we
> know from Autotest which is a good start.
> 

Sure.  I mentioned that it's impossible to define a "freeze" of any sort
on the interfaces.  Still, talking about them, helps to shape the
features they may have and how they'd be used.

But the most important thing here is your acknowledgment that this seems
to fit the needs we have on virtualization tests.

> Last remark regarding my and your RFC is that I deliberately defined the
> block-of-code like (not as) Test, because executing scripts is possible
> nowadays via `aexpect`, `Remoter`, `remote_commander` or other standard
> python libraries, but combining existing tests and get not just the

Right, but in no standard and supported way.  This is one of the major
goals here.  The very first line in this RFC is:

"Avocado currently does not provide test writers with standard tools
 or guidelines for developing tests that spawn multiple machines."

> executed results but also to gather remote environment and so on, that I
> see beneficial. Anyway if I understand this implementation correctly
> it'd be possible to create `AvocadoTest` inherited from `Code` which
> would allow such executions and the `Streams` could support environment
> gathering (optionally) and that is all I care about. For simpler stuff

Right.

> I'd simply use `aexpect` (as I personally like it a lot) and I know of
> people who use `remote_commander` to synchronize and distribute tasks
> across multiple machines in `avocado-vt` so I assume they'll stick to
> their working solution as well. I see the `Streams` as a library to

Our goal is to come up with useful innovations that will motivate users
to adopt them, so this is a bit negative.  Unless you don't really
believe in the value of this proposal, I would expect the opposite attitude.

> support complex tasks, not just a simple command execution, even though
> some abstraction might be useful.
> 

Can you describe complex tasks?  I imagine that the ideas your have are
based on "interacting with a remote shell/machine/console/application"
as aexpect and other tools allow, right?

> Actually now after the summary I noticed that what I'd really need is
> either the `AvocadoTest`-like command to be able to combine basic tests
> into a complex scenarios and then I'd need a unique way of interacting
> with different streams. So what I'd probably need is an
> `aexpect`-concentrator which would allow me to ask for a session based
> on the description:
> 
>     aexpect.RemoteShellSession(url)
> 
> where `url` is something like:
> 
>     "localhost"
>     "test:123456 at 192.168.122.10"
>     "docker://create=yes,image=fedora,remove=always"
>     "libvirt://create=no,domain=test_machine,start=yes"
> 
> which would establish the connection (creating the container/vm first if
> asked for) and than I'd interact with it as with other
> `aexpect.ShellSession` (therefor not just a single command, but full
> expect-like behavior)
> 

OK, this matches my previous comment.  Yes, what you're describing is
probably not something this RFC contemplates, but at least parts of
could be common for both use cases.

> This last note is probably outside the scope of multi-stream tests and
> could (hopefully should) be implemented in parallel to serve different
> purpose. Anyway with this in mind I don't see much point in having
> `Bash` or `PythonModule`-like code-blocks in parallel test execution and
> I'd only focus on the full-blown complex parallel tasks.
> 

Sure, something like `self.interactives` available at the test level,
working similar to the streams could be a nice addition here.

> Anyway, hopefully this feedback is understandable, I have been writing
> it for 2 days so feel free to ask for some hints ...
> 
> Lukáš
> 

Yes, I think I understood all your points.  Let me know if my response
was clear enough.

And this for the feedback!

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]
[  7ABB 96EB 8B46 B94D 5E0F  E9BB 657E 8D33 A5F2 09F3  ]

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/avocado-devel/attachments/20170405/e31328c7/attachment.sig>