[Avocado-devel] RFC: multi-stream test (previously multi-test) [v3]

Wed Apr 20 18:02:23 UTC 2016

Dne 19.4.2016 v 22:18 Cleber Rosa napsal(a):
> 
> 
> On 04/15/2016 03:05 AM, Lukáš Doktor wrote:
>> Hello again,
>>
>> There were couple of changes and the new Job API RFC, which might sound
>> similar to this RFC, but it covers different parts. Let's update the
>> multi-test RFC and fix the terminology, which might had been a bit
>> misleading.
>>
>> Changes:
>>
>>      v2: Rewritten from scratch
>>      v2: Added examples for the demonstration to avoid confusion
>>      v2: Removed the mht format (which was there to demonstrate manual
>>          execution)
>>      v2: Added 2 solutions for multi-tests
>>      v2: Described ways to support synchronization
>>      v3: Renamed to multi-stream as it befits the purpose
>>      v3: Improved introduction
>>      v3: Workers are renamed to streams
>>      v3: Added example which uses library, instead of new test
>>      v3: Multi-test renamed to nested tests
>>      v3: Added section regarding Job API RFC
>>      v3: Better description of the Synchronization section
>>      v3: Improved conclusion
>>      v3: Removed the "Internal API" section (it was a transition between
>>          no support and "nested test API", not a "real" solution)
>>      v3: Using per-test granularity in nested tests (requires plugins
>>          refactor from Job API, but allows greater flexibility)
>>
>>
>> The problem
>> ===========
>>
>> Allow tests to have some if its block of code run in separate stream(s).
>> We'll discuss the range of "block of code" further in the text.
>>
> 
> I believe it's also important to define what "stream" means.  The reason
> is that it's used both as an abstraction, and as a more concrete
> component in the code examples that follow.
> 
OK, I'll add this to v4

>> One example could be a user, who wants to run netperf on 2 machines,
>> which requires following manual steps:
>>
>>
>>      machine1: netserver -D
>>      machine1: # Wait till netserver is initialized
>>      machine2: netperf -H $machine1 -l 60
>>      machine2: # Wait till it finishes and report the results
>>      machine1: # stop the netserver and report possible failures
>>
>> the test would have to contain the code for both, machine1 and machine2
>> and it executes them in two separate streams, which might or not be
>> executed on the same machine.
>>
> 
> I can understand what you mean here just fine, but it's rather confusing
> to say "machine1 and machine2" and at the same time "migh or not be
> executed on the same machine".
> 
> This brings us back to the stream concept.  I see the streams as the
> running, isolated, execution of "code blocks".  This execution may be on
> the same machine or not.
> 
> With those statements in mind, I'd ask you to give your formal
> definition and vision of the the stream concept.
> 
I hope we share the same view, I'll try to put it on paper (keyboard)
while writing the v4.

>> You can see that each stream is valid even without the other, so
>> additional requirement would be to allow easy share of those block of
>> codes among other tests. Splitting the problem in two could also
>> sometimes help in analyzing the failures.
>>
> 
> Here you say that a stream is isolated from each other.  This matches my
> understanding of streams as "running, isolated execution of code blocks".
> 
> But "help in analyzing failures" should not be a core part or reason for
> this architecture.  It can be a bonus point.  Still, let's try to focus
> on the very core components on the architecture and drop the discussion
> about the lesser important aspects.
> 
Yep, I'm sorry for confusion, I meant it as another possible benefit,
but not a requirement.

>> Some other examples might be:
>>
>> 1. A simple stress routine being executed in parallel (the same or
>> different hosts)
>> 2. Several code blocks being combined into a complex scenario(s)
>> 3. Running the same test along with stress test in background
>>
>> For demonstrating purposes this RFC uses a very simple example fitting
>> in the category (1). It downloads the main page from "example.org"
>> location using "wget" (almost) concurrently from several machines.
>>
>>
>> Standard python libraries
>> -------------------------
>>
>> One can run pieces of python code directly using python's
>> multiprocessing library, without any need for the avocado-framework
>> support. But there is quite a lot of cons:
>>
>> + no need for framework API
>> - lots of boilerplate code in each test
>> - each solution would be unique and therefor hard to analyze the logs
>> - no decent way of sharing the code with other tests
>>
> 
> IMHO you can drop the reasons on why *not* to use lower level or just
> different code.  If, during research we came to find some other external
> project/framework/library, we should just have used it and documented
> it.  Since this is not the case, let's just avoid getting distracted on
> this RFC.
> 
Yep, as I got only response from you, I wanted to keep the variants
here. I'll remove them in the next version.

>> Yes, it's possible to share the code by writing libraries, but that does
>> not scale as other solutions...
>>
> 
> This is a justification of why we should do it the Avocado way.  Again,
> if there were better non-Avocado ways, we shouldn't be discussing
> anything else than those other possible solutions.  I'm being repetitive
> here because I really believe we should focus on our proposed
> architecture key points.
> 
>> Example (simplified):
>>
>>      from avocado.core.remoter import Remote
>>      from threading import Thread
>>      ...
>>      class Wget(Thread):
>>
>>          def __init__(self, machine, url):
>>              self.remoter = Remote(machine)
>>              self.url = url
>>              self.status = None
>>
>>          def run(self):
>>              ret = self.remoter.run("wget %s" % self.target,
>>                                     ignore_status=True)
>>              self.status = ret.exit_status
>>      ...
>>
>>      threads = []
>>      for machine in machines:
>>          threads.append(Wget(machine, "example.org"))
>>      for thread in threads:
>>          thread.start()
>>      for thread in threads:
>>          thread.join()
>>          self.failif(thread.status is 0, ...)
>>      ...
>>
> 
> Where is the Avocado test here?
> 
Please take a look at the response to Ademar. I forgot to describe what
was only in my head...

>>
>> This should serve the purpose, but to be able to understand failures,
>> one would have to add a lot of additional debug information and if one
>> wanted to re-use the Wget in other tests, he'd have to make it a library
>> shared with all the tests.
>>
>>
> 
> Making debug easy should not the reason to settle on a given
> architecture.  I understand this is *not* the solution you're proposing,
> so, let's focus on the architecture of the proposed solution.
> 
The point of this is that the example is very simple and in reality it'd
contain lots of boilerplate code to know what was happening there. Other
solutions does not suffer this.

>> Nested tests API
>> ----------------
>>
>> Another approach would be to say the "block of code" is the full avocado
>> test. The main benefits here are, that each avocado test provides
>> additional debug information in a well established format people are
>> used to from normal tests, allows one to split the complex problem into
>> separate parts (including separate development) and easy sharing of an
>> existing tests (eg. stress test, server setup, ...) and putting them
>> together like a Lego into complex scenarios.
>>
> 
> Here you're proposing that a "block of code" could be an Avocado test
> (avocado.Test).  Right, this is pretty clear and I should have no
> questions about it since I mentioned that on v2.
> 
> Now, on an updated RFC version like this, you should focus on why this
> is a good idea. It's obviously not a one-size-fits-all solution, but you
> should defend why it's an appropriate choice/compromise.  Focusing on
> the proposed choice will make the sight of the overall architecture
> clearer.
> 
I listed the pros and cons, I'm not sure what else to fit in. I'll think
about it.

>> On the negative side, avocado test is not the smallest piece of code and
>> it adds quite a bit of overhead. But for simpler code, one can execute
>> the code directly (threads, remoter) without a framework support.
>>
> 
> And THB, one universal way to defend it is as a choice is that, for
> lesser common use cases, users can always use their own code, as you put
> here.
> 
Not sure what THB is, google did not helped.

>> Example (simplified):
>>
>>      import avocado
>>
>>      class WgetExample(avocado.Test):
>>          def setUp(self):
>>              self.streams = avocado.Streams(self)
>>              for machine in machines:
>>                  self.streams.add_stream(machine)
>>          def test(self)
>>              for stream in self.streams:
>>                  stream.run_bg("/usr/bin/wget example.org")
>>              self.streams.wait(ignore_errors=False)
>>
>> where the `avocado.Stream` represents a worker (local or remote) which
>> allows running avocado tests in it (foreground or background). This
>> should provide enough flexibility to combine existing tests in complex
>> tests.
>>
> 
> This definition should have been given earlier, and without code to
> support it.  Then, when it appears on a code example such as this, it
> would be trivial to understand your proposal.
> 
I'm not sure, I like the reverse approach. people need to think about it
and then they discover if they thought it the same way. At least that's
how my brain sees that. Maybe it'll be better when I focus only on the
single variant

>> Instead of using plugin library for streams, we can develop it as
>> another test variant (not a new test type, only avocado.Test with some
>> additional initialization), called `avocado.MultiTest` or
>> `avocado.NestedTest`:
> 
> I miss what you mean by "plugin library for streams".  Also, by focusing
> on (how) "we can develop it" you miss the architecture, which, by the
> lack of complete definitions, is still unclear.
> 
Well this was inspired by your comments. One way to implement it would
be "tooled" test (basically the solution below). Because you did not
liked it, I added the library way, which means we'd develop `Streams`
class which would accept the current `Test` as parameter and take care
of the everything else.

Anyway I agree it's probably too early for optimization, so let's stick
to the library way for now. I'll change the examples accordingly in the v4.

>>
>>      import avocado
>>
>>      class WgetExample(avocado.NestedTest):
>>          # Machines are defined via params adn initialized
>>          # in NestedTest.setUp
>>          def test(self):
>>              for stream in self.streams:
>>                  stream.run("/usr/bin/wget example.org")
>>              self.wait(ignore_errors=False)
>>
> 
> So `avocado.NestedTest` is introduced, and has default code that sets up
> `self.streams`.  Then, all of a sudden, a stream becomes the executor of
> commands?  You had just a few paragraphs earlier defined the proposal
> for "blocks of code" was an `avocado.Test`.  This makes this example
> confusing.
> 
Please take a look at my response to Ademar. I wrote the `NestedTest`
class definition. Basically it's the same example as before, the
NestedTest is inherited from `avocado.Test` and it helps preparing the
execution so one does not need to write the additional code.

First I wanted to avoid writing it as a library as it requires storing
the running test in it, but probably we better start with it. We can
always add the Test class based on this later.

>>
>> API backed by internal API
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> _supported by cleber in v2 and I agree now_
>>
>> This would implement the nested test API using the internal API (from
>> avocado.core).
>>
>> + runs native python
>> + easy interaction and development
>> + easily extensible by either using internal API (and risk changes) or
>> by inheriting and extending the features.
>> - lots of internal API will be involved, thus with almost every change
>> of internal API we'd have to adjust this code to keep the NestedTest
>> working
>> - fabric/paramiko is not thread/parallel process safe and fails badly so
>> first we'd have to rewrite our remote execution code (use autotest's
>> worker, or aexpect+ssh)
>>
>>
>> API backed by cmdline
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> _liked by me in v2, hated by others, rejected in v3_
>>
>> This would implement the nested test API by translating it into "avocado
>> run" commands.
>>
>> + easy to debug as users are used to the "avocado run" syntax and issues
>> + allows manual mode where users trigger the "avocado run" manually
>> + cmdline args are part of public API so they should stay stable
>> + no issues with fabric/paramiko as each process is separate
>> + even easier extensible as one just needs to implement the feature for
>> "avocado run" and then can use it as extra_params in the worker, or send
>> PR to support it in the stable environment.
>> - would require additional features to be available on the cmdline like
>> streamline way of triggering tests
>> - only features available on the cmdline can be supported (currently not
>> limiting)
>> - rely on stdout parsing (but avocado supports machine readable output)
>>
>>
> 
> If this is really rejected (including by you), go ahead and drop it from
> the proposal.
> 
OK, the same reason as before, I'll remove it in v4 and focus only on
the single variant.

>> Synchronization
>> ===============
>>
>> Some tests do not need any synchronization, users just need to run them.
>> But some multi-stream tests needs to be precisely synchronized or they
>> need to exchange data.
>>
>> For synchronization purposes usually "barriers" are used, where barrier
>> guards the entry into a section identified by "name" and "number of
>> clients". All parties asking an entry into the section will be delayed
>> until the "number of clients" reach the section (or timeout). Then they
>> are resumed and can entry the section. Any failure while waiting for a
>> barrier propagates to other waiting parties.
>>
>> This can be all automated inside the `avocado.Streams`, which could
>> start listening on a free port and pass this information to the executed
>> code blocks. In the code blocks one simply imports `Sync` and initialize
>> it with the address+port and can use it for synchronization (or later
>> for data exchange).
>>
>>      from avocado.plugins.sync import Sync
>>      # Connect the sync server on address stored in params
>>      # which could be injected by the multi-stream test
>>      # or set manually.
>>      sync = Sync(self, params.get("sync_server", "/plugins/sync_server"))
>>      # wait until 2 tests ask to enter "setup" barrier (60s timeout)
>>      sync.barrier("setup", 2, 60)
>>
>> As before it can be part of the "NestedTest" test, initialized based on
>> params without the need for boilerplate code. The result would be the
>> same, avocado listens on some port and the tests can connect to this
>> port and asks for a barrier/data exchange, with the support for
>> re-connection.
>>
>> For debugging purposes it might be useful to allow starting the sync
>> server as avocado plugin eg. by `--sync-server ...` (or having another
>> command just to start listening, eg `avocado syncserver`). With that one
>> could spawn the multiple processes manually, without the need to run the
>> main multi-stream test and communicate over this manually started
>> server, or even just debug the behavior of one existing piece of the
>> bigger test and fake the other components by sending the messages
>> manually instead (eg. to see how it handles errors, timing issues,
>> unexpected situations).
>>
>> Again, there are several ways to implement this:
>>
>> Standard multiprocess API
>> -------------------------
>>
>> The standard python's multiprocessing library contains over the TCP
>> synchronization. The only problem is that "barriers" were introduced in
>> python3 so we'd have to backport it. Additionally it does not fit 100%
>> to our needs, so we'd have to adjust it a bit (eg. to allow manual
>> interaction)
>>
>>
>> Autotest's syncdata
>> -------------------
>>
>> Python 2.4 friendly, supports barriers and data synchronization. On the
>> contrary it's quite hackish and full of shortcuts.
>>
>>
>> Custom code
>> -----------
>>
>> We can inspire by the above and create simple human-readable (easy to
>> debug or interact with manually) protocol to support barriers and data
>> exchange via pickling. IMO that would be easier to maintain than
>> backporting and adjusting of the multiprocessing or fixing the autotest
>> syncdata. A proof-of-concept can be found here:
>>
>>      https://github.com/avocado-framework/avocado/pull/1019
>>
>> It modifies the "passtest" to be only executed when it's executed by 2
>> tests at the same time. The proof-of-concept does not support the
>> multi-stream tests, so one has to run "avocado run passtest" twice using
>> the same "--sync-server" (once --sync-server and once --sync).
>>
>>
> 
> Having sync support is part of the code concept.  Choosing one is
> simpler, IMHO.  I trust and agree your choice so far.  At implementation
> time, if limitations become clearer, we can revisit this.
> 
OK, I'll just describe it in the v4.

>> Job API RFC
>> ===========
>>
>> Recently introduced Job API RFC covers very similar topic as "nested
>> test", but it's not the same. The Job API is enabling users to modify
>> the job execution, eventually even write a runner which would suit them
>> to run groups of tests. On the contrary this RFC covers a way to combine
>> code-blocks/tests to reuse them into a single test. In a hackish way,
>> they can supplement each others, but the purpose is different.
>>
> 
> I think we should give the message about what a user of the Job API
> gets, and what the user of multi-stream test gets.  Let's just state
> what is the goal of each one.  If a user wants to hack their way into a
> "Frankenstein" approach, it's not our issue.
> 
>> One of the most obvious differences is, that a failed "nested" test can
>> be intentional (eg. reusing the NetPerf test to check if unreachable
>> machines can talk to each other), while in Job API it's always a failure.
>>
> 
> To me the difference is that by using the Job API the user would get to
> trigger one or more tests, with Job-level control benefits.  With
> multi-stream, a given test can have parts of it run on separate
> execution streams.  Each one is intended for a different scenario.  Our
> goal should be to make the stated goals nice and easy.  If user chooses
> to hammer them down to achieve different goals, it's their problem.  If
> user finds it easy to solve the stated goals with the different tool,
> it's our (design) problem.
> 
I think we are on the same boat, hopefully the examples help.

>> I hope you see the pattern. They are similar, but on a different layer.
>> Internally, though, they can share some pieces like execution the
>> individual tests concurrently with different params/plugins
>> (locally/remotely). All the needed plugin modifications would also be
>> useful for both of these RFCs.
>>
>> Some examples:
>>
>> User1 wants to run "compile_kernel" test on a machine followed by
>> "install_compiled_kernel passtest failtest warntest" on "machine1
>> machine2". They depend on the status of the previous test, but they
>> don't create a scenario. So the user should use Job API (or execute 3
>> jobs manually).
>>
> 
> User 1 writes a custom job, that optionally runs 3 tests.  Looks fine.
> 
>> User2 wants to create migration test, which starts migration from
>> machine1 and receives the migration on machine2. It requires cooperation
>> and together it creates one complex usecase so the user should use
>> multi-stream test.
> 
> Yes, one single test here (migration), some pieces executed as different
> streams (on different machines).
> 
>>
>>
>> Conclusion
>> ==========
>>
>> Given the reasons I like the idea of "nested tests" using "API backed by
>> internal API" as it is simple to start with, allows test reuse which
>> gives us well known test result format and internal API allow greater
>> flexibility for the future.
>>
>> The netperf example from introduction would look like this:
>>
>> Machine1:
>>
>>      class NetServer(avocado.NestedTest):
>>          def setUp(self):
>>              process.run("netserver")
>>              self.barrier("setup", self.params.get("no_clients"))
>>          def test(self):
>>              pass
>>          def tearDown(self):
>>              self.barrier("finished", self.params.get("no_clients"))
>>              process.run("killall netserver")
>>
>> Machine2:
>>
>>      class NetPerf(avocado.NestedTest):
>>          def setUp(self):
>>              self.barrier("setup", params.get("no_clients"))
>>          def test(self):
>>              process.run("netperf -H %s -l 60"
>>                          % params.get("server_ip"))
>>              barrier("finished", params.get("no_clients"))
>>
> 
> It's unclear why those are `avocado.NestedTest`, since the previous
> example suggested a specialized `avocado.Test` that would have
> `self.streams` already setup.
> 
Yep, the `avocado.NestedTest` is inherited from `avocado.Test` only it
additionally setups the streams and allows some convenient methods to
handle this kind of test. I'll use the library approach in the next
version as this idea was not welcomed warmly...

>> One would be able to run this manually (or from build systems) using:
>>
>>      avocado syncserver &
>>      avocado run NetServer --mux-inject /plugins/sync_server:sync-server
>> $SYNCSERVER &
>>      avocado run NetPerf --mux-inject /plugins/sync_server:sync-server
>> $SYNCSERVER &
>>
>> (where the --mux-inject passes the address of the "syncserver" into test
>> params)
>>
>> When the code is stable one would write this multi-stream test (or
>> multiple variants of them) to do the above automatically:
>>
>>      class MultiNetperf(avocado.NestedTest):
>>          def setUp(self):
>>              self.failif(len(self.streams) < 2)
>>          def test(self):
>>              self.streams[0].run_bg("NetServer",
>>                                     {"no_clients": len(self.streams)})
>>              for stream in self.streams[1:]:
>>                  stream.add_test("NetPerf",
>>                                  {"no_clients": len(self.workers),
>>                                   "server_ip": machines[0]})
>>              self.wait(ignore_failures=False)
>>
> 
> How would a user specify where a given stream is going to be run?
> 
Please take a look at my response to Ademar. It's via test parameters.

>> Executing of the complex example would become:
>>
>>      avocado run MultiNetperf
>>
>> You can see that the test allows running several NetPerf tests
>> simultaneously, either locally, or distributed across multiple machines
>> (or combinations) just by changing parameters. Additionally by adding
>> features to the nested tests, one can use different NetPerf commands, or
>> add other tests to be executed together.
>>
>> The results could look like this:
>>
>>
>>      $ tree $RESULTDIR
>>        └── test-results
>>            └── MultiNetperf
>>                ├── job.log
>>                    ...
>>                ├── 1
>>                │   └── job.log
>>                        ...
>>                └── 2
>>                    └── job.log
>>                        ...
>>
> 
> The multiple `job.log` files here makes things confusing... do we have a
> single job that ran a single test?
> 
copy&paste error. Please take a look at my response to Ademar.

>> Where the MultiNetperf/job.log contains combined logs of the "master"
>> test and all the "nested" tests and the sync server.
>>
>> Directories [12] contain results of the created (possibly even named)
>> streams. I think they should be in form of standard avocado Job to keep
>> the well known structure.
> 
> To keep the Avocado Job structure, they'd either have to be Avocado
> Jobs, or we'd have to fake them...  Then, of a sudden, we have things
> that look like jobs, but are not jobs.  How would users of the Job API
> react when then find out that their custom jobs have a single `job.log`
> and users of a multi-stream tests have multiple `job.log`s?
> 
> I'd not trade the familiarity of the job log format for the structure of
> the architecture we've been struggling to define.
I tried to describe this in the response to Ademar. It's not that simple
(the same way JobAPI is not simple as one test can be resolved in
multiple tests).

> 
> My final suggestion: define all the core concepts and let us know how
> they all fit. In text form.  Then, when we get to code examples, they
> should all be obvious.  Refrain from implementation details at this point.
> 

I can try. Naturally I see it from the other way around (we're from a
different hemisphere ;-) ), but let's give it a try next time.

Thank you for the suggestions,
Lukáš

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/avocado-devel/attachments/20160420/03dce856/attachment.sig>