[Avocado-devel] RFC: multi-stream test (previously multi-test) [v3]

Wed Apr 20 19:53:32 UTC 2016

On 04/20/2016 03:02 PM, Lukáš Doktor wrote:
> Dne 19.4.2016 v 22:18 Cleber Rosa napsal(a):
>>
>>
>> On 04/15/2016 03:05 AM, Lukáš Doktor wrote:
>>> Hello again,
>>>
>>> There were couple of changes and the new Job API RFC, which might sound
>>> similar to this RFC, but it covers different parts. Let's update the
>>> multi-test RFC and fix the terminology, which might had been a bit
>>> misleading.
>>>
>>> Changes:
>>>
>>>       v2: Rewritten from scratch
>>>       v2: Added examples for the demonstration to avoid confusion
>>>       v2: Removed the mht format (which was there to demonstrate manual
>>>           execution)
>>>       v2: Added 2 solutions for multi-tests
>>>       v2: Described ways to support synchronization
>>>       v3: Renamed to multi-stream as it befits the purpose
>>>       v3: Improved introduction
>>>       v3: Workers are renamed to streams
>>>       v3: Added example which uses library, instead of new test
>>>       v3: Multi-test renamed to nested tests
>>>       v3: Added section regarding Job API RFC
>>>       v3: Better description of the Synchronization section
>>>       v3: Improved conclusion
>>>       v3: Removed the "Internal API" section (it was a transition between
>>>           no support and "nested test API", not a "real" solution)
>>>       v3: Using per-test granularity in nested tests (requires plugins
>>>           refactor from Job API, but allows greater flexibility)
>>>
>>>
>>> The problem
>>> ===========
>>>
>>> Allow tests to have some if its block of code run in separate stream(s).
>>> We'll discuss the range of "block of code" further in the text.
>>>
>>
>> I believe it's also important to define what "stream" means.  The reason
>> is that it's used both as an abstraction, and as a more concrete
>> component in the code examples that follow.
>>
> OK, I'll add this to v4
>
>>> One example could be a user, who wants to run netperf on 2 machines,
>>> which requires following manual steps:
>>>
>>>
>>>       machine1: netserver -D
>>>       machine1: # Wait till netserver is initialized
>>>       machine2: netperf -H $machine1 -l 60
>>>       machine2: # Wait till it finishes and report the results
>>>       machine1: # stop the netserver and report possible failures
>>>
>>> the test would have to contain the code for both, machine1 and machine2
>>> and it executes them in two separate streams, which might or not be
>>> executed on the same machine.
>>>
>>
>> I can understand what you mean here just fine, but it's rather confusing
>> to say "machine1 and machine2" and at the same time "migh or not be
>> executed on the same machine".
>>
>> This brings us back to the stream concept.  I see the streams as the
>> running, isolated, execution of "code blocks".  This execution may be on
>> the same machine or not.
>>
>> With those statements in mind, I'd ask you to give your formal
>> definition and vision of the the stream concept.
>>
> I hope we share the same view, I'll try to put it on paper (keyboard)
> while writing the v4.
>
>>> You can see that each stream is valid even without the other, so
>>> additional requirement would be to allow easy share of those block of
>>> codes among other tests. Splitting the problem in two could also
>>> sometimes help in analyzing the failures.
>>>
>>
>> Here you say that a stream is isolated from each other.  This matches my
>> understanding of streams as "running, isolated execution of code blocks".
>>
>> But "help in analyzing failures" should not be a core part or reason for
>> this architecture.  It can be a bonus point.  Still, let's try to focus
>> on the very core components on the architecture and drop the discussion
>> about the lesser important aspects.
>>
> Yep, I'm sorry for confusion, I meant it as another possible benefit,
> but not a requirement.
>
>>> Some other examples might be:
>>>
>>> 1. A simple stress routine being executed in parallel (the same or
>>> different hosts)
>>> 2. Several code blocks being combined into a complex scenario(s)
>>> 3. Running the same test along with stress test in background
>>>
>>> For demonstrating purposes this RFC uses a very simple example fitting
>>> in the category (1). It downloads the main page from "example.org"
>>> location using "wget" (almost) concurrently from several machines.
>>>
>>>
>>> Standard python libraries
>>> -------------------------
>>>
>>> One can run pieces of python code directly using python's
>>> multiprocessing library, without any need for the avocado-framework
>>> support. But there is quite a lot of cons:
>>>
>>> + no need for framework API
>>> - lots of boilerplate code in each test
>>> - each solution would be unique and therefor hard to analyze the logs
>>> - no decent way of sharing the code with other tests
>>>
>>
>> IMHO you can drop the reasons on why *not* to use lower level or just
>> different code.  If, during research we came to find some other external
>> project/framework/library, we should just have used it and documented
>> it.  Since this is not the case, let's just avoid getting distracted on
>> this RFC.
>>
> Yep, as I got only response from you, I wanted to keep the variants
> here. I'll remove them in the next version.
>
>>> Yes, it's possible to share the code by writing libraries, but that does
>>> not scale as other solutions...
>>>
>>
>> This is a justification of why we should do it the Avocado way.  Again,
>> if there were better non-Avocado ways, we shouldn't be discussing
>> anything else than those other possible solutions.  I'm being repetitive
>> here because I really believe we should focus on our proposed
>> architecture key points.
>>
>>> Example (simplified):
>>>
>>>       from avocado.core.remoter import Remote
>>>       from threading import Thread
>>>       ...
>>>       class Wget(Thread):
>>>
>>>           def __init__(self, machine, url):
>>>               self.remoter = Remote(machine)
>>>               self.url = url
>>>               self.status = None
>>>
>>>           def run(self):
>>>               ret = self.remoter.run("wget %s" % self.target,
>>>                                      ignore_status=True)
>>>               self.status = ret.exit_status
>>>       ...
>>>
>>>       threads = []
>>>       for machine in machines:
>>>           threads.append(Wget(machine, "example.org"))
>>>       for thread in threads:
>>>           thread.start()
>>>       for thread in threads:
>>>           thread.join()
>>>           self.failif(thread.status is 0, ...)
>>>       ...
>>>
>>
>> Where is the Avocado test here?
>>
> Please take a look at the response to Ademar. I forgot to describe what
> was only in my head...
>
>>>
>>> This should serve the purpose, but to be able to understand failures,
>>> one would have to add a lot of additional debug information and if one
>>> wanted to re-use the Wget in other tests, he'd have to make it a library
>>> shared with all the tests.
>>>
>>>
>>
>> Making debug easy should not the reason to settle on a given
>> architecture.  I understand this is *not* the solution you're proposing,
>> so, let's focus on the architecture of the proposed solution.
>>
> The point of this is that the example is very simple and in reality it'd
> contain lots of boilerplate code to know what was happening there. Other
> solutions does not suffer this.
>
>>> Nested tests API
>>> ----------------
>>>
>>> Another approach would be to say the "block of code" is the full avocado
>>> test. The main benefits here are, that each avocado test provides
>>> additional debug information in a well established format people are
>>> used to from normal tests, allows one to split the complex problem into
>>> separate parts (including separate development) and easy sharing of an
>>> existing tests (eg. stress test, server setup, ...) and putting them
>>> together like a Lego into complex scenarios.
>>>
>>
>> Here you're proposing that a "block of code" could be an Avocado test
>> (avocado.Test).  Right, this is pretty clear and I should have no
>> questions about it since I mentioned that on v2.
>>
>> Now, on an updated RFC version like this, you should focus on why this
>> is a good idea. It's obviously not a one-size-fits-all solution, but you
>> should defend why it's an appropriate choice/compromise.  Focusing on
>> the proposed choice will make the sight of the overall architecture
>> clearer.
>>
> I listed the pros and cons, I'm not sure what else to fit in. I'll think
> about it.
>
>>> On the negative side, avocado test is not the smallest piece of code and
>>> it adds quite a bit of overhead. But for simpler code, one can execute
>>> the code directly (threads, remoter) without a framework support.
>>>
>>
>> And THB, one universal way to defend it is as a choice is that, for
>> lesser common use cases, users can always use their own code, as you put
>> here.
>>
> Not sure what THB is, google did not helped.
>

Sorry, this is a typo. Should be TBH, that is, to be honest.

>>> Example (simplified):
>>>
>>>       import avocado
>>>
>>>       class WgetExample(avocado.Test):
>>>           def setUp(self):
>>>               self.streams = avocado.Streams(self)
>>>               for machine in machines:
>>>                   self.streams.add_stream(machine)
>>>           def test(self)
>>>               for stream in self.streams:
>>>                   stream.run_bg("/usr/bin/wget example.org")
>>>               self.streams.wait(ignore_errors=False)
>>>
>>> where the `avocado.Stream` represents a worker (local or remote) which
>>> allows running avocado tests in it (foreground or background). This
>>> should provide enough flexibility to combine existing tests in complex
>>> tests.
>>>
>>
>> This definition should have been given earlier, and without code to
>> support it.  Then, when it appears on a code example such as this, it
>> would be trivial to understand your proposal.
>>
> I'm not sure, I like the reverse approach. people need to think about it
> and then they discover if they thought it the same way. At least that's
> how my brain sees that. Maybe it'll be better when I focus only on the
> single variant
>

Right, obviously we have different brains, which is good.  My suggestion 
then is to make sure there's nothing implicit (can be harder with code 
IMHO).

>>> Instead of using plugin library for streams, we can develop it as
>>> another test variant (not a new test type, only avocado.Test with some
>>> additional initialization), called `avocado.MultiTest` or
>>> `avocado.NestedTest`:
>>
>> I miss what you mean by "plugin library for streams".  Also, by focusing
>> on (how) "we can develop it" you miss the architecture, which, by the
>> lack of complete definitions, is still unclear.
>>
> Well this was inspired by your comments. One way to implement it would
> be "tooled" test (basically the solution below). Because you did not
> liked it, I added the library way, which means we'd develop `Streams`
> class which would accept the current `Test` as parameter and take care
> of the everything else.
>
> Anyway I agree it's probably too early for optimization, so let's stick
> to the library way for now. I'll change the examples accordingly in the v4.
>

OK, waiting for v4.

>>>
>>>       import avocado
>>>
>>>       class WgetExample(avocado.NestedTest):
>>>           # Machines are defined via params adn initialized
>>>           # in NestedTest.setUp
>>>           def test(self):
>>>               for stream in self.streams:
>>>                   stream.run("/usr/bin/wget example.org")
>>>               self.wait(ignore_errors=False)
>>>
>>
>> So `avocado.NestedTest` is introduced, and has default code that sets up
>> `self.streams`.  Then, all of a sudden, a stream becomes the executor of
>> commands?  You had just a few paragraphs earlier defined the proposal
>> for "blocks of code" was an `avocado.Test`.  This makes this example
>> confusing.
>>
> Please take a look at my response to Ademar. I wrote the `NestedTest`
> class definition. Basically it's the same example as before, the
> NestedTest is inherited from `avocado.Test` and it helps preparing the
> execution so one does not need to write the additional code.
>
> First I wanted to avoid writing it as a library as it requires storing
> the running test in it, but probably we better start with it. We can
> always add the Test class based on this later.
>

I think I understand what you mean now... still, v4 should help to make 
sure I get it.

>>>
>>> API backed by internal API
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> _supported by cleber in v2 and I agree now_
>>>
>>> This would implement the nested test API using the internal API (from
>>> avocado.core).
>>>
>>> + runs native python
>>> + easy interaction and development
>>> + easily extensible by either using internal API (and risk changes) or
>>> by inheriting and extending the features.
>>> - lots of internal API will be involved, thus with almost every change
>>> of internal API we'd have to adjust this code to keep the NestedTest
>>> working
>>> - fabric/paramiko is not thread/parallel process safe and fails badly so
>>> first we'd have to rewrite our remote execution code (use autotest's
>>> worker, or aexpect+ssh)
>>>
>>>
>>> API backed by cmdline
>>> ~~~~~~~~~~~~~~~~~~~~~
>>>
>>> _liked by me in v2, hated by others, rejected in v3_
>>>
>>> This would implement the nested test API by translating it into "avocado
>>> run" commands.
>>>
>>> + easy to debug as users are used to the "avocado run" syntax and issues
>>> + allows manual mode where users trigger the "avocado run" manually
>>> + cmdline args are part of public API so they should stay stable
>>> + no issues with fabric/paramiko as each process is separate
>>> + even easier extensible as one just needs to implement the feature for
>>> "avocado run" and then can use it as extra_params in the worker, or send
>>> PR to support it in the stable environment.
>>> - would require additional features to be available on the cmdline like
>>> streamline way of triggering tests
>>> - only features available on the cmdline can be supported (currently not
>>> limiting)
>>> - rely on stdout parsing (but avocado supports machine readable output)
>>>
>>>
>>
>> If this is really rejected (including by you), go ahead and drop it from
>> the proposal.
>>
> OK, the same reason as before, I'll remove it in v4 and focus only on
> the single variant.
>

OK.

>>> Synchronization
>>> ===============
>>>
>>> Some tests do not need any synchronization, users just need to run them.
>>> But some multi-stream tests needs to be precisely synchronized or they
>>> need to exchange data.
>>>
>>> For synchronization purposes usually "barriers" are used, where barrier
>>> guards the entry into a section identified by "name" and "number of
>>> clients". All parties asking an entry into the section will be delayed
>>> until the "number of clients" reach the section (or timeout). Then they
>>> are resumed and can entry the section. Any failure while waiting for a
>>> barrier propagates to other waiting parties.
>>>
>>> This can be all automated inside the `avocado.Streams`, which could
>>> start listening on a free port and pass this information to the executed
>>> code blocks. In the code blocks one simply imports `Sync` and initialize
>>> it with the address+port and can use it for synchronization (or later
>>> for data exchange).
>>>
>>>       from avocado.plugins.sync import Sync
>>>       # Connect the sync server on address stored in params
>>>       # which could be injected by the multi-stream test
>>>       # or set manually.
>>>       sync = Sync(self, params.get("sync_server", "/plugins/sync_server"))
>>>       # wait until 2 tests ask to enter "setup" barrier (60s timeout)
>>>       sync.barrier("setup", 2, 60)
>>>
>>> As before it can be part of the "NestedTest" test, initialized based on
>>> params without the need for boilerplate code. The result would be the
>>> same, avocado listens on some port and the tests can connect to this
>>> port and asks for a barrier/data exchange, with the support for
>>> re-connection.
>>>
>>> For debugging purposes it might be useful to allow starting the sync
>>> server as avocado plugin eg. by `--sync-server ...` (or having another
>>> command just to start listening, eg `avocado syncserver`). With that one
>>> could spawn the multiple processes manually, without the need to run the
>>> main multi-stream test and communicate over this manually started
>>> server, or even just debug the behavior of one existing piece of the
>>> bigger test and fake the other components by sending the messages
>>> manually instead (eg. to see how it handles errors, timing issues,
>>> unexpected situations).
>>>
>>> Again, there are several ways to implement this:
>>>
>>> Standard multiprocess API
>>> -------------------------
>>>
>>> The standard python's multiprocessing library contains over the TCP
>>> synchronization. The only problem is that "barriers" were introduced in
>>> python3 so we'd have to backport it. Additionally it does not fit 100%
>>> to our needs, so we'd have to adjust it a bit (eg. to allow manual
>>> interaction)
>>>
>>>
>>> Autotest's syncdata
>>> -------------------
>>>
>>> Python 2.4 friendly, supports barriers and data synchronization. On the
>>> contrary it's quite hackish and full of shortcuts.
>>>
>>>
>>> Custom code
>>> -----------
>>>
>>> We can inspire by the above and create simple human-readable (easy to
>>> debug or interact with manually) protocol to support barriers and data
>>> exchange via pickling. IMO that would be easier to maintain than
>>> backporting and adjusting of the multiprocessing or fixing the autotest
>>> syncdata. A proof-of-concept can be found here:
>>>
>>>       https://github.com/avocado-framework/avocado/pull/1019
>>>
>>> It modifies the "passtest" to be only executed when it's executed by 2
>>> tests at the same time. The proof-of-concept does not support the
>>> multi-stream tests, so one has to run "avocado run passtest" twice using
>>> the same "--sync-server" (once --sync-server and once --sync).
>>>
>>>
>>
>> Having sync support is part of the code concept.  Choosing one is
>> simpler, IMHO.  I trust and agree your choice so far.  At implementation
>> time, if limitations become clearer, we can revisit this.
>>
> OK, I'll just describe it in the v4.
>
>>> Job API RFC
>>> ===========
>>>
>>> Recently introduced Job API RFC covers very similar topic as "nested
>>> test", but it's not the same. The Job API is enabling users to modify
>>> the job execution, eventually even write a runner which would suit them
>>> to run groups of tests. On the contrary this RFC covers a way to combine
>>> code-blocks/tests to reuse them into a single test. In a hackish way,
>>> they can supplement each others, but the purpose is different.
>>>
>>
>> I think we should give the message about what a user of the Job API
>> gets, and what the user of multi-stream test gets.  Let's just state
>> what is the goal of each one.  If a user wants to hack their way into a
>> "Frankenstein" approach, it's not our issue.
>>
>>> One of the most obvious differences is, that a failed "nested" test can
>>> be intentional (eg. reusing the NetPerf test to check if unreachable
>>> machines can talk to each other), while in Job API it's always a failure.
>>>
>>
>> To me the difference is that by using the Job API the user would get to
>> trigger one or more tests, with Job-level control benefits.  With
>> multi-stream, a given test can have parts of it run on separate
>> execution streams.  Each one is intended for a different scenario.  Our
>> goal should be to make the stated goals nice and easy.  If user chooses
>> to hammer them down to achieve different goals, it's their problem.  If
>> user finds it easy to solve the stated goals with the different tool,
>> it's our (design) problem.
>>
> I think we are on the same boat, hopefully the examples help.
>
>>> I hope you see the pattern. They are similar, but on a different layer.
>>> Internally, though, they can share some pieces like execution the
>>> individual tests concurrently with different params/plugins
>>> (locally/remotely). All the needed plugin modifications would also be
>>> useful for both of these RFCs.
>>>
>>> Some examples:
>>>
>>> User1 wants to run "compile_kernel" test on a machine followed by
>>> "install_compiled_kernel passtest failtest warntest" on "machine1
>>> machine2". They depend on the status of the previous test, but they
>>> don't create a scenario. So the user should use Job API (or execute 3
>>> jobs manually).
>>>
>>
>> User 1 writes a custom job, that optionally runs 3 tests.  Looks fine.
>>
>>> User2 wants to create migration test, which starts migration from
>>> machine1 and receives the migration on machine2. It requires cooperation
>>> and together it creates one complex usecase so the user should use
>>> multi-stream test.
>>
>> Yes, one single test here (migration), some pieces executed as different
>> streams (on different machines).
>>
>>>
>>>
>>> Conclusion
>>> ==========
>>>
>>> Given the reasons I like the idea of "nested tests" using "API backed by
>>> internal API" as it is simple to start with, allows test reuse which
>>> gives us well known test result format and internal API allow greater
>>> flexibility for the future.
>>>
>>> The netperf example from introduction would look like this:
>>>
>>> Machine1:
>>>
>>>       class NetServer(avocado.NestedTest):
>>>           def setUp(self):
>>>               process.run("netserver")
>>>               self.barrier("setup", self.params.get("no_clients"))
>>>           def test(self):
>>>               pass
>>>           def tearDown(self):
>>>               self.barrier("finished", self.params.get("no_clients"))
>>>               process.run("killall netserver")
>>>
>>> Machine2:
>>>
>>>       class NetPerf(avocado.NestedTest):
>>>           def setUp(self):
>>>               self.barrier("setup", params.get("no_clients"))
>>>           def test(self):
>>>               process.run("netperf -H %s -l 60"
>>>                           % params.get("server_ip"))
>>>               barrier("finished", params.get("no_clients"))
>>>
>>
>> It's unclear why those are `avocado.NestedTest`, since the previous
>> example suggested a specialized `avocado.Test` that would have
>> `self.streams` already setup.
>>
> Yep, the `avocado.NestedTest` is inherited from `avocado.Test` only it
> additionally setups the streams and allows some convenient methods to
> handle this kind of test. I'll use the library approach in the next
> version as this idea was not welcomed warmly...
>
>>> One would be able to run this manually (or from build systems) using:
>>>
>>>       avocado syncserver &
>>>       avocado run NetServer --mux-inject /plugins/sync_server:sync-server
>>> $SYNCSERVER &
>>>       avocado run NetPerf --mux-inject /plugins/sync_server:sync-server
>>> $SYNCSERVER &
>>>
>>> (where the --mux-inject passes the address of the "syncserver" into test
>>> params)
>>>
>>> When the code is stable one would write this multi-stream test (or
>>> multiple variants of them) to do the above automatically:
>>>
>>>       class MultiNetperf(avocado.NestedTest):
>>>           def setUp(self):
>>>               self.failif(len(self.streams) < 2)
>>>           def test(self):
>>>               self.streams[0].run_bg("NetServer",
>>>                                      {"no_clients": len(self.streams)})
>>>               for stream in self.streams[1:]:
>>>                   stream.add_test("NetPerf",
>>>                                   {"no_clients": len(self.workers),
>>>                                    "server_ip": machines[0]})
>>>               self.wait(ignore_failures=False)
>>>
>>
>> How would a user specify where a given stream is going to be run?
>>
> Please take a look at my response to Ademar. It's via test parameters.
>
>>> Executing of the complex example would become:
>>>
>>>       avocado run MultiNetperf
>>>
>>> You can see that the test allows running several NetPerf tests
>>> simultaneously, either locally, or distributed across multiple machines
>>> (or combinations) just by changing parameters. Additionally by adding
>>> features to the nested tests, one can use different NetPerf commands, or
>>> add other tests to be executed together.
>>>
>>> The results could look like this:
>>>
>>>
>>>       $ tree $RESULTDIR
>>>         └── test-results
>>>             └── MultiNetperf
>>>                 ├── job.log
>>>                     ...
>>>                 ├── 1
>>>                 │   └── job.log
>>>                         ...
>>>                 └── 2
>>>                     └── job.log
>>>                         ...
>>>
>>
>> The multiple `job.log` files here makes things confusing... do we have a
>> single job that ran a single test?
>>
> copy&paste error. Please take a look at my response to Ademar.
>
>>> Where the MultiNetperf/job.log contains combined logs of the "master"
>>> test and all the "nested" tests and the sync server.
>>>
>>> Directories [12] contain results of the created (possibly even named)
>>> streams. I think they should be in form of standard avocado Job to keep
>>> the well known structure.
>>
>> To keep the Avocado Job structure, they'd either have to be Avocado
>> Jobs, or we'd have to fake them...  Then, of a sudden, we have things
>> that look like jobs, but are not jobs.  How would users of the Job API
>> react when then find out that their custom jobs have a single `job.log`
>> and users of a multi-stream tests have multiple `job.log`s?
>>
>> I'd not trade the familiarity of the job log format for the structure of
>> the architecture we've been struggling to define.
> I tried to describe this in the response to Ademar. It's not that simple
> (the same way JobAPI is not simple as one test can be resolved in
> multiple tests).
>
>>
>> My final suggestion: define all the core concepts and let us know how
>> they all fit. In text form.  Then, when we get to code examples, they
>> should all be obvious.  Refrain from implementation details at this point.
>>
>
> I can try. Naturally I see it from the other way around (we're from a
> different hemisphere ;-) ), but let's give it a try next time.
>

Sure thing, looking forward for the v4.

Thanks!
- Cleber.

> Thank you for the suggestions,
> Lukáš
>

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]