[Avocado-devel] Running tests in parallel

Wed Nov 23 14:46:07 UTC 2016

Hi,

On 11/23/2016 02:28 PM, Cleber Rosa wrote:
>
> On 11/23/2016 07:07 AM, Zubair Lutfullah Kakakhel wrote:
>> Hi,
>>
>> Thank you for your comprehensive reply!
>>
>> Comments inline.
>>
>> On 11/22/2016 02:11 PM, Cleber Rosa wrote:
>>> On 11/22/2016 07:53 AM, Zubair Lutfullah Kakakhel wrote:
>>>> Hi,
>>>>
>>>
>>> Hi Zubair,
>>>
>>>> There are quite a few threads about this and a trello card
>>>> https://trello.com/c/xNeR2slj/255-support-running-tests-in-parallel
>>>>
>>>> And the discussion leads to a complex multi-host RFC.
>>>> https://www.redhat.com/archives/avocado-devel/2016-March/msg00025.html
>>>>
>>>> Our requirement is simpler.
>>>> All we wanted to do is run disjoint simple (c executables) tests in
>>>> parallel.
>>>>
>>>
>>> Sounds fair enough.
>>>
>>>> I was wondering if somebody has a WIP branch that has some level of
>>>> implementation for this?
>>>
>>> I'm not familiar with a WiP or PoC on this (yet).  If anyone has
>>> experimented with it, I'd happy to hear about it.
>>>
>>>> Or If somebody is familiar with the code base, I'd appreciate some
>>>> direction on how to implement this.
>>>>
>>>
>>> Avocado already runs every single test in a fresh new process.  This is,
>>> at least theoretically,  a good start.  Also, the test process is
>>> handled based on the standard Python multiprocessing module:
>>>
>>> https://github.com/avocado-framework/avocado/blob/master/avocado/core/runner.py#L363
>>>
>>>
>>> The first experimentation I'd do would be to attempt using the also
>>> Python standard multiprocessing.Pool:
>>>
>>> https://docs.python.org/2.7/library/multiprocessing.html#using-a-pool-of-workers
>>>
>>
>> In this case, there would be a separate python thread for each test
>> being run in parallel.
>> Each python thread would actually call the test executable using a
>> sub-process?
>>
>
> Ideally, the Avocado test runner would remain a single process, that is,
> without one additional thread (or process) to manage each *test* process.
>
>> That can be OK for Desktops but won't scale well for using avocado in
>> memory
>> constrained Embedded devices.
>>
>
> I must admit I haven't attempted to run Avocado in resource constrained
> environments.  Can you explain what is your bigger concern?

In our case, primarily memory. Even for dormant processes. Although cpu usage is also
a concern.

Imagine running Avocado on a slightly beefy WiFi router with 128 Mbytes of RAM.
One python process is slow/difficult. Run a few python processes in parallel.
And the Kernel Out of Memory killer starts killing processes.

>
> Do you feel that Avocado (as a single process test *runner*) plus one
> process for each *test* is not suitable to those environments?

Avocado should only be running one process ideally.
And each test should be running 'only' its process.

I think we've confused the dialogue with terminology.
Threads/processes/subprocesses/multiprocessing
I'll attempt to clarify.

My current understanding of Avocado

Avocado-runner parent process
runs - > Avocado test thread using multiprocessing.Process here [1]
          run - > Actual test executable using subprocess here [2]

Is this correct?
Is there a particular purpose the runner starts a separate thread which
actually calls the test executable?

Now coming back to running tests in parallel.

You mentioned using multiprocessing.Pool. In that case, there could be
a potential issue for constrained devices.
e.g. Running 4 tests in parallel.

Avocado-runner parent process
runs - > Avocado test thread using multiprocessing.Process here [1]
          run - > Actual test process using subprocess here [2]
runs - > Avocado test thread using multiprocessing.Process here [1]
          run - > Actual test process using subprocess here [2]
runs - > Avocado test thread using multiprocessing.Process here [1]
          run - > Actual test process using subprocess here [2]
runs - > Avocado test thread using multiprocessing.Process here [1]
          run - > Actual test process using subprocess here [2]

So 4 tests would actually result in 9 processes being created.
1 runner in Python
4 of them mostly dormant Python multiprocessing. (their purpose is a bit unclear)
4 actual executables.

Ideally, that number should be 5 for running 4 tests.
Avocado-runner parent process
run - > Actual test process using subprocess here [2]
run - > Actual test process using subprocess here [2]
run - > Actual test process using subprocess here [2]
run - > Actual test process using subprocess here [2]

I hope this doesn't look even more confusing :)

Regards,
ZubairLK

[1] https://github.com/avocado-framework/avocado/blob/master/avocado/core/runner.py#L363
[2] https://github.com/avocado-framework/avocado/blob/master/avocado/utils/process.py#L273

>
> - Cleber.
>
>> Please correct me if I am reading this incorrectly.
>>
>> Regards,
>> ZubairLK
>>
>>>
>>> This would most certainly lead to changes in how Avocado currently
>>> serially waits for the test status:
>>>
>>> https://github.com/avocado-framework/avocado/blob/master/avocado/core/runner.py#L403
>>>
>>>
>>> Which ultimately is added to the (Job wide) results:
>>>
>>> https://github.com/avocado-framework/avocado/blob/master/avocado/core/runner.py#L455
>>>
>>>
>>> Since the results for many tests will now be acquired in unpredictable
>>> order, this will require changes to the ResultEvent based plugins (such
>>> as the UI).
>>>
>>>> Thanks
>>>>
>>>> Regards,
>>>> ZubairLK
>>>>
>>>
>>> I hope this is a good initial set of pointers.  If you feel adventurous
>>> and wants to start hacking on this, you're more then welcome.
>>>
>>> BTW: we've had quite a number of features that started as
>>> experiments/ideas/not-really-perfect-pull-requests from the community
>>> that Avocado "core team" members embraced and pushed all the way to
>>> completeness.
>>>
>