[Pulp-dev] Coverage report second round

Elyezer Rezende erezende at redhat.com
Tue Aug 30 20:00:23 UTC 2016


I have created this gist [1] which shows a simple reproducer and also
provides coverage data when using prefork and solo pool cls.

That shows that when using prefork the coverage data is simply not captured.

[1] https://gist.github.com/elyezer/2d8225843451364cbb19af08f4f52de3

On Tue, Aug 30, 2016 at 4:03 PM, Brian Bouterse <bbouters at redhat.com> wrote:

> Using the solo thread pool is good to get a one-off coverage report. The
> one you linked to [3] can be used to talk about the state of pulp smash at
> that point in time. I spot checked the report and the areas that I knew
> were incorrectly reported as uncovered before are shown as covered in [3]
> so that is great.
>
> For a permanent fix to the coverage problem, I recommend not putting the
> workaround in place on Jenkins. The test environment needs to run exactly
> the same as the production infrastructure. As one example, there could be
> meaningful problems that occur in forking celery environments which would
> not be discovered in the test environment which would not fork.
>
> In talking with @asksol he indicates that this is not a known issue in
> celery. Can an issue with a simple reproducer be filed against celery and
> we can pick up the technical conversation on that bug? Maybe reply on this
> thread with a link?
>
> Thank you for all the work that you've done on this issue. This is not an
> easy problem. I really appreciate your effort here.
>
> -Brian
>
>
>
> On 08/29/2016 08:40 PM, Elyezer Rezende wrote:
>
>> All,
>>
>> During this month of August I've been working on making the coverage
>> report we have for Pulp Automation better. I got stuck many times during
>> the process but I've finally found something that helped improving it.
>>
>> The current coverage report is not reliable since any of the Celery
>> tasks have its coverage data being captured. Because that we had some
>> guesses on what could be causing that, we suspected that for some reason
>> the coverage hook was not being installed and so the data not being
>> captured, also we suspected that coverage was not playing good on how
>> Celery forks its processes.
>>
>> With those two guesses in mind, I've read Celery and Coverage source
>> code and also got some help from the community. This allowed me to
>> discover that Celery uses a forked multiprocessing library called
>> Billiard [1], knowing that then I started checking if coverage.py was
>> able to capture coverage data multiprocessing processes [2] and created
>> a sample Django + Celery application to test some assumptions. The first
>> one I tested was if coverage.py was really able to capture coverage data
>> from decorated functions (since celery tasks are decorated functions)
>> ran by the celery workers, the first try failed since I had to make a
>> call from a celery task and celery tasks was not being captured at all
>> which made also the simple decorated function data not be captured. So
>> my first assumption have failed and I got stuck one more time on the
>> coverage data not being captured issue. Continuing the research I found
>> the POOL_CLS configuration for the celery workers, that allowed me to
>> change how celery deals with concurrency and it happened that using
>> "solo" or "threads" the coverage data was finally captured.
>>
>> Next I checked how coverage.py deals with the multiprocessing built-in
>> library and discovered that it does a patch [3]. That made me realize
>> that maybe something similar for the Billiard library will be necessary.
>> Since just changing the POOL_CLS config works and is easier than
>> creating a patch for the Billiard plugin to capture coverage data I
>> dropped this and will proceed only if necessary.
>>
>> I checked Pulp code and updated the services that spawn celery processes
>> to use threads as the POOL_CLS (this also required to install the
>> threadpool Python package), then I tried to run the Pulp Smash
>> test_sync_publish for RPM over API, and it was passing before the change
>> and started failing due to celery workers not being able to proper
>> handle the process' signals since they are only received by the parent
>> process. I did not investigate more about why it was failing and gave
>> "solo" a try, it happened that it worked and I was able to proper
>> capture coverage data for the pulp.server.async.tasks module which was
>> not being captured before, finally!
>>
>> I went ahead and ran all the Pulp Smash tests which produced the
>> coverage report [4], that has more captured data from the one available
>> on Jenkins. Now I need your help on identifying if there is any missing
>> data to be captured still. Please let me know if you think any module
>> should have more coverage data than presented by the updated report.
>>
>> Finally I am about to send a PR to the pulp repository in order to
>> update the coverage hook with the changes needed in order to set the
>> POOL_CLS to solo. The pulp worker template [5] and the
>> pulp_resource_manager.service [6] need to be patched, again, please let
>> me know if I am missing something and should patch any other file.
>>
>> PS.: I would like to thank all the help I received from Pulp developers
>> with all my newbie questions about Pulp design. A special thanks to Sean
>> who created the first version of the coverage data and provided some
>> initial input about what was missing, Brian who spent about an hour with
>> me explaining how Pulp workers and resource manager work and also to
>> provide some insights about what could be the cause for the coverage
>> data not being captured and Jeremy who listened me all the times I was
>> frustrated by being stuck. I hope this improved coverage report will
>> bring us value and the information we need in order to ensure Pulp is
>> well tested by Pulp Smash.
>>
>> [1] https://github.com/celery/billiard
>> [2] http://coverage.readthedocs.io/en/coverage-4.2/subprocess.html
>> [3]
>> https://bitbucket.org/ned/coveragepy/src/257e52793fb0f28853d
>> dca679f67b158107262bf/coverage/multiproc.py
>> [4] https://dl.dropboxusercontent.com/u/109792/pulp-2-11-coverag
>> e-report-20160829/index.html
>> [5] https://github.com/pulp/pulp/blob/master/server/pulp/server/
>> async/manage_workers.py#L23-L25
>> [6] https://github.com/pulp/pulp/blob/master/server/usr/lib/syst
>> emd/system/pulp_resource_manager.service#L10-L12
>>
>> --
>> Elyézer Rezende
>> Senior Quality Engineer
>> irc: elyezer
>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>



-- 
Elyézer Rezende
Senior Quality Engineer
irc: elyezer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20160830/d04fbf0f/attachment.htm>


More information about the Pulp-dev mailing list