[Pulp-dev] Fwd: Re: Changesets Challenges

Jeff Ortel jortel at redhat.com
Wed Apr 11 16:25:20 UTC 2018



On 04/11/2018 10:59 AM, Brian Bouterse wrote:
>
>
> On Tue, Apr 10, 2018 at 10:43 AM, Jeff Ortel <jortel at redhat.com 
> <mailto:jortel at redhat.com>> wrote:
>
>
>
>
>
>     	
>
>     	
>
>     	
>
>     	
>
>
>     On 04/06/2018 09:15 AM, Brian Bouterse wrote:
>>     Several plugins have started using the Changesets including
>>     pulp_ansible, pulp_python, pulp_file, and perhaps others. The
>>     Changesets provide several distinct points of value which are
>>     great, but there are two challenges I want to bring up. I want to
>>     focus only on the problem statements first.
>>
>>     1. There is redundant "differencing" code in all plugins. The
>>     Changeset interface requires the plugin writer to determine what
>>     units need to be added and those to be removed. This requires all
>>     plugin writers to write the same non-trivial differencing code
>>     over and over. For example, you can see the same non-trivial
>>     differencing code present in pulp_ansible
>>     <https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>,
>>     pulp_file
>>     <https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>,
>>     and pulp_python
>>     <https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>.
>>     Line-wise, this "differencing" code makes up a large portion
>>     (maybe 50%) of the sync code itself in each plugin.
>
>     Ten lines of trivial set logic hardly seems like a big deal but
>     any duplication is worth exploring.
>
> It's more than ten lines. Take pulp_ansible for example. By my count 
> (the linked to section) it's 89 lines, which out of 306 lines of 
> plugin code for sync is 29% of extra redundant code. The other plugins 
> have similar numbers. So with those numbers in mind, what do you think?

I was counting the lines (w/o comments) in find_delta() based on the 
linked code.  Which functions are you counting?

>
>>
>>     2. Plugins can't do end-to-end stream processing. The Changesets
>>     themselves do stream processing, but when you call into
>>     changeset.apply_and_drain() you have to have fully parsed the
>>     metadata already. Currently when fetching all metadata from
>>     Galaxy, pulp_ansible takes about 380 seconds (6+ min). This means
>>     that the actual Changeset content downloading starts 380 seconds
>>     later than it could. At the heart of the problem, the
>>     fetching+parsing of the metadata is not part of the stream
>>     processing.
>
>     The additions/removals can be any interable (like generator) and
>     by using ChangeSet.apply() and iterating the returned object, the
>     pluign can "turn the crank" while downloading and processing the
>     metadata.  The ChangeSet.apply_and_drain() is just a convenience
>     method.  I don't see how this is a limitation of the ChangeSet.
>
>
> That is new info for me (and maybe everyone). OK so Changesets have 
> two interfaces. apply() and apply_and_drain(). Why do we have two 
> interfaces when apply() can support all existing use cases (that I 
> know of) and do end-to-end stream processing but apply_and_drain() 
> cannot? I see all of our examples (and all of our new plugins) using 
> apply_and_drain().

The ChangeSet.apply() was how I designed (and documented) it.  Not sure 
when/who added the apply_and_drain().  +1 for removing it.

>
>>
>>     Do you see the same challenges I do? Are these the right problem
>>     statements? I think with clear problem statements a solution will
>>     be easy to see and agree on.
>
>     I'm not convinced that these are actual problems/challenges that
>     need to be addressed in the near term.
>
>>
>>     Thanks!
>>     Brian
>>
>>
>>     _______________________________________________
>>     Pulp-dev mailing list
>>     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>>     https://www.redhat.com/mailman/listinfo/pulp-dev
>>     <https://www.redhat.com/mailman/listinfo/pulp-dev>
>
>
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev
>     <https://www.redhat.com/mailman/listinfo/pulp-dev>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180411/ec041eec/attachment.htm>


More information about the Pulp-dev mailing list