[Pulp-dev] Fwd: Re: Changesets Challenges

Jeff Ortel jortel at redhat.com
Tue Apr 10 14:43:54 UTC 2018





On 04/06/2018 09:15 AM, Brian Bouterse wrote:
> Several plugins have started using the Changesets including 
> pulp_ansible, pulp_python, pulp_file, and perhaps others. The 
> Changesets provide several distinct points of value which are great, 
> but there are two challenges I want to bring up. I want to focus only 
> on the problem statements first.
> 1. There is redundant "differencing" code in all plugins. The 
> Changeset interface requires the plugin writer to determine what units 
> need to be added and those to be removed. This requires all plugin 
> writers to write the same non-trivial differencing code over and over. 
> For example, you can see the same non-trivial differencing code 
> present in pulp_ansible 
> <https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>, 
> pulp_file 
> <https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>, 
> and pulp_python 
> <https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>. 
> Line-wise, this "differencing" code makes up a large portion (maybe 
> 50%) of the sync code itself in each plugin.

Ten lines of trivial set logic hardly seems like a big deal but any 
duplication is worth exploring.

> 2. Plugins can't do end-to-end stream processing. The Changesets 
> themselves do stream processing, but when you call into 
> changeset.apply_and_drain() you have to have fully parsed the metadata 
> already. Currently when fetching all metadata from Galaxy, 
> pulp_ansible takes about 380 seconds (6+ min). This means that the 
> actual Changeset content downloading starts 380 seconds later than it 
> could. At the heart of the problem, the fetching+parsing of the 
> metadata is not part of the stream processing.

The additions/removals can be any interable (like generator) and by 
using ChangeSet.apply() and iterating the returned object, the pluign 
can "turn the crank" while downloading and processing the metadata.  The 
ChangeSet.apply_and_drain() is just a convenience method.  I don't see 
how this is a limitation of the ChangeSet.

> Do you see the same challenges I do? Are these the right problem 
> statements? I think with clear problem statements a solution will be 
> easy to see and agree on.

I'm not convinced that these are actual problems/challenges that need to 
be addressed in the near term.

> Thanks!
> Brian
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180410/081f2f99/attachment.htm>

More information about the Pulp-dev mailing list