[Pulp-dev] Fwd: Re: Changesets Challenges
Jeff Ortel
jortel at redhat.com
Tue Apr 10 14:43:54 UTC 2018
On 04/06/2018 09:15 AM, Brian Bouterse wrote:
> Several plugins have started using the Changesets including
> pulp_ansible, pulp_python, pulp_file, and perhaps others. The
> Changesets provide several distinct points of value which are great,
> but there are two challenges I want to bring up. I want to focus only
> on the problem statements first.
>
> 1. There is redundant "differencing" code in all plugins. The
> Changeset interface requires the plugin writer to determine what units
> need to be added and those to be removed. This requires all plugin
> writers to write the same non-trivial differencing code over and over.
> For example, you can see the same non-trivial differencing code
> present in pulp_ansible
> <https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>,
> pulp_file
> <https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>,
> and pulp_python
> <https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>.
> Line-wise, this "differencing" code makes up a large portion (maybe
> 50%) of the sync code itself in each plugin.
Ten lines of trivial set logic hardly seems like a big deal but any
duplication is worth exploring.
>
> 2. Plugins can't do end-to-end stream processing. The Changesets
> themselves do stream processing, but when you call into
> changeset.apply_and_drain() you have to have fully parsed the metadata
> already. Currently when fetching all metadata from Galaxy,
> pulp_ansible takes about 380 seconds (6+ min). This means that the
> actual Changeset content downloading starts 380 seconds later than it
> could. At the heart of the problem, the fetching+parsing of the
> metadata is not part of the stream processing.
The additions/removals can be any interable (like generator) and by
using ChangeSet.apply() and iterating the returned object, the pluign
can "turn the crank" while downloading and processing the metadata. The
ChangeSet.apply_and_drain() is just a convenience method. I don't see
how this is a limitation of the ChangeSet.
>
> Do you see the same challenges I do? Are these the right problem
> statements? I think with clear problem statements a solution will be
> easy to see and agree on.
I'm not convinced that these are actual problems/challenges that need to
be addressed in the near term.
>
> Thanks!
> Brian
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180410/081f2f99/attachment.htm>
More information about the Pulp-dev
mailing list