[Pulp-dev] Fwd: Re: Changesets Challenges

Wed Apr 11 15:59:50 UTC 2018

On Tue, Apr 10, 2018 at 10:43 AM, Jeff Ortel <jortel at redhat.com> wrote:

>
>
>
>
>
>
>
>
>
>
>
>
> On 04/06/2018 09:15 AM, Brian Bouterse wrote:
>
> Several plugins have started using the Changesets including pulp_ansible,
> pulp_python, pulp_file, and perhaps others. The Changesets provide several
> distinct points of value which are great, but there are two challenges I
> want to bring up. I want to focus only on the problem statements first.
>
> 1. There is redundant "differencing" code in all plugins. The Changeset
> interface requires the plugin writer to determine what units need to be
> added and those to be removed. This requires all plugin writers to write
> the same non-trivial differencing code over and over. For example, you can
> see the same non-trivial differencing code present in pulp_ansible
> <https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>,
> pulp_file
> <https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>,
> and pulp_python
> <https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>.
> Line-wise, this "differencing" code makes up a large portion (maybe 50%) of
> the sync code itself in each plugin.
>
>
> Ten lines of trivial set logic hardly seems like a big deal but any
> duplication is worth exploring.
>
It's more than ten lines. Take pulp_ansible for example. By my count (the
linked to section) it's 89 lines, which out of 306 lines of plugin code for
sync is 29% of extra redundant code. The other plugins have similar
numbers. So with those numbers in mind, what do you think?

>
>
> 2. Plugins can't do end-to-end stream processing. The Changesets
> themselves do stream processing, but when you call into
> changeset.apply_and_drain() you have to have fully parsed the metadata
> already. Currently when fetching all metadata from Galaxy, pulp_ansible
> takes about 380 seconds (6+ min). This means that the actual Changeset
> content downloading starts 380 seconds later than it could. At the heart of
> the problem, the fetching+parsing of the metadata is not part of the stream
> processing.
>
>
> The additions/removals can be any interable (like generator) and by using
> ChangeSet.apply() and iterating the returned object, the pluign can "turn
> the crank" while downloading and processing the metadata.  The
> ChangeSet.apply_and_drain() is just a convenience method.  I don't see how
> this is a limitation of the ChangeSet.
>

That is new info for me (and maybe everyone). OK so Changesets have two
interfaces. apply() and apply_and_drain(). Why do we have two interfaces
when apply() can support all existing use cases (that I know of) and do
end-to-end stream processing but apply_and_drain() cannot? I see all of our
examples (and all of our new plugins) using apply_and_drain().

>
>
> Do you see the same challenges I do? Are these the right problem
> statements? I think with clear problem statements a solution will be easy
> to see and agree on.
>
>
> I'm not convinced that these are actual problems/challenges that need to
> be addressed in the near term.
>
>
> Thanks!
> Brian
>
>
> _______________________________________________
> Pulp-dev mailing listPulp-dev at redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev
>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180411/d1d8ac38/attachment.htm>