[Pulp-dev] Changesets Challenges

Sat Apr 7 18:08:54 UTC 2018

+1

On Sat, Apr 7, 2018 at 8:13 AM, David Davis <daviddavis at redhat.com> wrote:

> +1
>
>
> David
>
> On Fri, Apr 6, 2018 at 10:39 AM, Dennis Kliban <dkliban at redhat.com> wrote:
>
>> On Fri, Apr 6, 2018 at 10:15 AM, Brian Bouterse <bbouters at redhat.com>
>> wrote:
>>
>>> Several plugins have started using the Changesets including
>>> pulp_ansible, pulp_python, pulp_file, and perhaps others. The Changesets
>>> provide several distinct points of value which are great, but there are two
>>> challenges I want to bring up. I want to focus only on the problem
>>> statements first.
>>>
>>> 1. There is redundant "differencing" code in all plugins. The Changeset
>>> interface requires the plugin writer to determine what units need to be
>>> added and those to be removed. This requires all plugin writers to write
>>> the same non-trivial differencing code over and over. For example, you can
>>> see the same non-trivial differencing code present in pulp_ansible
>>> <https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>,
>>> pulp_file
>>> <https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>,
>>> and pulp_python
>>> <https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>.
>>> Line-wise, this "differencing" code makes up a large portion (maybe 50%) of
>>> the sync code itself in each plugin.
>>>
>>>
>> That is definitely a problem. We should address this.
>>
>>
>>> 2. Plugins can't do end-to-end stream processing. The Changesets
>>> themselves do stream processing, but when you call into
>>> changeset.apply_and_drain() you have to have fully parsed the metadata
>>> already. Currently when fetching all metadata from Galaxy, pulp_ansible
>>> takes about 380 seconds (6+ min). This means that the actual Changeset
>>> content downloading starts 380 seconds later than it could. At the heart of
>>> the problem, the fetching+parsing of the metadata is not part of the stream
>>> processing.
>>>
>>>
>> This is the same problem we currently have in Pulp 2. We should address
>> this.
>>
>>
>>> Do you see the same challenges I do? Are these the right problem
>>> statements? I think with clear problem statements a solution will be easy
>>> to see and agree on.
>>>
>>>
>> Yes, I do. You described the problems very well.
>>
>>
>>> Thanks!
>>> Brian
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180407/48ac5333/attachment.htm>