[Pulp-dev] Changesets Challenges

Brian Bouterse bbouters at redhat.com
Fri Apr 6 14:15:29 UTC 2018

Several plugins have started using the Changesets including pulp_ansible,
pulp_python, pulp_file, and perhaps others. The Changesets provide several
distinct points of value which are great, but there are two challenges I
want to bring up. I want to focus only on the problem statements first.

1. There is redundant "differencing" code in all plugins. The Changeset
interface requires the plugin writer to determine what units need to be
added and those to be removed. This requires all plugin writers to write
the same non-trivial differencing code over and over. For example, you can
see the same non-trivial differencing code present in pulp_ansible
and pulp_python
Line-wise, this "differencing" code makes up a large portion (maybe 50%) of
the sync code itself in each plugin.

2. Plugins can't do end-to-end stream processing. The Changesets themselves
do stream processing, but when you call into changeset.apply_and_drain()
you have to have fully parsed the metadata already. Currently when fetching
all metadata from Galaxy, pulp_ansible takes about 380 seconds (6+ min).
This means that the actual Changeset content downloading starts 380 seconds
later than it could. At the heart of the problem, the fetching+parsing of
the metadata is not part of the stream processing.

Do you see the same challenges I do? Are these the right problem
statements? I think with clear problem statements a solution will be easy
to see and agree on.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180406/dfc85b06/attachment.htm>

More information about the Pulp-dev mailing list