[Pulp-dev] Composed Repositories

Tue May 15 14:48:56 UTC 2018

On 05/15/2018 09:29 AM, Milan Kovacik wrote:
> Hi,
>
> On Tue, May 15, 2018 at 3:22 PM, Dennis Kliban <dkliban at redhat.com> wrote:
>> On Mon, May 14, 2018 at 3:44 PM, Jeff Ortel <jortel at redhat.com> wrote:
>>> Let's brainstorm on something.
>>>
>>> Pulp needs to deal with remote repositories that are composed of multiple
>>> content types which may span the domain of a single plugin.  Here are a few
>>> examples.  Some Red Hat RPM repositories are composed of: RPMs, DRPMs, ,
>>> ISOs and Kickstart Trees.  Some OSTree repositories are composed of OSTrees
>>> & Kickstart Trees. This raises a question:
>>>
>>> How can pulp3 best support syncing with remote repositories that are
>>> composed of multiple (unrelated) content types in a way that doesn't result
>>> in plugins duplicating support for content types?
>>>
>>> Few approaches come to mind:
>>>
>>> 1. Multiple plugins (Remotes) participate in the sync flow to produce a
>>> new repository version.
>>> 2. Multiple plugins (Remotes) are sync'd successively each producing a new
>>> version of a repository.  Only the last version contains the fully sync'd
>>> composition.
>>> 3. Plugins share code.
>>> 4. Other?
>>>
>>>
>>> Option #1: Sync would be orchestrated by core or the user so that multiple
>>> plugins (Remotes) participate in populating a new repository version.  For
>>> example: the RPM plugin (Remote) and the Kickstart Tree plugin (Remote)
>>> would both be sync'd against the same remote repository that is composed of
>>> both types.  The new repository version would be composed of the result of
>>> both plugin (Remote) syncs.  To support this, we'd need to provide a way for
>>> each plugin to operate seamlessly on the same (new) repository version.
>>> Perhaps something internal to the RepositoryVersion.  The repository version
>>> would not be marked "complete" until the last plugin (Remote) sync has
>>> succeeded.  More complicated than #2 but results in only creating truly
>>> complete versions or nothing.  No idea how this would work with current REST
>>> API whereby plugins provide sync endpoints.
>>>
>> I like this approach because it allows the user to perform a single call to
>> the REST API and specify multiple "sync methods" to use to create a single
>> new repository version.
> Same here, esp. if the goal is an all-or-nothing behavior w/r the
> mix-in remotes; i.e an atomic sync.
> This has a benefit of a clear start and end of the sync procedure,
> that the user might want to refer to.
>
>>> Option #2: Sync would be orchestrated by core or the user so that multiple
>>> plugins (Remotes) create successive repository versions.  For example: the
>>> RPM plugin (Remote) and the Kickstart Tree plugin (Remote) would both be
>>> sync'd against the same remote repository that is a composition including
>>> both types.  The intermediate versions would be incomplete.  Only the last
>>> version contains the fully sync'd composition.  This approach can be
>>> supported by core today :) but will produce incomplete repository versions
>>> that are marked complete=True.  This /seems/ undesirable, right?  This may
>>> not be a problem for distribution since I would imaging that only the last
>>> (fully composed) version would be published.  But what about other usages of
>>> the repository's "latest" version?
> I'm afraid I don't see use of a middle-version esp. in case of
> failures; e.g ostree failed to sync while rpm managed and kickstart
> managed too; is the sync OK as a whole? What to do with the versions
> created? Should I merge the successes into one and retry the failure?
> How many versions would this introduce?

(option 2) The partial versions would be created in both normal and 
failure scenarios.  The normal scenario is created because each plugin 
(Remote) creates a new version and only the last one is completed.  the 
intermediate versions are always partial.

>
>>> Option #3: requires a plugin to be aware of specific repository
>>> composition(s); other plugins and creates a code dependency between plugins.
>>> For example, the RPM plugin could delegate ISOs to the File plugin and
>>> Kickstart Trees to the KickStart Tree plugin.
> Do you mean that the RPM plug-in would directly call into the File plug-in?
> If that's the case then I don't like it much, would be a pain every
> time a new plug-in would be introduced (O(len(plugin)^2) of updates)
> or if the API of a plug-in changed (O(len(plugin)) updates).
> Esp. keeping the plugin code aware of other plugin updates would be ugly.

Agreed.  The plugins could install libs into site-packages which would 
at least mitigate the complexity of calling into each other through the 
pulp plugin framework but I don't think it helps much. Even the rpm 
dependency is undesirable.

>
>>> For all options, plugins (Remotes) need to limit sync to affect only those
>>> content types within their domain.  For example, the RPM (Remote) sync
>>> cannot add/remove ISO or KS Trees.
>>>
>>> I am an advocate of some from of options #1 or #2.  Combining plugins
>>> (Remotes) as needed to deal with arbitrary combinations within remote
>>> repositories seems very powerful; does not impose complexity on plugin
>>> writers; and does not introduce code dependencies between plugins.
>>>
>>> Thoughts?
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
> Cheers,
> milan