[Pulp-dev] uniqueness constraints within a repository version
David Davis
daviddavis at redhat.com
Mon Jun 3 13:11:07 UTC 2019
Thanks for raising this issue. The pulp_file also suffers from this problem
in that files with duplicate names can be added to repo versions but they
probably shouldn't be:
https://pulp.plan.io/issues/4028
@Simon I like the idea behind the repo_key solution you came up with. Can
you be more specific around cases you think that it couldn't handle? I
imagine that plugin writers could use properties or denormailzation (ie
additional database columns) to solve cases where they need uniqueness
across data that isn't in the database. In a worst case scenario, they
can't use the pulpcore solution and just have to roll their own.
David
On Fri, May 31, 2019 at 3:27 PM Simon Baatz <gmbnomis at gmail.com> wrote:
> On Fri, May 31, 2019 at 01:12:58PM +0200, Tatiana Tereshchenko wrote:
> > A while ago RemoveDuplicates stage [0] was introduced to solve the
> > problem of enforcing uniqueness constraints within a repository
> version
> > at sync time.
> > The same problem ought to be solved when content which already exists
> > in Pulp is added to a repository. E.g. Content was uploaded, or
> content
> > was synced as a part of other repo. And now you want to add/copy it to
> > your repo.
> > RPM plugin has to solve this problem (specific examples can be seen in
> > this issue [1]).
> > It would be great if other plugins can share if the same problem
> exists
> > for them and if it's valuable to add some mechanism to the pulpcore.
> > I believe, if you use RemoveDuplicates stage during sync, then your
> > plugin is impacted by the described problem.
>
> Yes, the problem exists also for pulp_cookbook (although it does not
> use the RemoveDuplicates stage). Currently, the implementation to
> avoid duplicates in pulp_cookbook has the following components:
>
> - Content defines a 'repo_key' [0] similar to a unit_key. This key
> must be unique within a repo version (and not globally like the
> unit_key)
>
> - Cookbook metadata obtained during a sync does not contain
> digests. Therefore pulp_cookbook uses a custom stage
> QueryExistingRepoContentAndArtifacts [1] to identify existing
> content within the repo version the sync is based on. Content is
> queried using the repo key in the base repo version (and duplicates
> need not to be removed after the fact).
>
> (However, something like repo_key might be useful in the
> RemoveDuplicates stage for other plugins.)
>
> - As I found no way to ensure repo_key uniqueness on content
> association, it is done at publication time [2] based on the repo_key.
> However, this feels like a workaround. I think it should be
> enforced on repo version creation.
>
> > My personal opinion: if RemoveDuplicates stage was worth adding to the
> > pulpcore (stages API in pulpcore-plugin), a mechanism to ensure
> > uniqueness constraints within a repo version at association time makes
> > sense to add as well.
>
> I fully agree. I don't think the repo_key approach used by
> pulp_cookbook is general enough. It works well with Cookbooks, but
> other content types might have uniqueness constraints that
> can't be mapped directly to a composite key on repo versions.
>
>
> [0]
> https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/models.py#L70
> [1]
> https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/tasks/synchronizing.py#L61
> [2]
> https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/tasks/publishing.py#L63
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190603/7a159109/attachment.htm>
More information about the Pulp-dev
mailing list