[Pulp-dev] uniqueness constraints within a repository version

David Davis daviddavis at redhat.com
Mon Jun 3 13:11:07 UTC 2019

Thanks for raising this issue. The pulp_file also suffers from this problem
in that files with duplicate names can be added to repo versions but they
probably shouldn't be:


@Simon I like the idea behind the repo_key solution you came up with. Can
you be more specific around cases you think that it couldn't handle? I
imagine that plugin writers could use properties or denormailzation (ie
additional database columns) to solve cases where they need uniqueness
across data that isn't in the database. In a worst case scenario, they
can't use the pulpcore solution and just have to roll their own.


On Fri, May 31, 2019 at 3:27 PM Simon Baatz <gmbnomis at gmail.com> wrote:

> On Fri, May 31, 2019 at 01:12:58PM +0200, Tatiana Tereshchenko wrote:
> >    A while ago RemoveDuplicates stage [0] was introduced to solve the
> >    problem of enforcing uniqueness constraints within a repository
> version
> >    at sync time.
> >    The same problem ought to be solved when content which already exists
> >    in Pulp is added to a repository. E.g. Content was uploaded, or
> content
> >    was synced as a part of other repo. And now you want to add/copy it to
> >    your repo.
> >    RPM plugin has to solve this problem (specific examples can be seen in
> >    this issue [1]).
> >    It would be great if other plugins can share if the same problem
> exists
> >    for them and if it's valuable to add some mechanism to the pulpcore.
> >    I believe, if you use RemoveDuplicates stage during sync, then your
> >    plugin is impacted by the described problem.
> Yes, the problem exists also for pulp_cookbook (although it does not
> use the RemoveDuplicates stage). Currently, the implementation to
> avoid duplicates in pulp_cookbook has the following components:
> - Content defines a 'repo_key' [0] similar to a unit_key. This key
>   must be unique within a repo version (and not globally like the
>   unit_key)
> - Cookbook metadata obtained during a sync does not contain
>   digests.  Therefore pulp_cookbook uses a custom stage
>   QueryExistingRepoContentAndArtifacts [1] to identify existing
>   content within the repo version the sync is based on.  Content is
>   queried using the repo key in the base repo version (and duplicates
>   need not to be removed after the fact).
>   (However, something like repo_key might be useful in the
>   RemoveDuplicates stage for other plugins.)
> - As I found no way to ensure repo_key uniqueness on content
>   association, it is done at publication time [2] based on the repo_key.
>   However, this feels like a workaround.  I think it should be
>   enforced on repo version creation.
> >    My personal opinion: if RemoveDuplicates stage was worth adding to the
> >    pulpcore (stages API in pulpcore-plugin), a mechanism to ensure
> >    uniqueness constraints within a repo version at association time makes
> >    sense to add as well.
> I fully agree. I don't think the repo_key approach used by
> pulp_cookbook is general enough. It works well with Cookbooks, but
> other content types might have uniqueness constraints that
> can't be mapped directly to a composite key on repo versions.
> [0]
> https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/models.py#L70
> [1]
> https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/tasks/synchronizing.py#L61
> [2]
> https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/tasks/publishing.py#L63
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190603/7a159109/attachment.htm>

More information about the Pulp-dev mailing list