[Pulp-dev] uniqueness constraints within a repository version

Simon Baatz gmbnomis at gmail.com
Fri May 31 19:26:38 UTC 2019


On Fri, May 31, 2019 at 01:12:58PM +0200, Tatiana Tereshchenko wrote:
>    A while ago RemoveDuplicates stage [0] was introduced to solve the
>    problem of enforcing uniqueness constraints within a repository version
>    at sync time.
>    The same problem ought to be solved when content which already exists
>    in Pulp is added to a repository. E.g. Content was uploaded, or content
>    was synced as a part of other repo. And now you want to add/copy it to
>    your repo.
>    RPM plugin has to solve this problem (specific examples can be seen in
>    this issue [1]).
>    It would be great if other plugins can share if the same problem exists
>    for them and if it's valuable to add some mechanism to the pulpcore.
>    I believe, if you use RemoveDuplicates stage during sync, then your
>    plugin is impacted by the described problem.

Yes, the problem exists also for pulp_cookbook (although it does not
use the RemoveDuplicates stage). Currently, the implementation to
avoid duplicates in pulp_cookbook has the following components:

- Content defines a 'repo_key' [0] similar to a unit_key. This key
  must be unique within a repo version (and not globally like the
  unit_key)

- Cookbook metadata obtained during a sync does not contain
  digests.  Therefore pulp_cookbook uses a custom stage
  QueryExistingRepoContentAndArtifacts [1] to identify existing
  content within the repo version the sync is based on.  Content is
  queried using the repo key in the base repo version (and duplicates
  need not to be removed after the fact).

  (However, something like repo_key might be useful in the
  RemoveDuplicates stage for other plugins.)

- As I found no way to ensure repo_key uniqueness on content
  association, it is done at publication time [2] based on the repo_key. 
  However, this feels like a workaround.  I think it should be
  enforced on repo version creation.

>    My personal opinion: if RemoveDuplicates stage was worth adding to the
>    pulpcore (stages API in pulpcore-plugin), a mechanism to ensure
>    uniqueness constraints within a repo version at association time makes
>    sense to add as well.

I fully agree. I don't think the repo_key approach used by
pulp_cookbook is general enough. It works well with Cookbooks, but
other content types might have uniqueness constraints that
can't be mapped directly to a composite key on repo versions.


[0] https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/models.py#L70
[1] https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/tasks/synchronizing.py#L61
[2] https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/tasks/publishing.py#L63




More information about the Pulp-dev mailing list