[Pulp-dev] uniqueness constraints within a repository version

Brian Bouterse bbouters at redhat.com
Mon Jul 8 19:08:58 UTC 2019


I want to retell Simon's proposal to have "Content defines a 'repo_key'
similar to a unit_key. This key must be unique within a repo version (and
not globally like the unit_key."

We could adopt his proposal to have the repo_key tuple defined on Content
in pulpcore. If we left the add/remove APIs in core and adopt for both sync
and add/remove a "keep newest to associate" functionality described earlier
in the thread. This "keep newest to associate" code would be used by sync
in the form of a core stage that is a generalized version of the
RemoveDuplicates stage. This would become part of the default pipeline for
all users of Stages API. I think this would be better than plugin writers
implementing it over and over and also less effort for plugin wrtiers. This
design would meet the current needs of pulp_cookbook, pulp_file, and
pulp_rpm which are the only 3 places I know we have this problem so far,
but I believe more content types are susceptible to this.

What do you think we should do?

Thanks!
Brian





On Thu, Jun 27, 2019 at 4:03 AM Tatiana Tereshchenko <ttereshc at redhat.com>
wrote:

> Sure, the code can be de-duplicated.
> My main worry is that it's a responsibility of a plugin writer not to
> forget to ensure uniqueness constraints within a repo version for every
> workflow (sync, copy, anything else) where a repo version is created.
> Every time before RepositoryVersion.create() is called, there should be a
> check that there is no colliding content in a repo version.
> It would be much more reliable and friendly, in my opinion, if plugin
> writer could define rules/callbacks/whatever for each content and it would
> be applied to any repository creation.
> At the same time this eliminates the flexibility to define different logic
> for content collision for different workflows, however I'm not sure if such
> a use case exists or is desired.
>
> Tanya
>
>
> On Wed, Jun 26, 2019 at 6:49 PM Austin Macdonald <amacdona at redhat.com>
> wrote:
>
>> @Tanya Tereshchenko <ttereshc at redhat.com>
>>
>>> Do I understand correctly that it doesn't cover the sync case and it's
>>> only about explicit repo version creation?
>>>
>>
>> I don't mean that add/remove could not share code with remove duplicate
>> stage. I wanted to point out that we have a problem here (how to remove
>> duplicates) that has similar patterns to other problems with add remove
>> (recursive, copy, deciding which content to keep with a collision, etc.) I
>> don't doubt that pulpcore could help solve these problems, but I think that
>> as we approach our GA, we should consider solving this problem (for now) by
>> getting out of the way of plugin writers rather than by implementing code
>> that is supposed to work for all plugins. I suspect that plenty of the
>> plugins will be implementing their own add/remove anyway.
>>
>> On Tue, Jun 25, 2019 at 12:56 PM David Davis <daviddavis at redhat.com>
>> wrote:
>>
>>> I don't think this solution would work in the case of creating a new
>>> repository version. Suppose for example you had two content units that
>>> collide, one in a repo version and one older unit that a user explicitly
>>> wants to add to the repo version. If the latter one is older, then what
>>> would happen?
>>>
>>> David
>>>
>>>
>>> On Tue, Jun 25, 2019 at 12:48 PM Brian Bouterse <bbouters at redhat.com>
>>> wrote:
>>>
>>>> Having a way for units to express their uniqueness per repo sounds good
>>>> because then more areas of Pulp's code could answer the question: "will I
>>>> have a duplicate if I add content X to repo_version Y".
>>>>
>>>> Let's assume we know that situation is about to occur during sync for
>>>> example, what do we do about it? In the errata case we know the "new" one
>>>> should replace the existing one. Maybe we start to 'order' the units with
>>>> colliding repo keys and keep the newest one always? Would this work for
>>>> pulp_cookbook and pulp_rpm? Would it generalize? Is this what you imagined?
>>>>
>>>> On Tue, Jun 25, 2019 at 5:30 AM Tatiana Tereshchenko <
>>>> ttereshc at redhat.com> wrote:
>>>>
>>>>> Do I understand correctly that it doesn't cover the sync case and it's
>>>>> only about explicit repo version creation?
>>>>> So the suggestion is to implement the same logic twice: for sync case
>>>>> - RemoveDuplicates stage and/or maybe some custom stage (e.g. to disallow
>>>>> overlapping paths), and for direct repo version creation - your proposal.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 24, 2019 at 3:13 PM Austin Macdonald <amacdona at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> I have a design in mind for solving this problem:
>>>>>>
>>>>>> 1. Remove POST to RepositoryVersion (no general add/remove endpoint).
>>>>>> 2. Add an endpoint to kick off an add/remove task, namespaced by
>>>>>> plugin. ie `POST pulp/api/v3/docker/add-remove/`
>>>>>>    This view can be provided to all plugins by the plugin template,
>>>>>> and will be based on the current RepositoryVersionCreate:
>>>>>>
>>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/viewsets/repository.py#L221-L258
>>>>>>    Note: the main purpose of this view is to kick off the general
>>>>>> add/remove task, which will be unchanged:
>>>>>>
>>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/tasks/repository.py#L70
>>>>>> 3. Add an add/remove serializer to the plugin API.
>>>>>> 3. Plugins needing further customization can provide their own task
>>>>>> and subclassed serializer.
>>>>>>
>>>>>> This gives the plugin writer full control over the endpoint
>>>>>> (customizable arguments and validation), and full control over the flow
>>>>>> (extra logic, depsolving, enforced uniqueness). It only uses the existing
>>>>>> patterns (and existing required knowledge), but requires no work (other
>>>>>> than using the template) for the simple case.
>>>>>>
>>>>>> On Mon, Jun 3, 2019 at 2:56 PM Simon Baatz <gmbnomis at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Mon, Jun 03, 2019 at 09:11:07AM -0400, David Davis wrote:
>>>>>>> >    @Simon I like the idea behind the repo_key solution you came up
>>>>>>> with.
>>>>>>> >    Can you be more specific around cases you think that it couldn't
>>>>>>> >    handle? I imagine that plugin writers could use properties or
>>>>>>> >    denormailzation (ie additional database columns) to solve cases
>>>>>>> where
>>>>>>> >    they need uniqueness across data that isn't in the database. In
>>>>>>> a worst
>>>>>>> >    case scenario, they can't use the pulpcore solution and just
>>>>>>> have to
>>>>>>> >    roll their own.
>>>>>>>
>>>>>>>
>>>>>>> What I wrote probably sounded too pessimistic. You are right, in
>>>>>>> most cases that should be doable.
>>>>>>>
>>>>>>> I agree that we could have a simple default solution that just
>>>>>>> requires to specify a couple of field names in the easiest case.  As
>>>>>>> you
>>>>>>> say, it should be possible use custom logic in a plugin if required.
>>>>>>>
>>>>>>> Here is the case I was thinking of that it can't handle:
>>>>>>>
>>>>>>> In pulp_file, a uniqueness constraint on "relative_path" would allow
>>>>>>> content units "a" and "a/b" to be in a repo version.
>>>>>>>
>>>>>>> However, we may want file repos to be representable on an actual file
>>>>>>> system (e.g. when exporting them as tar files).  For the repo above,
>>>>>>> this does not work, as "a" can't be a file and a directory at the
>>>>>>> same time on a standard Unix file system.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pulp-dev mailing list
>>>>>>> Pulp-dev at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190708/2fdcd494/attachment.htm>


More information about the Pulp-dev mailing list