[Pulp-dev] uniqueness constraints within a repository version

Tatiana Tereshchenko ttereshc at redhat.com
Thu Jun 27 08:02:25 UTC 2019


Sure, the code can be de-duplicated.
My main worry is that it's a responsibility of a plugin writer not to
forget to ensure uniqueness constraints within a repo version for every
workflow (sync, copy, anything else) where a repo version is created.
Every time before RepositoryVersion.create() is called, there should be a
check that there is no colliding content in a repo version.
It would be much more reliable and friendly, in my opinion, if plugin
writer could define rules/callbacks/whatever for each content and it would
be applied to any repository creation.
At the same time this eliminates the flexibility to define different logic
for content collision for different workflows, however I'm not sure if such
a use case exists or is desired.

Tanya


On Wed, Jun 26, 2019 at 6:49 PM Austin Macdonald <amacdona at redhat.com>
wrote:

> @Tanya Tereshchenko <ttereshc at redhat.com>
>
>> Do I understand correctly that it doesn't cover the sync case and it's
>> only about explicit repo version creation?
>>
>
> I don't mean that add/remove could not share code with remove duplicate
> stage. I wanted to point out that we have a problem here (how to remove
> duplicates) that has similar patterns to other problems with add remove
> (recursive, copy, deciding which content to keep with a collision, etc.) I
> don't doubt that pulpcore could help solve these problems, but I think that
> as we approach our GA, we should consider solving this problem (for now) by
> getting out of the way of plugin writers rather than by implementing code
> that is supposed to work for all plugins. I suspect that plenty of the
> plugins will be implementing their own add/remove anyway.
>
> On Tue, Jun 25, 2019 at 12:56 PM David Davis <daviddavis at redhat.com>
> wrote:
>
>> I don't think this solution would work in the case of creating a new
>> repository version. Suppose for example you had two content units that
>> collide, one in a repo version and one older unit that a user explicitly
>> wants to add to the repo version. If the latter one is older, then what
>> would happen?
>>
>> David
>>
>>
>> On Tue, Jun 25, 2019 at 12:48 PM Brian Bouterse <bbouters at redhat.com>
>> wrote:
>>
>>> Having a way for units to express their uniqueness per repo sounds good
>>> because then more areas of Pulp's code could answer the question: "will I
>>> have a duplicate if I add content X to repo_version Y".
>>>
>>> Let's assume we know that situation is about to occur during sync for
>>> example, what do we do about it? In the errata case we know the "new" one
>>> should replace the existing one. Maybe we start to 'order' the units with
>>> colliding repo keys and keep the newest one always? Would this work for
>>> pulp_cookbook and pulp_rpm? Would it generalize? Is this what you imagined?
>>>
>>> On Tue, Jun 25, 2019 at 5:30 AM Tatiana Tereshchenko <
>>> ttereshc at redhat.com> wrote:
>>>
>>>> Do I understand correctly that it doesn't cover the sync case and it's
>>>> only about explicit repo version creation?
>>>> So the suggestion is to implement the same logic twice: for sync case -
>>>> RemoveDuplicates stage and/or maybe some custom stage (e.g. to disallow
>>>> overlapping paths), and for direct repo version creation - your proposal.
>>>>
>>>>
>>>>
>>>> On Mon, Jun 24, 2019 at 3:13 PM Austin Macdonald <amacdona at redhat.com>
>>>> wrote:
>>>>
>>>>> I have a design in mind for solving this problem:
>>>>>
>>>>> 1. Remove POST to RepositoryVersion (no general add/remove endpoint).
>>>>> 2. Add an endpoint to kick off an add/remove task, namespaced by
>>>>> plugin. ie `POST pulp/api/v3/docker/add-remove/`
>>>>>    This view can be provided to all plugins by the plugin template,
>>>>> and will be based on the current RepositoryVersionCreate:
>>>>>
>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/viewsets/repository.py#L221-L258
>>>>>    Note: the main purpose of this view is to kick off the general
>>>>> add/remove task, which will be unchanged:
>>>>>
>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/tasks/repository.py#L70
>>>>> 3. Add an add/remove serializer to the plugin API.
>>>>> 3. Plugins needing further customization can provide their own task
>>>>> and subclassed serializer.
>>>>>
>>>>> This gives the plugin writer full control over the endpoint
>>>>> (customizable arguments and validation), and full control over the flow
>>>>> (extra logic, depsolving, enforced uniqueness). It only uses the existing
>>>>> patterns (and existing required knowledge), but requires no work (other
>>>>> than using the template) for the simple case.
>>>>>
>>>>> On Mon, Jun 3, 2019 at 2:56 PM Simon Baatz <gmbnomis at gmail.com> wrote:
>>>>>
>>>>>> On Mon, Jun 03, 2019 at 09:11:07AM -0400, David Davis wrote:
>>>>>> >    @Simon I like the idea behind the repo_key solution you came up
>>>>>> with.
>>>>>> >    Can you be more specific around cases you think that it couldn't
>>>>>> >    handle? I imagine that plugin writers could use properties or
>>>>>> >    denormailzation (ie additional database columns) to solve cases
>>>>>> where
>>>>>> >    they need uniqueness across data that isn't in the database. In
>>>>>> a worst
>>>>>> >    case scenario, they can't use the pulpcore solution and just
>>>>>> have to
>>>>>> >    roll their own.
>>>>>>
>>>>>>
>>>>>> What I wrote probably sounded too pessimistic. You are right, in
>>>>>> most cases that should be doable.
>>>>>>
>>>>>> I agree that we could have a simple default solution that just
>>>>>> requires to specify a couple of field names in the easiest case.  As
>>>>>> you
>>>>>> say, it should be possible use custom logic in a plugin if required.
>>>>>>
>>>>>> Here is the case I was thinking of that it can't handle:
>>>>>>
>>>>>> In pulp_file, a uniqueness constraint on "relative_path" would allow
>>>>>> content units "a" and "a/b" to be in a repo version.
>>>>>>
>>>>>> However, we may want file repos to be representable on an actual file
>>>>>> system (e.g. when exporting them as tar files).  For the repo above,
>>>>>> this does not work, as "a" can't be a file and a directory at the
>>>>>> same time on a standard Unix file system.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190627/a3150f66/attachment.htm>


More information about the Pulp-dev mailing list