[Pulp-dev] uniqueness constraints within a repository version

Brian Bouterse bbouters at redhat.com
Sun Jul 21 13:00:15 UTC 2019


On Sun, Jul 21, 2019 at 6:23 AM Tatiana Tereshchenko <ttereshc at redhat.com>
wrote:

> +1 to the idea of a repo_key.
>
> Should we also add the ability to apply custom validation of the content
> being added?
> Similar to a repo_key, Content model can optionally provide an additional
> validator.
> Use cases:
>  - for pulp_file to avoid relative path overlap - e.g. 'a/b' and 'a'
>
In thinking this over more, I'm unsure that pulp_file has the use case. Two
different Artifacts having relative paths 'a' and 'a/b' in one repo version
doesn't seem problematic. This problem statement is similar to the
Distribution.base_path overlap problem statement where it's unavoidably
ambiguous which Distribution should be matched when base_paths are allowed
to overlap. In this case for pulp_file, it's not ambiguous in the same way,
the relative_path I expect to match to exactly 1 content unit either 'a',
or 'a/b', but not both. What do you think about this?

 - for pulp_rpm to filter by signature/signing key
>
Can we expand on this use case a bit? Is it that the repo version should
only contains units signed or unsigned rpms? Or is it that we are ok with a
mixture as long as each NEVRA is unique? I suspect the former, but I want
to be sure.


> Plugins can solve it by defining their own stage but it seems like almost
> any plugin needs to ensure absence of collisions specific to it, even the
> simple pulp_file.
> It means that our default pipeline becomes less useful and will be hardly
> ever used by any [currently known] plugins.
>
> Any thoughts?
>
> Tanya
>
>
> On Mon, Jul 8, 2019 at 9:09 PM Brian Bouterse <bbouters at redhat.com> wrote:
>
>> I want to retell Simon's proposal to have "Content defines a 'repo_key'
>> similar to a unit_key. This key must be unique within a repo version (and
>> not globally like the unit_key."
>>
>> We could adopt his proposal to have the repo_key tuple defined on Content
>> in pulpcore. If we left the add/remove APIs in core and adopt for both sync
>> and add/remove a "keep newest to associate" functionality described earlier
>> in the thread. This "keep newest to associate" code would be used by sync
>> in the form of a core stage that is a generalized version of the
>> RemoveDuplicates stage. This would become part of the default pipeline for
>> all users of Stages API. I think this would be better than plugin writers
>> implementing it over and over and also less effort for plugin wrtiers. This
>> design would meet the current needs of pulp_cookbook, pulp_file, and
>> pulp_rpm which are the only 3 places I know we have this problem so far,
>> but I believe more content types are susceptible to this.
>>
>> What do you think we should do?
>>
>> Thanks!
>> Brian
>>
>>
>>
>>
>>
>> On Thu, Jun 27, 2019 at 4:03 AM Tatiana Tereshchenko <ttereshc at redhat.com>
>> wrote:
>>
>>> Sure, the code can be de-duplicated.
>>> My main worry is that it's a responsibility of a plugin writer not to
>>> forget to ensure uniqueness constraints within a repo version for every
>>> workflow (sync, copy, anything else) where a repo version is created.
>>> Every time before RepositoryVersion.create() is called, there should be
>>> a check that there is no colliding content in a repo version.
>>> It would be much more reliable and friendly, in my opinion, if plugin
>>> writer could define rules/callbacks/whatever for each content and it would
>>> be applied to any repository creation.
>>> At the same time this eliminates the flexibility to define different
>>> logic for content collision for different workflows, however I'm not sure
>>> if such a use case exists or is desired.
>>>
>>> Tanya
>>>
>>>
>>> On Wed, Jun 26, 2019 at 6:49 PM Austin Macdonald <amacdona at redhat.com>
>>> wrote:
>>>
>>>> @Tanya Tereshchenko <ttereshc at redhat.com>
>>>>
>>>>> Do I understand correctly that it doesn't cover the sync case and it's
>>>>> only about explicit repo version creation?
>>>>>
>>>>
>>>> I don't mean that add/remove could not share code with remove duplicate
>>>> stage. I wanted to point out that we have a problem here (how to remove
>>>> duplicates) that has similar patterns to other problems with add remove
>>>> (recursive, copy, deciding which content to keep with a collision, etc.) I
>>>> don't doubt that pulpcore could help solve these problems, but I think that
>>>> as we approach our GA, we should consider solving this problem (for now) by
>>>> getting out of the way of plugin writers rather than by implementing code
>>>> that is supposed to work for all plugins. I suspect that plenty of the
>>>> plugins will be implementing their own add/remove anyway.
>>>>
>>>> On Tue, Jun 25, 2019 at 12:56 PM David Davis <daviddavis at redhat.com>
>>>> wrote:
>>>>
>>>>> I don't think this solution would work in the case of creating a new
>>>>> repository version. Suppose for example you had two content units that
>>>>> collide, one in a repo version and one older unit that a user explicitly
>>>>> wants to add to the repo version. If the latter one is older, then what
>>>>> would happen?
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> On Tue, Jun 25, 2019 at 12:48 PM Brian Bouterse <bbouters at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Having a way for units to express their uniqueness per repo sounds
>>>>>> good because then more areas of Pulp's code could answer the question:
>>>>>> "will I have a duplicate if I add content X to repo_version Y".
>>>>>>
>>>>>> Let's assume we know that situation is about to occur during sync for
>>>>>> example, what do we do about it? In the errata case we know the "new" one
>>>>>> should replace the existing one. Maybe we start to 'order' the units with
>>>>>> colliding repo keys and keep the newest one always? Would this work for
>>>>>> pulp_cookbook and pulp_rpm? Would it generalize? Is this what you imagined?
>>>>>>
>>>>>> On Tue, Jun 25, 2019 at 5:30 AM Tatiana Tereshchenko <
>>>>>> ttereshc at redhat.com> wrote:
>>>>>>
>>>>>>> Do I understand correctly that it doesn't cover the sync case and
>>>>>>> it's only about explicit repo version creation?
>>>>>>> So the suggestion is to implement the same logic twice: for sync
>>>>>>> case - RemoveDuplicates stage and/or maybe some custom stage (e.g. to
>>>>>>> disallow overlapping paths), and for direct repo version creation - your
>>>>>>> proposal.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 24, 2019 at 3:13 PM Austin Macdonald <
>>>>>>> amacdona at redhat.com> wrote:
>>>>>>>
>>>>>>>> I have a design in mind for solving this problem:
>>>>>>>>
>>>>>>>> 1. Remove POST to RepositoryVersion (no general add/remove
>>>>>>>> endpoint).
>>>>>>>> 2. Add an endpoint to kick off an add/remove task, namespaced by
>>>>>>>> plugin. ie `POST pulp/api/v3/docker/add-remove/`
>>>>>>>>    This view can be provided to all plugins by the plugin template,
>>>>>>>> and will be based on the current RepositoryVersionCreate:
>>>>>>>>
>>>>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/viewsets/repository.py#L221-L258
>>>>>>>>    Note: the main purpose of this view is to kick off the general
>>>>>>>> add/remove task, which will be unchanged:
>>>>>>>>
>>>>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/tasks/repository.py#L70
>>>>>>>> 3. Add an add/remove serializer to the plugin API.
>>>>>>>> 3. Plugins needing further customization can provide their own task
>>>>>>>> and subclassed serializer.
>>>>>>>>
>>>>>>>> This gives the plugin writer full control over the endpoint
>>>>>>>> (customizable arguments and validation), and full control over the flow
>>>>>>>> (extra logic, depsolving, enforced uniqueness). It only uses the existing
>>>>>>>> patterns (and existing required knowledge), but requires no work (other
>>>>>>>> than using the template) for the simple case.
>>>>>>>>
>>>>>>>> On Mon, Jun 3, 2019 at 2:56 PM Simon Baatz <gmbnomis at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Mon, Jun 03, 2019 at 09:11:07AM -0400, David Davis wrote:
>>>>>>>>> >    @Simon I like the idea behind the repo_key solution you came
>>>>>>>>> up with.
>>>>>>>>> >    Can you be more specific around cases you think that it
>>>>>>>>> couldn't
>>>>>>>>> >    handle? I imagine that plugin writers could use properties or
>>>>>>>>> >    denormailzation (ie additional database columns) to solve
>>>>>>>>> cases where
>>>>>>>>> >    they need uniqueness across data that isn't in the database.
>>>>>>>>> In a worst
>>>>>>>>> >    case scenario, they can't use the pulpcore solution and just
>>>>>>>>> have to
>>>>>>>>> >    roll their own.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What I wrote probably sounded too pessimistic. You are right, in
>>>>>>>>> most cases that should be doable.
>>>>>>>>>
>>>>>>>>> I agree that we could have a simple default solution that just
>>>>>>>>> requires to specify a couple of field names in the easiest case.
>>>>>>>>> As you
>>>>>>>>> say, it should be possible use custom logic in a plugin if
>>>>>>>>> required.
>>>>>>>>>
>>>>>>>>> Here is the case I was thinking of that it can't handle:
>>>>>>>>>
>>>>>>>>> In pulp_file, a uniqueness constraint on "relative_path" would
>>>>>>>>> allow
>>>>>>>>> content units "a" and "a/b" to be in a repo version.
>>>>>>>>>
>>>>>>>>> However, we may want file repos to be representable on an actual
>>>>>>>>> file
>>>>>>>>> system (e.g. when exporting them as tar files).  For the repo
>>>>>>>>> above,
>>>>>>>>> this does not work, as "a" can't be a file and a directory at the
>>>>>>>>> same time on a standard Unix file system.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pulp-dev mailing list
>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pulp-dev mailing list
>>>>>>>> Pulp-dev at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pulp-dev mailing list
>>>>>>> Pulp-dev at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190721/f31118a3/attachment.htm>


More information about the Pulp-dev mailing list