[Pulp-dev] uniqueness constraints within a repository version

Tatiana Tereshchenko ttereshc at redhat.com
Mon Jul 22 08:47:22 UTC 2019


On Sun, Jul 21, 2019 at 3:00 PM Brian Bouterse <bbouters at redhat.com> wrote:

>
>
> On Sun, Jul 21, 2019 at 6:23 AM Tatiana Tereshchenko <ttereshc at redhat.com>
> wrote:
>
>> +1 to the idea of a repo_key.
>>
>> Should we also add the ability to apply custom validation of the content
>> being added?
>> Similar to a repo_key, Content model can optionally provide an additional
>> validator.
>> Use cases:
>>  - for pulp_file to avoid relative path overlap - e.g. 'a/b' and 'a'
>>
> In thinking this over more, I'm unsure that pulp_file has the use case.
> Two different Artifacts having relative paths 'a' and 'a/b' in one repo
> version doesn't seem problematic. This problem statement is similar to the
> Distribution.base_path overlap problem statement where it's unavoidably
> ambiguous which Distribution should be matched when base_paths are allowed
> to overlap. In this case for pulp_file, it's not ambiguous in the same way,
> the relative_path I expect to match to exactly 1 content unit either 'a',
> or 'a/b', but not both. What do you think about this?
>

I agree that the problem is similar to the Distribution.base_path overlap.
If I understand you correctly, yes, it's not a problem if you query content
one by one.
What about use cases when we want to have a repo version on a filesystem?
E.g. Browsable repositoiries (this feature has already been asked for by
our stakeholders), export (e.g. rsync).

>
>  - for pulp_rpm to filter by signature/signing key
>>
> Can we expand on this use case a bit? Is it that the repo version should
> only contains units signed or unsigned rpms? Or is it that we are ok with a
> mixture as long as each NEVRA is unique? I suspect the former, but I want
> to be sure.
>

I think it should contain only signed units and optionally signed by
specific keys only. See pulp 2 feature description
https://docs.pulpproject.org/plugins/pulp_rpm/user-guide/features.html#package-signatures-and-gpg-key-id-filtering
Another use case which comes to mind is: keeping the last N versions of a
unit within a repo verison.

Tanya


>
>> Plugins can solve it by defining their own stage but it seems like almost
>> any plugin needs to ensure absence of collisions specific to it, even the
>> simple pulp_file.
>> It means that our default pipeline becomes less useful and will be hardly
>> ever used by any [currently known] plugins.
>>
>> Any thoughts?
>>
>> Tanya
>>
>>
>> On Mon, Jul 8, 2019 at 9:09 PM Brian Bouterse <bbouters at redhat.com>
>> wrote:
>>
>>> I want to retell Simon's proposal to have "Content defines a 'repo_key'
>>> similar to a unit_key. This key must be unique within a repo version (and
>>> not globally like the unit_key."
>>>
>>> We could adopt his proposal to have the repo_key tuple defined on
>>> Content in pulpcore. If we left the add/remove APIs in core and adopt for
>>> both sync and add/remove a "keep newest to associate" functionality
>>> described earlier in the thread. This "keep newest to associate" code would
>>> be used by sync in the form of a core stage that is a generalized version
>>> of the RemoveDuplicates stage. This would become part of the default
>>> pipeline for all users of Stages API. I think this would be better than
>>> plugin writers implementing it over and over and also less effort for
>>> plugin wrtiers. This design would meet the current needs of pulp_cookbook,
>>> pulp_file, and pulp_rpm which are the only 3 places I know we have this
>>> problem so far, but I believe more content types are susceptible to this.
>>>
>>> What do you think we should do?
>>>
>>> Thanks!
>>> Brian
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jun 27, 2019 at 4:03 AM Tatiana Tereshchenko <
>>> ttereshc at redhat.com> wrote:
>>>
>>>> Sure, the code can be de-duplicated.
>>>> My main worry is that it's a responsibility of a plugin writer not to
>>>> forget to ensure uniqueness constraints within a repo version for every
>>>> workflow (sync, copy, anything else) where a repo version is created.
>>>> Every time before RepositoryVersion.create() is called, there should be
>>>> a check that there is no colliding content in a repo version.
>>>> It would be much more reliable and friendly, in my opinion, if plugin
>>>> writer could define rules/callbacks/whatever for each content and it would
>>>> be applied to any repository creation.
>>>> At the same time this eliminates the flexibility to define different
>>>> logic for content collision for different workflows, however I'm not sure
>>>> if such a use case exists or is desired.
>>>>
>>>> Tanya
>>>>
>>>>
>>>> On Wed, Jun 26, 2019 at 6:49 PM Austin Macdonald <amacdona at redhat.com>
>>>> wrote:
>>>>
>>>>> @Tanya Tereshchenko <ttereshc at redhat.com>
>>>>>
>>>>>> Do I understand correctly that it doesn't cover the sync case and
>>>>>> it's only about explicit repo version creation?
>>>>>>
>>>>>
>>>>> I don't mean that add/remove could not share code with remove
>>>>> duplicate stage. I wanted to point out that we have a problem here (how to
>>>>> remove duplicates) that has similar patterns to other problems with add
>>>>> remove (recursive, copy, deciding which content to keep with a collision,
>>>>> etc.) I don't doubt that pulpcore could help solve these problems, but I
>>>>> think that as we approach our GA, we should consider solving this problem
>>>>> (for now) by getting out of the way of plugin writers rather than by
>>>>> implementing code that is supposed to work for all plugins. I suspect that
>>>>> plenty of the plugins will be implementing their own add/remove anyway.
>>>>>
>>>>> On Tue, Jun 25, 2019 at 12:56 PM David Davis <daviddavis at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> I don't think this solution would work in the case of creating a new
>>>>>> repository version. Suppose for example you had two content units that
>>>>>> collide, one in a repo version and one older unit that a user explicitly
>>>>>> wants to add to the repo version. If the latter one is older, then what
>>>>>> would happen?
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 25, 2019 at 12:48 PM Brian Bouterse <bbouters at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Having a way for units to express their uniqueness per repo sounds
>>>>>>> good because then more areas of Pulp's code could answer the question:
>>>>>>> "will I have a duplicate if I add content X to repo_version Y".
>>>>>>>
>>>>>>> Let's assume we know that situation is about to occur during sync
>>>>>>> for example, what do we do about it? In the errata case we know the "new"
>>>>>>> one should replace the existing one. Maybe we start to 'order' the units
>>>>>>> with colliding repo keys and keep the newest one always? Would this work
>>>>>>> for pulp_cookbook and pulp_rpm? Would it generalize? Is this what you
>>>>>>> imagined?
>>>>>>>
>>>>>>> On Tue, Jun 25, 2019 at 5:30 AM Tatiana Tereshchenko <
>>>>>>> ttereshc at redhat.com> wrote:
>>>>>>>
>>>>>>>> Do I understand correctly that it doesn't cover the sync case and
>>>>>>>> it's only about explicit repo version creation?
>>>>>>>> So the suggestion is to implement the same logic twice: for sync
>>>>>>>> case - RemoveDuplicates stage and/or maybe some custom stage (e.g. to
>>>>>>>> disallow overlapping paths), and for direct repo version creation - your
>>>>>>>> proposal.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 24, 2019 at 3:13 PM Austin Macdonald <
>>>>>>>> amacdona at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> I have a design in mind for solving this problem:
>>>>>>>>>
>>>>>>>>> 1. Remove POST to RepositoryVersion (no general add/remove
>>>>>>>>> endpoint).
>>>>>>>>> 2. Add an endpoint to kick off an add/remove task, namespaced by
>>>>>>>>> plugin. ie `POST pulp/api/v3/docker/add-remove/`
>>>>>>>>>    This view can be provided to all plugins by the plugin
>>>>>>>>> template, and will be based on the current RepositoryVersionCreate:
>>>>>>>>>
>>>>>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/viewsets/repository.py#L221-L258
>>>>>>>>>    Note: the main purpose of this view is to kick off the general
>>>>>>>>> add/remove task, which will be unchanged:
>>>>>>>>>
>>>>>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/tasks/repository.py#L70
>>>>>>>>> 3. Add an add/remove serializer to the plugin API.
>>>>>>>>> 3. Plugins needing further customization can provide their own
>>>>>>>>> task and subclassed serializer.
>>>>>>>>>
>>>>>>>>> This gives the plugin writer full control over the endpoint
>>>>>>>>> (customizable arguments and validation), and full control over the flow
>>>>>>>>> (extra logic, depsolving, enforced uniqueness). It only uses the existing
>>>>>>>>> patterns (and existing required knowledge), but requires no work (other
>>>>>>>>> than using the template) for the simple case.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 3, 2019 at 2:56 PM Simon Baatz <gmbnomis at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On Mon, Jun 03, 2019 at 09:11:07AM -0400, David Davis wrote:
>>>>>>>>>> >    @Simon I like the idea behind the repo_key solution you came
>>>>>>>>>> up with.
>>>>>>>>>> >    Can you be more specific around cases you think that it
>>>>>>>>>> couldn't
>>>>>>>>>> >    handle? I imagine that plugin writers could use properties or
>>>>>>>>>> >    denormailzation (ie additional database columns) to solve
>>>>>>>>>> cases where
>>>>>>>>>> >    they need uniqueness across data that isn't in the database.
>>>>>>>>>> In a worst
>>>>>>>>>> >    case scenario, they can't use the pulpcore solution and just
>>>>>>>>>> have to
>>>>>>>>>> >    roll their own.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What I wrote probably sounded too pessimistic. You are right, in
>>>>>>>>>> most cases that should be doable.
>>>>>>>>>>
>>>>>>>>>> I agree that we could have a simple default solution that just
>>>>>>>>>> requires to specify a couple of field names in the easiest case.
>>>>>>>>>> As you
>>>>>>>>>> say, it should be possible use custom logic in a plugin if
>>>>>>>>>> required.
>>>>>>>>>>
>>>>>>>>>> Here is the case I was thinking of that it can't handle:
>>>>>>>>>>
>>>>>>>>>> In pulp_file, a uniqueness constraint on "relative_path" would
>>>>>>>>>> allow
>>>>>>>>>> content units "a" and "a/b" to be in a repo version.
>>>>>>>>>>
>>>>>>>>>> However, we may want file repos to be representable on an actual
>>>>>>>>>> file
>>>>>>>>>> system (e.g. when exporting them as tar files).  For the repo
>>>>>>>>>> above,
>>>>>>>>>> this does not work, as "a" can't be a file and a directory at the
>>>>>>>>>> same time on a standard Unix file system.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Pulp-dev mailing list
>>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pulp-dev mailing list
>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pulp-dev mailing list
>>>>>>>> Pulp-dev at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pulp-dev mailing list
>>>>>>> Pulp-dev at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190722/749c327a/attachment.htm>


More information about the Pulp-dev mailing list