[Pulp-dev] the "relative path" problem

Daniel Alley dalley at redhat.com
Tue Apr 28 14:08:35 UTC 2020


We realized in our discussion that the original proposal described in my
email will not work, because "relative_path" ultimately describes the path
of the published *artifacts* (not content), and for content types with
multiple artifacts, storing this information in a field on
RepositoryContent would not be possible.

On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley <dalley at redhat.com> wrote:

> There is a video call scheduled to discuss this issue tomorrow (Tuesday
> April 28th) at 13:30 UTC (please convert to your local time).
> https://meet.google.com/scy-csbx-qiu
>
> On Sat, Apr 25, 2020 at 7:02 AM David Davis <daviddavis at redhat.com> wrote:
>
>> I had a chance to think about this some more yesterday and wanted to
>> email out my thoughts. I also think that this change sounds scary and will
>> have a big impact on plugin writers so I thought of a couple alternatives:
>>
>> First, we could add a relative_path field to RepositoryContent instead of
>> moving it there. This would be an optional field. It would be up to plugins
>> to manage this field and they would still need to populate the
>> relative_path field on ContentArtifact. But plugins could use this optional
>> field to store relative paths per repository and then use this field when
>> generating publications.
>>
>> The second alternative is one that is already laid out in the original
>> email but to call it out again: it would be to not solve this in pulpcore.
>> RPM would create its own object that would map content in a repository to
>> relative_paths.
>>
>> David
>>
>>
>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp <pamp at atix.de> wrote:
>>
>>> Hi,
>>>
>>>
>>> I am not currently very well versed in the classes involved, but moving
>>> relative_path around sounds slightly scary with the potential to break
>>> things.
>>>
>>>
>>> As such, I would be interested to be kept in the loop as this moves
>>> forward. (Mailing list once there is some movement is entirely sufficient
>>> 😉)
>>>
>>>
>>> Thanks,
>>>
>>> Quirin Pamp
>>> ------------------------------
>>> *From:* pulp-dev-bounces at redhat.com <pulp-dev-bounces at redhat.com> on
>>> behalf of Ina Panova <ipanova at redhat.com>
>>> *Sent:* 21 April 2020 14:07:13
>>> *To:* Daniel Alley <dalley at redhat.com>
>>> *Cc:* Pulp-dev <pulp-dev at redhat.com>
>>> *Subject:* Re: [Pulp-dev] the "relative path" problem
>>>
>>> Daniel,
>>>
>>> how about setting up a meeting and brainstorm the alternatives,
>>> pros/cons there?
>>>
>>>
>>> --------
>>> Regards,
>>>
>>> Ina Panova
>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>
>>> "Do not go where the path may lead,
>>>  go instead where there is no path and leave a trail."
>>>
>>>
>>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley <dalley at redhat.com> wrote:
>>>
>>> Bump, this item needs to move forwards soon.  Does anyone have any
>>> thoughts?
>>>
>>> On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka <ppicka at redhat.com> wrote:
>>>
>>> Hi,
>>> I'd like to add one more question to this topic. Do you think it is a
>>> blocker for PRs [0] & [1] as by testing [2] this features I haven't run
>>> into real world example where two really same name packages appears.
>>> I think this is a 'must have' feature but until we solve/decide it we
>>> can have two features working may with warning in docs for users that can
>>> happen in some 'special' repositories.
>>>
>>> To follow topic directly I like proposed move to 'RepositoryContent' and
>>> add it to its uniqueness constraint (if I understand well).
>>>
>>> [0] https://github.com/pulp/pulp_rpm/pull/1657
>>> [1] https://github.com/pulp/pulp_rpm/pull/1642
>>> [2] tested with centos 7, 8, opensuse and SLE repositories
>>>
>>> On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley <dalley at redhat.com> wrote:
>>>
>>> We'd like to start a discussion on the "relative path problem"
>>> identified recently.
>>> Problem:
>>>
>>> Currently, a relative_path is tied to content in Pulp. This means that
>>> if a content unit exists in two places within a repository or across
>>> repositories, it has to be stored as two separate content units. This
>>> creates redundant data and potential confusion for users.
>>>
>>> As a specific example, we need to support mirroring content in pulp_rpm
>>> <https://pulp.plan.io/issues/6353>. Currently, for each location at
>>> which a single package is stored, we’ll need to create a content unit. We
>>> could end up with several records representing a single package. Users may
>>> be confused about why they see multiple records for a package and they may
>>> have trouble for example deciding which content unit to copy.
>>> Proposed Solution:
>>>
>>> Move “relative_path” from its current location on ContentArtifact, to
>>> RepositoryContent. This will require a sizable data migration. It is
>>> possibly the case that in rare cases, repository versions may change
>>> slightly due to deduplication.
>>>
>>> A repository-version-wide uniqueness constraint will be present on
>>> “relative_path”, independently of any other repository uniquness
>>> constraints (repo_key_fields) defined by the plugin writer.
>>>
>>> Modify the Stages API so that the relative_path can be processed in the
>>> correct location – instead of “DeclarativeArtifact” it will likely need to
>>> go on “DeclarativeContent”
>>>
>>> Remove “location_href” from the RPM Package content model – it was never
>>> a true part of the RPM (file) metadata, it is derived from the repository
>>> metadata. So storing it as a part of the Content unit doesn’t entirely make
>>> sense.
>>> Alternatives
>>>
>>> In most cases, a content unit will have a single relative path for a
>>> content unit. Creating a general solution to solve a one-off problem is
>>> usually not a good idea. As an alternative, we could look at another
>>> solution for mirroring content. One example might be to create a new object
>>> (e.g. RpmRepoMirrorContentMapping) that maps content to specific paths
>>> within a repo or repo version.
>>> Questions
>>>
>>>    - How do we handle this in pulp_file? How are content units
>>>    identified in pulp_file without relative_path?
>>>       - Checksum?
>>>       - How was this problem handled in Pulp 2?
>>>
>>>
>>> Please weigh in if you have any input on potential problems with the
>>> proposal, potential alternate solutions, or other insights or questions!
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>>
>>>
>>> --
>>> Pavel Picka
>>> Red Hat
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200428/cdeb3173/attachment.htm>


More information about the Pulp-dev mailing list