[Pulp-dev] the "relative path" problem

Daniel Alley dalley at redhat.com
Thu Apr 30 16:32:48 UTC 2020


@David Davis <daviddavis at redhat.com>  so this proposal would go something
like this, correct?:

* For the signed metadata / exact mirror use-case we need to store the
repository metadata itself as a content unit inside the RepositoryVersion
anyway (because the hash must be equal)
* Because we have this metadata lying around, we can reference it at
publish time to discover the appropriate PublishedArtifact.relative_path
   * Create a map of "filename" -> "location_href" and look up the filename
of each RPM package to find the appropriate path
   * This should be pretty fast for the RPM plugin since createrepo_c is
doing all the hard work
* Data migration to ensure ContentArtifact.relative_path is only storing
the filename (and I would suggest we also change the name to "filename")
* If metadata isn't present in the RepositoryVersion, then just tweak the
PublishedArtifact.relative_path so that it uses whichever our default repo
layout is

On Tue, Apr 28, 2020 at 11:41 AM David Davis <daviddavis at redhat.com> wrote:

> Yes, that's correct. During our meeting we discussed two options: the
> first was to extend RepositoryContent to store relative path per
> ContentArtifact as storing a relative_path per Content won't work for
> multi-Artifact Content units.
>
> An alternative that I pitched was to have plugins (or maybe even core
> someday) store this information outside RepositoryContent and then use this
> information during publishing to set relative_path on PublishedArtifacts.
> We'd have to modify the content app if we wanted to support pass through
> publications but I think asking plugins to use published artifacts in this
> case is warranted. That said, I don't think anyone else was keen on this
> idea though.
>
> David
>
>
> On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg <mdellweg at redhat.com>
> wrote:
>
>> That is only used for passthrough publication afaik. If you publish each
>> content unit "by hand", you create a new relative path for each published
>> artifact. That is, why it can be empty and still the content can be
>> published.
>>
>> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley <dalley at redhat.com> wrote:
>>
>>> We realized in our discussion that the original proposal described in my
>>> email will not work, because "relative_path" ultimately describes the path
>>> of the published *artifacts* (not content), and for content types with
>>> multiple artifacts, storing this information in a field on
>>> RepositoryContent would not be possible.
>>>
>>> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley <dalley at redhat.com> wrote:
>>>
>>>> There is a video call scheduled to discuss this issue tomorrow (Tuesday
>>>> April 28th) at 13:30 UTC (please convert to your local time).
>>>> https://meet.google.com/scy-csbx-qiu
>>>>
>>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis <daviddavis at redhat.com>
>>>> wrote:
>>>>
>>>>> I had a chance to think about this some more yesterday and wanted to
>>>>> email out my thoughts. I also think that this change sounds scary and will
>>>>> have a big impact on plugin writers so I thought of a couple alternatives:
>>>>>
>>>>> First, we could add a relative_path field to RepositoryContent instead
>>>>> of moving it there. This would be an optional field. It would be up to
>>>>> plugins to manage this field and they would still need to populate the
>>>>> relative_path field on ContentArtifact. But plugins could use this optional
>>>>> field to store relative paths per repository and then use this field when
>>>>> generating publications.
>>>>>
>>>>> The second alternative is one that is already laid out in the original
>>>>> email but to call it out again: it would be to not solve this in pulpcore.
>>>>> RPM would create its own object that would map content in a repository to
>>>>> relative_paths.
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp <pamp at atix.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> I am not currently very well versed in the classes involved, but
>>>>>> moving relative_path around sounds slightly scary with the potential to
>>>>>> break things.
>>>>>>
>>>>>>
>>>>>> As such, I would be interested to be kept in the loop as this moves
>>>>>> forward. (Mailing list once there is some movement is entirely sufficient
>>>>>> 😉)
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Quirin Pamp
>>>>>> ------------------------------
>>>>>> *From:* pulp-dev-bounces at redhat.com <pulp-dev-bounces at redhat.com> on
>>>>>> behalf of Ina Panova <ipanova at redhat.com>
>>>>>> *Sent:* 21 April 2020 14:07:13
>>>>>> *To:* Daniel Alley <dalley at redhat.com>
>>>>>> *Cc:* Pulp-dev <pulp-dev at redhat.com>
>>>>>> *Subject:* Re: [Pulp-dev] the "relative path" problem
>>>>>>
>>>>>> Daniel,
>>>>>>
>>>>>> how about setting up a meeting and brainstorm the alternatives,
>>>>>> pros/cons there?
>>>>>>
>>>>>>
>>>>>> --------
>>>>>> Regards,
>>>>>>
>>>>>> Ina Panova
>>>>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>>>>
>>>>>> "Do not go where the path may lead,
>>>>>>  go instead where there is no path and leave a trail."
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley <dalley at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>> Bump, this item needs to move forwards soon.  Does anyone have any
>>>>>> thoughts?
>>>>>>
>>>>>> On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka <ppicka at redhat.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>> I'd like to add one more question to this topic. Do you think it is a
>>>>>> blocker for PRs [0] & [1] as by testing [2] this features I haven't run
>>>>>> into real world example where two really same name packages appears.
>>>>>> I think this is a 'must have' feature but until we solve/decide it we
>>>>>> can have two features working may with warning in docs for users that can
>>>>>> happen in some 'special' repositories.
>>>>>>
>>>>>> To follow topic directly I like proposed move to 'RepositoryContent'
>>>>>> and add it to its uniqueness constraint (if I understand well).
>>>>>>
>>>>>> [0] https://github.com/pulp/pulp_rpm/pull/1657
>>>>>> [1] https://github.com/pulp/pulp_rpm/pull/1642
>>>>>> [2] tested with centos 7, 8, opensuse and SLE repositories
>>>>>>
>>>>>> On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley <dalley at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>> We'd like to start a discussion on the "relative path problem"
>>>>>> identified recently.
>>>>>> Problem:
>>>>>>
>>>>>> Currently, a relative_path is tied to content in Pulp. This means
>>>>>> that if a content unit exists in two places within a repository or across
>>>>>> repositories, it has to be stored as two separate content units. This
>>>>>> creates redundant data and potential confusion for users.
>>>>>>
>>>>>> As a specific example, we need to support mirroring content in
>>>>>> pulp_rpm <https://pulp.plan.io/issues/6353>. Currently, for each
>>>>>> location at which a single package is stored, we’ll need to create a
>>>>>> content unit. We could end up with several records representing a single
>>>>>> package. Users may be confused about why they see multiple records for a
>>>>>> package and they may have trouble for example deciding which content unit
>>>>>> to copy.
>>>>>> Proposed Solution:
>>>>>>
>>>>>> Move “relative_path” from its current location on ContentArtifact, to
>>>>>> RepositoryContent. This will require a sizable data migration. It is
>>>>>> possibly the case that in rare cases, repository versions may change
>>>>>> slightly due to deduplication.
>>>>>>
>>>>>> A repository-version-wide uniqueness constraint will be present on
>>>>>> “relative_path”, independently of any other repository uniquness
>>>>>> constraints (repo_key_fields) defined by the plugin writer.
>>>>>>
>>>>>> Modify the Stages API so that the relative_path can be processed in
>>>>>> the correct location – instead of “DeclarativeArtifact” it will likely need
>>>>>> to go on “DeclarativeContent”
>>>>>>
>>>>>> Remove “location_href” from the RPM Package content model – it was
>>>>>> never a true part of the RPM (file) metadata, it is derived from the
>>>>>> repository metadata. So storing it as a part of the Content unit doesn’t
>>>>>> entirely make sense.
>>>>>> Alternatives
>>>>>>
>>>>>> In most cases, a content unit will have a single relative path for a
>>>>>> content unit. Creating a general solution to solve a one-off problem is
>>>>>> usually not a good idea. As an alternative, we could look at another
>>>>>> solution for mirroring content. One example might be to create a new object
>>>>>> (e.g. RpmRepoMirrorContentMapping) that maps content to specific paths
>>>>>> within a repo or repo version.
>>>>>> Questions
>>>>>>
>>>>>>    - How do we handle this in pulp_file? How are content units
>>>>>>    identified in pulp_file without relative_path?
>>>>>>       - Checksum?
>>>>>>       - How was this problem handled in Pulp 2?
>>>>>>
>>>>>>
>>>>>> Please weigh in if you have any input on potential problems with the
>>>>>> proposal, potential alternate solutions, or other insights or questions!
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pavel Picka
>>>>>> Red Hat
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200430/6418f403/attachment.htm>


More information about the Pulp-dev mailing list