[Pulp-dev] the "relative path" problem

Pavel Picka ppicka at redhat.com
Wed Apr 1 13:40:12 UTC 2020


Hi,
I'd like to add one more question to this topic. Do you think it is a
blocker for PRs [0] & [1] as by testing [2] this features I haven't run
into real world example where two really same name packages appears.
I think this is a 'must have' feature but until we solve/decide it we can
have two features working may with warning in docs for users that can
happen in some 'special' repositories.

To follow topic directly I like proposed move to 'RepositoryContent' and
add it to its uniqueness constraint (if I understand well).

[0] https://github.com/pulp/pulp_rpm/pull/1657
[1] https://github.com/pulp/pulp_rpm/pull/1642
[2] tested with centos 7, 8, opensuse and SLE repositories

On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley <dalley at redhat.com> wrote:

> We'd like to start a discussion on the "relative path problem" identified
> recently.
> Problem:
>
> Currently, a relative_path is tied to content in Pulp. This means that if
> a content unit exists in two places within a repository or across
> repositories, it has to be stored as two separate content units. This
> creates redundant data and potential confusion for users.
>
> As a specific example, we need to support mirroring content in pulp_rpm
> <https://pulp.plan.io/issues/6353>. Currently, for each location at which
> a single package is stored, we’ll need to create a content unit. We could
> end up with several records representing a single package. Users may be
> confused about why they see multiple records for a package and they may
> have trouble for example deciding which content unit to copy.
> Proposed Solution:
>
> Move “relative_path” from its current location on ContentArtifact, to
> RepositoryContent. This will require a sizable data migration. It is
> possibly the case that in rare cases, repository versions may change
> slightly due to deduplication.
>
> A repository-version-wide uniqueness constraint will be present on
> “relative_path”, independently of any other repository uniquness
> constraints (repo_key_fields) defined by the plugin writer.
>
> Modify the Stages API so that the relative_path can be processed in the
> correct location – instead of “DeclarativeArtifact” it will likely need to
> go on “DeclarativeContent”
>
> Remove “location_href” from the RPM Package content model – it was never a
> true part of the RPM (file) metadata, it is derived from the repository
> metadata. So storing it as a part of the Content unit doesn’t entirely make
> sense.
> Alternatives
>
> In most cases, a content unit will have a single relative path for a
> content unit. Creating a general solution to solve a one-off problem is
> usually not a good idea. As an alternative, we could look at another
> solution for mirroring content. One example might be to create a new object
> (e.g. RpmRepoMirrorContentMapping) that maps content to specific paths
> within a repo or repo version.
> Questions
>
>    - How do we handle this in pulp_file? How are content units identified
>    in pulp_file without relative_path?
>       - Checksum?
>       - How was this problem handled in Pulp 2?
>
>
> Please weigh in if you have any input on potential problems with the
> proposal, potential alternate solutions, or other insights or questions!
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>


-- 
Pavel Picka
Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200401/a5fa204e/attachment.htm>


More information about the Pulp-dev mailing list