[Pulp-dev] the "relative path" problem

Daniel Alley dalley at redhat.com
Wed Apr 1 13:20:49 UTC 2020


We'd like to start a discussion on the "relative path problem" identified
recently.
Problem:

Currently, a relative_path is tied to content in Pulp. This means that if a
content unit exists in two places within a repository or across
repositories, it has to be stored as two separate content units. This
creates redundant data and potential confusion for users.

As a specific example, we need to support mirroring content in pulp_rpm
<https://pulp.plan.io/issues/6353>. Currently, for each location at which a
single package is stored, we’ll need to create a content unit. We could end
up with several records representing a single package. Users may be
confused about why they see multiple records for a package and they may
have trouble for example deciding which content unit to copy.
Proposed Solution:

Move “relative_path” from its current location on ContentArtifact, to
RepositoryContent. This will require a sizable data migration. It is
possibly the case that in rare cases, repository versions may change
slightly due to deduplication.

A repository-version-wide uniqueness constraint will be present on
“relative_path”, independently of any other repository uniquness
constraints (repo_key_fields) defined by the plugin writer.

Modify the Stages API so that the relative_path can be processed in the
correct location – instead of “DeclarativeArtifact” it will likely need to
go on “DeclarativeContent”

Remove “location_href” from the RPM Package content model – it was never a
true part of the RPM (file) metadata, it is derived from the repository
metadata. So storing it as a part of the Content unit doesn’t entirely make
sense.
Alternatives

In most cases, a content unit will have a single relative path for a
content unit. Creating a general solution to solve a one-off problem is
usually not a good idea. As an alternative, we could look at another
solution for mirroring content. One example might be to create a new object
(e.g. RpmRepoMirrorContentMapping) that maps content to specific paths
within a repo or repo version.
Questions

   - How do we handle this in pulp_file? How are content units identified
   in pulp_file without relative_path?
      - Checksum?
      - How was this problem handled in Pulp 2?


Please weigh in if you have any input on potential problems with the
proposal, potential alternate solutions, or other insights or questions!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200401/65e87a3f/attachment.htm>


More information about the Pulp-dev mailing list