<div dir="ltr"><h3 class="gmail-part" id="gmail-Problem"><font size="2"><span style="font-weight:normal">We'd like to start a discussion on the "relative path problem" identified recently.</span></font><br></h3><h3 class="gmail-part" id="gmail-Problem">Problem:</h3><p class="gmail-part">Currently,
a relative_path is tied to content in Pulp. This means that if a
content unit exists in two places within a repository or across
repositories, it has to be stored as two separate content units. This
creates redundant data and potential confusion for users.</p><p class="gmail-part">As a specific example, we need <a href="https://pulp.plan.io/issues/6353" target="_blank" rel="noopener">to support mirroring content in pulp_rpm</a>.
Currently, for each location at which a single package is stored, we’ll
need to create a content unit. We could end up with several records
representing a single package. Users may be confused about why they see
multiple records for a package and they may have trouble for example
deciding which content unit to copy.</p><h3 class="gmail-part" id="gmail-Proposed-Solution">Proposed Solution:</h3><p class="gmail-part">Move
“relative_path” from its current location on ContentArtifact, to
RepositoryContent. This will require a sizable data migration. It is
possibly the case that in rare cases, repository versions may change
slightly due to deduplication.</p><p class="gmail-part">A
repository-version-wide uniqueness constraint will be present on
“relative_path”, independently of any other repository uniquness
constraints (repo_key_fields) defined by the plugin writer.</p><p class="gmail-part">Modify
the Stages API so that the relative_path can be processed in the
correct location – instead of “DeclarativeArtifact” it will likely need
to go on “DeclarativeContent”</p><p class="gmail-part">Remove
“location_href” from the RPM Package content model – it was never a
true part of the RPM (file) metadata, it is derived from the repository
metadata. So storing it as a part of the Content unit doesn’t entirely
make sense.</p><h3 class="gmail-part" id="gmail-Alternatives">Alternatives</h3><p class="gmail-part">In
most cases, a content unit will have a single relative path for a
content unit. Creating a general solution to solve a one-off problem is
usually not a good idea. As an alternative, we could look at another
solution for mirroring content. One example might be to create a new
object (e.g. RpmRepoMirrorContentMapping) that maps content to specific
paths within a repo or repo version.</p><h3 class="gmail-part" id="gmail-Questions">Questions</h3><ul class="gmail-part"><li class="gmail-">How do we handle this in pulp_file? How are content units identified in pulp_file without relative_path?</li><ul><li class="gmail-">Checksum?<br></li></ul><li class="gmail-">How was this problem handled in Pulp 2?</li></ul><div><br></div><div>Please weigh in if you have any input on potential problems with the proposal, potential alternate solutions, or other insights or questions!<br></div></div>