[Pulp-dev] proposing changes to pulp 3 upload API

Jeff Ortel jortel at redhat.com
Fri Jun 30 11:53:47 UTC 2017


Perhaps I missed something .. when did the "random path" get proposed?  In pulp2 and in pulp3 (currently), the
artifact's path is deterministic.

MEDIA_ROOT/content/units/<type>/digest[0:2]/digest[2:]/<relative-path>

where The digest is the sha256 hex-digest of the content unit's natural key.  For example, the natural key for
RPM content is the NEVREA and checksum.

On 06/29/2017 03:51 PM, Brian Bouterse wrote:
> There is really one practical issue that is driving this convo (I think):  Django's file upload handling wants
> to save a file when we receive it. We also don't want to be moving around files. Therefore we must save the
> file in the right place on the first save().
> 
> So given ^, the question reduces to: "Where do we want to save a file that backs an Artifact?" We can do that
> one of two ways: randomly or orderly. Randomly would be inventing a uuid for each file and having that make
> the path to the file unique. An orderly way of doing it would be to have an digest be used instead of a uuid.
> Here are some path examples:
> 
> random_path_example (random uuid):    MEDIA_ROOT/artifact/uuid[0:2]/uuid[2:]
> orderly_path_example (sha256 is the binary's digest):    MEDIA_ROOT/artifact/digest[0:2]/digest[2:]
> 
> Random assignment is straightforward, and it also allows one Artifact to serve exactly one content unit
> allowing CASCADE delete's to handle cleanup easily. The problem with random assignment is that it prevents an
> important down-the-road use case:  "as a user who has a file backup but not a database backup, I can recover
> my data without having to re-download all of my content from remotes". Specifically, if Artifact's paths are
> randomly chosen at upload time then if someone hands you a disk of Artifacts and asks you to sync EPEL, there
> is no way Pulp can reasonably recognize content it has on disk as already existing there.
> 
> This is where content addressable storage comes in. If the remoteArtifact has the sha256 hash value set from
> the remote metadata that was fetched, Pulp's changesets could recognize data on disk as already downloaded. A
> random layout can never do that. A tertiary outcome of using Content Addressable Store is that now each file
> backing an Artifact can only be stored on the filesystem. I say "tertiary outcome" and not "downside" because
> even though it's harder for us to implement, users would definitely see it as a benefit that Pulp can't
> duplicate content at an Architectural level.
> 
> Please send thoughts/ideas.
> 
> -Brian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170630/946e53a4/attachment.sig>


More information about the Pulp-dev mailing list