[Pulp-dev] proposing changes to pulp 3 upload API

Michael Hrivnak mhrivnak at redhat.com
Fri Jun 30 15:15:02 UTC 2017


I was also going to bring up that content is currently addressable based on
the unit. That's an option we could keep in Pulp 3. It may or may not be
best, but it's a viable option. The one thing I like about it is the
simplicity of ownership and cleanup. There is no race condition around file
management nor a need to manage orphaned files.

I also want to clarify again that moving a file within the same filesystem
is crazy cheap. I don't think we need to worry about having something
transient like FileUpload and also an Artifact model as long as they are
being stored on the same filesystem, which /var/lib/pulp/ should always be.
Object storage will be even easier I think, because we'd just reference the
same remote object from both models.

All that said, I do like the idea of tracking Artifact uniqueness
separately. I just want to figure out exactly what we gain, vs. the expense.

Jeff, earlier in the thread we talked about using the through table to hold
the path. I think that's the right place, because the path would be a
property of the relationship between an artifact and a content unit. It
also occurred to me that the file name could be different for different
content, so maybe the path would need to include the filename. That seems a
bit weird, but I think it has to be the case if we use a many-to-many
relationship.

On Fri, Jun 30, 2017 at 8:53 AM, Jeff Ortel <jortel at redhat.com> wrote:

> It's my understanding that the advantage of the proposed change to how the
> storage path is calculated to be
> based on the sha256 digest of the artifact's file itself, is so that
> uploads can be stored directly as
> Artifacts.  In other words, an artifact can be created independently of a
> content unit and stored at a
> deterministic location.  This is effectively CAS (content addressed
> storage).  As a result, each artifact is
> absolutely unique and stored exactly once.  In pulp2 and pulp3
> (currently), each artifact is unique within its
> content unit by relative path.  The binary uniqueness introduces a
> requirement of a many-to-many relationship
> between Artifact <-> Content.  For example: an EL7 RPM that is associated
> with an RPM content unit and an EL7
> distribution.
>
> There is a lot I like about this approach.  However, one thing that needs
> to be accounted for is the
> 'relative_path' field on the Artifact.  It represents the artifact's
> location within the content unit.  This
> information is vital to publishing multi-artifact content such as
> distributions.  Perhaps a modeling change
> would resolve this.  Just thinking out loud:
>
> File 1--n Artifact n--1 Content.
>
> File
> -----------------
> [pk] id
> path <--- FileField:  MEDIA_ROOT/files/digest[0:2]/digest[2:]/<name>  #
> not sure about <name>
> downloaded
> sha1
> sha224
> sha256
> ...
>
>
> Artifact
> ------------------
> [pk] id
> [fk] file_id
> [fk] content_id
> relative_path
>
> Files can be uploaded directly to `File`.  I think this would support the
> proposed upload API.
>
> Thoughts?
>
>
> On 06/30/2017 06:53 AM, Jeff Ortel wrote:
> > Perhaps I missed something .. when did the "random path" get proposed?
> In pulp2 and in pulp3 (currently), the
> > artifact's path is deterministic.
> >
> > MEDIA_ROOT/content/units/<type>/digest[0:2]/digest[2:]/<relative-path>
> >
> > where The digest is the sha256 hex-digest of the content unit's natural
> key.  For example, the natural key for
> > RPM content is the NEVREA and checksum.
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>


-- 

Michael Hrivnak

Principal Software Engineer, RHCE

Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170630/6726ccbf/attachment.htm>


More information about the Pulp-dev mailing list