[Pulp-dev] proposing changes to pulp 3 upload API

Michael Hrivnak mhrivnak at redhat.com
Thu Jun 29 13:16:56 UTC 2017

Thanks for that explanation. That makes sense. I would describe this as
saying there is a many-to-many relationship between Content and Artifact,
and the ContentArtifact is the "glue" or "through" table.

And again to just understand why... are we deliberately trying to
prioritize a use case where one artifact is shared by multiple Content
units? Can someone talk about the pros and cons of that within the context
of this proposal? I expect it could save a small portion of disk space, but
maybe not very much. Pulp does a pretty good job of de-duplicating at the
Content level. Changing to a m2m relationship would definitely add more
complexity though, and that's the aspect I'm interested in comparing to
what value we are seeking.

Separate from that relationship question, we have a use case that the
direct-to Artifact workflow does not cover. There are multiple unit types
where a user wants to upload a single file that represents multiple content
units, and let pulp create one or more content units based on that file.
For example, a docker manifest and its blobs all get saved to disk together
(by a separate tool) and then uploaded as a tarball that Pulp can receive
and process together. We could ask the upload client to open up the tarball
and upload files individually I suppose. That puts more burden on the
client though.

Another example: a user can upload a comps.xml file, and pulp will parse it
to create as many units as it finds in the XML. Pulp does not keep that
comps.xml file, so in the proposed workflow, it would need to delete the
Artifact at the end. It seems unexpected to utilize an Artifact as
temporary storage in this way.

I suspect we'll find more use cases like this. Thoughts? Is the FileUpload
really worth eliminating? What I like about the current upload workflow,
and the FileUpload workflow, is that it allows the plugin to receive any
file or set of files that make sense within its domain, and then use that
set of files to create units however it sees fit. It is difficult to get
more prescriptive than that at the platform/core level.

On Thu, Jun 29, 2017 at 8:47 AM, Dennis Kliban <dkliban at redhat.com> wrote:

> On Thu, Jun 29, 2017 at 7:40 AM, Michael Hrivnak <mhrivnak at redhat.com>
> wrote:
>> On Thu, Jun 29, 2017 at 7:22 AM, Dennis Kliban <dkliban at redhat.com>
>> wrote:
>>> The many to many relationship is between Artifact and ContentArtifact.
>>> This allows a content unit to have multiple Artifacts associated with it.
>> Could you elaborate on this? A content unit can have multiple artifacts
>> just by artifact having a foreign key to a content unit. That's the
>> one-to-many relationship we have on the model now in 3.0-dev.
>> Also, what is a ContentArtifact?
> Here are some definitions for the new proposal:
>    - Artifact - a file stored in pulp
>    - Content - a named collection of 0 or more Artifacts that can be
>    associated with a repository as a single unit
>    - ContentArtifact - a relationship between an Artifact and Content.
>    There is 0 or more ContentArtifacts for each Content.
>    - Repository - A named collection of content.
>    - RepositoryContent - a relationship between Content and Repository.
> In the proposal we have in the MVP we have the following:
>    - FileUpload - Uploaded file that is used to create Artifacts and is
>    then removed (definition for this is not present in the glossary of MVP)
>    - Artifact - A file associated with one content (unit). Artifacts are
>    not shared between content (units). Create a content unit using an uploaded
>    file ID as the source for its metadata. Create Artifacts associated with
>    the content unit using an uploaded file ID for each; commit as a single
>    transaction.
>    - Content (unit) - A single piece of content manged by Pulp. Each file
>    associated with a content (unit) is called an Artifact. Each content (unit)
>    may have zero or many Artifacts.
>    - Repository - A named collection of content.
>    - RepositoryContent - a relationship between Content and Repository
>    (also not in the glossary of the MVP)
> In the MVP in order to add a unit to a repository, a user would:
>    1. Create a FileUpload by uploading a file
>    2. Create an Artifact and a Content with one API call
>    3. Associate a Content with a Repository
>    4. Delete the FileUpload (or some cleanup job would do that for the
>    user)
> The newly proposed workflow:
>    1. Create an Artifact by uploading a file
>    2. Create a Content by specifying which Artifact(s) belongs to the
>    Content and their relative paths inside the unit. This creates
>    ContentArtifacts for each relationship.
>    3. Associate a Content with a repository.
> In the MVP workflow, once an FileUpload is deleted, it's hard to create
> another Content from that file. I am sure we can come up with a way to do
> it, but it won't be as straight forward as the above workflow.
>> --
>> Michael Hrivnak
>> Principal Software Engineer, RHCE
>> Red Hat


Michael Hrivnak

Principal Software Engineer, RHCE

Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170629/6bacdac6/attachment.htm>

More information about the Pulp-dev mailing list