[Pulp-dev] proposing changes to pulp 3 upload API
Jeff Ortel
jortel at redhat.com
Wed Jun 28 17:10:47 UTC 2017
On 06/28/2017 11:44 AM, Brian Bouterse wrote:
> For a file to be received and saved in the right place once, we need the view saving the file to have all the
> info to form the complete path. After talking w/ @jortel, I think we should store Artifacts at the following path:
>
> MEDIA_ROOT/content/units/digest[0:2]/digest[2:]/<rel_path>
Consider:
MEDIA_ROOT/artifact/digest[0:2]/digest[2:]/<rel_path>
Since artifact would have an optional association with content. And, given the many-to-many relationship, the
content_id FK would not longer exist in the Artifact table. Also, I have more plans for Artifacts in a
"Publishing" proposal I'm writing to pulp-dev (spoiler alert).
We would also want to enforce the same CAS (content addressed storage) uniqueness in the DB using a unique
constraint on the Artifact. Eg: unique (sha256, rel_path). This ensure that each unique artifact (file) has
exactly 1 DB record.
>
> Note that digest is the Artifact's sha256 digest. This is different from pulp2 which used the digest of the
> content unit. Note that <rel_path> would be provided by the user along with <size> and/or <checksum_digest>.
>
> Note that this will cause an Artifact to live in exactly one place which means Artifacts are now unique by
> digest and would need to be able to be associated with multiple content units. I'm not sure why we didn't do
> this before, so I'm interested in exploring issues associated with this.
>
> It would be a good workflow. For a single file content unit (e.g.) rpm upload would be a two step process.
>
> 1. POST/PUT the file's binary data and the <relative_path> and <size> and/or <checksum_digest> as GET parameters
> 2. Create a content unit with the unit metadata, and 0 .. n Artifacts referred to by ID. This could optionally
> associate the new unit with one repository as part of the atomic unit creation.
>
> Thoughts/Ideas?
>
> -Brian
>
>
> On Tue, Jun 27, 2017 at 4:16 PM, Dennis Kliban <dkliban at redhat.com <mailto:dkliban at redhat.com>> wrote:
>
> On Tue, Jun 27, 2017 at 3:31 PM, Michael Hrivnak <mhrivnak at redhat.com <mailto:mhrivnak at redhat.com>> wrote:
>
> Could you re-summarize what problem would be solved by not having a FileUpload model, and giving the
> Artifact model the ability to have partial data and no Content foreign key?
>
> I understand the concern about where on the filesystem the data gets written and how many times, but
> I'm not seeing how that's related to whether we have a FileUpload model or not. Are we discussing two
> separate issues? 1) filesystem locations and copy efficiency, and 2) API design? Or is this discussion
> trying to connect them in a way I'm not seeing?
>
>
> There were two concerns: 1) Filesystem location and copy efficiency 2) API design
>
> The first one has been addressed. Thank you for pointing out that a second write will be a move operation.
>
> However, I am still concerned about the complexity of the API. A relatively small file should not require
> an upload session to be uploaded. A single API call to the Artifacts API should be enough to upload a file
> and create an Artifact from it. In Pulp 3.1+ we can introduce the FileUpload model to support chunked
> uploads. At the same time we would extend the Artifact API to accept a FileUpload id for creating an
> Artifact.
>
>
> On Tue, Jun 27, 2017 at 3:20 PM, Dennis Kliban <dkliban at redhat.com <mailto:dkliban at redhat.com>> wrote:
>
> On Tue, Jun 27, 2017 at 2:56 PM, Brian Bouterse <bbouters at redhat.com <mailto:bbouters at redhat.com>>
> wrote:
>
> Picking up from @jortel's observations...
>
> +1 to allowing Artifacts to have an optional FK.
>
> If we have an Artifacts endpoint then we can allow for the deleting of a single artifact if it
> has no FK. I think we want to disallow the removal of an Artifact that has a foreign key. Also
> filtering should allow a single operation to clean up all unassociated artifacts by searching
> for FK=None or similar.
>
> Yes, we will need to allow the single call delivering a file to also specify the relative
> path, size, checksums etc. Since the POST body contains binary data we either need to accept
> this data as GET style params or use a multi-part MIME upload [0]. Note that this creation of
> an Artifact does not change the repository contents and therefore can be handled synchronously
> outside of the tasking system.
>
> +1 to the saving of an Artifact to perform validation
>
> [0]: https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
> <https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html>
>
>
>
> -Brian
>
>
> I also support this optional FK for Artifacts and validation on save. We should probably stick
> with accepting GET parameters for the MVP. Though multi-part MIME support would be good to
> consider for 3.1+.
>
>
>
> On Tue, Jun 27, 2017 at 2:44 PM, Dennis Kliban <dkliban at redhat.com
> <mailto:dkliban at redhat.com>> wrote:
>
> On Tue, Jun 27, 2017 at 1:24 PM, Michael Hrivnak <mhrivnak at redhat.com
> <mailto:mhrivnak at redhat.com>> wrote:
>
>
> On Tue, Jun 27, 2017 at 11:27 AM, Jeff Ortel <jortel at redhat.com
> <mailto:jortel at redhat.com>> wrote:
>
>
> - The artifact FK to a content unit would need to become optional.
>
> - Need to add use cases for cleaning up artifacts not associated with a content unit.
>
> - The upload API would need additional information needed to create an artifact.
> Like relative path, size,
> checksums etc.
>
> - Since (I assume) you are proposing uploading/writing directly to artifact
> storage (not staging in a working
> dir), the flow would need to involve (optional) validation. If validation fails,
> the artifact must not be
> inserted into the DB.
>
>
> Perhaps a decent middle ground would be to stick with the plan of keeping uploaded (or
> partially uploaded) files as a separate model until they are ready to be turned into a
> Content instance plus artifacts, and save their file data directly to somewhere within
> /var/lib/pulp/. It would be some path distinct from where Artifacts are stored. That's
> what I had imagined we would do anyway. Then as Dennis pointed out, turning that into
> an Artifact would only require a move operation on the same filesystem, which is
> super-cheap.
>
>
> Would that address all the concerns? We'd write the data just once, and then move it
> once on the same filesystem. I haven't looked at django's support for this recently,
> but it seems like it should be doable.
>
> I was just looking at the dropbox API and noticed that they provide two separate API
> endpoints for regular file uploads[0] (< 150mb) and large file uploads[1]. It is the
> latter that supports chunking and requires using an upload id. For the most common case
> they support uploading a file with one API call. Our original proposal requires 2 for the
> same use case. Pulp API users would appreciate having to only make one API call to upload
> a file.
>
> [0] https://www.dropbox.com/developers-v1/core/docs#files_put
> <https://www.dropbox.com/developers-v1/core/docs#files_put>
> [1] https://www.dropbox.com/developers-v1/core/docs#chunked-upload
> <https://www.dropbox.com/developers-v1/core/docs#chunked-upload>
>
>
>
> --
>
> Michael Hrivnak
>
> Principal Software Engineer, RHCE
>
> Red Hat
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
>
>
>
>
>
>
> --
>
> Michael Hrivnak
>
> Principal Software Engineer, RHCE
>
> Red Hat
>
>
>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170628/49731082/attachment.sig>
More information about the Pulp-dev
mailing list