[Pulp-dev] proposing changes to pulp 3 upload API

Brian Bouterse bbouters at redhat.com
Tue Jun 27 13:36:09 UTC 2017


I thought that we pulled out the chunking uploads from the MVP. IIRC,
@jortel and I thought since that use case was for high performing
(parallel) uploads and it should be on the 3.1+ page.

+1 to just sending data without having a file handle. If the entire file is
delivered in one request then having a file ID to upload to in a second
request is just cumbersome.
+1 to having the handler receiving that file just make it an Artifact()
right away. This will work better with how Django handles file uploads.

I also think we can skip making one Artifact from another. That is not
going to be a commonly used use case I think. So removing that use case and
chunking that would be:

   - As an authenticated user, I can upload a file which becomes an
   Artifact. At the end up the of upload, the server returns the JSON
   representation of the created Artifact.
   - As an authenticated user, I can create a content unit by providing the
   content type, its Artifacts using IDs for each Artifact, and the metadata
   supplied in the POST body. This call is atomic, content unit is created in
   the database and on the filesystem or not at all.

The biggest reason I think to do this adjustment is to aligns with the
users desire to have uploads take fewer calls. This removes at least two
calls from the workflow. It also avoids having to save the data multiple
times which I don't think we can do practically.

Thoughts or ideas?

-Brian

On Tue, Jun 27, 2017 at 8:55 AM, Dennis Kliban <dkliban at redhat.com> wrote:

> My motivations for writing this email include: recent discussion about
> pulp 2 upload API in #pulp and django's documentation on file uploads.
>
> Files uploaded to Django are initially stored in memory (if under 2.5 mb)
> or Python's tempfile module is used to write it to /tmp/ directory. The
> file created in /tmp is deleted when and if the last file handle is closed.
>
> If we implement the upload API as described in the MVP doc[0], then
> according to Django docs[1] we will be performing a write to disk 2 or 3
> times for each upload. In cases where a file is bigger than 2.5mb in size,
> it will be first written to /tmp. The same file will then be written to
> /var/lib/pulp/uploads (or similar location) when the FileUpload model is
> saved. A third write will occur when an artifact is created using the
> FileUpload. This third write will likely be a move though.
>
> I propose that we eliminate writing the uploaded file to
> /var/lib/pulp/upload and go directly to creating an artifact. The use cases
> can then be rewritten as the following:
>
>
>    - As an authenticated user, I can upload a file with an optional chunk
>    size, and an optional offset. At the end up the of upload the server
>    returns the JSON representation of the artifact.
>
>
>
>    - As an authenticated user, I can create a new artifact by specifying
>    an existing artifact id.
>
>
>
>    - As an authenticated user, I can create a content unit by providing
>    the content type, its Artifacts using IDs for each Artifact, and the
>    metadata supplied in the POST body. This call is atomic, content unit is
>    created in the database and on the filesystem or not at all.
>
>
>
>
> [0] https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Minimum_
> Viable_Product#Upload-amp-Copy
> [1] https://docs.djangoproject.com/en/1.9/topics/http/file-
> uploads/#handling-uploaded-files-with-a-model
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170627/97b3487c/attachment.htm>


More information about the Pulp-dev mailing list