[Pulp-dev] proposing changes to pulp 3 upload API

Tue Jun 27 20:16:08 UTC 2017

On Tue, Jun 27, 2017 at 3:31 PM, Michael Hrivnak <mhrivnak at redhat.com>
wrote:

> Could you re-summarize what problem would be solved by not having a
> FileUpload model, and giving the Artifact model the ability to have partial
> data and no Content foreign key?
>
> I understand the concern about where on the filesystem the data gets
> written and how many times, but I'm not seeing how that's related to
> whether we have a FileUpload model or not. Are we discussing two separate
> issues? 1) filesystem locations and copy efficiency, and 2) API design? Or
> is this discussion trying to connect them in a way I'm not seeing?
>

There were two concerns: 1) Filesystem  location and copy efficiency 2) API
design

The first one has been addressed. Thank you for pointing out that a second
write will be a move operation.

However, I am still concerned about the complexity of the API. A relatively
small file should not require an upload session to be uploaded. A single
API call to the Artifacts API should be enough to upload a file and create
an Artifact from it. In Pulp 3.1+ we can introduce the FileUpload model to
support chunked uploads. At the same time we would extend the Artifact API
to accept a FileUpload id for creating an Artifact.

> On Tue, Jun 27, 2017 at 3:20 PM, Dennis Kliban <dkliban at redhat.com> wrote:
>
>> On Tue, Jun 27, 2017 at 2:56 PM, Brian Bouterse <bbouters at redhat.com>
>> wrote:
>>
>>> Picking up from @jortel's observations...
>>>
>>> +1 to allowing Artifacts to have an optional FK.
>>>
>>> If we have an Artifacts endpoint then we can allow for the deleting of a
>>> single artifact if it has no FK. I think we want to disallow the removal of
>>> an Artifact that has a foreign key. Also filtering should allow a single
>>> operation to clean up all unassociated artifacts by searching for FK=None
>>> or similar.
>>>
>>> Yes, we will need to allow the single call delivering a file to also
>>> specify the relative path, size, checksums etc. Since the POST body
>>> contains binary data we either need to accept this data as GET style params
>>> or use a multi-part MIME upload [0]. Note that this creation of an Artifact
>>> does not change the repository contents and therefore can be handled
>>> synchronously outside of the tasking system.
>>>
>>> +1 to the saving of an Artifact to perform validation
>>>
>>> [0]: https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
>>>
>>>
>>
>>> -Brian
>>>
>>
>> I also support this optional FK for Artifacts and validation on save.  We
>> should probably stick with accepting GET parameters for the MVP. Though
>> multi-part MIME support would be good to consider for 3.1+.
>>
>>
>>>
>>> On Tue, Jun 27, 2017 at 2:44 PM, Dennis Kliban <dkliban at redhat.com>
>>> wrote:
>>>
>>>> On Tue, Jun 27, 2017 at 1:24 PM, Michael Hrivnak <mhrivnak at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On Tue, Jun 27, 2017 at 11:27 AM, Jeff Ortel <jortel at redhat.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> - The artifact FK to a content unit would need to become optional.
>>>>>>
>>>>>> - Need to add use cases for cleaning up artifacts not associated with
>>>>>> a content unit.
>>>>>>
>>>>>> - The upload API would need additional information needed to create
>>>>>> an artifact.  Like relative path, size,
>>>>>> checksums etc.
>>>>>>
>>>>>> - Since (I assume) you are proposing uploading/writing directly to
>>>>>> artifact storage (not staging in a working
>>>>>> dir), the flow would need to involve (optional) validation.  If
>>>>>> validation fails, the artifact must not be
>>>>>> inserted into the DB.
>>>>>
>>>>>
>>>>> Perhaps a decent middle ground would be to stick with the plan of
>>>>> keeping uploaded (or partially uploaded) files as a separate model until
>>>>> they are ready to be turned into a Content instance plus artifacts, and
>>>>> save their file data directly to somewhere within /var/lib/pulp/. It would
>>>>> be some path distinct from where Artifacts are stored. That's what I had
>>>>> imagined we would do anyway. Then as Dennis pointed out, turning that into
>>>>> an Artifact would only require a move operation on the same filesystem,
>>>>> which is super-cheap.
>>>>>
>>>>>
>>>> Would that address all the concerns? We'd write the data just once, and
>>>>> then move it once on the same filesystem. I haven't looked at django's
>>>>> support for this recently, but it seems like it should be doable.
>>>>>
>>>>> I was just looking at the dropbox API and noticed that they provide
>>>> two separate API endpoints for regular file uploads[0] (< 150mb) and large
>>>> file uploads[1]. It is the latter that supports chunking and requires using
>>>> an upload id. For the most common case they support uploading a file with
>>>> one API call. Our original proposal requires 2 for the same use case. Pulp
>>>> API users would appreciate having to only make one API call to upload a
>>>> file.
>>>>
>>>> [0] https://www.dropbox.com/developers-v1/core/docs#files_put
>>>> [1] https://www.dropbox.com/developers-v1/core/docs#chunked-upload
>>>>
>>>>
>>>>
>>>>> --
>>>>>
>>>>> Michael Hrivnak
>>>>>
>>>>> Principal Software Engineer, RHCE
>>>>>
>>>>> Red Hat
>>>>>
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>>
>>>
>>
>
>
> --
>
> Michael Hrivnak
>
> Principal Software Engineer, RHCE
>
> Red Hat
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170627/bd7bddd4/attachment.htm>