[Pulp-dev] proposing changes to pulp 3 upload API

Michael Hrivnak mhrivnak at redhat.com
Tue Jun 27 19:31:55 UTC 2017


Could you re-summarize what problem would be solved by not having a
FileUpload model, and giving the Artifact model the ability to have partial
data and no Content foreign key?

I understand the concern about where on the filesystem the data gets
written and how many times, but I'm not seeing how that's related to
whether we have a FileUpload model or not. Are we discussing two separate
issues? 1) filesystem locations and copy efficiency, and 2) API design? Or
is this discussion trying to connect them in a way I'm not seeing?

On Tue, Jun 27, 2017 at 3:20 PM, Dennis Kliban <dkliban at redhat.com> wrote:

> On Tue, Jun 27, 2017 at 2:56 PM, Brian Bouterse <bbouters at redhat.com>
> wrote:
>
>> Picking up from @jortel's observations...
>>
>> +1 to allowing Artifacts to have an optional FK.
>>
>> If we have an Artifacts endpoint then we can allow for the deleting of a
>> single artifact if it has no FK. I think we want to disallow the removal of
>> an Artifact that has a foreign key. Also filtering should allow a single
>> operation to clean up all unassociated artifacts by searching for FK=None
>> or similar.
>>
>> Yes, we will need to allow the single call delivering a file to also
>> specify the relative path, size, checksums etc. Since the POST body
>> contains binary data we either need to accept this data as GET style params
>> or use a multi-part MIME upload [0]. Note that this creation of an Artifact
>> does not change the repository contents and therefore can be handled
>> synchronously outside of the tasking system.
>>
>> +1 to the saving of an Artifact to perform validation
>>
>> [0]: https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
>>
>>
>
>> -Brian
>>
>
> I also support this optional FK for Artifacts and validation on save.  We
> should probably stick with accepting GET parameters for the MVP. Though
> multi-part MIME support would be good to consider for 3.1+.
>
>
>>
>> On Tue, Jun 27, 2017 at 2:44 PM, Dennis Kliban <dkliban at redhat.com>
>> wrote:
>>
>>> On Tue, Jun 27, 2017 at 1:24 PM, Michael Hrivnak <mhrivnak at redhat.com>
>>> wrote:
>>>
>>>>
>>>> On Tue, Jun 27, 2017 at 11:27 AM, Jeff Ortel <jortel at redhat.com> wrote:
>>>>
>>>>>
>>>>> - The artifact FK to a content unit would need to become optional.
>>>>>
>>>>> - Need to add use cases for cleaning up artifacts not associated with
>>>>> a content unit.
>>>>>
>>>>> - The upload API would need additional information needed to create an
>>>>> artifact.  Like relative path, size,
>>>>> checksums etc.
>>>>>
>>>>> - Since (I assume) you are proposing uploading/writing directly to
>>>>> artifact storage (not staging in a working
>>>>> dir), the flow would need to involve (optional) validation.  If
>>>>> validation fails, the artifact must not be
>>>>> inserted into the DB.
>>>>
>>>>
>>>> Perhaps a decent middle ground would be to stick with the plan of
>>>> keeping uploaded (or partially uploaded) files as a separate model until
>>>> they are ready to be turned into a Content instance plus artifacts, and
>>>> save their file data directly to somewhere within /var/lib/pulp/. It would
>>>> be some path distinct from where Artifacts are stored. That's what I had
>>>> imagined we would do anyway. Then as Dennis pointed out, turning that into
>>>> an Artifact would only require a move operation on the same filesystem,
>>>> which is super-cheap.
>>>>
>>>>
>>> Would that address all the concerns? We'd write the data just once, and
>>>> then move it once on the same filesystem. I haven't looked at django's
>>>> support for this recently, but it seems like it should be doable.
>>>>
>>>> I was just looking at the dropbox API and noticed that they provide two
>>> separate API endpoints for regular file uploads[0] (< 150mb) and large file
>>> uploads[1]. It is the latter that supports chunking and requires using an
>>> upload id. For the most common case they support uploading a file with one
>>> API call. Our original proposal requires 2 for the same use case. Pulp API
>>> users would appreciate having to only make one API call to upload a file.
>>>
>>> [0] https://www.dropbox.com/developers-v1/core/docs#files_put
>>> [1] https://www.dropbox.com/developers-v1/core/docs#chunked-upload
>>>
>>>
>>>
>>>> --
>>>>
>>>> Michael Hrivnak
>>>>
>>>> Principal Software Engineer, RHCE
>>>>
>>>> Red Hat
>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>>
>>
>


-- 

Michael Hrivnak

Principal Software Engineer, RHCE

Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170627/84b50bc6/attachment.htm>


More information about the Pulp-dev mailing list