[Pulp-dev] Changes in the Pulp 3 Upload story

Brian Bouterse bbouters at redhat.com
Fri Feb 22 19:01:58 UTC 2019


I'm so glad we're talking it through on the list. Thank you for your
writing!

I wrote some inline, but overall I'm wondering:

how much do users value reducing the 3 calls into 1?
how much do users value the server inspecting the asset to figure out the
content's metadata?
how much do users want to upload many artifacts at once? (Can we do this?)
how much do users value association at the same time they are uploading an
artifact?
how much do users value association at the same time they are uploading
many artifacts?

Thank you,
Brian

On Fri, Feb 22, 2019 at 1:05 PM Justin Sherrill <jsherril at redhat.com> wrote:

>
> On 2/22/19 12:07 PM, Brian Bouterse wrote:
>
>
>
> On Fri, Feb 22, 2019 at 9:36 AM Justin Sherrill <jsherril at redhat.com>
> wrote:
>
>>
>> On 2/18/19 2:41 PM, Austin Macdonald wrote:
>>
>> Originally, our upload story was as follows:
>> The user will upload a new file to Pulp via POST to /artifacts/ (provided
>> by core)
>> The user will create a new plugin specific Content via POST to
>> /path/to/plugin/content/, referencing whatever artifacts that are
>> contained, and whatever fields are expected for the new content.
>> The user will add the new content to a repository via POST to
>> /repositories/1/versions/
>>
>> However, this is somewhat cumbersome to the user with 3 API calls to
>> accomplish something that only took one call in Pulp 2.
>>
>> How would you do this with one call in pulp2?
>> https://docs.pulpproject.org/dev-guide/integration/rest-api/content/upload.html
>> seems to suggest 3-4 calls.
>>
> Some plugins implemented the pulp2 equivalent of a one-shot uploader.
> Those docs are for pulp2's core which don't include the plugin's docs.
>
>>
>> There are a couple of different paths plugins have taken to improve the
>> user experience:
>> The Python plugin follows the above workflow, but reads the Artifact file
>> to determine the values for the fields. The RPM plugin has gone even
>> farther and created a new endpoint for "one shot" upload that perform all
>> of this in a single call. I think it is likely that the Python plugin will
>> move more in the "one shot" direction, and other plugins will probably
>> follow.
>>
>> How does the RPM one shot api work?  Will it be compatible with whatever
>> solution https://pulp.plan.io/issues/4196 arrives at?
>>
> You would upload the Artifact as binary data along with what content type
> it is and what relative path it uses and Pulp creates the Artifact, Content
> unit, ContentArtifact. It should be compatible with issue 4196 because
> django's binary form data should allow for parallel uploading before
> calling the view handler. It may take 2 calls though. The issue to me isn't
> about the number of calls as it is the client data payload complexity.
>
> If i'm having to chunk up data, i already have quite a bit of client data
> payload complexity.  In pulp 2 this was most of the complexity!
>
> I would hate for all our plugins to move to One shot methods which users
>> can't even rely on.
>>
> I don't think we're taking the "generic" uploading away. You can always
> rely on that. The issue w/ one-shot is that it's not possible (literally)
> for many content types, e.g. Artifact-less content. It's also hard for
> multi-artifact Content so that would probably still be something plugin
> writers would provide as a custom thing for their content type. Regardless
> it's just not possible to have consistency in this area.
>
> Why is it not possible to create a one-shot upload for artifact-less
> content?  (maybe we're defining what a one-shot upload actually is
> differently, i'm reading it as something that combines multiple steps into
> one)
>
I'm expecting the default viewset for Content to accept kwargs and try to
save them onto the Content unit it's creating as attributes. Plugin writers
can always override that though so it could be inconsistent.

> Why is consistency not possible? I guess i don't see a huge variation of
> upload scenarios beyond:
>
The main reason is because the plugin writer is providing the viewset for
the creation of the Content unit. Even if we make something in core that
provides a "generic" functionality it will need plugin code provided
somehow to parse the Artifact. For example the artifact specific parsing
code is here:
https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/viewsets.py#L92

> 1.  upload Zero to many files as artifacts
>
You get this generically. Post to the core-provided Artifact URL.

> 2.  Provide some metadata about the zero or more artifacts or let the
> plugin parse it out themselves (or maybe even a combination of the two)
>
Either the client provides the metadata 100% or the plugin has to be
involved somehow. Not all plugins will do this, and we can't require them
to. This is the inconsistent part even if we introduce a core-based
one-shot uploader.

> 3.  Import that unit into a repository.
>
The issue is that creating a new repo-version with each version uploaded is
problematic.I think users want to associate several units, so association
is probably useful if we could have it accept many Artifacts at once. Is
that what people imagined out of this feature?

> I can see it being difficult as a user to go through all of those steps
> (even if 2 & 3 were combined into one), and the desire is to simplify the
> process, but uploading arbitrary files is not simple.   Why do i need to
> give up the plugin's ability to parse the unit's details because i'm using
> the consistent api?
>
> Keep in mind all my questions are coming from a very ignorant perspective
> with respect of pulp3 internals, and more from a user perspective.
>
> My problem with single api calls to upload files is that we cannot
>> reliably use them due to limitation in request sizes.  We have to be
>> prepared to use multiple calls to upload files regardless.  Maybe if a user
>> is using some plugin that never has super large files (ansible?) you could
>> be confident you would never hit a request size limitation.   But file,
>> docker, and yum all would require multiple calls to get the physical data
>> to the server.
>>
> I believe arbitrarily large files can be uploaded either through
> multi-part form data or through the django-chunked interface. We'll see
> what happens with 4196, but I expect arbitrary payload size to be a
> requirement for Pulp users.
>
>> I care more about having a consistent method for uploading files than
>> having fewer api calls.   If we need a some content specific api, that's
>> fine, but please make it a consistent part of the process.
>>
> It sounds like the 4-call interface is the only choice then if consistency
> is a must. There isn't a way to offer consistency for one-shot uploaders.
> Is it ok that Katello will have to fill out all of the field data when you
> post the content type? What could be better?
>
> I'll reserve my comments here based on the discussion above.
>
> Thanks!
>
> Justin
>
>
> I feel like we may be chasing the wrong goal here (fewer calls vs a more
>> consistent experience).
>>
>>
>> That said, I think we should discuss this as a community to encourage
>> plugins to behave similarly, and because there may also be a possibility
>> for sharing some of code. It is my hope that a "one shot upload" could do 2
>> things: 1) Upload and create Content. 2) Optionally add that content to
>> repositories.
>>
>> _______________________________________________
>> Pulp-dev mailing listPulp-dev at redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190222/7c0c1741/attachment.htm>


More information about the Pulp-dev mailing list