[Pulp-dev] [pulp 3] proposed change to publishing REST api

David Davis daviddavis at redhat.com
Tue Oct 31 17:39:35 UTC 2017


Personally I am not opposed to the url endpoint you suggest.

It also seems like there is some consensus around adding a ‘created
resources’ relationship to Task or at least prototyping that out to see
what it would look like.

If no one disagrees, should I update issue #3033 with those two items?


David

On Wed, Oct 25, 2017 at 1:23 PM, Dennis Kliban <dkliban at redhat.com> wrote:

> On Wed, Oct 25, 2017 at 11:24 AM, David Davis <daviddavis at redhat.com>
> wrote:
>
>> I don’t know that the ambiguity around whether a task has a publication
>> or not is a big deal. If I call the publication endpoint, I’d expect a
>> publication task which either has 1 publication or 0 (if the publication
>> failed) attached to it.
>>
>> In terms of ambiguity, I see a worse problem around adding a task_id
>> field to publications. As a user, I don’t know if a publication failed or
>> not when I get back a publication object. Instead, I have to look up the
>> task to see if it is a real (or successful) publication. Moreover, since we
>> allow users to remove/clean up tasks, that task may not even exist anymore.
>>
>>
> I agree that the ephemeral nature of tasks makes the originally proposed
> solution non-deterministic. I am open to associating 'resources created'
> with a task instead.
>
> However, I still think there is value in changing the rest API endpoint
> for starting a publish task to POST /api/v3/repositories/<repo-id>
> /publishers/<type>/<name>/publications/. However, I will start a separate
> thread for that discussion.
>
>  - Dennis
>
>
>>
>> David
>>
>> On Wed, Oct 25, 2017 at 11:03 AM, Brian Bouterse <bbouters at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Oct 24, 2017 at 10:00 PM, Michael Hrivnak <mhrivnak at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Oct 24, 2017 at 2:11 PM, Brian Bouterse <bbouters at redhat.com>
>>>> wrote:
>>>>
>>>>> Thanks everyone for all the discussion! I'll try to recap the problem
>>>>> and some of the solutions I've heard. I'll also share some of my
>>>>> perspective on them too.
>>>>>
>>>>> What problem are we solving?
>>>>> When a user calls "publish" (the action API endpoint) they get a 202
>>>>> w/ a link to the task. That task will produce a publication. How can the
>>>>> user find the publication that was produced by the task? How can the user
>>>>> be sure the publication is fully complete?
>>>>>
>>>>>
>>>>> What are our options?
>>>>> 1) Start linking to created objects from task status. I believe its
>>>>> been clearly stated about why we can't do this. If it's not clear, or if
>>>>> there are other things we should consider, let's talk about it.
>>>>> Acknowledging or establishing agreement on this is crucial because a change
>>>>> like this would bring back a lot of the user pain from pulp2. I believe the
>>>>> HAL suggestion falls into this area.
>>>>>
>>>>
>>>> I may have missed something, but I do not think this is clear. I know
>>>> that Pulp 2's API included a lot of unstructured data, but that is not at
>>>> all what I'm suggesting here.
>>>>
>>>> It is standard and recommended practice for REST API responses to
>>>> include links to resources along with information about what type of
>>>> resource each link references. We could include a reference to the created
>>>> resource and an identifier for what type of resource it is, and that would
>>>> be well within the bounds of good REST API design. HAL is just one of
>>>> several ways to accomplish that, and I'm not pitching any particular
>>>> solution there. In any case, I'm not sure what the problem would be with
>>>> this approach.
>>>>
>>>
>>> I agree it is a standard practice for a resource to include links to
>>> other resources, but the proposal is to include "generic" links is
>>> different and creates a different user experience. I believe referencing
>>> the task from the publication will be easier for users and clients. When a
>>> user looks up a publication, they will always know they'll get between 0
>>> and 1 links to a task. You can use that to check the state of the
>>> publication. If we link to "generic" resources (like a publication) from a
>>> task, then if I ask a user "do you expect task
>>> ede3af3e-d5cf-4e18-8c57-69ac4d4e4de6 to contain a link to a publication
>>> or not?" you can't know until you query it. I think that ambiguity was a
>>> pain point in Pulp2. I don't totally reject this solution, but this is an
>>> undesirable property (I think).
>>>
>>>
>>>>
>>>>>
>>>>> 2) Have the user find the publication via query that sorts on time and
>>>>> filters only for a specific publisher. This could be fragile because with a
>>>>> multi-user system and no hard references between publications and tasks,
>>>>> answering the question "which is the publication for me" is hard because
>>>>> another user could have submitted a publish too. While not totally perfect,
>>>>> this could work.
>>>>>
>>>>
>>>> In theory if a user queried for a publication from a specific publisher
>>>> that was created between the start and end times of the task, that should
>>>> unambiguously identify the correct publication. But depending on timestamps
>>>> is not a particularly robust nor confidence-inspiring way to reference a
>>>> resource.
>>>>
>>> Agreed and Agreed
>>>
>>>
>>>>
>>>>>
>>>>> 3) Have the user create a publication directly like any other REST
>>>>> resource, and help the user understand the state of that resource over
>>>>> time. I believe the proposal at the start of this thread is recommending
>>>>> this solution. I'm also +1 on this solution.
>>>>>
>>>>
>>>> I think the problem with this is that a user cannot create a
>>>> publication. A user can only ask a plugin to create a publication. Until
>>>> the plugin creates the publication, there is no publication.
>>>>
>>>
>>> Note a publication is an object, but really we mean a publication and
>>> it's related PublishedArtifact, PublishedMetadat, etc objects. It would be
>>> straightforward for a user to create a publication using the viewset and
>>> have the task associated with it call the publisher to build out the
>>> associated PublishedArtifact, PublishedContent, PublishedMetadata, etc. We
>>> should explore if this is good or not, but it is possible.
>>>
>>> As an aside, this is related to a problem everyone should be aware of:
>>> the existence of a publication does not guarantee that publication is
>>> finished publishing. Even with option 1, where the task creates the
>>> publisher and links to it in the task status, while the publisher is
>>> running it must save the Publication so that the PublishedArtifact, etc can
>>> link to it. So for any given publication, in order to know if it's "fully
>>> finished and consistent" you must be able to check the status of the
>>> associated task that produced it.
>>>
>>>
>>>>
>>>>> As an aside, I don't think considering versioned repos as a possible
>>>>> solution is helping us with this problem. The scope of the current problem
>>>>> is relatively small and the scope of planning for versioned repos is large.
>>>>>
>>>>>
>>>> Versioned repos is a potential solution. In that scenario, a user would
>>>> request publication of a specific repo version (perhaps defaulting to the
>>>> latest), the publication would be linked to that version, and that is an
>>>> easy mechanism for the user to find the publication they want. Ultimately
>>>> the user is interested in working with a specific content set anyway. They
>>>> get a repo to a state where it has the content they want, and then they
>>>> publish that content set. No matter what we do with publications, users
>>>> will think of them in terms of related content sets. A repo version is that
>>>> immutable content set they can work with confidently.
>>>>
>>>
>>> It's neat to me that that versions are snapshots of content and
>>> publications are snapshots of content. Publications already create much of
>>> the value propostion of versioned repos with publications. They allow you
>>> to work with specific content sets like you describe. Also they allow for
>>> rollback. So that is all great for our users. For this thread, I want to
>>> bring the conversation back to where it started, solving a small problem
>>> about linking two resources that already exist.
>>>
>>>
>>>> It helps the rollback scenario a lot as well. Versioning repos allows a
>>>> user to see what the differences are between two content sets, and thus two
>>>> different publications, which informs them about when and how far back they
>>>> should roll back a distribution.
>>>>
>>>
>>>> - user discovers a horrible flaw in a piece of content
>>>> - user queries for which version of the repo introduced that piece of
>>>> content
>>>> - user updates the distribution to serve the publication that came
>>>> before the one which introduced the piece of content, optionally
>>>> re-publishing that version in case its publication was deleted or had never
>>>> been made in the first place.
>>>>
>>>> --
>>>>
>>>> Michael Hrivnak
>>>>
>>>> Principal Software Engineer, RHCE
>>>>
>>>> Red Hat
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171031/a49af9ad/attachment.htm>


More information about the Pulp-dev mailing list