[Pulp-dev] [pulp 3] proposed change to publishing REST api

Wed Oct 25 17:23:41 UTC 2017

On Wed, Oct 25, 2017 at 11:24 AM, David Davis <daviddavis at redhat.com> wrote:

> I don’t know that the ambiguity around whether a task has a publication or
> not is a big deal. If I call the publication endpoint, I’d expect a
> publication task which either has 1 publication or 0 (if the publication
> failed) attached to it.
>
> In terms of ambiguity, I see a worse problem around adding a task_id field
> to publications. As a user, I don’t know if a publication failed or not
> when I get back a publication object. Instead, I have to look up the task
> to see if it is a real (or successful) publication. Moreover, since we
> allow users to remove/clean up tasks, that task may not even exist anymore.
>
>
I agree that the ephemeral nature of tasks makes the originally proposed
solution non-deterministic. I am open to associating 'resources created'
with a task instead.

However, I still think there is value in changing the rest API endpoint for
starting a publish task to POST
/api/v3/repositories/<repo-id>/publishers/<type>/<name>/publications/.
However, I will start a separate thread for that discussion.

 - Dennis

>
> David
>
> On Wed, Oct 25, 2017 at 11:03 AM, Brian Bouterse <bbouters at redhat.com>
> wrote:
>
>>
>>
>> On Tue, Oct 24, 2017 at 10:00 PM, Michael Hrivnak <mhrivnak at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Oct 24, 2017 at 2:11 PM, Brian Bouterse <bbouters at redhat.com>
>>> wrote:
>>>
>>>> Thanks everyone for all the discussion! I'll try to recap the problem
>>>> and some of the solutions I've heard. I'll also share some of my
>>>> perspective on them too.
>>>>
>>>> What problem are we solving?
>>>> When a user calls "publish" (the action API endpoint) they get a 202 w/
>>>> a link to the task. That task will produce a publication. How can the user
>>>> find the publication that was produced by the task? How can the user be
>>>> sure the publication is fully complete?
>>>>
>>>>
>>>> What are our options?
>>>> 1) Start linking to created objects from task status. I believe its
>>>> been clearly stated about why we can't do this. If it's not clear, or if
>>>> there are other things we should consider, let's talk about it.
>>>> Acknowledging or establishing agreement on this is crucial because a change
>>>> like this would bring back a lot of the user pain from pulp2. I believe the
>>>> HAL suggestion falls into this area.
>>>>
>>>
>>> I may have missed something, but I do not think this is clear. I know
>>> that Pulp 2's API included a lot of unstructured data, but that is not at
>>> all what I'm suggesting here.
>>>
>>> It is standard and recommended practice for REST API responses to
>>> include links to resources along with information about what type of
>>> resource each link references. We could include a reference to the created
>>> resource and an identifier for what type of resource it is, and that would
>>> be well within the bounds of good REST API design. HAL is just one of
>>> several ways to accomplish that, and I'm not pitching any particular
>>> solution there. In any case, I'm not sure what the problem would be with
>>> this approach.
>>>
>>
>> I agree it is a standard practice for a resource to include links to
>> other resources, but the proposal is to include "generic" links is
>> different and creates a different user experience. I believe referencing
>> the task from the publication will be easier for users and clients. When a
>> user looks up a publication, they will always know they'll get between 0
>> and 1 links to a task. You can use that to check the state of the
>> publication. If we link to "generic" resources (like a publication) from a
>> task, then if I ask a user "do you expect task
>> ede3af3e-d5cf-4e18-8c57-69ac4d4e4de6 to contain a link to a publication
>> or not?" you can't know until you query it. I think that ambiguity was a
>> pain point in Pulp2. I don't totally reject this solution, but this is an
>> undesirable property (I think).
>>
>>
>>>
>>>>
>>>> 2) Have the user find the publication via query that sorts on time and
>>>> filters only for a specific publisher. This could be fragile because with a
>>>> multi-user system and no hard references between publications and tasks,
>>>> answering the question "which is the publication for me" is hard because
>>>> another user could have submitted a publish too. While not totally perfect,
>>>> this could work.
>>>>
>>>
>>> In theory if a user queried for a publication from a specific publisher
>>> that was created between the start and end times of the task, that should
>>> unambiguously identify the correct publication. But depending on timestamps
>>> is not a particularly robust nor confidence-inspiring way to reference a
>>> resource.
>>>
>> Agreed and Agreed
>>
>>
>>>
>>>>
>>>> 3) Have the user create a publication directly like any other REST
>>>> resource, and help the user understand the state of that resource over
>>>> time. I believe the proposal at the start of this thread is recommending
>>>> this solution. I'm also +1 on this solution.
>>>>
>>>
>>> I think the problem with this is that a user cannot create a
>>> publication. A user can only ask a plugin to create a publication. Until
>>> the plugin creates the publication, there is no publication.
>>>
>>
>> Note a publication is an object, but really we mean a publication and
>> it's related PublishedArtifact, PublishedMetadat, etc objects. It would be
>> straightforward for a user to create a publication using the viewset and
>> have the task associated with it call the publisher to build out the
>> associated PublishedArtifact, PublishedContent, PublishedMetadata, etc. We
>> should explore if this is good or not, but it is possible.
>>
>> As an aside, this is related to a problem everyone should be aware of:
>> the existence of a publication does not guarantee that publication is
>> finished publishing. Even with option 1, where the task creates the
>> publisher and links to it in the task status, while the publisher is
>> running it must save the Publication so that the PublishedArtifact, etc can
>> link to it. So for any given publication, in order to know if it's "fully
>> finished and consistent" you must be able to check the status of the
>> associated task that produced it.
>>
>>
>>>
>>>> As an aside, I don't think considering versioned repos as a possible
>>>> solution is helping us with this problem. The scope of the current problem
>>>> is relatively small and the scope of planning for versioned repos is large.
>>>>
>>>>
>>> Versioned repos is a potential solution. In that scenario, a user would
>>> request publication of a specific repo version (perhaps defaulting to the
>>> latest), the publication would be linked to that version, and that is an
>>> easy mechanism for the user to find the publication they want. Ultimately
>>> the user is interested in working with a specific content set anyway. They
>>> get a repo to a state where it has the content they want, and then they
>>> publish that content set. No matter what we do with publications, users
>>> will think of them in terms of related content sets. A repo version is that
>>> immutable content set they can work with confidently.
>>>
>>
>> It's neat to me that that versions are snapshots of content and
>> publications are snapshots of content. Publications already create much of
>> the value propostion of versioned repos with publications. They allow you
>> to work with specific content sets like you describe. Also they allow for
>> rollback. So that is all great for our users. For this thread, I want to
>> bring the conversation back to where it started, solving a small problem
>> about linking two resources that already exist.
>>
>>
>>> It helps the rollback scenario a lot as well. Versioning repos allows a
>>> user to see what the differences are between two content sets, and thus two
>>> different publications, which informs them about when and how far back they
>>> should roll back a distribution.
>>>
>>
>>> - user discovers a horrible flaw in a piece of content
>>> - user queries for which version of the repo introduced that piece of
>>> content
>>> - user updates the distribution to serve the publication that came
>>> before the one which introduced the piece of content, optionally
>>> re-publishing that version in case its publication was deleted or had never
>>> been made in the first place.
>>>
>>> --
>>>
>>> Michael Hrivnak
>>>
>>> Principal Software Engineer, RHCE
>>>
>>> Red Hat
>>>
>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171025/018ff3ac/attachment.htm>