[Pulp-dev] [pulp 3] proposed change to publishing REST api

Wed Oct 25 15:50:30 UTC 2017

Perhaps the ambiguity problem isn't a big deal. Let's continue to explore
adding "generic" links to the task status. I wonder what a link or a HAL
would look like w/ the swagger coreAPI. What does that look like? A proof
of concept of this would be great. What is the user experience like for
coreAPI?

I agree w/ the issue of users not knowing if a publication has failed or
not. I agree that can be resolved by having failed publications and their
associated publishedArtifacts, etc deleted. Is this the behavior we want?

Note though the problem from the earlier email. During publishing, if a
user queries for a publication how can the user know the publication is
done? They would have to look at the associated task status to be sure. One
possible solution is to have a 'user_visible' or 'done' field that the
viewset filters out publications where done=False. The last thing the
publisher code in core could do would be to set this field in the db. In
combination w/ the auto-delete behavior from above I think this would be a
good user experience.

If we had a coreAPI proof of concept w/ this idea I would be more willing
to +1 it. Can we have a prototype?

On Wed, Oct 25, 2017 at 11:24 AM, David Davis <daviddavis at redhat.com> wrote:

> I don’t know that the ambiguity around whether a task has a publication or
> not is a big deal. If I call the publication endpoint, I’d expect a
> publication task which either has 1 publication or 0 (if the publication
> failed) attached to it.
>
> In terms of ambiguity, I see a worse problem around adding a task_id field
> to publications. As a user, I don’t know if a publication failed or not
> when I get back a publication object. Instead, I have to look up the task
> to see if it is a real (or successful) publication. Moreover, since we
> allow users to remove/clean up tasks, that task may not even exist anymore.
>
>
> David
>
> On Wed, Oct 25, 2017 at 11:03 AM, Brian Bouterse <bbouters at redhat.com>
> wrote:
>
>>
>>
>> On Tue, Oct 24, 2017 at 10:00 PM, Michael Hrivnak <mhrivnak at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Oct 24, 2017 at 2:11 PM, Brian Bouterse <bbouters at redhat.com>
>>> wrote:
>>>
>>>> Thanks everyone for all the discussion! I'll try to recap the problem
>>>> and some of the solutions I've heard. I'll also share some of my
>>>> perspective on them too.
>>>>
>>>> What problem are we solving?
>>>> When a user calls "publish" (the action API endpoint) they get a 202 w/
>>>> a link to the task. That task will produce a publication. How can the user
>>>> find the publication that was produced by the task? How can the user be
>>>> sure the publication is fully complete?
>>>>
>>>>
>>>> What are our options?
>>>> 1) Start linking to created objects from task status. I believe its
>>>> been clearly stated about why we can't do this. If it's not clear, or if
>>>> there are other things we should consider, let's talk about it.
>>>> Acknowledging or establishing agreement on this is crucial because a change
>>>> like this would bring back a lot of the user pain from pulp2. I believe the
>>>> HAL suggestion falls into this area.
>>>>
>>>
>>> I may have missed something, but I do not think this is clear. I know
>>> that Pulp 2's API included a lot of unstructured data, but that is not at
>>> all what I'm suggesting here.
>>>
>>> It is standard and recommended practice for REST API responses to
>>> include links to resources along with information about what type of
>>> resource each link references. We could include a reference to the created
>>> resource and an identifier for what type of resource it is, and that would
>>> be well within the bounds of good REST API design. HAL is just one of
>>> several ways to accomplish that, and I'm not pitching any particular
>>> solution there. In any case, I'm not sure what the problem would be with
>>> this approach.
>>>
>>
>> I agree it is a standard practice for a resource to include links to
>> other resources, but the proposal is to include "generic" links is
>> different and creates a different user experience. I believe referencing
>> the task from the publication will be easier for users and clients. When a
>> user looks up a publication, they will always know they'll get between 0
>> and 1 links to a task. You can use that to check the state of the
>> publication. If we link to "generic" resources (like a publication) from a
>> task, then if I ask a user "do you expect task
>> ede3af3e-d5cf-4e18-8c57-69ac4d4e4de6 to contain a link to a publication
>> or not?" you can't know until you query it. I think that ambiguity was a
>> pain point in Pulp2. I don't totally reject this solution, but this is an
>> undesirable property (I think).
>>
>>
>>>
>>>>
>>>> 2) Have the user find the publication via query that sorts on time and
>>>> filters only for a specific publisher. This could be fragile because with a
>>>> multi-user system and no hard references between publications and tasks,
>>>> answering the question "which is the publication for me" is hard because
>>>> another user could have submitted a publish too. While not totally perfect,
>>>> this could work.
>>>>
>>>
>>> In theory if a user queried for a publication from a specific publisher
>>> that was created between the start and end times of the task, that should
>>> unambiguously identify the correct publication. But depending on timestamps
>>> is not a particularly robust nor confidence-inspiring way to reference a
>>> resource.
>>>
>> Agreed and Agreed
>>
>>
>>>
>>>>
>>>> 3) Have the user create a publication directly like any other REST
>>>> resource, and help the user understand the state of that resource over
>>>> time. I believe the proposal at the start of this thread is recommending
>>>> this solution. I'm also +1 on this solution.
>>>>
>>>
>>> I think the problem with this is that a user cannot create a
>>> publication. A user can only ask a plugin to create a publication. Until
>>> the plugin creates the publication, there is no publication.
>>>
>>
>> Note a publication is an object, but really we mean a publication and
>> it's related PublishedArtifact, PublishedMetadat, etc objects. It would be
>> straightforward for a user to create a publication using the viewset and
>> have the task associated with it call the publisher to build out the
>> associated PublishedArtifact, PublishedContent, PublishedMetadata, etc. We
>> should explore if this is good or not, but it is possible.
>>
>> As an aside, this is related to a problem everyone should be aware of:
>> the existence of a publication does not guarantee that publication is
>> finished publishing. Even with option 1, where the task creates the
>> publisher and links to it in the task status, while the publisher is
>> running it must save the Publication so that the PublishedArtifact, etc can
>> link to it. So for any given publication, in order to know if it's "fully
>> finished and consistent" you must be able to check the status of the
>> associated task that produced it.
>>
>>
>>>
>>>> As an aside, I don't think considering versioned repos as a possible
>>>> solution is helping us with this problem. The scope of the current problem
>>>> is relatively small and the scope of planning for versioned repos is large.
>>>>
>>>>
>>> Versioned repos is a potential solution. In that scenario, a user would
>>> request publication of a specific repo version (perhaps defaulting to the
>>> latest), the publication would be linked to that version, and that is an
>>> easy mechanism for the user to find the publication they want. Ultimately
>>> the user is interested in working with a specific content set anyway. They
>>> get a repo to a state where it has the content they want, and then they
>>> publish that content set. No matter what we do with publications, users
>>> will think of them in terms of related content sets. A repo version is that
>>> immutable content set they can work with confidently.
>>>
>>
>> It's neat to me that that versions are snapshots of content and
>> publications are snapshots of content. Publications already create much of
>> the value propostion of versioned repos with publications. They allow you
>> to work with specific content sets like you describe. Also they allow for
>> rollback. So that is all great for our users. For this thread, I want to
>> bring the conversation back to where it started, solving a small problem
>> about linking two resources that already exist.
>>
>>
>>> It helps the rollback scenario a lot as well. Versioning repos allows a
>>> user to see what the differences are between two content sets, and thus two
>>> different publications, which informs them about when and how far back they
>>> should roll back a distribution.
>>>
>>
>>> - user discovers a horrible flaw in a piece of content
>>> - user queries for which version of the repo introduced that piece of
>>> content
>>> - user updates the distribution to serve the publication that came
>>> before the one which introduced the piece of content, optionally
>>> re-publishing that version in case its publication was deleted or had never
>>> been made in the first place.
>>>
>>> --
>>>
>>> Michael Hrivnak
>>>
>>> Principal Software Engineer, RHCE
>>>
>>> Red Hat
>>>
>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171025/ba623573/attachment.htm>