[Pulp-dev] [pulp 3] proposed change to publishing REST api

Wed Oct 25 15:03:59 UTC 2017

On Tue, Oct 24, 2017 at 10:00 PM, Michael Hrivnak <mhrivnak at redhat.com>
wrote:

>
>
> On Tue, Oct 24, 2017 at 2:11 PM, Brian Bouterse <bbouters at redhat.com>
> wrote:
>
>> Thanks everyone for all the discussion! I'll try to recap the problem and
>> some of the solutions I've heard. I'll also share some of my perspective on
>> them too.
>>
>> What problem are we solving?
>> When a user calls "publish" (the action API endpoint) they get a 202 w/ a
>> link to the task. That task will produce a publication. How can the user
>> find the publication that was produced by the task? How can the user be
>> sure the publication is fully complete?
>>
>>
>> What are our options?
>> 1) Start linking to created objects from task status. I believe its been
>> clearly stated about why we can't do this. If it's not clear, or if there
>> are other things we should consider, let's talk about it. Acknowledging or
>> establishing agreement on this is crucial because a change like this would
>> bring back a lot of the user pain from pulp2. I believe the HAL suggestion
>> falls into this area.
>>
>
> I may have missed something, but I do not think this is clear. I know that
> Pulp 2's API included a lot of unstructured data, but that is not at all
> what I'm suggesting here.
>
> It is standard and recommended practice for REST API responses to include
> links to resources along with information about what type of resource each
> link references. We could include a reference to the created resource and
> an identifier for what type of resource it is, and that would be well
> within the bounds of good REST API design. HAL is just one of several ways
> to accomplish that, and I'm not pitching any particular solution there. In
> any case, I'm not sure what the problem would be with this approach.
>

I agree it is a standard practice for a resource to include links to other
resources, but the proposal is to include "generic" links is different and
creates a different user experience. I believe referencing the task from
the publication will be easier for users and clients. When a user looks up
a publication, they will always know they'll get between 0 and 1 links to a
task. You can use that to check the state of the publication. If we link to
"generic" resources (like a publication) from a task, then if I ask a user
"do you expect task ede3af3e-d5cf-4e18-8c57-69ac4d4e4de6 to contain a link
to a publication or not?" you can't know until you query it. I think that
ambiguity was a pain point in Pulp2. I don't totally reject this solution,
but this is an undesirable property (I think).

>
>>
>> 2) Have the user find the publication via query that sorts on time and
>> filters only for a specific publisher. This could be fragile because with a
>> multi-user system and no hard references between publications and tasks,
>> answering the question "which is the publication for me" is hard because
>> another user could have submitted a publish too. While not totally perfect,
>> this could work.
>>
>
> In theory if a user queried for a publication from a specific publisher
> that was created between the start and end times of the task, that should
> unambiguously identify the correct publication. But depending on timestamps
> is not a particularly robust nor confidence-inspiring way to reference a
> resource.
>
Agreed and Agreed

>
>>
>> 3) Have the user create a publication directly like any other REST
>> resource, and help the user understand the state of that resource over
>> time. I believe the proposal at the start of this thread is recommending
>> this solution. I'm also +1 on this solution.
>>
>
> I think the problem with this is that a user cannot create a publication.
> A user can only ask a plugin to create a publication. Until the plugin
> creates the publication, there is no publication.
>

Note a publication is an object, but really we mean a publication and it's
related PublishedArtifact, PublishedMetadat, etc objects. It would be
straightforward for a user to create a publication using the viewset and
have the task associated with it call the publisher to build out the
associated PublishedArtifact, PublishedContent, PublishedMetadata, etc. We
should explore if this is good or not, but it is possible.

As an aside, this is related to a problem everyone should be aware of: the
existence of a publication does not guarantee that publication is finished
publishing. Even with option 1, where the task creates the publisher and
links to it in the task status, while the publisher is running it must save
the Publication so that the PublishedArtifact, etc can link to it. So for
any given publication, in order to know if it's "fully finished and
consistent" you must be able to check the status of the associated task
that produced it.

>
>> As an aside, I don't think considering versioned repos as a possible
>> solution is helping us with this problem. The scope of the current problem
>> is relatively small and the scope of planning for versioned repos is large.
>>
>>
> Versioned repos is a potential solution. In that scenario, a user would
> request publication of a specific repo version (perhaps defaulting to the
> latest), the publication would be linked to that version, and that is an
> easy mechanism for the user to find the publication they want. Ultimately
> the user is interested in working with a specific content set anyway. They
> get a repo to a state where it has the content they want, and then they
> publish that content set. No matter what we do with publications, users
> will think of them in terms of related content sets. A repo version is that
> immutable content set they can work with confidently.
>

It's neat to me that that versions are snapshots of content and
publications are snapshots of content. Publications already create much of
the value propostion of versioned repos with publications. They allow you
to work with specific content sets like you describe. Also they allow for
rollback. So that is all great for our users. For this thread, I want to
bring the conversation back to where it started, solving a small problem
about linking two resources that already exist.

> It helps the rollback scenario a lot as well. Versioning repos allows a
> user to see what the differences are between two content sets, and thus two
> different publications, which informs them about when and how far back they
> should roll back a distribution.
>

> - user discovers a horrible flaw in a piece of content
> - user queries for which version of the repo introduced that piece of
> content
> - user updates the distribution to serve the publication that came before
> the one which introduced the piece of content, optionally re-publishing
> that version in case its publication was deleted or had never been made in
> the first place.
>
> --
>
> Michael Hrivnak
>
> Principal Software Engineer, RHCE
>
> Red Hat
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171025/566a6d84/attachment.htm>