[Pulp-dev] [pulp 3] proposed change to publishing REST api

David Davis daviddavis at redhat.com
Wed Oct 25 15:24:43 UTC 2017

I don’t know that the ambiguity around whether a task has a publication or
not is a big deal. If I call the publication endpoint, I’d expect a
publication task which either has 1 publication or 0 (if the publication
failed) attached to it.

In terms of ambiguity, I see a worse problem around adding a task_id field
to publications. As a user, I don’t know if a publication failed or not
when I get back a publication object. Instead, I have to look up the task
to see if it is a real (or successful) publication. Moreover, since we
allow users to remove/clean up tasks, that task may not even exist anymore.


On Wed, Oct 25, 2017 at 11:03 AM, Brian Bouterse <bbouters at redhat.com>

> On Tue, Oct 24, 2017 at 10:00 PM, Michael Hrivnak <mhrivnak at redhat.com>
> wrote:
>> On Tue, Oct 24, 2017 at 2:11 PM, Brian Bouterse <bbouters at redhat.com>
>> wrote:
>>> Thanks everyone for all the discussion! I'll try to recap the problem
>>> and some of the solutions I've heard. I'll also share some of my
>>> perspective on them too.
>>> What problem are we solving?
>>> When a user calls "publish" (the action API endpoint) they get a 202 w/
>>> a link to the task. That task will produce a publication. How can the user
>>> find the publication that was produced by the task? How can the user be
>>> sure the publication is fully complete?
>>> What are our options?
>>> 1) Start linking to created objects from task status. I believe its been
>>> clearly stated about why we can't do this. If it's not clear, or if there
>>> are other things we should consider, let's talk about it. Acknowledging or
>>> establishing agreement on this is crucial because a change like this would
>>> bring back a lot of the user pain from pulp2. I believe the HAL suggestion
>>> falls into this area.
>> I may have missed something, but I do not think this is clear. I know
>> that Pulp 2's API included a lot of unstructured data, but that is not at
>> all what I'm suggesting here.
>> It is standard and recommended practice for REST API responses to include
>> links to resources along with information about what type of resource each
>> link references. We could include a reference to the created resource and
>> an identifier for what type of resource it is, and that would be well
>> within the bounds of good REST API design. HAL is just one of several ways
>> to accomplish that, and I'm not pitching any particular solution there. In
>> any case, I'm not sure what the problem would be with this approach.
> I agree it is a standard practice for a resource to include links to other
> resources, but the proposal is to include "generic" links is different and
> creates a different user experience. I believe referencing the task from
> the publication will be easier for users and clients. When a user looks up
> a publication, they will always know they'll get between 0 and 1 links to a
> task. You can use that to check the state of the publication. If we link to
> "generic" resources (like a publication) from a task, then if I ask a user
> "do you expect task ede3af3e-d5cf-4e18-8c57-69ac4d4e4de6 to contain a
> link to a publication or not?" you can't know until you query it. I think
> that ambiguity was a pain point in Pulp2. I don't totally reject this
> solution, but this is an undesirable property (I think).
>>> 2) Have the user find the publication via query that sorts on time and
>>> filters only for a specific publisher. This could be fragile because with a
>>> multi-user system and no hard references between publications and tasks,
>>> answering the question "which is the publication for me" is hard because
>>> another user could have submitted a publish too. While not totally perfect,
>>> this could work.
>> In theory if a user queried for a publication from a specific publisher
>> that was created between the start and end times of the task, that should
>> unambiguously identify the correct publication. But depending on timestamps
>> is not a particularly robust nor confidence-inspiring way to reference a
>> resource.
> Agreed and Agreed
>>> 3) Have the user create a publication directly like any other REST
>>> resource, and help the user understand the state of that resource over
>>> time. I believe the proposal at the start of this thread is recommending
>>> this solution. I'm also +1 on this solution.
>> I think the problem with this is that a user cannot create a publication.
>> A user can only ask a plugin to create a publication. Until the plugin
>> creates the publication, there is no publication.
> Note a publication is an object, but really we mean a publication and it's
> related PublishedArtifact, PublishedMetadat, etc objects. It would be
> straightforward for a user to create a publication using the viewset and
> have the task associated with it call the publisher to build out the
> associated PublishedArtifact, PublishedContent, PublishedMetadata, etc. We
> should explore if this is good or not, but it is possible.
> As an aside, this is related to a problem everyone should be aware of: the
> existence of a publication does not guarantee that publication is finished
> publishing. Even with option 1, where the task creates the publisher and
> links to it in the task status, while the publisher is running it must save
> the Publication so that the PublishedArtifact, etc can link to it. So for
> any given publication, in order to know if it's "fully finished and
> consistent" you must be able to check the status of the associated task
> that produced it.
>>> As an aside, I don't think considering versioned repos as a possible
>>> solution is helping us with this problem. The scope of the current problem
>>> is relatively small and the scope of planning for versioned repos is large.
>> Versioned repos is a potential solution. In that scenario, a user would
>> request publication of a specific repo version (perhaps defaulting to the
>> latest), the publication would be linked to that version, and that is an
>> easy mechanism for the user to find the publication they want. Ultimately
>> the user is interested in working with a specific content set anyway. They
>> get a repo to a state where it has the content they want, and then they
>> publish that content set. No matter what we do with publications, users
>> will think of them in terms of related content sets. A repo version is that
>> immutable content set they can work with confidently.
> It's neat to me that that versions are snapshots of content and
> publications are snapshots of content. Publications already create much of
> the value propostion of versioned repos with publications. They allow you
> to work with specific content sets like you describe. Also they allow for
> rollback. So that is all great for our users. For this thread, I want to
> bring the conversation back to where it started, solving a small problem
> about linking two resources that already exist.
>> It helps the rollback scenario a lot as well. Versioning repos allows a
>> user to see what the differences are between two content sets, and thus two
>> different publications, which informs them about when and how far back they
>> should roll back a distribution.
>> - user discovers a horrible flaw in a piece of content
>> - user queries for which version of the repo introduced that piece of
>> content
>> - user updates the distribution to serve the publication that came before
>> the one which introduced the piece of content, optionally re-publishing
>> that version in case its publication was deleted or had never been made in
>> the first place.
>> --
>> Michael Hrivnak
>> Principal Software Engineer, RHCE
>> Red Hat
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171025/5cb707a6/attachment.htm>

More information about the Pulp-dev mailing list