[Pulp-dev] Task responses in Pulp 3

Tue Nov 28 04:42:10 UTC 2017

On Mon, Nov 27, 2017 at 9:52 AM, David Davis <daviddavis at redhat.com> wrote:

> TL;DR
>
> In Pulp 3 do we want to return a single task or list of tasks for
> endpoints like sync, publish, etc?
>

Thanks for raising this. I noticed the same recently and wasn't crazy about
it either.

>
> # Background
>
> Currently all task responses are lists of tasks in Pulp 3. For example, if
> you call the sync endpoint, you get a list of one task back that tracks the
> progress of the sync. Allowing a list of tasks allows Pulp to return
> multiple user facing tasks with a single operation. Currently, all
> endpoints that produce tasks currently will produce exactly 1 task. So why
> add a list now if it only contains 1 item? With Pulp's REST API being
> semver governed from the 3.0 release, we couldn't ever return multiple
> tasks in Pulp < 4.0 because changing a single item to a list would be
> backwards incompatible in the REST API.
>

The only use case that comes to mind from Pulp 2 is applicability. Client
hits an endpoint requesting lots of recalculation, the request handler
batches up that work into several tasks, and the response could include
references to those tasks. But as you point out below, Pulp 2 uses task
groups to handle that use case.

>
> # Going Forward
>
> Returning a list of tasks creates a somewhat suboptimal user experience
> where users have to query multiple endpoints to know if their job is done.
> Maybe we should do this differently. Perhaps anytime an operation needs
> multiple tasks, we should have a single task tracking them because that is
> a better user experience.
>

You could have a single task that runs and then does whatever is necessary
to spawn child tasks. The problem is that if you have a backlog of tasks,
all the spawned tasks go to the back of the line, roughly doubling the
amount of time spent waiting for an available worker.

You could have a single task that's in waiting state and already has child
tasks, which were queued by the original http request handler. That would
change the meaning of child tasks to go beyond just tasks created by the
parent task, but it is not unreasonable.

>
> This is effectively the "group tasks" concept we had in Pulp2 that we have
> left out of Pulp3 currently. The user would monitor a single task which
> shows counts of 'done' and 'not done' or something like that. With counts
> though, you wouldn't know which of the tasks are done and not done; maybe
> in addition to, or as a replacement of, 'spawned_tasks', the viewset could
> be split into 'finished_spawned_tasks' or 'unfinished_spawned_tasks'. I
> don't like these names, but you get the idea.
>

In Pulp 2 we have a "task group" concept. It's just a simple way to
identify that a collection of normal tasks are members of a group, and you
can track a single summary endpoint that shows how many of those tasks are
in each state. It's simple and effective. At the REST level, a request
handler generates a response with a reference to a group instead of a task.
A client can then easily query for tasks in the group and track/present
their individual states and progress as it sees fit.

>
>
> Always returning a single task could paint us into a corner or it could
> force us to provide a better user experience. Thoughts?
>
>
I would avoid overloading the task concept and instead build in the
expectation that a 202 response can include a reference to a single task or
to a task group, using normal REST constructs (media types or named hrefs
for example) to identify what is in any given response. That would keep
data structures focused on simple and clearly-differentiated problems while
retaining API flexibility.

-- 

Michael Hrivnak

Principal Software Engineer, RHCE

Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171127/a14bb561/attachment.htm>