[rest-practices] Async operations and statelessness

Tue Apr 27 16:58:39 UTC 2010

Folks,

I've been experimenting with various ways for a client to indicate that
it wishes an action to be executed asynchronously, as opposed to
blocking for completion. The idea would be to make this asynchrony
optional so as to facilitate really simple clients which may not be set
up to easily poll for completion.

Now I don't really like the idea of using an "?async=true" style of
query parameter, as this would tend to bleed knowledge of the URI
structure onto the client side.

In the case where the POST entity-body is non-empty (e.g. for migrate we
might indicate a target host, or for reboot maybe a grace period or
somesuch), then the async preference could simply be encoded in the task
representation. But some actions are likely to have a empty body, and
I'm leery about adding one just for the sake of the client's async
preference. 

Another option would be to leverage the standard Expect header, with a
custom expectation-extension in the style of the familiar "Expect:
100-Continue" usage.

So the client of a long-running action could set a header something with
like "Expect: 202-Accepted", which the service would take as an explicit
instruction to spin off an async task and return a 202. If the action
could not be carried out in a non-blocking fashion for some reason, the
service would bail with a 417 Expectation Failed.

Is there a whiff of sulphur around extending the semantics of an
existing header in this way? Is it likely to cause trouble with
intermediating proxies? There are some worrying comments in the RFC[1]
to the effect that proxies must understand the expectation.

Cheers,
Eoghan

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.20

On Tue, 2010-04-20 at 19:05 +0100, Eoghan Glynn wrote:
> Thanks for the feedback, folks. 
> 
> So I've been chewing away on our requirements around minimal updates to
> search queries and also the asynchronous actions discussed on this
> thread.
> 
> I'm tending towards a design whereby we maintain a sort of three-level
> cache of VM-related representations:
> 
> 1. The VM collection, which represents our best state of knowledge of
> the current status of each VM. This is used to populate the response to
> GET /vms?query=... style operations.
> 
> 2. A list of completed VM state changes, only the most recent for each
> VM. The length of this list is bounded by the VM collection size, but
> its also aggressively reaped, so in general it would be much smaller.
> The idea is to return a link to the next free slot in this list along
> with the result of any queries evaluated over the entire VM collection
> in #1, so that a client can periodically refresh its query result set
> without re-evaluating the entire query. The client effectively polls on
> the tail of the update list, only seeing those state changes that
> occured _after_ its initial GET. A retrieval on a reaped VM state-change
> resource is simply 301-redirected back to the main collection.
> 
> 3. A potentially larger queue of pending or in-progress actions, of
> which there may be more than one outstanding per VM (in the case of a
> sequence of actions being submitted in quick succession). A reference to
> a resource representing the corresponding action is returned with the
> initial 202 Accepted response to the POST on an action URI. Once the
> task is completed, whether it succeeded or failed, this resource becomes
> part of the audit trail and is effectively persisted, presumably with a
> longer lifespan than any reasonable client.     
> 
> So #2 allows client to efficiently refresh their queries without any
> dependence on special handling of If-Modified-Since, whereas #3 skates
> around the statefulness prohibition.
> 
> Cheers,
> Eoghan
> 
> 
> On Tue, 2010-04-20 at 08:42 -0400, Bryan Kearney wrote:
> > On 04/19/2010 03:20 PM, Eoghan Glynn wrote:
> > >
> > > Hi Folks,
> > >
> > > I wanted to get your feeling on the question of breaking statelessness
> > > when asynchronous operations are used, in particular in terms of the
> > > protocol used to check the outcome of the deferred task.
> > >
> > > So say we need to model a potentially long-lived operation, such as
> > > migrating a VM. To avoid tying up a connection for the duration, the
> > > server could respond to the "POST /vms/999/reboot" request with a 202
> > > Accepted, along with a unique URI to be used subsequently to check on
> > > the status of the migration.
> > >
> > 
> > Perhaps, you model it like a collection of actions. So.. you POST to
> > 
> > /vms/999/rebootRequests
> > 
> > which gives you back a resource which looks like (pardon the xml)
> > 
> > <rebootRequest>
> > 	<requested>SOME DATE</requested>
> >          <requestor>SOME DUDE</requestor>
> >          <status>SOME STATUS</requestor>
> > </rebootRequest>
> > 
> > This way, the request itself is a resource. Assuming infinate DB Space, 
> > you keep it for the life of the machine.
> > 
> > Or.. a post to reboot could redirect you to /vms/999/actions
> > 
> > Which gives you the same basic object
> > 
> > <action>
> > 	<type>reboot</type>
> > 	<requested>SOME DATE</requested>
> >          <requestor>SOME DUDE</requestor>
> >          <status>SOME STATUS</requestor>
> > </action>
> > 
> > 
> > > Now it's the lifecycle of the temporary resource represented by this URI
> > > that seems to be potentially problematic. While the async operation is
> > > still in flight, there's no problem. However the question is how long
> > > _after_ the operation has completed should we maintain this resource so
> > > that the client can eventually determine the outcome? We've no guarantee
> > > that the client will poll regularly, so say we impose some arbitrary
> > > expiry, maybe 10 minutes after the task has completed. But even for that
> > > limited time we seem to be breaking one of the fundamental Fielding
> > > commandments, the one that demands session state is kept entirely on the
> > > client side. After the task completes, this URI no longer represents a
> > > resource per se, rather we'd just be keeping it around for a while to
> > > support our conversation with the client.
> > >
> > > So another more extreme approach would be to limit the client to:
> > >
> > > (a) checking whether the async operation is still being processed, so
> > > that the status URI is only valid until the task completes,
> > >
> > > (b) inferring from the current state of the VM whether the task may have
> > > succeeded or failed, for example the client gets a big hint that its
> > > stop operation has failed if the task has completed but the VM state
> > > hasn't transitioned to DOWN (or course there's a race here, as the VM
> > > may simply have been restarted in the meantime by another agent),
> > >
> > > and,
> > >
> > > (c) getting a indirect indication of the failure reason by scanning an
> > > event/audit log.
> > 
> > 
> > I am also fine with querying the state of the object.. but I bet you 
> > have a need for a log anyways.
> > 
> > -- bk
> 
> _______________________________________________
> rest-practices mailing list
> rest-practices at redhat.com
> https://www.redhat.com/mailman/listinfo/rest-practices