[rest-practices] Async operations and statelessness

Tue Apr 20 18:05:04 UTC 2010

Thanks for the feedback, folks. 

So I've been chewing away on our requirements around minimal updates to
search queries and also the asynchronous actions discussed on this
thread.

I'm tending towards a design whereby we maintain a sort of three-level
cache of VM-related representations:

1. The VM collection, which represents our best state of knowledge of
the current status of each VM. This is used to populate the response to
GET /vms?query=... style operations.

2. A list of completed VM state changes, only the most recent for each
VM. The length of this list is bounded by the VM collection size, but
its also aggressively reaped, so in general it would be much smaller.
The idea is to return a link to the next free slot in this list along
with the result of any queries evaluated over the entire VM collection
in #1, so that a client can periodically refresh its query result set
without re-evaluating the entire query. The client effectively polls on
the tail of the update list, only seeing those state changes that
occured _after_ its initial GET. A retrieval on a reaped VM state-change
resource is simply 301-redirected back to the main collection.

3. A potentially larger queue of pending or in-progress actions, of
which there may be more than one outstanding per VM (in the case of a
sequence of actions being submitted in quick succession). A reference to
a resource representing the corresponding action is returned with the
initial 202 Accepted response to the POST on an action URI. Once the
task is completed, whether it succeeded or failed, this resource becomes
part of the audit trail and is effectively persisted, presumably with a
longer lifespan than any reasonable client.     

So #2 allows client to efficiently refresh their queries without any
dependence on special handling of If-Modified-Since, whereas #3 skates
around the statefulness prohibition.

Cheers,
Eoghan

On Tue, 2010-04-20 at 08:42 -0400, Bryan Kearney wrote:
> On 04/19/2010 03:20 PM, Eoghan Glynn wrote:
> >
> > Hi Folks,
> >
> > I wanted to get your feeling on the question of breaking statelessness
> > when asynchronous operations are used, in particular in terms of the
> > protocol used to check the outcome of the deferred task.
> >
> > So say we need to model a potentially long-lived operation, such as
> > migrating a VM. To avoid tying up a connection for the duration, the
> > server could respond to the "POST /vms/999/reboot" request with a 202
> > Accepted, along with a unique URI to be used subsequently to check on
> > the status of the migration.
> >
> 
> Perhaps, you model it like a collection of actions. So.. you POST to
> 
> /vms/999/rebootRequests
> 
> which gives you back a resource which looks like (pardon the xml)
> 
> <rebootRequest>
> 	<requested>SOME DATE</requested>
>          <requestor>SOME DUDE</requestor>
>          <status>SOME STATUS</requestor>
> </rebootRequest>
> 
> This way, the request itself is a resource. Assuming infinate DB Space, 
> you keep it for the life of the machine.
> 
> Or.. a post to reboot could redirect you to /vms/999/actions
> 
> Which gives you the same basic object
> 
> <action>
> 	<type>reboot</type>
> 	<requested>SOME DATE</requested>
>          <requestor>SOME DUDE</requestor>
>          <status>SOME STATUS</requestor>
> </action>
> 
> 
> > Now it's the lifecycle of the temporary resource represented by this URI
> > that seems to be potentially problematic. While the async operation is
> > still in flight, there's no problem. However the question is how long
> > _after_ the operation has completed should we maintain this resource so
> > that the client can eventually determine the outcome? We've no guarantee
> > that the client will poll regularly, so say we impose some arbitrary
> > expiry, maybe 10 minutes after the task has completed. But even for that
> > limited time we seem to be breaking one of the fundamental Fielding
> > commandments, the one that demands session state is kept entirely on the
> > client side. After the task completes, this URI no longer represents a
> > resource per se, rather we'd just be keeping it around for a while to
> > support our conversation with the client.
> >
> > So another more extreme approach would be to limit the client to:
> >
> > (a) checking whether the async operation is still being processed, so
> > that the status URI is only valid until the task completes,
> >
> > (b) inferring from the current state of the VM whether the task may have
> > succeeded or failed, for example the client gets a big hint that its
> > stop operation has failed if the task has completed but the VM state
> > hasn't transitioned to DOWN (or course there's a race here, as the VM
> > may simply have been restarted in the meantime by another agent),
> >
> > and,
> >
> > (c) getting a indirect indication of the failure reason by scanning an
> > event/audit log.
> 
> 
> I am also fine with querying the state of the object.. but I bet you 
> have a need for a log anyways.
> 
> -- bk