[Ovirt-devel] Re: Thoughts about taskomatic redesign

Fri Jun 27 14:24:35 UTC 2008

David Lutterkort wrote:
> On Wed, 2008-06-25 at 18:32 -0700, Ian Main wrote:
>   
>> So I've been doing some thinking on this, and here's what I've come up with to date. As usual any input is appreciated.
>>
>> I wanted to make sure we had some basic requirements we could discuss.  Taskomatic should:
>>
>> - Execute many tasks in a timely manner (presumably via distributed/multi threaded setup)
>> - Execute tasks in the correct order.
>> - Implement transaction support (if we can/want it)
>> - Have good error reporting.
>> - Include authentication and encryption.
>> - It would be nice to be able to see the state of the queues in the WUI.
>> - Implement at least the following tasks: 
>>    - start/stop/create/destroy VMs
>>    - clear host functionality.  Destroy all VMs/pools.
>>    - Migrate vms.
>>    - Storage pool creation/destruction/management
>>     
>
> One fundamental question is: should tasks follow actions in the UI
> closely, or should they be based on a set of basic actions, and the WUI
> (or API) performs more complicated actions by queueing several tasks at
> once. For example, to provide 'clear the host' functionality, you could
> either have a 'clear the host' task or have the WUI queue tasks to
> migrate each VM on that host to somewhere else. 
>
> It's actually a little more complicated than that: you'd first have to
> tell the host to not start any more VM's, and only after that can you
> determine the set of VM's that need to be migrated away; even worse, any
> queued task that wants to start a VM on that host then needs to be
> changed to start the VM on a different host.
>
>   
For this particular example, our current method is as follows:
1) mark the host as unavailable for starting any new VMs (this isn't 
done as a task, as it's a simple db flag)
2) insert a single "clear this VM" task associated with the host (so the 
WUI doesn't figure out what VMs are currently on the host)

Taskomatic will then need to figure out what VMs are running on the host 
and migrate them appropriately. We don't have to worry about new VMs 
appearing while the task is queued, though, as they're effectively 
locked out by the host now being disabled. At the moment, we don't have 
to worry about pending VM starts, since the "start this VM" task doesn't 
specify a particular host. If we do end up adding the ability to choose 
the host to start a VM on, I still don't see it as a problem here, as we 
already have to handle the disabled host case (or for that matter 
overallocated or crashed hosts, etc.) -- basically the user will have to 
specify whether to fail if the chosen host isn't available (i.e. VM 
_must_ stay on this host) or whether, for an unavailable host, the start 
VM should be treated as a normal "start this VM anywhere" call.

We could also rewrite this such that the WUI creates a separate migrate 
task for each VM -- it's really a question of 1) whether we want to 
build this logic into the WUI or taskomatic; and 2) how we want status 
reporting.

In particular, per-VM migrate tasks would give us a more granular notion 
of task status -- on the other hand we'd also like to have a high-level 
status "did 'clear this host' pass?"

So we _may_ want to model tasks as more complex actions -- i.e. 
supporting sub-tasks. WUI submits a "clear this host" task, and at some 
point break this task down into subtasks (referencing the main task). 
the main  task is done when each of the subtasks has completed.
> I am not sure which of those approaches is better here, it depends a lot
> on what tasks might need to be implemented. Both have adavantages and
> drawbacks. Using 'complex' tasks will probably be a little easier to
> implement to start with, but it might be hard to make sure that there
> are no races, and all dependencies etc. of those tasks are captured
> correctly. 'Simple' tasks will make it easier to understand possible
> interactions between tasks, but will make the task queuing/processing
> logic a little hairier, and you'd want to make sure you have as small
> and as expressive a set of base tasks as possible.
>
>   
>> Now, if we break it down the system into basic components, I'm thinking:
>>
>> - Task producer: This is the WUI now but could also be some other
>> script in the future.  Creates task and adds them to the queue.
>>
>> - Task ordering system: At some point I think a single process needs
>> to separate out the tasks and order them.  I think it would be useful
>> to actually move them into separate independent queues that can be
>> executed in parallel.  This would be done such that each action in a
>> given queue would need to be done in order, but each queue will not
>> have dependencies on anything in another queue and so they can be
>> worked on in parallel.  While this could be a bottleneck I think that
>> the logic required to order the tasks will not be onerous and should
>> keep up with high loads.
>>     
>
> I strongly disagree with that. There should be one single queue,
> implemented as a table (or a few interrelated tables) in a relational
> database. I don't think that you'll get a task processing system that is
> robust and free of races without transactional guarantees from a
> relational database.
>
> For ordering/dependency tracking, you need for each task T a set of
> prerequiste tasks (i.e. tasks that need to happen before T can be
> started) Each task will go through a few states, at a minimum, something
> like 'new' (task just queued) -> 'in progress' (action associated with
> task was started) -> 'complete' (action successfully finished) plus an
> error state. You probably need to split those states into a few more to
> help the UI display more information, but these four states are enough
> to get a basic parallelized task system going.
>
> A task is runnable when it is in state 'new' and all of its
> prerequisites are in state 'complete'. It's pretty easy to query all
> runnable tasks with a single query.
>
>   
Currently we have states like that -- with a couple additions:
  queued:   same as 'new' above
  running:  same as 'in progress' above
  finished:  same as 'complete' above
  failed:      task completed unsuccessfully
  canceled: task was canceled without ever being attempted (either due 
to user action or taskomatic) (although really this is a subset of 'failed')
  paused:    I'm not sure if we're using this state now

What we _don't_ have in the current implementation is:
  1) any notion of task dependencies (i.e. task 66 must run after 
33,44,and 55 have completed. 33 and 44 must complete successfully, but 
success state of 55 is unimportant)
  2) any notion of compound tasks (i.e. task 100 is composed of subtasks 
101,102,and 103. These subtasks have their own dependencies and are 
tracked just as top-level tasks, and task 100 is essentialy a no-op task 
with dependencies on its subtasks
>> - Task implementation system: This would take a given queue and
>> implement the tasks within it, dispatching requests to hosts as
>> needed.  Any errors occurring in this system will be reported and
>> possibly we could implement rollback.
>>
>> - Host/vm/pool State: In addition to the above, in order to implement
>> queue ordering and determine for certain that a given task succeeded,
>> we'll require host/vm/storage pool state information that is as up to
>> date as possible.
>>
>> So in terms of implementing this, a lot of it comes down to technology selection.
>>
>> Queues could continue to be implemented in postgresql.  It would be
>> nice however to have something that was event driven and did not
>> require polling.
>>     
>
> If you wrap all access to the task queue in a simple API, you don't even
> need to poll, since you know the places where tasks can become runnable:
> (1) when new tasks are added to the queue and (2) when the state of
> existing tasks change. It's important though that nobody can insert
> tasks into the queue behind your back.
>
> I would go with a hybrid approach: find new runnable tasks whenever you
> hit conditions (1) and (2), and poll, just to guard against silly errors
> in the task processing logic (and log any tasks found that way loudly)
>
>   
Right now (1) is still handled by polling, since "adding a task to the 
queue" is implemened by "creating a task object and saving it" -- i.e. 
the queue is implemented in postgresql. So if we keep the task list as a 
PG table, we'd still need to do polling for new tasks, unless the 
"insert new task" action on the WUI did something other than just adding 
a row -- such as sending a message w/ AMQP (or some other form of 
WUI->Taskomatic communication -- right now there is no direct link 
between the two other than db table polling.
>> I think a single ruby process could be used to order the tasks and
>> place them in per-thread/process queues.
>>     
>
> I don't think that two-level queueing is really needed. Just have a
> single ruby process that pulls all the runnable tasks off the queue and
> spawns a thread/process for each of them. Each of those workers is then
> responsible for getting the action associated with the task started.
> When using synchronous actions (e.g. libvirt calls) the worker will have
> to sit around and wait for the result of the call and update the task
> accordingly. For async calls (e.g. qpid), somebody needs to hang around
> and accept completion/error messages and update the corresponding task
> in the queue.
>
> Of course, you should bound the maximum number of workers, but the limit
> can probably be pretty high, since the workers cause minimal load on the
> queue processor.
>
>   
We may not need two-level queuing, but generating the dependency graph 
will have to happen at one step -- "order the tasks" as stated above 
will still be necessary, but it's more a case of "work out the 
dependencies for each task". When the WUI insterts a task for "create 
this VM" or "migrate this VM" the dependency information will not be 
known by the WUI. So something on the taskomatic side will have to, as 
part of the "notice new tasks in the queue and process them"

So when there are new tasks in the queue, we'd need to something like 
the following:
1) if any new tasks have necessary subtasks, those need to be broken 
out. So we need a 'pre-process' step where this happens -- i.e. for 
'clear this host' we've got to iterate over the list of VMs and insert a 
migrate sub-task for each one
2) for each new task, determine whether it depends on any existing 
'queued' or 'running' tasks -- if so, insert these into the task 
dependency list for the task. Presumably we can ignore completed tasks here.
3) actually running tasks: runnable tasks are all 'queued' tasks for 
which all dependencies are met. cancel/fail tasks for which the 
dependency requires successful completion.
>> One possibility is that the task implementers be a mix of ruby
>> processes running on the wui and C or C++ applications running on the
>> node that use qpid modeling to represent the host, VMs, and storage
>> pools.
>>     
>
> It's helpful to distinguish between tasks and actions associated with
> tasks. So a task is just something that needs to be done at some point,
> has a state, prerequisites, and a corresponding action. The action is
> what makes libvirt calls etc.
>
> That way, you can treat all tasks the same for purpsoes of queue
> processing; the different actions come into play (1) when the task is
> queued, to determine which tasks are prerequisites (2) when the task is
> run by executing its action.
>
> David
>
> _______________________________________________
> Ovirt-devel mailing list
> Ovirt-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/ovirt-devel
>   

Scott