[Ovirt-devel] Thoughts about taskomatic redesign

Mon Jun 23 21:03:34 UTC 2008

On Mon, Jun 23, 2008 at 11:06:08AM +0200, Chris Lalancette wrote:
> Ian, et.al,
>      I've been doing some thinking about both what taskomatic needs to do in its
> next incarnation, along with ways of how to do it.
> 
> WHAT:
> 1)  Taskomatic needs to be able to run on multiple machines at the same time,
> accessing a central database

This is an over-specialization - basically taskomatic needs to be parallelized
whether it runs on one or many machines.

> 2)  Taskomatic needs to be able to fire off tasks relating to different VMs (or
> storage pools) concurrently (whether it's just run on one machine or many).

It strikes me that this is avoiding the more general problem - that there are
explicit dependancies between tasks. Serializing tasks per-VM is not expressing
this concept of dependancies directly. 

So as an example a task starting a VM, may have a dependancy on a task to 
start a storage pool (or refresh the volume list in an existing pool). Now
while these 2 tasks are pending, another VM start task is schedule which 
has a dependancy on the same storage task. 

Or the admin may have some runtime policy to the effect that during the hours
9-5 they want VM 'x' to be running on a machine, and then at 5pm shutdown 'x'
and startup 'y' in its place. This has a strict ordering requirement between
the 2 vms - they can't be schedule independantly because there won't be RAM
for 'y', until 'x' is shutdown.

> HOW:
> 1)  I think we should actually have two modes for taskomatic: standalone (i.e. I
> am the only taskomatic), and multi-host (there are other taskomatics).  The
> reason for this is in the standalone case, we probably want to fork one
> taskomatic process for each VM (or storage pool) we want to perform actions on.
>  In the multi-host case, we don't know how many other taskomatics might be out
> there doing tasks, so we keep one process per machine (this should be a
> command-line option/config file option)

Having two modes is inserting an artificial distinction that really doesn't
exist. Even if there is only a single  instance of taskomatic runnig on a 
single machine in the data center there is going to be parallization because
the world has gone heavily SMP whether multi-socket or multi-core or both.
By the very nature of its work taskomatic is not going to be bottlnecked on
CPU, instead spending alot of time waiting on results from operations. To
maximise utilizattion of a single node taskomatic will want to be heavily
parallized, whether fork() based or thread based, on some multiple of the 
number of the number of CPUs. 

On a 4 logical CPU machine perhaps want 16 taskomatic threads running. So
whether those 16 threads are on  single 4 cpu machine, or a pair of 2 cpu
machines is not a dinstiction we ned to consider. We just scale horizontally
to add capacity as required.

> 2)  We need to lock rows in the database as each taskomatic wakes up and finds
> work to do.  Luckily both postgres and activerecord support row locking, so the
> underlying infrastructure is there.

We only need row locking if you're working on the model where you keep the
transaction open for the duration of taskomatic's processing for that 
particular job and commit/rollback on completion. It may be that you simply
immediately mark a task as 'in progress' and commit that change right at
the start. Then later have a second transaction where you fill in the result
of the task whether succes or failure.

> 
> In the standalone case, taskomatic should wake up, look at how many different
> VMs (or storage pools) there are currently tasks queued for, and fork off that
> many workers to do work (i.e. if you have start_vm 1, start_vm 2, stop_vm 1 in
> the queue, you would fork off two workers).  Each worker would lock all of the
> rows of the database corresponding with their VM (i.e. the first worker would
> lock all rows having to do with VM 1), and then busy themselves with executing
> the actions for that VM serially.  I guess the locking isn't strictly necessary
> here, since we can tell each worker which VM or storage ID it should work on,
> but it makes it more like the multihost case.

Forking a thread per VM doesn't work because there can be ordering requirements
between taskss on different VMs, and/or storage. Explicit task dependancies
need to be tracked. At which point, each taskomatic process/thread in existance
simply waits for a task to arrive which has no pending dependant tasks, claims
it and goes to work on it. Completing the task will then satisfy dependant tasks
allowing them to be processeed and so on. Need to specialize a particular worker
process to a particular object.

> Note that in both standalone and multihost case, it's OK for multiple
> taskomatics to be sending commands to identical managed nodes.  Libvirtd itself
> is serial, so commands might get intertwined, but that's OK since we are
> explicitly making sure our taskomatics work on different VMs or storage pools.

Don't rely on libvirtd being serial - we may well find ourselves making it
fully parallized allowing operations to be made &  executed concurrently. At
the very least we'll have current execution when adding async background
jobs.

> 3)  Transaction support in taskomatic (hi slinaberry!).  I'm not sure about this
> one; we are modifying state external to the database, so I'm not sure
> "rolling-back" a transaction means a whole hill of beans to us.  In fact, I
> might argue that rolling back is worse in this case; if you modified external
> state, and then crashed, when you come back you might "roll-back" your VM state
> to something that's totally invalid, and you'll need to be corrected by
> host-status anyway.  Does anyone have further thoughts here?

I agree that life probably isn't going to be as simple as just rolling back. 
I think its much more likely we'll need to explicitly track the failure against
the task. So more similar to the example I mentioned earlier, where we mark
the task as in progreess in the DB, and then later update with the outcome of
the task. If a task failed you'd then want to fail and tasks depending on it
and tasks depending on those, etc, etc. This gives oVirt ability to track
the failures and automatically re-schedule new tasks to try again, or let the
admin choose a different action.

Simply rolling-back the tranaction means you're not capturing any of this
and just re-trying over and over without neccessarily solving the problem

Regards,
Daniel.
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|