[Pulp-list] Asynchronous Task Dispatching

Thu Apr 14 17:28:37 UTC 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/14/2011 01:13 PM, Jason L Connor wrote:
> On Thu, 2011-04-14 at 11:50 -0400, Jay Dobies wrote:
>> Why do we need to query the database for tasks? Can we keep the task
>> stuff in memory and snap out its state when it changes? Or are you
>> trying to solve the cross-process task question at the same time? 
> 
> There's a few of reasons:
>      1. I want to get the task persistence stuff working, we can look at
>         what optimizations we need once it does

I'd argue this, but the other points make it not worth it.

>      2. I'm currently trying to keep the multi-process deployment option
>         open and volatile memory storage is not conducive to that

Fair enough. You know if you didn't keep that door open, we'd run into a
case where we need it.

>      3. I'm trying not to introduce any task state consistency bugs, at
>         least, not initially
>      4. To be honest, dequeueing tasks, running tasks, timing out tasks,
>         and canceling tasks (i.e. what the dispatcher does), all
>         represent state changes and most would have to hit the db anyway

That makes sense. A possibility would be to have the writes to update
those states on a separate frequency to update the database, queuing
them up in the meantime. But that's probably overkill; we're not in the
business of designing uber python tasking systems.

> I'm think that once the persistent stuff actually works I can revisit it
> looking for optimizations and features needed to support multi-process
> access (if we decided to go that route).
> 
> In the meantime, I was thinking about a 30 second delay between task
> queue checks. With an on demand dispatcher wake up whenever a new task
> is enqueued. This should keep our async sub-system fairly responsive in
> terms of repo syncs and the like while keep db io down to something
> reasonable.

You lost me after the 30 seconds part. How will the on demand dispatcher
resolve handling situations where there are other tasks ahead of it in line?

So let's take a case of a single thread doing stuff. It just popped a
new task off the queue, beginning the 30 second timer. Left on the queue
are tasks B, C, and D.

During that 30 second timeframe, a request to sync a repo comes in. I'd
expect it to be put on the queue to execute after D, but the way I read
your comment it'll jump to the front of the line by being run by the on
demand dispatcher.

I'm probably missing something though, so maybe some more explanation
will help clear it up.

Also (again, I don't know how you're implementing it), based on the
discussions in this thread it sounds like the queue of tasks will only
exist in the database and not in memory. How is that going to affect the
uniqueness check? Does that become a series of database retrievals or
can we do that check atomically in the DB?

> It fits in with the time granularity of 1 minute that I've been
> advertising as well.
> 
> 
> Any other thoughts?

17 minutes. We'll let the star wars ascii movie keep them entertained in
the meantime.

>
> 
> 
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list

- -- 
Jay Dobies
RHCE# 805008743336126
Freenode: jdob
http://pulpproject.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNpy7FAAoJEOMmcTqOSQHCByQIAJOidUxxtSXuN7MHKLooK0jm
cexlytaKnaAVew4U6hZ6lJCF+b3SlExV4wk3gwCeEHwlR501wQbmsFKyAuu+2jvr
uoEHU0+5kN9LNwjyM7wVpmXihCkvYVfZcn8Gwp/wnyNSlKAWBxmG3JE29F4UACsJ
2rdqB8locXMcNtMwXx1SP1CvQ7u13ufeGXPZDsHEvtFfjO+W2ztL3nhZZRSoIUe/
s4SqvkhKOoCzJMdI6LEiGtdQBLxUftcNgfPqRivEO+mNFOmgxnT9/ssNEcn9cTFm
YDJdSp4a5kXSN9+sI+Qn2z94V19MHW94IaUxbI8xFXJgaL/XtrKCvciidCPZNWM=
=wL4b
-----END PGP SIGNATURE-----