[Pulp-list] Coordinator Usage for Repositories

Thu Mar 15 14:18:47 UTC 2012

I see it in place for repo delete, sync, and publish. It still needs to 
be added in other places too. I figured I'd start a discussion of where 
and how.

= Repo Create =
I actually don't think we need it here. I think the create is atomic 
enough where the race condition of multiple creates for the same ID is fine.

The reason I'm against using it is because it's going to royally mess up 
my create workflow in the RPM extension. That create is actually going 
to be three operations: create repo, add importer, add distributor. If 
that create doesn't immediately tell me success or fail, then I can't go 
on to the other steps. That means the user will have to wait for the 
create to complete, delete the repo, and try again.

= Add or Remove Importer/Distributor =
As much as I want to say this falls under the same rationale as create 
repo, it doesn't. It's actually closer to updating a repo than it is a 
create, so I think it needs to block on everything repo update blocks on.

= Repo Update =
Technically speaking, the actual data in a repo is so benign that 
changing it won't have any repercussions on a running operation. Still, 
should treat it as any other update.

= Update Importer/Distributor Config =
Again, this is just like updating a repo. Can't do this while in the 
process of a sync.

= Concerns? =
(this is mostly me thinking out loud)

I know I've said it before, but do we need to entertain multiple queues? 
If we have 4 repos synchronizing, you're locked out of any repo 
manipulation operations until one of those syncs finishes. By 
manipulation operations I mean I can't create/update/add 
importers/distributors to a a repo while 4 totally separate repos are 
synccing.

That has the potential to be really annoying if you have a lot of 
scheduled syncs. You could be in the middle of some admin operations 
when one or more scheduled syncs kicks in and basically takes over all 
of Pulp's processing capabilities. That may be alleviated by suggested 
usage of off-hours synccing.

And that's just within repo operations. To be delayed from creating a 
new repo or updating one because I've triggered a handful of consumer 
operations is also a rough user experience (I say delayed meaning the 
coordinator isn't the one blocking it, just the sheer lack of open 
threads in the task pool).

That said, I haven't fully thought through what a multiple queue setup 
would look like. It probably gets really tricky very fast. I just want 
to make sure we understand how that user experience is going to change 
now that many more things that previously didn't are now reliant on an 
open thread in the task pool.

I wonder if it makes sense to use creative math for the task weighting 
concept to ensure there will be some open space for non-sync/publish 
tasks. For instance, say syncs weigh 3 and the total allocated weight 
points in the task queue is 8. That means we could never have sync 
operations block the entire task queue; they just don't fit. That'd 
always leave 2 spots open for the smaller operations like 
create/delete/update (assume for this example they each weigh 1). The 
coordinator would prevent smaller operations on the repos being syncced 
from taking place, but it would let them slide through for unrelated repos.

-- 
Jay Dobies
Freenode: jdob @ #pulp
http://pulpproject.org | http://blog.pulpproject.org