[Pulp-list] Coordinator Usage for Repositories
Jay Dobies
jason.dobies at redhat.com
Thu Mar 15 14:18:47 UTC 2012
I see it in place for repo delete, sync, and publish. It still needs to
be added in other places too. I figured I'd start a discussion of where
and how.
= Repo Create =
I actually don't think we need it here. I think the create is atomic
enough where the race condition of multiple creates for the same ID is fine.
The reason I'm against using it is because it's going to royally mess up
my create workflow in the RPM extension. That create is actually going
to be three operations: create repo, add importer, add distributor. If
that create doesn't immediately tell me success or fail, then I can't go
on to the other steps. That means the user will have to wait for the
create to complete, delete the repo, and try again.
= Add or Remove Importer/Distributor =
As much as I want to say this falls under the same rationale as create
repo, it doesn't. It's actually closer to updating a repo than it is a
create, so I think it needs to block on everything repo update blocks on.
= Repo Update =
Technically speaking, the actual data in a repo is so benign that
changing it won't have any repercussions on a running operation. Still,
should treat it as any other update.
= Update Importer/Distributor Config =
Again, this is just like updating a repo. Can't do this while in the
process of a sync.
= Concerns? =
(this is mostly me thinking out loud)
I know I've said it before, but do we need to entertain multiple queues?
If we have 4 repos synchronizing, you're locked out of any repo
manipulation operations until one of those syncs finishes. By
manipulation operations I mean I can't create/update/add
importers/distributors to a a repo while 4 totally separate repos are
synccing.
That has the potential to be really annoying if you have a lot of
scheduled syncs. You could be in the middle of some admin operations
when one or more scheduled syncs kicks in and basically takes over all
of Pulp's processing capabilities. That may be alleviated by suggested
usage of off-hours synccing.
And that's just within repo operations. To be delayed from creating a
new repo or updating one because I've triggered a handful of consumer
operations is also a rough user experience (I say delayed meaning the
coordinator isn't the one blocking it, just the sheer lack of open
threads in the task pool).
That said, I haven't fully thought through what a multiple queue setup
would look like. It probably gets really tricky very fast. I just want
to make sure we understand how that user experience is going to change
now that many more things that previously didn't are now reliant on an
open thread in the task pool.
I wonder if it makes sense to use creative math for the task weighting
concept to ensure there will be some open space for non-sync/publish
tasks. For instance, say syncs weigh 3 and the total allocated weight
points in the task queue is 8. That means we could never have sync
operations block the entire task queue; they just don't fit. That'd
always leave 2 spots open for the smaller operations like
create/delete/update (assume for this example they each weigh 1). The
coordinator would prevent smaller operations on the repos being syncced
from taking place, but it would let them slide through for unrelated repos.
--
Jay Dobies
Freenode: jdob @ #pulp
http://pulpproject.org | http://blog.pulpproject.org
More information about the Pulp-list
mailing list