[Pulp-dev] rethinking workers vs queues

Tue Oct 31 14:08:47 UTC 2017

+1

This approach makes sense to me.

On 10/30/2017 05:26 PM, Michael Hrivnak wrote:
> While it's on my mind, I just want to get this idea out to others for future consideration. I do not think we
> should necessarily make any changes to Pulp 3.0 based on this.
> 
> Setup
> -------
> 
> What is a Pulp worker? We tend to think of them as a process, or pair of processes in parent-child
> relationship, with a number from 0-7 (or a higher number if you configure Pulp as such). Each worker has a
> systemd unit file and a queue. We know how many should be running and monitor them. If you have multiple
> machines, each machine has a defined set of numbered workers.
> 
> Pulp tracks each worker in the database. Why? For resource reservation. For any given resource (usually a
> repository), all not-complete tasks are assigned to the same worker so they go into one FIFO queue, which
> preserves order-of-operation. Having one worker per queue guarantees that no more than one task will run at a
> time for a given resource.
> 
> Difficulty arises when we deal with workers going offline. What if a worker dies unexpectedly and leaves its
> queue behind, orphaned? How can we quiesce a worker (stop assigning it work) so it can be taken offline
> gracefully? In a clustered environment, such as Pulp running in Kubernetes or OpenShift, users will expect the
> ability to scale the number of workers up and down, and so we'll need to address these challenges. The
> containerized-Pulp use case helps clarify, I think, the role of workers vs. queues.
> 
> Pitch
> ------
> 
> Workers are stateless processes. They are a commodity that should come and go just as easily as the processes
> that handle http requests. The only long-term state associated with a worker is its queue, and I propose that
> we (eventually) stop defining a queue based on which worker created it.
> 
> Today: a worker starts, creates a queue for itself, and informs Pulp it is ready to receive work in that queue.
> 
> Future: a worker starts, the worker informs Pulp it is ready, and Pulp tells the worker which queues it should
> work from.
> 
> Queues become the first-class resource in Pulp that tasks are assigned to. Pulp monitors workers to ensure
> that each queue is assigned to exactly one healthy worker, but it does not care as much which one.
> 
> Use Cases
> --------------
> 
> If a worker process dies and a new one starts up, Pulp can assign the orphaned queue to the new worker.
> 
> If a worker dies (gracefully or not) and a new one does not show up, Pulp can assign the orphaned queue to
> another worker, which would do double-duty until one of the queues was emptied, at which point Pulp could
> choose to delete that queue.
> 
> If a new additional worker shows up, Pulp could potentially assign it only to the general "celery" queue.
> Based on some policy, a new resource-reserving queue could optionally be created in the future, only if/when
> it was needed, and assigned to that worker.
> 
> Pulp as a clustered app would own and manage a pool of queues. The number of queues would be influenced by
> user settings (maybe a min and max), how much work is being requested at any given time, and how many
> processes are available to do work. The cluster would manage the full lifecycle of each queue.
> 
> Pulp would monitor a pool of workers who are effectively anonymous. They would have no meaningful identity
> from a scheduling standpoint. They come and go through outside influence, but the application would make no
> effort to manage their lifecycle. Pulp would only tell each worker which queues it should work from.
> 
> Summary
> -----------
> 
> Details aside, the important points are:
> 
> - Focus on the queue as the owner of state.
> - For purposes of scheduling tasks, worker processes are anonymous.
> - Pulp manages a pool of queues, monitors a pool of workers, and assigns queues to workers as workers come and go.
> 
> Thoughts? Would it help to elaborate with concrete examples? Maybe a metaphor...
> 
> Black Friday
> ---------------
> 
> Extending our familiar Black Friday metaphor... starting with a re-cap.
> 
> Customers at a retail store are standing in one long line to check out. A traffic-cop at the head of the line
> tells each person which register to go to, based on some rules. (each register represents a worker's queue).
> 
> This proposal is that we should think about the line at each register separately from the cashier. (the line
> is a queue, and the cashier is a worker process) One cashier coming on duty can take over another's register
> so they can go on break. If a cashier has to close their register to go on break, the cashier next-door might
> run back-and-forth between two registers for a while until one of the lines is empty. An entire shift of 16
> fresh cashiers might show up and relieve the previous shift. (similar to migrating worker processes from one
> machine in a cluster to another; the queues stay the same, but they get matched with new anonymous workers)
> 
> -- 
> 
> Michael Hrivnak
> 
> Principal Software Engineer, RHCE 
> 
> Red Hat
> 
> 
> 
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171031/8c4a19f7/attachment.sig>