[Pulp-dev] Tasking System Changes and Feedback

Daniel Alley dalley at redhat.com
Tue Apr 13 16:54:47 UTC 2021


>
> Are there any benefits to improving RQ vs the invented here method? I'm
> just curious about the cost of maintaining a tasking system versus being
> part of a community built one. This feels like the kind of problem that
> many other applications should have in the Python world -- or are there
> elements of Pulp's deployment architecture that make it unique here?
>

This shouldn't be viewed through an "invented here" lens, because most of
what the proposal does would actually reduce our dependence on "invented
here".

Basically there is a fundamental problem with having task state split
between the database, and some external service (RQ), which is incredibly
difficult to keep consistent.  There is a lot of existing complexity around
resource locking that would completely go away if we just switch to keeping
the "task queue" in the database, and using normal transactions and
row/table locks rather than separate "lock objects" in additional tables,
etc.

The idea is, we already have all of this information about the tasks in the
database (reporting what happened and so on), and if we just store 2 extra
pieces of information - the function to execute, and the parameters to
execute with - we will essentially have the "front half" of a task queue,
that we can much more easily keep in a consistent state with everything
else.

This actually is a fairly common problem - it's called the outbox pattern:
https://microservices.io/patterns/data/transactional-outbox.html

Regarding the "back half", which is dealing with the actual process of
spawning the process, I'm less certain.  Maybe Matthias can explain what
the plan is, there.  IMO even if we continued using RQ for that portion
(part 3 in the diagram in the link), the change to the "front half"
(everything up to and including the pulp resource manager") makes a lot of
sense and would be a significant net reduction in complexity.

This is sort of an aside to this general change. Are Pulp tasks cleaned up
> from the database today?
>

They aren't.  We don't clean up anything automatically, cleanup is
user-driven.

On Tue, Apr 13, 2021 at 11:18 AM Eric Helms <ehelms at redhat.com> wrote:

>
>
> On Thu, Apr 8, 2021 at 5:24 PM Daniel Alley <dalley at redhat.com> wrote:
>
>> Eric,
>>
>> * The idea is to move away from RQ entirely.  RQ is fine (and vastly
>> better than Celery IMO), but managing task state across both 1) the
>> database and 2) a separate, external registry is still problematic.  If all
>> of the information can simply be kept in the database, then it will be much
>> easier to maintain consistent state.
>>
>
> Are there any benefits to improving RQ vs the invented here method? I'm
> just curious about the cost of maintaining a tasking system versus being
> part of a community built one. This feels like the kind of problem that
> many other applications should have in the Python world -- or are there
> elements of Pulp's deployment architecture that make it unique here?
>
>
>> * *Maybe*.  We're considering using Redis as a cache to improve content
>> serving performance (after all, caching is one of the primary uses of
>> Redis). If we do, then Redis would remain in the architecture, but it could
>> potentially be an optional component and would be easier to remove at some
>> point in the future.
>> * We'd just be adding a small amount of information to each task record,
>> and it wouldn't prevent cleanup later.
>>
>
> This is sort of an aside to this general change. Are Pulp tasks cleaned up
> from the database today?
>
>
>>
>>
>>
>> On Thu, Apr 8, 2021 at 4:42 PM Eric Helms <ehelms at redhat.com> wrote:
>>
>>> A few initial questions that get a bit into the stack but will help the
>>> Foreman project think on the proposed changes:
>>>
>>>  * Does this move away from RQ entirely or just RQ workers?
>>>  * Do the new workers remove Pulp 3's use of Redis all together?
>>>  * Will using the database result in any additional build up of tasking
>>> information that can impact performance over time? (Or does all task data
>>> get cleaned up eventually?)
>>>
>>> Thanks for sending this along early.
>>>
>>> On Fri, Apr 2, 2021 at 4:43 PM Brian Bouterse <bmbouter at redhat.com>
>>> wrote:
>>>
>>>> FYI, @mdellweg and I have been collaborating on the tasking system
>>>> changes. This email is to share some info to transition the work to
>>>> @mdellweg while I'm out. With the new-style disabled by default I am hoping
>>>> it can go into 3.13.
>>>>
>>>> ## The PoC and ticket info
>>>>
>>>> The PoC is basically functional, but it's a PoC:
>>>> https://github.com/pulp/pulpcore/pull/1222/
>>>>
>>>> * The epic is being tracked here which recaps why we're doing this and
>>>> the high level approach. The sub-tasks capture the various detailed
>>>> changes. https://pulp.plan.io/issues/8495
>>>>
>>>> * This is totally separate from the RQ workers you use today, and those
>>>> will continue to be available for a while.
>>>>
>>>> ## Next Steps
>>>>
>>>> * @mdellweg will continue the work and hopefully merge the PoC while
>>>> I'm out
>>>>
>>>> * Once it's demo-able I've asked @mdellweg to give a 20 minute, public
>>>> (hopefully recorded) technical demo. While it is designed to be a drop-in
>>>> replacement from a user perspective, we think sharing the internals will be
>>>> helpful to get feedback and increase the list of those who understand the
>>>> work.
>>>>
>>>> All the best,
>>>> Brian
>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://listman.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>
>>>
>>> --
>>> Eric Helms
>>> Principal Software Engineer
>>> Satellite
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://listman.redhat.com/mailman/listinfo/pulp-dev
>>>
>>
>
> --
> Eric Helms
> Principal Software Engineer
> Satellite
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20210413/4ac5af45/attachment.htm>


More information about the Pulp-dev mailing list