[Pulp-dev] Tasking system improvements

Brian Bouterse bmbouter at redhat.com
Wed Dec 2 22:34:02 UTC 2020


Recently at triage we discussed tasking system improvements. I have mostly
good news to share regarding the resolution of those.

Through postmortem analysis of the system we identified one specific
improvement and that got merged today. See the issue description for full
details on the failure scenario and the PR that fixed it:
https://pulp.plan.io/issues/7907

We also identified an opportunity for Pulp to avoid these types of problems
with less human intervention, and I wrote up that additional "health check
and recovery" bugfix here: https://pulp.plan.io/issues/7912

@dalley I'm hoping maybe you could consider implementing 7912 if there are
no objections or improvements from others.

The not so great news is we also identified a variety of race conditions
stemming from spreading our correctness across two data systems without
transactional support "across" them, i.e. postgresql and redis. While ^
specific fixes are great, I believe we will need to work longer term to
eliminate both redis and the resource-manager from the architecture to
fully close the door on these issues. I will write up a motivation for that
separately.

Any feedback, ideas, or concerns are welcome.

Cheers,
Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20201202/207f7932/attachment.htm>


More information about the Pulp-dev mailing list