[Pulp-list] Pulp 3 waiting tasks

Daniel Alley dalley at redhat.com
Mon Apr 6 21:24:11 UTC 2020


On the few occasions when I've had issues (usually doing something like
deleting the database while a task was still running), a "redis-cli
FLUSHALL" has solved my problems as well.  So if ^ does not resolve things,
try that.

On Mon, Apr 6, 2020 at 1:37 PM Brian Bouterse <bmbouter at redhat.com> wrote:

> Thank you. We will look into this bug you've filed.
>
> I believe you can recover your current installation by canceling the tasks
> stuck in the "waiting" state. To cancel use this API call
> https://docs.pulpproject.org/restapi.html#operation/tasks_cancel
>
> Let me know if this doesn't help get your system back on track.
>
> Thanks,
> Brian
>
>
> On Mon, Apr 6, 2020 at 8:04 AM Bin Li (BLOOMBERG/ 120 PARK) <
> bli111 at bloomberg.net> wrote:
>
>> Brian,
>>
>> I filed a bug to track this issue "https://pulp.plan.io/issues/6449". In
>> the meantime, is it possible to recover from this issue or we need to erase
>> the database and reinstall?
>>
>> Thanks
>>
>>
>> From: bmbouter at redhat.com At: 04/03/20 16:45:00
>> To: Bin Li (BLOOMBERG/ 120 PARK ) <bli111 at bloomberg.net>
>> Cc: pulp-list at redhat.com
>> Subject: Re: [Pulp-list] Pulp 3 waiting tasks
>>
>> So the problematic thing I see in this output is the "resource-manager |
>> 0". This tells me that Pulp's record of the task is in postgresql (and was
>> never run), but RQ has lost the task from the "resource-manager" queue in
>> Redis. So the next question is how did that happen?
>>
>> Would you be willing to file a bug and link it here so that I could try
>> to reproduce it on our end?
>>
>> Thanks!
>> Brian
>>
>>
>> On Fri, Apr 3, 2020 at 4:35 PM Bin Li (BLOOMBERG/ 120 PARK) <
>> bli111 at bloomberg.net> wrote:
>>
>>> Brian,
>>>
>>> Here is rq info output. Thanks for look into this.
>>>
>>> # rq --version
>>> rq, version 1.2.2
>>>
>>> # rq info
>>> 134692 at pulpmaster |██ 7
>>> 182536 at pulpmaster | 0
>>> 134343 at pulpmaster |██ 7
>>> 191144 at pulpmaster | 0
>>> 130945 at pulpmaster |██ 7
>>> 135922 at pulpmaster | 0
>>> 182528 at pulpmaster | 0
>>> 182532 at pulpmaster | 0
>>> 191145 at pulpmaster | 0
>>> 135796 at pulpmaster | 0
>>> 191148 at pulpmaster | 0
>>> 191152 at pulpmaster | 0
>>> 191151 at pulpmaster | 0
>>> 135306 at pulpmaster | 0
>>> 135679 at pulpmaster | 0
>>> 182539 at pulpmaster | 0
>>> 182547 at pulpmaster | 0
>>> 182530 at pulpmaster | 0
>>> 134332 at pulpmaster |██ 7
>>> 191147 at pulpmaster | 0
>>> 131701 at pulpmaster |██ 5
>>> 134330 at pulpmaster |██ 7
>>> 134688 at pulpmaster |██ 5
>>> 182548 at pulpmaster | 0
>>> 134929 at pulpmaster | 0
>>> 135180 at pulpmaster | 0
>>> 135503 at pulpmaster | 0
>>> 182546 at pulpmaster | 0
>>> 131485 at pulpmaster |██ 7
>>> 131269 at pulpmaster |██ 7
>>> 32603 at pulpmaster | 0
>>> 191146 at pulpmaster | 0
>>> 131053 at pulpmaster |██ 7
>>> 134339 at pulpmaster |██ 7
>>> 134336 at pulpmaster |██ 7
>>> 191150 at pulpmaster | 0
>>> 182542 at pulpmaster | 0
>>> 182540 at pulpmaster | 0
>>> 32609 at pulpmaster | 0
>>> 191153 at pulpmaster | 0
>>> 131593 at pulpmaster |████████ 21
>>> 135051 at pulpmaster | 0
>>> 134696 at pulpmaster |██ 7
>>> 191149 at pulpmaster | 0
>>> 131377 at pulpmaster |██ 5
>>> 134694 at pulpmaster |██ 7
>>> 134690 at pulpmaster |██ 7
>>> 131161 at pulpmaster |██ 5
>>> 136342 at pulpmaster | 0
>>> 32626 at pulpmaster | 0
>>> 131810 at pulpmaster |██ 7
>>> 136462 at pulpmaster | 0
>>> 130836 at pulpmaster |██ 5
>>> resource-manager | 0
>>> 54 queues, 144 jobs total
>>>
>>> 191146 at pulpmaster (b'pulpp-ob-581' 191146): idle 191146 at pulpmaster
>>> 191147 at pulpmaster (b'pulpp-ob-581' 191147): idle 191147 at pulpmaster
>>> 191153 at pulpmaster (b'pulpp-ob-581' 191153): idle 191153 at pulpmaster
>>> resource-manager (b'pulpp-ob-581' 187238): idle resource-manager
>>> 191144 at pulpmaster (b'pulpp-ob-581' 191144): idle 191144 at pulpmaster
>>> 191151 at pulpmaster (b'pulpp-ob-581' 191151): idle 191151 at pulpmaster
>>> 191149 at pulpmaster (b'pulpp-ob-581' 191149): idle 191149 at pulpmaster
>>> 191145 at pulpmaster (b'pulpp-ob-581' 191145): idle 191145 at pulpmaster
>>> 191148 at pulpmaster (b'pulpp-ob-581' 191148): idle 191148 at pulpmaster
>>> 191150 at pulpmaster (b'pulpp-ob-581' 191150): idle 191150 at pulpmaster
>>> 191152 at pulpmaster (b'pulpp-ob-581' 191152): idle 191152 at pulpmaster
>>> 11 workers, 54 queues
>>>
>>> Updated: 2020-04-03 16:30:23.373244
>>>
>>> From: bmbouter at redhat.com At: 04/03/20 16:23:22
>>> To: Bin Li (BLOOMBERG/ 120 PARK ) <bli111 at bloomberg.net>
>>> Cc: pulp-list at redhat.com
>>> Subject: Re: [Pulp-list] Pulp 3 waiting tasks
>>>
>>> Since the task that is stalled has a "worker" unassigned it tells me it
>>> has not traveled through the resource-manager yet. All tests in Pulp3
>>> (currently) go through the resource-manager. I can see from your ps output
>>> there is 1 resource-manager running (which is good), and the status API
>>> agrees with that (also good).
>>>
>>> So what does RQ thing the situation is? Can you paste the output of `rq
>>> info` please?
>>>
>>> Also what version of RQ are do you have installed?
>>>
>>> Thanks,
>>> Brian
>>>
>>>
>>> On Fri, Apr 3, 2020 at 9:39 AM Bin Li (BLOOMBERG/ 120 PARK) <
>>> bli111 at bloomberg.net> wrote:
>>>
>>>> Here is the more info. Log is very big. I will send you shortly.
>>>>
>>>> # ./sget status
>>>> {
>>>> "database_connection": {
>>>> "connected": true
>>>> },
>>>> "online_content_apps": [
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:30.135954Z",
>>>> "name": "187254 at pulpp-ob-581"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:30.132849Z",
>>>> "name": "187257 at pulpp-ob-581"
>>>> }
>>>> ],
>>>> "online_workers": [
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:29.898377Z",
>>>> "name": "191147 at pulpp-ob-581.bloomberg.com",
>>>> "pulp_created": "2020-04-02T13:36:11.796937Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/268261b9-f46d-4d37-ab47-0b50ca382637/"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:19.087502Z",
>>>> "name": "191150 at pulpp-ob-581.bloomberg.com",
>>>> "pulp_created": "2020-04-02T13:36:11.807418Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/4fb4d87c-2c3c-4f64-b6f3-e05d9aaf6fc0/"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:29.498852Z",
>>>> "name": "191146 at pulpp-ob-581.bloomberg.com",
>>>> "pulp_created": "2020-04-02T13:36:11.810402Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/7b15b6bd-1437-47b8-9832-0b44b326e0fa/"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:29.798941Z",
>>>> "name": "191149 at pulpp-ob-581.bloomberg.com",
>>>> "pulp_created": "2020-04-02T13:36:11.817391Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/62523740-e109-4828-bcbb-e8459c0944c5/"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:29.598962Z",
>>>> "name": "191144 at pulpp-ob-581.bloomberg.com",
>>>> "pulp_created": "2020-04-02T13:36:11.818322Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/02e33d62-797d-4797-8fdc-b999efc8cd12/"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:16.685771Z",
>>>> "name": "191153 at pulpp-ob-581.bloomberg.com",
>>>> "pulp_created": "2020-04-02T13:36:11.831154Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/23e2a484-a877-4083-bcd4-38a0e89fcb49/"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:18.487964Z",
>>>> "name": "191145 at pulpp-ob-581.bloomberg.com",
>>>> "pulp_created": "2020-04-02T13:36:11.869871Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/9e63708f-bbc0-473d-8de1-8788a1c91f51/"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:29.898354Z",
>>>> "name": "191151 at pulpp-ob-581.bloomberg.com",
>>>> "pulp_created": "2020-04-02T13:36:11.880995Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/ddd49126-5531-471a-bea1-3aab07bcf8b4/"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:18.887949Z",
>>>> "name": "191148 at pulpp-ob-581.bloomberg.com",
>>>> "pulp_created": "2020-04-02T13:36:11.893280Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/2ef1e562-845f-4ae7-8007-9b7db8cf73a0/"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:29.798877Z",
>>>> "name": "191152 at pulpp-ob-581.bloomberg.com",
>>>> "pulp_created": "2020-04-02T13:36:11.917095Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/6e2cf918-af8e-4c5d-bc8f-bef3d3a83dca/"
>>>> },
>>>> {
>>>> "last_heartbeat": "2020-04-03T13:10:15.684710Z",
>>>> "name": "resource-manager",
>>>> "pulp_created": "2020-01-23T18:24:49.246717Z",
>>>> "pulp_href":
>>>> "/pulp/api/v3/workers/d46e4da0-9735-445b-a502-2aff7ce13ef7/"
>>>> }
>>>> ],
>>>> "redis_connection": {
>>>> "connected": true
>>>> },
>>>> "storage": {
>>>> "free": 32543019880448,
>>>> "total": 33521607376896,
>>>> "used": 978587496448
>>>> },
>>>> "versions": [
>>>> {
>>>> "component": "pulpcore",
>>>> "version": "3.2.1"
>>>> },
>>>> {
>>>> "component": "pulp_rpm",
>>>> "version": "3.2.0"
>>>> },
>>>> {
>>>> "component": "pulp_file",
>>>> "version": "0.2.0"
>>>> }
>>>> ]
>>>>
>>>>
>>>> # ps -awfux |grep pulp
>>>> root 180078 0.0 0.0 107992 616 pts/1 S+ Apr02 0:00 | \_ tail -f
>>>> /var/log/pulp/pulp-config.log
>>>> root 184836 0.0 0.0 124448 2044 pts/2 S+ Apr02 0:00 | \_ vi bbpulp3.py
>>>> root 43270 0.0 0.0 112708 984 pts/3 S+ 09:11 0:00 \_ grep --color=auto
>>>> pulp
>>>> pulp 187224 0.0 0.0 228600 19188 ? Ss Apr02 0:04
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3
>>>> /opt/utils/venv/pulp/3.7.3/bin/gunicorn pulpcore.app.wsgi:application
>>>> --bind 127.0.0.1:24817 --access-logfile -
>>>> pulp 187251 1.4 0.0 528708 109752 ? S Apr02 20:48 \_
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3
>>>> /opt/utils/venv/pulp/3.7.3/bin/gunicorn pulpcore.app.wsgi:application
>>>> --bind 127.0.0.1:24817 --access-logfile -
>>>> pulp 187231 0.0 0.0 269476 27976 ? Ss Apr02 0:05
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3
>>>> /opt/utils/venv/pulp/3.7.3/bin/gunicorn pulpcore.content:server --bind
>>>> 127.0.0.1:24816 --worker-class aiohttp.GunicornWebWorker -w 2
>>>> --access-logfile -
>>>> pulp 187254 0.0 0.0 485860 68592 ? S Apr02 0:18 \_
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3
>>>> /opt/utils/venv/pulp/3.7.3/bin/gunicorn pulpcore.content:server --bind
>>>> 127.0.0.1:24816 --worker-class aiohttp.GunicornWebWorker -w 2
>>>> --access-logfile -
>>>> pulp 187257 0.0 0.0 486132 68604 ? S Apr02 0:19 \_
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3
>>>> /opt/utils/venv/pulp/3.7.3/bin/gunicorn pulpcore.content:server --bind
>>>> 127.0.0.1:24816 --worker-class aiohttp.GunicornWebWorker -w 2
>>>> --access-logfile -
>>>> pulp 187238 0.0 0.0 486428 71128 ? Ss Apr02 1:20
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker -n resource-manager
>>>> --pid=/var/run/pulpcore-resource-manager/resource-manager.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>> pulp 191144 0.0 0.0 486392 71064 ? Ss Apr02 0:51
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker
>>>> --pid=/var/run/pulpcore-worker-1/reserved-resource-worker-1.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>> pulp 191145 0.0 0.0 486404 71064 ? Ss Apr02 0:50
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker
>>>> --pid=/var/run/pulpcore-worker-2/reserved-resource-worker-2.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>> pulp 191146 0.0 0.0 486404 71044 ? Ss Apr02 0:50
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker
>>>> --pid=/var/run/pulpcore-worker-3/reserved-resource-worker-3.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>> pulp 191147 0.0 0.0 486404 71036 ? Ss Apr02 0:52
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker
>>>> --pid=/var/run/pulpcore-worker-4/reserved-resource-worker-4.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>> pulp 191148 0.0 0.0 486164 71056 ? Ss Apr02 0:51
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker
>>>> --pid=/var/run/pulpcore-worker-5/reserved-resource-worker-5.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>> pulp 191149 0.0 0.0 486168 71060 ? Ss Apr02 0:52
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker
>>>> --pid=/var/run/pulpcore-worker-6/reserved-resource-worker-6.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>> pulp 191150 0.0 0.0 486148 71040 ? Ss Apr02 0:50
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker
>>>> --pid=/var/run/pulpcore-worker-7/reserved-resource-worker-7.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>> pulp 191151 0.0 0.0 486400 71060 ? Ss Apr02 0:51
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker
>>>> --pid=/var/run/pulpcore-worker-8/reserved-resource-worker-8.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>> pulp 191152 0.0 0.0 486164 71044 ? Ss Apr02 0:52
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker
>>>> --pid=/var/run/pulpcore-worker-9/reserved-resource-worker-9.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>> pulp 191153 0.0 0.0 486392 71068 ? Ss Apr02 0:52
>>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq
>>>> worker -w pulpcore.tasking.worker.PulpWorker
>>>> --pid=/var/run/pulpcore-worker-10/reserved-resource-worker-10.pid -c
>>>> pulpcore.rqconfig --disable-job-desc-logging
>>>>
>>>> From: bmbouter at redhat.com At: 04/03/20 09:05:47
>>>> To: Bin Li (BLOOMBERG/ 120 PARK ) <bli111 at bloomberg.net>
>>>> Cc: pulp-list at redhat.com
>>>> Subject: Re: [Pulp-list] Pulp 3 waiting tasks
>>>>
>>>> While you are experiencing the issue, can you capture the status API
>>>> output?
>>>>
>>>> Also can you paste an output of the workers on that system with `ps
>>>> -awfux | grep pulp`.
>>>>
>>>> Also do you see any errors in the log? Could you share a copy of the
>>>> log?
>>>>
>>>> On Fri, Apr 3, 2020 at 9:01 AM Bin Li (BLOOMBERG/ 120 PARK) <
>>>> bli111 at bloomberg.net> wrote:
>>>>
>>>>> We have been seeing many waiting tasks. They seem to be stuck forever.
>>>>> e.g.
>>>>> pulpp-ob-581 /home/bli4/pulp3-script # ./get
>>>>> /pulp/api/v3/tasks/14b76b27-9f34-4297-88ed-5ec13cbe5e50/
>>>>> HTTP/1.1 200 OK
>>>>> Allow: GET, PATCH, DELETE, HEAD, OPTIONS
>>>>> Connection: keep-alive
>>>>> Content-Length: 323
>>>>> Content-Type: application/json
>>>>> Date: Fri, 03 Apr 2020 12:56:02 GMT
>>>>> Server: nginx/1.16.1
>>>>> Vary: Accept, Cookie
>>>>> X-Frame-Options: SAMEORIGIN
>>>>>
>>>>> {
>>>>> "created_resources": [],
>>>>> "error": null,
>>>>> "finished_at": null,
>>>>> "name": "pulpcore.app.tasks.base.general_update",
>>>>> "progress_reports": [],
>>>>> "pulp_created": "2020-04-02T13:00:14.881212Z",
>>>>> "pulp_href":
>>>>> "/pulp/api/v3/tasks/14b76b27-9f34-4297-88ed-5ec13cbe5e50/",
>>>>> "reserved_resources_record": [],
>>>>> "started_at": null,
>>>>> "state": "waiting",
>>>>> "worker": null
>>>>> }
>>>>>
>>>>> What could be the reason for these stuck waiting tasks? How should we
>>>>> troubleshot the issue?
>>>>> _______________________________________________
>>>>> Pulp-list mailing list
>>>>> Pulp-list at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-list
>>>>
>>>>
>>>>
>>>
>> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20200406/a78d4981/attachment.htm>


More information about the Pulp-list mailing list