[Pulp-list] Tasks stuck in waiting state
Brian Bouterse
bbouters at redhat.com
Thu Sep 22 20:28:55 UTC 2016
When using Pulp with Qpid (default broker) there is a hard-to-reproduce
deadlocking bug [0]. The bug is in Qpid not Pulp, but we are very
interested in seeing it resolved.
In terms of clearing out your task operations, this will happen
naturally if all the pulp workers are killed and restarted. If it's
really bad you could consider running `sudo kill -9 -f celery` which
kills all pulp workers.
You could also issue cancel for all outstanding tasks with pulp-admin
and then kill+restart at which point your system will be empty when
processes finish starting. Note that deadlocked workers usually need to
be killed with SIGKILL before being restarted.
Many users never experience this problem. A few users do experience it
and usually they experience it again. Several devs have tried to
reproduce this but we have not been able to.
The Qpid project is aware and investigating. I believe they have some
rpms that provide a new version of python-qpid which is specifically
patched for this issue. I'm waiting for them to produce rpms for the
different distros so that affected users can evaluate if it resolves
their issue.
One other option to be aware of is that Pulp does support rabbitMQ and
has not experienced this deadlocking issue. See the docs and server.conf
for more info. FYI Pulp currently only tests the releases against Qpid.
[0]: https://issues.apache.org/jira/browse/QPID-7317
-Brian
On 09/21/2016 09:00 PM, Erinn Looney-Triggs wrote:
> I have 52 tasks that are stuck in a waiting state with nothing in a
> running state. I don't know much about pulp at this point, I am just
> fighting my way through satellite in an attempt to make it stable, but
> this looks a bit odd to me:
>
> pulp-admin -u admin -p tasks list | grep -i waiting | wc -l
> 52
>
> pulp-admin -u admin -p tasks list --state running
> +----------------------------------------------------------------------+
> Tasks
> +----------------------------------------------------------------------+
>
> No tasks found
>
> The tasks, with the exception of one are all unit_update operations, the
> remaining one is a sync operation.
>
> I have done many restarts of the pulp processes with no luck in clearing
> these out, I can kill them off of course, but I would prefer to know
> what is going on here. Also chances are very good this will happen again.
>
> Thanks,
> -Erinn
>
> The technical details:
> RHEL 7.2
>
> rpm -qa | grep pulp
> pulp-katello-1.0.1-1.el7sat.noarch
> rubygem-smart_proxy_pulp-1.2.2-1.el7sat.noarch
> python-pulp-repoauth-2.8.3.4-1.el7sat.noarch
> python-pulp-client-lib-2.8.3.4-1.el7sat.noarch
> pulp-docker-plugins-2.0.1.1-1.el7sat.noarch
> pulp-selinux-2.8.3.4-1.el7sat.noarch
> pulp-server-2.8.3.4-1.el7sat.noarch
> pulp-client-1.0-1.noarch
> python-pulp-common-2.8.3.4-1.el7sat.noarch
> pulp-rpm-admin-extensions-2.8.3.5-1.el7sat.noarch
> python-pulp-docker-common-2.0.1.1-1.el7sat.noarch
> pulp-ostree-plugins-1.1.1-2.el7sat.noarch
> pulp-puppet-plugins-2.8.3.3-1.el7sat.noarch
> python-pulp-bindings-2.8.3.4-1.el7sat.noarch
> python-isodate-0.5.0-4.pulp.el7sat.noarch
> python-pulp-streamer-2.8.3.4-1.el7sat.noarch
> python-pulp-oid_validation-2.8.3.4-1.el7sat.noarch
> python-pulp-agent-lib-2.8.3.4-1.el7sat.noarch
> python-pulp-ostree-common-1.1.1-2.el7sat.noarch
> pulp-rpm-handlers-2.8.3.5-1.el7sat.noarch
> pulp-admin-client-2.8.3.4-1.el7sat.noarch
> pulp-rpm-plugins-2.8.3.5-1.el7sat.noarch
> python-pulp-rpm-common-2.8.3.5-1.el7sat.noarch
> python-pulp-puppet-common-2.8.3.3-1.el7sat.noarch
> pulp-puppet-tools-2.8.3.3-1.el7sat.noarch
>
> ps -awfux | grep celery
> root 65959 0.0 0.0 112648 972 pts/0 S+ 18:57 0:00 |
> \_ grep --color=auto celery
> apache 52282 0.1 0.0 685240 63396 ? Ssl 18:46 0:00
> /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n
> resource_manager@%h -Q resource_manager -c 1 --events --umask 18
> --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
> apache 52406 0.0 0.0 595524 53092 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n
> resource_manager@%h -Q resource_manager -c 1 --events --umask 18
> --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
> apache 52424 0.1 0.0 685240 63272 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-0.pid
> --heartbeat-interval=30
> apache 52692 0.0 0.0 610828 56536 ? Sl 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-0.pid
> --heartbeat-interval=30
> apache 52426 0.1 0.0 684664 63404 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-1.pid
> --heartbeat-interval=30
> apache 52714 0.0 0.0 595524 53052 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-1.pid
> --heartbeat-interval=30
> apache 52428 0.1 0.0 684668 63428 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-2.pid
> --heartbeat-interval=30
> apache 52715 0.0 0.0 595528 53056 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-2.pid
> --heartbeat-interval=30
> apache 52430 0.1 0.0 684660 63236 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-3.pid
> --heartbeat-interval=30
> apache 52745 0.0 0.0 595520 53072 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-3.pid
> --heartbeat-interval=30
> apache 52432 0.1 0.0 684664 63224 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-4.pid
> --heartbeat-interval=30
> apache 52749 0.0 0.0 595520 53096 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-4.pid
> --heartbeat-interval=30
> apache 52434 0.1 0.0 684668 63388 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-5.pid
> --heartbeat-interval=30
> apache 52750 0.0 0.0 595528 53056 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-5.pid
> --heartbeat-interval=30
> apache 52436 0.1 0.0 684660 65480 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-6.pid
> --heartbeat-interval=30
> apache 52724 0.0 0.0 595524 55092 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-6.pid
> --heartbeat-interval=30
> apache 52440 0.1 0.0 684664 63364 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-7.pid
> --heartbeat-interval=30
> apache 52720 0.0 0.0 595524 53088 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-7.pid
> --heartbeat-interval=30
> apache 52444 0.1 0.0 684664 63248 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-8@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-8.pid
> --heartbeat-interval=30
> apache 52747 0.0 0.0 595524 53080 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-8@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-8.pid
> --heartbeat-interval=30
> apache 52453 0.1 0.0 684684 65432 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-9@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-9.pid
> --heartbeat-interval=30
> apache 52752 0.0 0.0 595516 53060 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-9@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-9.pid
> --heartbeat-interval=30
> apache 52459 0.1 0.0 684696 63416 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-10@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-10.pid
> --heartbeat-interval=30
> apache 52725 0.0 0.0 595524 53056 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-10@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-10.pid
> --heartbeat-interval=30
> apache 52468 0.1 0.0 684696 63400 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-11@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-11.pid
> --heartbeat-interval=30
> apache 52716 0.0 0.0 595524 53044 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-11@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-11.pid
> --heartbeat-interval=30
> apache 52472 0.1 0.0 684688 63416 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-12@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-12.pid
> --heartbeat-interval=30
> apache 52729 0.0 0.0 595516 53044 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-12@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-12.pid
> --heartbeat-interval=30
> apache 52479 0.1 0.0 684692 63424 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-13@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-13.pid
> --heartbeat-interval=30
> apache 52722 0.0 0.0 595520 53084 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-13@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-13.pid
> --heartbeat-interval=30
> apache 52486 0.1 0.0 685272 63432 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-14@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-14.pid
> --heartbeat-interval=30
> apache 52708 0.0 0.0 669764 54612 ? Sl 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-14@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-14.pid
> --heartbeat-interval=30
> apache 52491 0.1 0.0 684652 63360 ? Ssl 18:46 0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-15@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-15.pid
> --heartbeat-interval=30
> apache 52731 0.0 0.0 595516 53040 ? S 18:46 0:00 \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-15@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-15.pid
> --heartbeat-interval=30
> apache 52570 0.7 0.0 690292 44016 ? Ssl 18:46 0:05
> /usr/bin/python /usr/bin/celery beat
> --app=pulp.server.async.celery_instance.celery
> --scheduler=pulp.server.async.scheduler.Scheduler
>
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
>
More information about the Pulp-list
mailing list