[Pulp-list] Pulp publish task timeout?
Brian Bouterse
bbouters at redhat.com
Mon Dec 12 18:13:39 UTC 2016
I think the celery worker is experiencing a segfault or maybe it's being
killed by the OOM. If the OOM is killing it there would be log evidence. If
it's a segfault, with Python a segfault is unlikely, so this is probably a
segfault while calling to the system using subprocess which Pulp does in
various places. I haven't looked in the publish code of platform and rpm to
look for subprocess usage but that would probably hint at the problem. To
really debug something like that you would want capture a coredump. I think
celery has the ability to capture coredumps, but I've never done it.
The pulp-smash tests for publish showed they were working. Is it possible
that this could be an environment issue? Is it possible to reproduce the
issue on separate hardware to rule that out. If it is reproducable, I
recommend opening a bug [0].
[0]: https://pulp.plan.io/projects/pulp/issues/new
-Brian
On Mon, Dec 12, 2016 at 11:49 AM, David Gersting <dgersting at systems.wvu.edu>
wrote:
> Hello everyone,
>
> I've been banging my head against the desk for a while on this one, and
> could use the group's help.
>
> I have a rather large repo (OEL 6's base repo with 36,684 RPMs) that I'm
> trying to mirror locally to speed up our os patching, and every time I
> try to publish the repo the task fails just after the "Publishing Delta
> RPMs" step starts. After some digging it seems to me that the worker is
> timing out. Has anyone else seen this and/or know how I can fix it or
> increase the timeout for this task?
>
> I've attached the full shell output for anyone who wants it, but the
> error message I'm seeing from the worker is:
> # journalctl --unit=pulp_worker-5
> *SNIP*
> Dec 12 10:48:19 *HOSTNAME* pulp[1403]: celery.worker.job:ERROR:
> (1403-27776) Task
> pulp.server.managers.repo.publish.publish[e3d25854-757c-
> 40af-8979-d0b7287263ed]
> raised unexpected: WorkerLostError('Worker exited prematurely: signal 9
> (SIGKILL).',)
> Dec 12 10:48:19 *HOSTNAME* pulp[1403]: celery.worker.job:ERROR:
> (1403-27776) Traceback (most recent call last):
> Dec 12 10:48:19 *HOSTNAME* pulp[1403]: celery.worker.job:ERROR:
> (1403-27776) File
> "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1171, in
> mark_as_worker_lost
> Dec 12 10:48:19 *HOSTNAME* pulp[1403]: celery.worker.job:ERROR:
> (1403-27776) human_status(exitcode)),
> Dec 12 10:48:19 *HOSTNAME* pulp[1403]: celery.worker.job:ERROR:
> (1403-27776) WorkerLostError: Worker exited prematurely: signal 9
> (SIGKILL).
> Dec 12 10:48:21 *HOSTNAME* pulp[49191]: py.warnings:WARNING:
> (49191-27776) /usr/lib64/python2.7/site-packages/pymongo/topology.py:74:
> UserWarning: MongoClient opened before fork. Create MongoClient with
> connect=False, or create client after forking. Se
> Dec 12 10:48:21 *HOSTNAME* pulp[49191]: py.warnings:WARNING:
> (49191-27776) "MongoClient opened before fork. Create MongoClient "
> Dec 12 10:48:21 *HOSTNAME* pulp[49191]: py.warnings:WARNING:
> (49191-27776)
> Dec 12 10:48:22 *HOSTNAME* pulp[49191]:
> pulp.server.async.tasks:INFO: Task failed :
> [e3d25854-757c-40af-8979-d0b7287263ed]
>
>
>
> Any help would be much appreciated!
>
> --
> David Gersting
> Linux Systems Administrator
> WVU Information Technology Services
>
>
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20161212/c23a5583/attachment.htm>
More information about the Pulp-list
mailing list