[Pulp-list] repo sync runs hanging
Pier, Bryce
Bryce.Pier at Capella.edu
Fri Jan 30 23:06:23 UTC 2015
I’ve been having a lot of trouble with my pulp server lately related to rpm repo syncs hanging/stalling. I thought the issue might have been related to the 2.6 beta build I was running because it fixed my bug (1176698) but it doesn’t appear be just that version.
I built a new pulp server this week on version 2.5.3-0.2.rc. This RHEL6.6 VM has 8 vcpus, 8 GB of RAM and 400Gb of SAN LUNs attached to it. Both /var/lib/pulp and /var/lib/mongodb are symlinked to the SAN LUN for performance.
Initially this new server was working great. I created and sync several rpm repos without any issues but today the hangs/stalls of the syncs started again. I’m beginning to wonder if something about the 2.5+ architecture isn’t handling the nearly 100,000 rpms that have been pulled into it.
When the stall happens it is always on the downloading of RPMs from the feed but nothing is logged and no errors are thrown. I’ve let the process sit and run overnight and it never resumes. After canceling the sync task, I have to stop all of the pulp-processes and one of the workers never stops:
# for s in {goferd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do service $s stop; done
goferd: unrecognized service
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
Stopping pulp_celerybeat... OK
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> resource_manager at dvpuap01.capella.edu: QUIT -> 2387
> Waiting for 1 node -> 2387.....
> resource_manager at dvpuap01.capella.edu: OK
celery init v10.0.
Using config script: /etc/default/pulp_workers
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> reserved_resource_worker-5 at dvpuap01.capella.edu: QUIT -> 2664
> reserved_resource_worker-2 at dvpuap01.capella.edu: QUIT -> 2570
> reserved_resource_worker-4 at dvpuap01.capella.edu: QUIT -> 2633
> reserved_resource_worker-1 at dvpuap01.capella.edu: QUIT -> 2540
> reserved_resource_worker-7 at dvpuap01.capella.edu: QUIT -> 2723
> reserved_resource_worker-3 at dvpuap01.capella.edu: QUIT -> 2602
> reserved_resource_worker-6 at dvpuap01.capella.edu: QUIT -> 2692
> reserved_resource_worker-0 at dvpuap01.capella.edu: QUIT -> 2513
> Waiting for 8 nodes -> 2664, 2570, 2633, 2540, 2723, 2602, 2692, 2513............
> reserved_resource_worker-5 at dvpuap01.capella.edu: OK
> Waiting for 7 nodes -> 2570, 2633, 2540, 2723, 2602, 2692, 2513....
> reserved_resource_worker-2 at dvpuap01.capella.edu: OK
> Waiting for 6 nodes -> 2633, 2540, 2723, 2602, 2692, 2513....
> reserved_resource_worker-4 at dvpuap01.capella.edu: OK
> Waiting for 5 nodes -> 2540, 2723, 2602, 2692, 2513....
> reserved_resource_worker-1 at dvpuap01.capella.edu: OK
> Waiting for 4 nodes -> 2723, 2602, 2692, 2513....
> reserved_resource_worker-7 at dvpuap01.capella.edu: OK
> Waiting for 3 nodes -> 2602, 2692, 2513....
> reserved_resource_worker-3 at dvpuap01.capella.edu: OK
> Waiting for 2 nodes -> 2692, 2513.....
> reserved_resource_worker-0 at dvpuap01.capella.edu: OK
> Waiting for 1 node -> 2692.................................................................................................................................................................................................................................................................................................................................................................................................................................................................^C
Session terminated, killing shell... ...killed.
If I run the for loop again, everything appears to clean up but there is always a single process that I have to manually kill:
apache 2763 1 2 15:34 ? 00:02:09 /usr/bin/python -m celery.__main__ worker -c 1 -n reserved_resource_worker-6 at dvpuap01.capella.edu --events --app=pulp.server.async.app --loglevel=INFO --logfile=/var/log/pulp/reserved_resource_worker-6.log --pidfile=/var/run/pulp/reserved_resource_worker-6.pid
After killing this final process, I usually stop mongodb, start everything back up and try the sync again. I’ve also tried rebooting the VM but it doesn’t seem to be more effective than just stopping and starting the services.
Below are the repos I’ve successfully sync’ed so far on the new server. (Notice the rhel-6-optional one has 7368 rpm units but hasn’t successfully finished downloading yet even though I’ve killed it and restarted it 3 time this afternoon.)
# pulp-admin rpm repo list
+----------------------------------------------------------------------+
RPM Repositories
+----------------------------------------------------------------------+
Id: ol5_x86_64_latest
Display Name: ol5_x86_64_latest
Description: None
Content Unit Counts:
Erratum: 1116
Package Category: 9
Package Group: 103
Rpm: 6761
Srpm: 2292
Id: ol6_x86_64_latest
Display Name: ol6_x86_64_latest
Description: None
Content Unit Counts:
Erratum: 1659
Package Category: 14
Package Group: 207
Rpm: 13215
Srpm: 3812
Id: epel6
Display Name: epel6
Description: None
Content Unit Counts:
Erratum: 3668
Package Category: 3
Package Group: 208
Rpm: 11135
Yum Repo Metadata File: 1
Id: epel5
Display Name: epel5
Description: None
Content Unit Counts:
Erratum: 1953
Package Category: 5
Package Group: 36
Rpm: 6678
Yum Repo Metadata File: 1
Id: epel7
Display Name: epel7
Description: None
Content Unit Counts:
Erratum: 1252
Package Category: 4
Package Environment: 1
Package Group: 209
Rpm: 7161
Yum Repo Metadata File: 1
Id: rhel-6-os
Display Name: rhel-6-os
Description: None
Content Unit Counts:
Erratum: 2842
Package Category: 10
Package Group: 202
Rpm: 14574
Yum Repo Metadata File: 1
Id: rhel-5-os
Display Name: rhel-5-os
Description: None
Content Unit Counts:
Erratum: 3040
Package Category: 6
Package Group: 99
Rpm: 16668
Yum Repo Metadata File: 1
Id: rhel-6-optional
Display Name: rhel-6-optional
Description: None
Content Unit Counts:
Rpm: 7368
Thanks,
- Bryce
More information about the Pulp-list
mailing list