[Pulp-list] Need help/advice with import tasks intermittently causing a time-out condition
Brian Bouterse
bbouters at redhat.com
Wed Dec 6 19:42:16 UTC 2017
That actually sounds normal if work is being dispatched slowly into Pulp.
If you expect two workers, and the /status/ API shows two workers, then it
should be healthy. I wrote some on the youtube question about this also:
https://www.youtube.com/watch?v=PpinNWOpksA&lc=UgyHs_RFkeLbU6L9HeR4AaABAg.8_qLVyV5tza8_qMzDLvKrK
On Wed, Dec 6, 2017 at 2:31 PM, Deej Howard <Deej.Howard at neulion.com> wrote:
> I used the qpid-stat -q utilility on my installation, and
> I saw something that confused me. I would have expected the
> resource_manager queue to have more message traffic as compared to my
> workers, but this is not the case, and in fact one of my two workers seems
> to have no message traffic at all. I suspect this indicates some sort of
> misconfiguration somewhere, does that sound correct?
>
>
>
> [root at 7d53bac13e28 /]# qpid-stat -q
>
> Queues
>
> queue dur autoDel excl
> msg msgIn msgOut bytes bytesIn bytesOut cons bind
>
> ============================================================
> =====================================================================
>
> …extra output omitted for brevity…
>
> celery
> Y 0 206 206 0 171k 171k 2
> 2
>
> celeryev.911e1280-9618-40bb-a54f-813db11d4d3e
> Y 0 96.9k 96.9k 0 78.3m 78.3m 1 2
>
> pulp.task
> Y 0 0 0 0 0 0 3
> 1
>
> reserved_resource_worker-1 at worker1.celery.pidbox
> Y 0 0 0 0 0 0 1 2
>
> reserved_resource_worker-1 at worker1.dq Y
> Y 0 0 0 0 0 0 1 2
>
> reserved_resource_worker-2 at worker2.celery.pidbox
> Y 0 0 0 0 0 0 1 2
>
> reserved_resource_worker-2 at worker2.dq Y
> Y 0 1.07k 1.07k 0 1.21m 1.21m 1 2
>
> resource_manager Y
> 0 533 533 0 820k 820k 1 2
>
> resource_manager at resource_manager.celery.pidbox
> Y 0 0 0 0 0 0 1 2
>
> resource_manager at resource_manager.dq Y
> Y 0 0 0 0 0 0 1 2
>
>
>
> The pulp-admin status output definitely shows both workers and the
> resource_manager as being “discovered”, so what gives?
>
>
>
> *From:* Deej Howard [mailto:Deej.Howard at neulion.com]
> *Sent:* Tuesday, December 05, 2017 6:42 PM
> *To:* 'Dennis Kliban' <dkliban at redhat.com>
> *Cc:* 'pulp-list' <pulp-list at redhat.com>
> *Subject:* RE: [Pulp-list] Need help/advice with import tasks
> intermittently causing a time-out condition
>
>
>
> That video was very useful, Dennis – thanx for passing it
> on!
>
>
>
> It sounds like the solution to the problem I’m seeing lies
> with the client-side operations, based on the repo reservation methodology
> that is in place. It would really be useful if there were some sort of API
> call that could be made so the client code could decide if the operation
> were just hung due to network issues (and abort or otherwise handle that
> state), or if there is an active repo reservation in place that is waiting
> to clear before the operation can proceed. I can also appreciate that this
> has at least the potential of changing dynamically from the viewpoint of a
> client’s operations (because the repo reservation can be put on/taken off
> for other tasks that are already in the queue), and it would be good for
> the client to be able to determine that its task is progressing (or not) as
> far as getting assigned/executed. Sounds like I need to dig deeper into
> what I can accomplish with API (or REST) to get a better idea of the exact
> status of the import operation and basing decisions more on that status
> rather than just “30 attempts every 2 seconds”.
>
> If nothing else, I now have a better understanding and
> some additional troubleshooting tools to track down exactly what is (and is
> not) going on!
>
>
>
> *From:* Dennis Kliban [mailto:dkliban at redhat.com <dkliban at redhat.com>]
> *Sent:* Tuesday, December 05, 2017 1:07 PM
> *To:* Deej Howard <Deej.Howard at neulion.com>
> *Cc:* pulp-list <pulp-list at redhat.com>
> *Subject:* Re: [Pulp-list] Need help/advice with import tasks
> intermittently causing a time-out condition
>
>
>
> The tasking system in Pulp locks a repository during an import of a
> content unit. If clients are uploading content to the same repository, the
> import operation has to wait for any previous imports to the same repo to
> complete. It's possible that you are not waiting long enough. Unfortunately
> this portion of Pulp is not well documented, however, there is a 40 minute
> video[0] on YouTube that provides insight into how the tasking system works
> and how to troubleshoot it.
>
> [0] https://youtu.be/PpinNWOpksA
>
>
>
> On Tue, Dec 5, 2017 at 12:43 PM, Deej Howard <Deej.Howard at neulion.com>
> wrote:
>
> Hi, I’m hoping someone can help me solve a strange problem
> I’m having with my Pulp installation, or at least give me a good idea where
> I should look further to get it solved. The most irritating aspect of the
> problem is that it doesn’t reliably reproduce.
>
> The failure condition is realized when a client is adding
> a new artifact. In all cases, the client is able to successfully “upload”
> the artifact to Pulp (successful according to the response from the Pulp
> server). The problem comes in at the next step where the client directs
> Pulp to “import” the uploaded artifact, and then awaits a successful task
> result before proceeding. This is set up within a loop; up to 30 queries
> for a successful response to the import task are made, with a 2-second
> interval between queries. If the import doesn’t succeed within those
> constraints, the operation is treated as having timed-out, and further
> actions with that artifact (specifically, a publish operation) are
> abandoned. Many times that algorithm works with no problem at all, but far
> too often, that successful response is not received within the 30
> iterations. It surprises me that there would be a failure at this point,
> actually – I wouldn’t expect an “import” operation to be very complicated
> or take a lot of time (but I’m certainly not intimate with the details of
> Pulp implementation either). Is it just a case that my expectations of the
> “import” operation are unreasonable, and I should relax the loop parameters
> to allow more attempts/more time between attempts for this to succeed? As
> I’ve mentioned, this doesn’t always fail, I’d even go so far as to claim
> that it succeeds “most of the time”, but I need more consistency than that
> for this to be deemed production-worthy.
>
> I’ve tried monitoring operations using pulp-admin to make
> sure that tasks are being managed properly (they seem to be, but I’m not
> yet any sort of Pulp expert), and I’ve also monitored the Apache mod_status
> output to see if there is anything obvious (there’s not, but I’m no Apache
> expert either). I’ve also found nothing obvious in any Pulp log output.
> I’d be deeply grateful if anyone can offer any sort of wisdom, help or
> advice on this issue, I’m at the point where I’m not sure where to look
> next to get this resolved. I’d seriously hate to have to abandon Pulp
> because I can’t get it to perform consistently and reliably (not only
> because of the amount of work this would represent, but because I like
> working with Pulp and appreciate what it has to offer).
>
>
>
> I have managed to put together a test case that seems to
> reliably demonstrate the problem – sort of. This test case uses 16 clients
> running in parallel, each of which has from 1-10 artifacts to upload (most
> clients have only 5). When I say that it “sort of” demonstrates the
> problem, the most recent run failed on 5 of those clients (all with the
> condition mentioned above), while the previous run failed on 8, and the one
> before that on 9, with no consistency of which client will fail to upload
> which artifact.
>
>
>
> Other observations:
>
> - Failure conditions don’t seem to have anything to do with the
> client’s platform, geographical location, or be attached to a specific
> client.
> - One failure on a client doesn’t imply the next attempt from that
> same client will also fail, in fact, more often than not it doesn’t.
> - Failure conditions don’t seem to have anything to do with the
> artifact being uploaded.
> - There is no consistency around which artifact fails to upload (it’s
> not always the first artifact from a client, or the third, etc.)
>
>
>
> Environment Details
>
> - Pulp 2.14.3 using Docker containers based on Centos 7: one
> Apache/Pulp API container, one Qpid message broker container, one Mongo DB
> container, one Celery worker management container, one resource
> manager/task assignment container, and two Pulp worker containers. All
> containers are running within a single Docker host, dedicated to only
> Pulp-related operations. The diagram at http://docs.pulpproject.org/
> en/2.14/user-guide/scaling.html
> <http://docs.pulpproject.org/en/2.14/user-guide/scaling.html> was used
> as a guide for this setup.
> - Ubuntu/Mac/Windows-based clients are using a Java application plugin
> to do artifact uploads. Clients are dispersed across multiple geographical
> sites, including the same site where the Pulp server resides.
> - Artifacts are company-proprietary (configured as a Pulp plugin), but
> essentially are a single ZIP file with attached metadata for tracking and
> management purposes.
>
>
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
>
>
>
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20171206/44d154ba/attachment.htm>
More information about the Pulp-list
mailing list