[Pulp-list] Need help/advice with import tasks intermittently causing a time-out condition

Deej Howard Deej.Howard at neulion.com
Wed Dec 6 01:42:20 UTC 2017


                That video was very useful, Dennis – thanx for passing it
on!



                It sounds like the solution to the problem I’m seeing lies
with the client-side operations, based on the repo reservation methodology
that is in place.  It would really be useful if there were some sort of API
call that could be made so the client code could decide if the operation
were just hung due to network issues (and abort or otherwise handle that
state), or if there is an active repo reservation in place that is waiting
to clear before the operation can proceed.  I can also appreciate that this
has at least the potential of changing dynamically from the viewpoint of a
client’s operations (because the repo reservation can be put on/taken off
for other tasks that are already in the queue), and it would be good for
the client to be able to determine that its task is progressing (or not) as
far as getting assigned/executed.  Sounds like I need to dig deeper into
what I can accomplish with API (or REST) to get a better idea of the exact
status of the import operation and basing decisions more on that status
rather than just “30 attempts every 2 seconds”.

                If nothing else, I now have a better understanding and some
additional troubleshooting tools to track down exactly what is (and is not)
going on!



*From:* Dennis Kliban [mailto:dkliban at redhat.com]
*Sent:* Tuesday, December 05, 2017 1:07 PM
*To:* Deej Howard <Deej.Howard at neulion.com>
*Cc:* pulp-list <pulp-list at redhat.com>
*Subject:* Re: [Pulp-list] Need help/advice with import tasks
intermittently causing a time-out condition



The tasking system in Pulp locks a repository during an import of a content
unit. If clients are uploading content to the same repository, the import
operation has to wait for any previous imports to the same repo to
complete. It's possible that you are not waiting long enough. Unfortunately
this portion of Pulp is not well documented, however, there is a 40 minute
video[0] on YouTube that provides insight into how the tasking system works
and how to troubleshoot it.

[0] https://youtu.be/PpinNWOpksA



On Tue, Dec 5, 2017 at 12:43 PM, Deej Howard <Deej.Howard at neulion.com>
wrote:

                Hi, I’m hoping someone can help me solve a strange problem
I’m having with my Pulp installation, or at least give me a good idea where
I should look further to get it solved.  The most irritating aspect of the
problem is that it doesn’t reliably reproduce.

                The failure condition is realized when a client is adding a
new artifact.  In all cases, the client is able to successfully “upload”
the artifact to Pulp (successful according to the response from the Pulp
server).  The problem comes in at the next step where the client directs
Pulp to “import” the uploaded artifact, and then awaits a successful task
result before proceeding.  This is set up within a loop;  up to 30 queries
for a successful response to the import task are made, with a 2-second
interval between queries.  If the import doesn’t succeed within those
constraints, the operation is treated as having timed-out, and further
actions with that artifact (specifically, a publish operation) are
abandoned. Many times that algorithm works with no problem at all, but far
too often, that successful response is not received within the 30
iterations.  It surprises me that there would be a failure at this point,
actually – I wouldn’t expect an “import” operation to be very complicated
or take a lot of time (but I’m certainly not intimate with the details of
Pulp implementation either).  Is it just a case that my expectations of the
“import” operation are unreasonable, and I should relax the loop parameters
to allow more attempts/more time between attempts for this to succeed?  As
I’ve mentioned, this doesn’t always fail, I’d even go so far as to claim
that it succeeds “most of the time”, but I need more consistency than that
for this to be deemed production-worthy.

                I’ve tried monitoring operations using pulp-admin to make
sure that tasks are being managed properly (they seem to be, but I’m not
yet any sort of Pulp expert), and I’ve also monitored the Apache mod_status
output to see if there is anything obvious (there’s not, but I’m no Apache
expert either).  I’ve also found nothing obvious in any Pulp log output.
I’d be deeply grateful if anyone can offer any sort of wisdom, help or
advice on this issue, I’m at the point where I’m not sure where to look
next to get this resolved.  I’d seriously hate to have to abandon Pulp
because I can’t get it to perform consistently and reliably (not only
because of the amount of work this would represent, but because I like
working with Pulp and appreciate what it has to offer).



                I have managed to put together a test case that seems to
reliably demonstrate the problem – sort of.  This test case uses 16 clients
running in parallel, each of which has from 1-10 artifacts to upload (most
clients have only 5).  When I say that it “sort of” demonstrates the
problem, the most recent run failed on 5 of those clients (all with the
condition mentioned above), while the previous run failed on 8, and the one
before that on 9, with no consistency of which client will fail to upload
which artifact.



Other observations:

   - Failure conditions don’t seem to have anything to do with the client’s
   platform, geographical location, or be attached to a specific client.
   - One failure on a client doesn’t imply the next attempt from that same
   client will also fail, in fact, more often than not it doesn’t.
   - Failure conditions don’t seem to have anything to do with the artifact
   being uploaded.
   - There is no consistency around which artifact fails to upload (it’s
   not always the first artifact from a client, or the third, etc.)



Environment Details

   - Pulp 2.14.3 using Docker containers based on Centos 7: one Apache/Pulp
   API container, one Qpid message broker container, one Mongo DB container,
   one Celery worker management container, one resource manager/task
   assignment container, and two Pulp worker containers.  All containers are
   running within a single Docker host, dedicated to only Pulp-related
   operations.  The diagram at
   http://docs.pulpproject.org/en/2.14/user-guide/scaling.html was used as
   a guide for this setup.
   - Ubuntu/Mac/Windows-based clients are using a Java application plugin
   to do artifact uploads.  Clients are dispersed across multiple geographical
   sites, including the same site where the Pulp server resides.
   - Artifacts are company-proprietary (configured as a Pulp plugin), but
   essentially are a single ZIP file with attached metadata for tracking and
   management purposes.


_______________________________________________
Pulp-list mailing list
Pulp-list at redhat.com
https://www.redhat.com/mailman/listinfo/pulp-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20171205/c9e49278/attachment.htm>


More information about the Pulp-list mailing list