[Pulp-list] Need help/advice with import tasks intermittently causing a time-out condition

Tue Dec 5 17:43:43 UTC 2017

                Hi, I’m hoping someone can help me solve a strange problem
I’m having with my Pulp installation, or at least give me a good idea where
I should look further to get it solved.  The most irritating aspect of the
problem is that it doesn’t reliably reproduce.

                The failure condition is realized when a client is adding a
new artifact.  In all cases, the client is able to successfully “upload”
the artifact to Pulp (successful according to the response from the Pulp
server).  The problem comes in at the next step where the client directs
Pulp to “import” the uploaded artifact, and then awaits a successful task
result before proceeding.  This is set up within a loop;  up to 30 queries
for a successful response to the import task are made, with a 2-second
interval between queries.  If the import doesn’t succeed within those
constraints, the operation is treated as having timed-out, and further
actions with that artifact (specifically, a publish operation) are
abandoned. Many times that algorithm works with no problem at all, but far
too often, that successful response is not received within the 30
iterations.  It surprises me that there would be a failure at this point,
actually – I wouldn’t expect an “import” operation to be very complicated
or take a lot of time (but I’m certainly not intimate with the details of
Pulp implementation either).  Is it just a case that my expectations of the
“import” operation are unreasonable, and I should relax the loop parameters
to allow more attempts/more time between attempts for this to succeed?  As
I’ve mentioned, this doesn’t always fail, I’d even go so far as to claim
that it succeeds “most of the time”, but I need more consistency than that
for this to be deemed production-worthy.

                I’ve tried monitoring operations using pulp-admin to make
sure that tasks are being managed properly (they seem to be, but I’m not
yet any sort of Pulp expert), and I’ve also monitored the Apache mod_status
output to see if there is anything obvious (there’s not, but I’m no Apache
expert either).  I’ve also found nothing obvious in any Pulp log output.
I’d be deeply grateful if anyone can offer any sort of wisdom, help or
advice on this issue, I’m at the point where I’m not sure where to look
next to get this resolved.  I’d seriously hate to have to abandon Pulp
because I can’t get it to perform consistently and reliably (not only
because of the amount of work this would represent, but because I like
working with Pulp and appreciate what it has to offer).

                I have managed to put together a test case that seems to
reliably demonstrate the problem – sort of.  This test case uses 16 clients
running in parallel, each of which has from 1-10 artifacts to upload (most
clients have only 5).  When I say that it “sort of” demonstrates the
problem, the most recent run failed on 5 of those clients (all with the
condition mentioned above), while the previous run failed on 8, and the one
before that on 9, with no consistency of which client will fail to upload
which artifact.

Other observations:

   - Failure conditions don’t seem to have anything to do with the client’s
   platform, geographical location, or be attached to a specific client.
   - One failure on a client doesn’t imply the next attempt from that same
   client will also fail, in fact, more often than not it doesn’t.
   - Failure conditions don’t seem to have anything to do with the artifact
   being uploaded.
   - There is no consistency around which artifact fails to upload (it’s
   not always the first artifact from a client, or the third, etc.)

Environment Details

   - Pulp 2.14.3 using Docker containers based on Centos 7: one Apache/Pulp
   API container, one Qpid message broker container, one Mongo DB container,
   one Celery worker management container, one resource manager/task
   assignment container, and two Pulp worker containers.  All containers are
   running within a single Docker host, dedicated to only Pulp-related
   operations.  The diagram at
   http://docs.pulpproject.org/en/2.14/user-guide/scaling.html was used as
   a guide for this setup.
   - Ubuntu/Mac/Windows-based clients are using a Java application plugin
   to do artifact uploads.  Clients are dispersed across multiple geographical
   sites, including the same site where the Pulp server resides.
   - Artifacts are company-proprietary (configured as a Pulp plugin), but
   essentially are a single ZIP file with attached metadata for tracking and
   management purposes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20171205/a87fcebb/attachment.htm>