[Pulp-list] REST API performance-related question(s)

Deej Howard Deej.Howard at neulion.com
Fri Dec 15 17:43:37 UTC 2017


                Hi, I’m using the 2.14.3 release in a Docker-based
configuration (details below), and I’ve noticed some performance-related
issues in a script-based artifact cleanup job that is run on a daily
basis.  The artifacts in question are of our own construction, incorporated
via the Pulp plugin mechanism, and all residing in a single repository
(there are around 22K artifacts in that one repo at this point).  The
Python script makes various Pulp REST API calls, and I’ve put in some extra
code to give me feedback on how much time each call is taking.  The “query”
calls have acceptable performance (less than a second, typically), but
there are others that are much slower;  calls to “unassociate” and
“orphans” take somewhere around 10s, and calls to “publish” take around 45s.

                I’m looking for some guidance on how I can improve this
performance.  I’m not the original author of this code, but I was lucky(?)
enough to inherit it.  The core algorithm essentially does some queries to
get the essential “keys” for the artifacts in question, then calls
“unassociate” with the relevant JSON payload for those artifacts, followed
by “orphans” to do the actual clean-up action, then “publish” after that
completes.  This cycle of action is executed potentially multiple times
within the cleanup script (on a “grouped artifact” basis).

                Some specific questions I have:

   - Is the methodology outline above appropriate for removing artifacts
   from a repository, or would some other mechanism be better/more efficient?
   - In the documentation for implementing support for new types[1], there
   is mention of a type definition JSON file that belongs in
   /usr/lib/pulp/plugins/types[2]. Unfortunately, it’s not clear which of
   the Pulp components (Qpid?  MongoDB?  Resource manager?  Workers?) use that
   information, and it looks like our installation has no files at all in that
   directory location.  We have other repo types installed (puppet, python),
   so I would have expected at least one such file, especially given that the
   puppet_module is provided as the example in the documentation.   This
   sounds like it could provide improvements to performance via insertion of
   search indexes or other such shortcuts.  Where can I find more details
   about this and/or more extensive examples?



[1]
https://docs.pulpproject.org/dev-guide/newtypesupport/plugin/example.html

[2]
https://docs.pulpproject.org/dev-guide/newtypesupport/plugin/type_defs.html

Environment Details

   - Pulp 2.14.3 using Docker containers based on Centos 7: one Apache/Pulp
   API container, one Qpid message broker container, one Mongo DB container,
   one Celery worker management container, one resource manager/task
   assignment container, and two Pulp worker containers.  All containers are
   running within a single Docker host, dedicated to only Pulp-related
   operations.  The diagram at
   http://docs.pulpproject.org/en/2.14/user-guide/scaling.html was used as
   a guide for this setup.
   - Artifacts are company-proprietary (configured as a Pulp plugin), but
   essentially are a single ZIP file with attached metadata for tracking and
   management purposes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20171215/c7a85524/attachment.htm>


More information about the Pulp-list mailing list