[Pulp-list] Children resource usage

Brian Bouterse bbouters at redhat.com
Wed Jun 24 16:51:51 UTC 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Salvatore,

Thanks for the note describing the setup you've done. It's great to
see users clustering Pulp!

I've done work with clustering Pulp (starting with 2.6.1) and put
together a clustering guide [0] which was tested by QE (and me).

Pulp still has two single points of failure (pulp_celerybeat, and
pulp_resource_manager) but we're working on fixing those in a future
version of Pulp.

Even after fixing those issues Pulp will still have trouble
guaranteeing consistency when using a replica_set with mongodb. That
is going to be harder to fix and we're still in the planning phase.
You can follow that issue, discussion, and its subissues here [1].
That being said it should *mostly* work today, but your mileage may vary
.

Generally, the clustering doc [0] is the preferred way to scale Pulp
within a single data center or over low latency network connections.
You didn't use nodes, but to clarify for others the nodes feature is
more for replicating content data between data centers or if one of
the Pulp installations needs to be network disconnected from the other.

[0]:
http://pulp.readthedocs.org/en/latest/user-guide/scaling.html#clustering
- -pulp
[1]: https://pulp.plan.io/issues/1014

- -Brian

On 06/23/2015 04:08 AM, Salvatore Di Nardo wrote:
> Not an expert here as i started to work with pulp just recently,
> but i tried to install 2-3 server configuration with a master and 1
> or 2 clients. The aim was to spread the load (in active active
> configuration so no clustered configuration) and avoiding a single
> point of failure.
> 
> Sadly i went stuck with the fact that nodes need Oauth
> authentication, but it was not working properly and other pages
> declared oauth deprecated and soon to be removed from pulp.
> 
> How then nodes should work its a mystery. Since the documentation
> was contradicting itself and i didn't managed to make of work ( ssl
> issues even if i disabled it everywhere) i opted for a totally
> different approach:
> 
> I created a single pulp server and mounted a nas volume.
> 
> I moved the /var/lib/pulp and /var/lib/mongodb to the nas and
> replaced the mentioned path with another nfs mount. Simbolic links
> could work with mongodb, but not with pulp as some paths need to be
> available on apache who by default don't follow simlinks.
> 
> Once the pulp stuff are located in NAS i exported that volume on 2
> more apache servers and made available the same  'published'
> directory trough those apache server ( you can reuse the pukp.conf
> in /etc/httpd/conf.d as it need just minor changes). All the
> clients actually connect to the apache servers, so i can scale
> horizontally how much do i want and the pulp server only do the
> repo sync so his load actually its quite low.
> 
> The good: with this configuration the pulp server can be restarted,
> reinstalled, or shutdown and the repos will still available to the
> hosts as they connect to the apache servers. This helps pulp
> maintenance. Having pulp unavailable means only that there will be
> no new syncs to update the repositories but the repos are
> available.
> 
> The bad: this is all nice but only if you use pulp as pure rpm repo
> manager. If you use pulp also to register the hosts, then this
> configuration its no use for you. since the hosts have to register,
> they have to connect to the pulp servers and only pulp can 'push'
> changes to the hosts, so the single point of failure comes back.
> 
> the workaround ( no, its not "ugly" :) ) In my work environment we
> use puppet to define the server configuration and the running
> services, so we can rebuild it automatically without manual
> intervention. This includes repo configurationa nd packages 
> installed, so we dont need to register hosts in specific host
> groups as puppet does everything (better).
> 
> Actually during my host registration test i didn't liked the logic 
> behind. We host several thousand hosts and we need to be able to 
> reinstall them when needed without manual intervention. Puppet
> cope that, so when i was looking how to register a host i was
> surprised that a host cannot register to a specific puppet host
> group. You have to do that by hand on the puppet server ( more
> exactly: using pulp-admin). So anytime a machine register itself
> you have some manual task on pulp, which its not scalable for us,
> so in the end we skipped this part and used pulp just are local rpm
> repo and continued to use puppet for the rest.
> 
> 
> On 22/06/15 15:11, Sean Waite wrote:
>> By children, I'm referring to child nodes - the subservers that
>> can sync from a "parent" node.
>> 
>> Looking again at the resources, below is what I have. It does
>> look like the 1.7g proc is actually a worker.
>> 
>> Some statistics on what I have here (resident memory): 2
>> celery__main__worker procs listed as "resource_manager"  - 41m 
>> memory each 2 celery__main__worker procs listed as
>> "reserved_resource_worker" - 42m and 1.7g respectively 1 mongo
>> process - 972m 1 celerybeat - 24m a pile of httpd procs - 14m
>> each 1 qpid -  21m
>> 
>> For disk utilization, the mongo db is around 3.8G and my
>> directory containing all of the rpms etc is around 95G.
>> 
>> We're on a system with only 3.5G available memory, which is
>> probably part of the problem. We're looking at expanding it, I'm
>> just trying to figure out how much to expand it by. From your
>> numbers above, we'd need 6-7G of memory + 2*N gigs for the
>> workers. Should I expect maybe 3-4 workers at any one time? I've
>> got 2 now, but that is at an idle state.
>> 
>> 
>> On Mon, Jun 22, 2015 at 9:24 AM, Brian Bouterse
>> <bbouters at redhat.com <mailto:bbouters at redhat.com>> wrote:
>> 
> Hi Sean,
> 
> I'm not really sure what you mean by the term 'children'. Maybe
> you mean process or consumer?
> 
> I expect pulp_resource_manager to use less than 1.7G of memory,
> but its possible. Memory analysis can be a little bit tricky so
> more details are needed about how this is being measured to be
> sure.
> 
> The biggest memory process within Pulp by far is mongodb. If you
> can, ensure that at least 4G of RAM is available on that machine
> that you are running mongodb on.
> 
> I looked into the docs and we don't talk much about the memory 
> requirements. Feel free to file a bug on that if you want. Roughly
> I expect the following amounts of RAM to be available per process:
> 
> pulp_celerybeat, 256MB - 512MB pulp_resource_manager, 256MB -
> 512MB pulp_workers. This process spawns N workers. Each worker
> could use 256MB - 2GB depending on what its doing. httpd, 1GB 
> mongodb, 4GB qpidd/rabbitMQ, ???
> 
> Note all the pulp_*, processes have a parent and child process,
> for the numbers above I consider each parent/child together. I
> usually show the inheritance using `sudo ps -awfux`.
> 
> I'm interested to see what others think about these numbers too.
> 
> -Brian
> 
> 
> On 06/22/2015 08:46 AM, Sean Waite wrote:
>> Hi,
> 
>> I've got a pulp server running, and I'd like to add some
>> children. The server itself is a bit hard up on resources, so
>> we're going to rebuild with a larger one. How much resources
>> would the children use? Is it a fairly beefy process/memory hog?
> 
>> We've got a large number of repositories. pulp-resource-manager 
>> seems to be using 1.7G of memory, with a .7G of mongodb.
> 
>> Any pointers on how much I might be able to expect?
> 
>> Thanks
> 
>> -- Sean Waite swaite at tracelink.com <mailto:swaite at tracelink.com>
> <mailto:swaite at tracelink.com <mailto:swaite at tracelink.com>> Cloud 
> Operations
>> Engineer                GPG 3071E870 TraceLink, Inc.
> 
>> Be Excellent to Each Other
> 
> 
>> _______________________________________________ Pulp-list
>> mailing list Pulp-list at redhat.com <mailto:Pulp-list at redhat.com> 
>> https://www.redhat.com/mailman/listinfo/pulp-list
> 
>> 
>> _______________________________________________ Pulp-list mailing
>> list Pulp-list at redhat.com <mailto:Pulp-list at redhat.com> 
>> https://www.redhat.com/mailman/listinfo/pulp-list
>> 
>> 
>> 
>> 
>> -- Sean Waite
>> swaite at tracelink.com <mailto:swaite at tracelink.com> Cloud
>> Operations Engineer                GPG 3071E870 TraceLink, Inc.
>> 
>> Be Excellent to Each Other
>> 
>> 
>> _______________________________________________ Pulp-list mailing
>> list Pulp-list at redhat.com 
>> https://www.redhat.com/mailman/listinfo/pulp-list
> 
> 
> 
> _______________________________________________ Pulp-list mailing
> list Pulp-list at redhat.com 
> https://www.redhat.com/mailman/listinfo/pulp-list
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJViuAmAAoJEK48cdELyEfy+lUIAIzNE9TZqUwc9OkBQIlYJKj4
r3msZgaGDnecJ//QzfrBPA7VmUU6opGkfcJEv6bdif5XcTmucfDlyq0cyfTAN7mA
iqzoi8YFyI1ulKT01BRxWV9bnSkKftGtKzbVBspjMrvCNQq79iEqp4HtycihSBxQ
wKLt56huh9fvk7rbTz75L5TNZ7+Pz8UC+ReL/0GAIcnx+3s461uERFCS+FmzPMVF
vMK2BGV4txfp2gpvK9aQwiRa/CfuD3EOUY+wGNPjneLesW3CwzFilD6TDP7JOi4+
fhnUY2H/zPhZbYqtUyAy3oOxoQkKAdpQqz8boTa6gScx0Ew+6rmJcwAaAj9kHe4=
=/jOh
-----END PGP SIGNATURE-----




More information about the Pulp-list mailing list