[Pulp-list] Children resource usage
Brian Bouterse
bbouters at redhat.com
Fri Jun 26 18:35:28 UTC 2015
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hi Sean,
I don't specifically which version it will land in, but at the
earliest it would be 2.8.0. You can follow the progress by watching
these two issues [0] [1].
[0]: https://pulp.plan.io/issues/1060
[1]: https://pulp.plan.io/issues/898
- -Brian
On 06/26/2015 10:07 AM, Sean Waite wrote:
> Thanks Brian and Salvatore. We're using it in a much similar
> fashion, with pulp just acting as an https target for yum and
> managing everything on the backend. I'll read the clustering
> guide, it could be useful. Any idea what release the HA celerbeat
> and resource_manager are targeted for?
>
> Sean
>
> On Wed, Jun 24, 2015 at 12:51 PM, Brian Bouterse
> <bbouters at redhat.com <mailto:bbouters at redhat.com>> wrote:
>
> Salvatore,
>
> Thanks for the note describing the setup you've done. It's great
> to see users clustering Pulp!
>
> I've done work with clustering Pulp (starting with 2.6.1) and put
> together a clustering guide [0] which was tested by QE (and me).
>
> Pulp still has two single points of failure (pulp_celerybeat, and
> pulp_resource_manager) but we're working on fixing those in a
> future version of Pulp.
>
> Even after fixing those issues Pulp will still have trouble
> guaranteeing consistency when using a replica_set with mongodb.
> That is going to be harder to fix and we're still in the planning
> phase. You can follow that issue, discussion, and its subissues
> here [1]. That being said it should *mostly* work today, but your
> mileage may vary .
>
> Generally, the clustering doc [0] is the preferred way to scale
> Pulp within a single data center or over low latency network
> connections. You didn't use nodes, but to clarify for others the
> nodes feature is more for replicating content data between data
> centers or if one of the Pulp installations needs to be network
> disconnected from the other.
>
> [0]:
> http://pulp.readthedocs.org/en/latest/user-guide/scaling.html#clusteri
ng
>
>
>
>
- -pulp
> [1]: https://pulp.plan.io/issues/1014
>
> -Brian
>
> On 06/23/2015 04:08 AM, Salvatore Di Nardo wrote:
>> Not an expert here as i started to work with pulp just recently,
>> but i tried to install 2-3 server configuration with a master and
>> 1 or 2 clients. The aim was to spread the load (in active active
>> configuration so no clustered configuration) and avoiding a
>> single point of failure.
>
>> Sadly i went stuck with the fact that nodes need Oauth
>> authentication, but it was not working properly and other pages
>> declared oauth deprecated and soon to be removed from pulp.
>
>> How then nodes should work its a mystery. Since the
>> documentation was contradicting itself and i didn't managed to
>> make of work ( ssl issues even if i disabled it everywhere) i
>> opted for a totally different approach:
>
>> I created a single pulp server and mounted a nas volume.
>
>> I moved the /var/lib/pulp and /var/lib/mongodb to the nas and
>> replaced the mentioned path with another nfs mount. Simbolic
>> links could work with mongodb, but not with pulp as some paths
>> need to be available on apache who by default don't follow
>> simlinks.
>
>> Once the pulp stuff are located in NAS i exported that volume on
>> 2 more apache servers and made available the same 'published'
>> directory trough those apache server ( you can reuse the
>> pukp.conf in /etc/httpd/conf.d as it need just minor changes).
>> All the clients actually connect to the apache servers, so i can
>> scale horizontally how much do i want and the pulp server only
>> do the repo sync so his load actually its quite low.
>
>> The good: with this configuration the pulp server can be
>> restarted, reinstalled, or shutdown and the repos will still
>> available to the hosts as they connect to the apache servers.
>> This helps pulp maintenance. Having pulp unavailable means only
>> that there will be no new syncs to update the repositories but
>> the repos are available.
>
>> The bad: this is all nice but only if you use pulp as pure rpm
>> repo manager. If you use pulp also to register the hosts, then
>> this configuration its no use for you. since the hosts have to
>> register, they have to connect to the pulp servers and only pulp
>> can 'push' changes to the hosts, so the single point of failure
>> comes back.
>
>> the workaround ( no, its not "ugly" :) ) In my work environment
>> we use puppet to define the server configuration and the running
>> services, so we can rebuild it automatically without manual
>> intervention. This includes repo configurationa nd packages
>> installed, so we dont need to register hosts in specific host
>> groups as puppet does everything (better).
>
>> Actually during my host registration test i didn't liked the
>> logic behind. We host several thousand hosts and we need to be
>> able to reinstall them when needed without manual intervention.
>> Puppet cope that, so when i was looking how to register a host i
>> was surprised that a host cannot register to a specific puppet
>> host group. You have to do that by hand on the puppet server (
>> more exactly: using pulp-admin). So anytime a machine register
>> itself you have some manual task on pulp, which its not scalable
>> for us, so in the end we skipped this part and used pulp just
>> are local rpm repo and continued to use puppet for the rest.
>
>
>> On 22/06/15 15:11, Sean Waite wrote:
>>> By children, I'm referring to child nodes - the subservers
>>> that can sync from a "parent" node.
>>>
>>> Looking again at the resources, below is what I have. It does
>>> look like the 1.7g proc is actually a worker.
>>>
>>> Some statistics on what I have here (resident memory): 2
>>> celery__main__worker procs listed as "resource_manager" - 41m
>>> memory each 2 celery__main__worker procs listed as
>>> "reserved_resource_worker" - 42m and 1.7g respectively 1 mongo
>>> process - 972m 1 celerybeat - 24m a pile of httpd procs - 14m
>>> each 1 qpid - 21m
>>>
>>> For disk utilization, the mongo db is around 3.8G and my
>>> directory containing all of the rpms etc is around 95G.
>>>
>>> We're on a system with only 3.5G available memory, which is
>>> probably part of the problem. We're looking at expanding it,
>>> I'm just trying to figure out how much to expand it by. From
>>> your numbers above, we'd need 6-7G of memory + 2*N gigs for
>>> the workers. Should I expect maybe 3-4 workers at any one time?
>>> I've got 2 now, but that is at an idle state.
>>>
>>>
>>> On Mon, Jun 22, 2015 at 9:24 AM, Brian Bouterse
>>> <bbouters at redhat.com <mailto:bbouters at redhat.com>
> <mailto:bbouters at redhat.com <mailto:bbouters at redhat.com>>> wrote:
>>>
>> Hi Sean,
>
>> I'm not really sure what you mean by the term 'children'. Maybe
>> you mean process or consumer?
>
>> I expect pulp_resource_manager to use less than 1.7G of memory,
>> but its possible. Memory analysis can be a little bit tricky so
>> more details are needed about how this is being measured to be
>> sure.
>
>> The biggest memory process within Pulp by far is mongodb. If you
>> can, ensure that at least 4G of RAM is available on that machine
>> that you are running mongodb on.
>
>> I looked into the docs and we don't talk much about the memory
>> requirements. Feel free to file a bug on that if you want.
>> Roughly I expect the following amounts of RAM to be available per
>> process:
>
>> pulp_celerybeat, 256MB - 512MB pulp_resource_manager, 256MB -
>> 512MB pulp_workers. This process spawns N workers. Each worker
>> could use 256MB - 2GB depending on what its doing. httpd, 1GB
>> mongodb, 4GB qpidd/rabbitMQ, ???
>
>> Note all the pulp_*, processes have a parent and child process,
>> for the numbers above I consider each parent/child together. I
>> usually show the inheritance using `sudo ps -awfux`.
>
>> I'm interested to see what others think about these numbers too.
>
>> -Brian
>
>
>> On 06/22/2015 08:46 AM, Sean Waite wrote:
>>> Hi,
>
>>> I've got a pulp server running, and I'd like to add some
>>> children. The server itself is a bit hard up on resources, so
>>> we're going to rebuild with a larger one. How much resources
>>> would the children use? Is it a fairly beefy process/memory
>>> hog?
>
>>> We've got a large number of repositories.
>>> pulp-resource-manager seems to be using 1.7G of memory, with a
>>> .7G of mongodb.
>
>>> Any pointers on how much I might be able to expect?
>
>>> Thanks
>
>>> -- Sean Waite swaite at tracelink.com
>>> <mailto:swaite at tracelink.com>
> <mailto:swaite at tracelink.com <mailto:swaite at tracelink.com>>
>> <mailto:swaite at tracelink.com <mailto:swaite at tracelink.com>
> <mailto:swaite at tracelink.com <mailto:swaite at tracelink.com>>> Cloud
>> Operations
>>> Engineer GPG 3071E870 TraceLink, Inc.
>
>>> Be Excellent to Each Other
>
>
>>> _______________________________________________ Pulp-list
>>> mailing list Pulp-list at redhat.com
>>> <mailto:Pulp-list at redhat.com>
> <mailto:Pulp-list at redhat.com <mailto:Pulp-list at redhat.com>>
>>> https://www.redhat.com/mailman/listinfo/pulp-list
>
>>>
>>> _______________________________________________ Pulp-list
>>> mailing list Pulp-list at redhat.com
>>> <mailto:Pulp-list at redhat.com>
> <mailto:Pulp-list at redhat.com <mailto:Pulp-list at redhat.com>>
>>> https://www.redhat.com/mailman/listinfo/pulp-list
>>>
>>>
>>>
>>>
>>> -- Sean Waite swaite at tracelink.com
>>> <mailto:swaite at tracelink.com>
> <mailto:swaite at tracelink.com <mailto:swaite at tracelink.com>> Cloud
>>> Operations Engineer GPG 3071E870 TraceLink,
>>> Inc.
>>>
>>> Be Excellent to Each Other
>>>
>>>
>>> _______________________________________________ Pulp-list
>>> mailing list Pulp-list at redhat.com
>>> <mailto:Pulp-list at redhat.com>
>>> https://www.redhat.com/mailman/listinfo/pulp-list
>
>
>
>> _______________________________________________ Pulp-list
>> mailing list Pulp-list at redhat.com <mailto:Pulp-list at redhat.com>
>> https://www.redhat.com/mailman/listinfo/pulp-list
>
>
> _______________________________________________ Pulp-list mailing
> list Pulp-list at redhat.com <mailto:Pulp-list at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-list
>
>
>
>
> -- Sean Waite swaite at tracelink.com <mailto:swaite at tracelink.com>
> Cloud Operations Engineer GPG 3071E870 TraceLink,
> Inc.
>
> Be Excellent to Each Other
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAEBCAAGBQJVjZtvAAoJEK48cdELyEfyi80H/ipf1xjUa4aOA9fsk8iCcf46
mPI/V3kJK1r5ksLWjWcONvf6Nv9rTmzbkwL8vOkjtnx+3Lelb2326y8iZYyhqtqL
zcJu5adjpGzAErSi8uCfJ6WVPAtZfPkKDc0cofRoXPXNcgFWmnl3T8p1DGMLOz7Y
a6WGyyp4dC5nZEL0eWzs3z0djlIYGtaw44y27JZdEnTXVnRujHrbEtDOMn5vmpJC
LNE55Uh5fftFaXoQ4BtZRvwQfP36AIif9OprcQKWVM8hJBspC0WBdpE+1Qe5Zb20
HkxZEjbEyfZ9kSAuAseO/JhLqRj4eOJ4H946z7labjQehz80+RilLshVS0hfzZI=
=0wy8
-----END PGP SIGNATURE-----
More information about the Pulp-list
mailing list