[Pulp-list] High Available Pulp Setup
Baird, Josh
jbaird at follett.com
Wed Nov 27 20:12:57 UTC 2013
I think you are overthinking this a little bit. Pulp has a "nodes" concept which can provide native replication of repository to one or more child/parent nodes. Here's how we do it:
* One Pulp server in each datacenter
* "Parent" pulp server in primary DC
* "Child" pulp server in secondary DCs
* Content is synced to the parent Pulp server from various repositories
* Content is then automatically replicated per a replication schedule to each child node
* Clients point to their nearest Pulp server
* This is done via intelligent DNS (F5 BIG-IP GTM) that hands out the IP address for the nearest Pulp server depending on the source of the DNS query.
I don't see a need to have more than one Pulp server in any given datacenter. One server can easily handle the load for one datacenter. If it goes down, our BIG-IP device notices the failure and starts handing out another Pulp server that is healthy in another datacenter. Our datacenters are very well connected so bandwidth is not a concern. This scenario requires no shared storage or fancy/complicated clustering.
Also, Pulp, as of now, will not be able to handle Debian based repositories.
Thanks,
Josh
-----Original Message-----
From: pulp-list-bounces at redhat.com [mailto:pulp-list-bounces at redhat.com] On Behalf Of Arnold Bechtoldt
Sent: Wednesday, November 27, 2013 3:07 PM
To: pulp-list at redhat.com
Subject: [Pulp-list] High Available Pulp Setup
Hey,
we want to setup Pulp with a two-side HA concept.
There will be two servers in each of two DCs. Two per DC to do a fast failover within a DC, another (identically configured) two in the second DC to be able two work when the first DC is completly down.
Repositories to be mirorred:
* RHEL server with additional repositories/channels
* EPEL
* Foreman (low prio)
* Puppet Labs (yum.puppetlabs.com)
* rpm repos of some hardware vendors
* rpm repos of some software community projects
* several rpm repos of own software
and the same required for Ubuntu and maybe SLES (ASAP).
Geo-redundant SAN (both DCs) via NFS is available.
When I understood Pulp correctly, Pulp requires mainly httpd with mod_wsgi, mongodb and storage (/var/lib/pulp/contents) for pulp-server and any host for pulp-admin. pulp-consumer is currently not planned for use.
Besides to the node feature there are no docs concerning pulp HA on the web (or PEBKAC) - I would add some as soon I am able to.
We have tested Pulp to mirror the repos mentioned above and cloned some, too.
Some questions remained be open:
* do I need 4 x independent storage space?
* do I have to manage 2 or 4 pulp servers with the same content/sync-tasks/clone-tasks? note: every server must be able to provide current mirrors of upstream in a short time (5-10 min) after a failover
* is it a expected behaviour that pulp doesn't re-download missing contents to /var/lib/pulp/contents/ of a repo (intentionally removed some)?
* is there a way to import contents of a repo (mirror) in another pulp server with the same repo settings/parameters?
* does a mongodb replication (master->3 x slave) make sense?
notice: Pulp needs to be run on only one system at the same time.
Active/Active over both DCs isn't a must. The release of packages of the most important mirrors to the consuming hosts will be staged.
Thank you for developing Pulp and giving your ideas to this topic.
Arnold
--
Arnold Bechtoldt
IT Engineering & Operations
inovex GmbH
Zur Gießerei 16
D-76227 Karlsruhe
Tel: 07231 31 91 0
Fax: 07231 31 91 91
Mobil: 0173 3181 117
arnold.bechtoldt at inovex.de
www.inovex.de
Sitz der Gesellschaft: Pforzheim
AG Mannheim, HRB 502126
Geschäftsführer: Stephan Müller
More information about the Pulp-list
mailing list