[Pulp-list] High Available Pulp Setup

Wed Nov 27 20:12:57 UTC 2013

I think you are overthinking this a little bit.  Pulp has a "nodes" concept which can provide native replication of repository to one or more child/parent nodes.  Here's how we do it:

* One Pulp server in each datacenter
  * "Parent" pulp server in primary DC
  * "Child" pulp server in secondary DCs
* Content is synced to the parent Pulp server from various repositories
* Content is then automatically replicated per a replication schedule to each child node
* Clients point to their nearest Pulp server
  * This is done via intelligent DNS (F5 BIG-IP GTM) that hands out the IP address for the nearest Pulp server depending on the source of the DNS query.

I don't see a need to have more than one Pulp server in any given datacenter.  One server can easily handle the load for one datacenter.  If it goes down, our BIG-IP device notices the failure and starts handing out another Pulp server that is healthy in another datacenter.  Our datacenters are very well connected so bandwidth is not a concern.  This scenario requires no shared storage or fancy/complicated clustering.

Also, Pulp, as of now, will not be able to handle Debian based repositories.

Thanks,

Josh

-----Original Message-----
From: pulp-list-bounces at redhat.com [mailto:pulp-list-bounces at redhat.com] On Behalf Of Arnold Bechtoldt
Sent: Wednesday, November 27, 2013 3:07 PM
To: pulp-list at redhat.com
Subject: [Pulp-list] High Available Pulp Setup

Hey,

we want to setup Pulp with a two-side HA concept.

There will be two servers in each of two DCs. Two per DC to do a fast failover within a DC, another (identically configured) two in the second DC to be able two work when the first DC is completly down.

Repositories to be mirorred:

* RHEL server with additional repositories/channels
* EPEL
* Foreman (low prio)
* Puppet Labs (yum.puppetlabs.com)
* rpm repos of some hardware vendors
* rpm repos of some software community projects
* several rpm repos of own software

and the same required for Ubuntu and maybe SLES (ASAP).

Geo-redundant SAN (both DCs) via NFS is available.

When I understood Pulp correctly, Pulp requires mainly httpd with mod_wsgi, mongodb and storage (/var/lib/pulp/contents) for pulp-server and any host for pulp-admin. pulp-consumer is currently not planned for use.

Besides to the node feature there are no docs concerning pulp HA on the web (or PEBKAC) - I would add some as soon I am able to.

We have tested Pulp to mirror the repos mentioned above and cloned some, too.
Some questions remained be open:

* do I need 4 x independent storage space?
* do I have to manage 2 or 4 pulp servers with the same content/sync-tasks/clone-tasks? note: every server must be able to provide current mirrors of upstream in a short time (5-10 min) after a failover
* is it a expected behaviour that pulp doesn't re-download missing contents to /var/lib/pulp/contents/ of a repo (intentionally removed some)?
* is there a way to import contents of a repo (mirror) in another pulp server with the same repo settings/parameters?
* does a mongodb replication (master->3 x slave) make sense?

notice: Pulp needs to be run on only one system at the same time.
Active/Active over both DCs isn't a must. The release of packages of the most important mirrors to the consuming hosts will be staged.

Thank you for developing Pulp and giving your ideas to this topic.

Arnold

--
Arnold Bechtoldt
IT Engineering & Operations

inovex GmbH

Zur Gießerei 16
D-76227 Karlsruhe
Tel: 07231 31 91 0
Fax: 07231 31 91 91
Mobil: 0173 3181 117
arnold.bechtoldt at inovex.de
www.inovex.de

Sitz der Gesellschaft: Pforzheim
AG Mannheim, HRB 502126
Geschäftsführer: Stephan Müller