[rhelv6-list] Highly available OpenLDAP

Fri Jan 31 23:14:22 UTC 2014

Hi all,

            I'm looking for a little input on what other folks are doing to solve a problem we are trying to address. The scenario is as follows:

We were an NIS shop for many, many years. Our environment was (and still is) heavily dependant on NIS, and netgroups in particular, to function correctly.

About 5 or 6 years ago we migrated from NIS to LDAP (using RFC2307 to provide NIS maps via LDAP). The environment at the time consisted of less than 200 servers (150 in primary site, the rest in a secondary site), mostly HP-UX with Linux playing the part of "utility" services (LDAP, DNS, mysql, httpd, VNC).

We use LDAP only to provide the standard NIS "maps" (with a few small custom maps, too).

We maintain our our LDAP servers with the RHEL-provided OpenLDAP, with a single master in our primary site in conjunction with 2 replica servers in our primary site and 2 replica servers in our secondary site. Replication was using the slurpd mechanism (we started on RHEL3).

Life was good :)

Fast forward to current environment, and a merger with a different Unix team (and migrating that environment from NIS to LDAP as well). We now have close to 1000 servers (mix of physical and VM): roughly 400 each for our 2 primary sites and the rest scattered across another 3 sites. The mix is now much more heavily Linux (70%), which the remaining 30% split between HP-UX and Solaris.

We have increased the number of replicas adding 2 more replicas in each of the new sites.

We are still (mostly) using slurpd for replication, although with the impending migration of our LDAP master from RHEL5 to RHEL6, we must change to using sync-repl. No problem, as this is (IMO) a much better replication method and relieves the worries and headaches that occur when a replica for some reason becomes "broken" for some period of time. We have already started this migration, and our master now handles both slurpd (to old replicas) and sync-repl (from new replicas).

In our environment, each site has is configured to point to LDAP services by IP address. Two IP addresses per site which are "load-balanced" by alternating which IP is first and second in the config files based on whether the last octet of the client IP address is even or odd. This is done as very basic way to distribute the load.

Now comes the crux of the problem: what happens when an LDAP server becomes unavailable for some reason?

If the client is HP-UX (ldapclientd), Solaris (ldap_cachemgr) or RHEL6 (nslcd) there is not much of an issue as long as 1 LDAP replica in each site is functioning. The specific LDAP-daemon for each platform will have a small hiccup while it times out and falls over to the next LDAP replica... a few seconds, not a big deal.

If, however, the client is RHEL4 (yes, still!) or RHEL5 then the problem is much bigger! On these versions, each process that needs to use LDAP must go thru the exact same timeout process - the systems become very bogged down, or even unusable depending on the server load.

In one subset of our larger environment (about 40%), we run nscd which can help alleviate some of this issue but not all of it. We are planning to enable nscd on the remainder very soon - the historical reasoning for why those servers do not use nscd is unknown.

Last year, I started investigating and testing the use of LVS (Linux Virtual Server) to provide a highly available (aka, clustered), load-balanced front-end that would direct client requests for a single IP address (per site) to the backend LDAP servers. Results were very good, and I proposed this plan to our management.

DENIED!

It was deemed to be "too complex to manage" by our team, and redundant to the BigIP F5 service offering with the company. I tend to favor self-management of infrastructure components which are critical to maintaining system functionality, but what do I know?  :)

So, we are now looking down the route of using F5 (managed by another team) to front-ent our LDAP

But, another option has been proposed: what if we make each linux server an LDAP replica that keeps itself up to date with sync-repl and have each server use only itself for LDAP services? The setup of this would be fairly straightforward, and could be easily integrated into our build process.

Since we don't make massive volumes of changes, I feel like the network load for LDAP would probably drop significantly, and we don't have to worry about many of these other issues. I know that this solves the problem only for Linux, but Solaris and HP-UX already handle the problem case are are being phased out of our environment.

Anyway, thanks for reading this novel - had not intended to write so much, but wanted to set the foundation for my question.

What are you people doing to solve this problem? Are you using F5? Do you think the "every server a replica" approach makes sense?

I am posting to both RHEL5 and RHEL6 lists, sorry if you see it twice.

Thanks in advance for your input.

Kevin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rhelv6-list/attachments/20140131/a3b12bbb/attachment.htm>