[Freeipa-users] problems with ipa server no longer responding to ldap

siology.io siology.io at gmail.com
Sun Sep 11 21:38:12 UTC 2016


Hello there.

My setup is that i have five ipa servers. 2 in one location (alder,
auth-syd2), 2 in anouther location (auth-wlg, auth-wlg2), and one in yet
anouther location (waffle) which is reached over a long,
mostly-but-possibly-notably-not-entirely reliable vpn connection.

I'm having an issue with an IPA server falling over. By 'falling over' what
i mean is that it no longer responds to ldap queries (although the tcp port
389 is still open via nmap). When i run 'systemctl ipa stop' the command
never seems to return, so up to now the only fix i have it to reboot that
server.

All machines are centos 7. All are using
ipa-server-4.2.0-15.0.1.el7.centos.18.x86_64. Replication occurs between:
alder<->auth-wlg, alder<->syd2, auth-wlg<->auth-wlg2, and
auth-wlg<->waffle, possibly notably *not* between alder and waffle directly.

The problem of ldap being unavailable occurs on alder only; the other ipa
servers seem to be reliable. Unfortunately, alder is also our most used
server.

The error logs off alder look like this:  http://pastebin.com/TxCVjWTe
with reboot done at around 19:55

I did notice upon investigating / googling the errors in this log -
starting with the attr_replace (nsslapd-referral) one, that on my servers
this ldap query:

ldapsearch -ZZ -h alder.blah.com -D "cn=Directory Manager" -W -b "o=ipaca"
"(&(objectclass=nstombstone)(nsUniqueId=ffffffff-ffffffff-ffffffff-ffffffff))"
 | grep "nsds50ruv\|nsDS5ReplicaId"

returns results similar to this:

nsDS5ReplicaId: 96
nsds50ruv: {replicageneration} 5733d428000000600000
nsds50ruv: {replica 96 ldap://alder.blah.com:389} 5733d474000000600000 57
nsds50ruv: {replica 91 ldap://auth-syd2.blah.com:389} 576337b90004005b000
nsds50ruv: {replica 97 ldap://auth-wlg.blah.com:389} 5733d49a000000610000
nsds50ruv: {replica 1095 ldap://auth-wlg2.blah.com:389} 574fa5b0000004470
nsds50ruv: {replica 1090 ldap://waffle.bsh.blah.com:389} 576b1add00000442
nsds50ruv: {replica 1085 ldap://waffle.bsh.blah.com:389} 576b22f10000043d

i.e: waffle is listed twice. If i run that ldap query on waffle though, i
get no results at all (but the command does at least return). - so i dont
know waffle's nsDS5ReplicaId at the moment. I understand once i know that i
can clean-ruv the other id off the other ipa servers? I don't *think* any
of this is related to my original issue above though, but it might be a
smoking gun, i don't know - just mentioning it in case.

At the moment i've not got a lot to go on. Has anyone else seen errors like
those in the paste bin, or might know where to look for more useful info ?
Possibly also worth noting that alder, and auth-syd2 are AWS ec2 instances.
The rest are vm's on site(s).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20160912/02695190/attachment.htm>


More information about the Freeipa-users mailing list