[Freeipa-users] Replication Issues

Mark Reynolds mareynol at redhat.com
Tue Mar 7 13:45:57 UTC 2017


What version of 389-ds-base are you using?

rpm -qa | grep 389-ds-base


comments below..

On 03/06/2017 02:37 PM, Christopher Young wrote:
> I've seen similar posts, but in the interest of asking fresh and
> trying to understand what is going on, I thought I would ask for
> advice on how best to handle this situation.
>
> In the interest of providing some history:
> I have three (3) FreeIPA servers.  Everything is running 4.4.0 now.
> The originals (orldc-prod-ipa01, orldc-prod-ipa02) were upgraded from
> the 3.x branch quite a while back.  Everything had been working fine,
> however I ran into a replication issue (that I _think_ may have been a
> result of IPv6 being disabled by my default Ansible roles).  I thought
> I had resolved that by reinitializing the 2nd replica,
> orldc-prod-ipa02.
>
> In any case, I feel like the replication has never been fully stable
> since then, and I have all types of errors in messages that indicate
> something is off.  I had single introduced a 3rd replica such that the
> agreements would look like so:
>
> orldc-prod-ipa01 -> orldc-prod-ipa02 ---> bohdc-prod-ipa01
>
> It feels like orldc-prod-ipa02 & bohdc-prod-ipa01 are out of sync.
> I've tried reinitializing them in order but with no positive results.
> At this point, I feel like I'm ready to 'bite the bullet' and tear
> them down quickly (remove them from IPA, delete the local
> DBs/directories) and rebuild them from scratch.
>
> I want to minimize my impact as much as possible (which I can somewhat
> do by redirecting LDAP/DNS request via my load-balancers temporarily)
> and do this right.
>
> (Getting to the point...)
>
> I'd like advice on the order of operations to do this.  Give the
> errors (I'll include samples at the bottom of this message), does it
> make sense for me to remove the replicas on bohdc-prod-ipa01 &
> orldc-prod-ipa02 (in that order), wipe out any directories/residual
> pieces (I'd need some idea of what to do there), and then create new
> replicas? -OR-  Should I export/backup the LDAP DB and rebuild
> everything from scratch.
>
> I need advice and ideas.  Furthermore, if there is someone with
> experience in this that would be interested in making a little money
> on the side, let me know, because having an extra brain and set of
> hands would be welcome.
>
> DETAILS:
> =================
>
>
> ERRORS I see on orldc-prod-ipa01 (the one whose LDAP DB seems the most
> up-to-date since my changes are usually directed at it):
> ------
> Mar  6 14:36:24 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:24.434956575 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:25 orldc-prod-ipa01 ipa-dnskeysyncd: ipa         : INFO
>   LDAP bind...
> Mar  6 14:36:25 orldc-prod-ipa01 ipa-dnskeysyncd: ipa         : INFO
>   Commencing sync process
> Mar  6 14:36:26 orldc-prod-ipa01 ipa-dnskeysyncd:
> ipa.ipapython.dnssec.keysyncer.KeySyncer: INFO     Initial LDAP dump
> is done, sychronizing with ODS and BIND
> Mar  6 14:36:27 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:27.799519203 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:30 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:30.994760069 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:34 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:34.940115481 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:35 orldc-prod-ipa01 named-pkcs11[32134]: client
> 10.26.250.66#49635 (56.10.in-addr.arpa): transfer of
> '56.10.in-addr.arpa/IN': AXFR-style IXFR started
> Mar  6 14:36:35 orldc-prod-ipa01 named-pkcs11[32134]: client
> 10.26.250.66#49635 (56.10.in-addr.arpa): transfer of
> '56.10.in-addr.arpa/IN': AXFR-style IXFR ended
> Mar  6 14:36:37 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:37.977875463 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:40 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:40.999275184 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:36:45 orldc-prod-ipa01 ns-slapd:
> [06/Mar/2017:14:36:45.211260414 -0500] NSMMReplicationPlugin -
> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa02:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> ------
These messages indicate that the replica does not have the same database
as the master.  So either the master or the replica needs to be
reinitialized.,  More on this below...
>
>
> Errors on orldc-prod-ipa02:
> ------
> r  6 14:16:04 orldc-prod-ipa02 ipa-dnskeysyncd: ipa         : INFO
> Commencing sync process
> Mar  6 14:16:04 orldc-prod-ipa02 ipa-dnskeysyncd:
> ipa.ipapython.dnssec.keysyncer.KeySyncer: INFO     Initial LDAP dump
> is done, sychronizing with ODS and BIND
> Mar  6 14:16:05 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:05.934405274 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:05 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:05.937278142 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:05 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:05.939434025 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
These are harmless "errors" which have been removed in newer versions of
389-ds-base.
> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:06.882795654 -0500]
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) -
> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988).
> If replication stops, the consumer may need to be reinitialized.
> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:06.886029272 -0500] NSMMReplicationPlugin -
> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local"
> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't
> as up to date, or we purged
This "could" also be a known issue that is fixed in newer versions of
389-ds-base.  Or this is a valid error message due to the replica being
stale for a very long time and records actually being purged from the
changelog before they were replicated. 

> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:06.888679268 -0500] NSMMReplicationPlugin -
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389):
> Data required to update replica has been purged from the changelog.
> The replica must be reinitialized.
> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:06.960804253 -0500] NSMMReplicationPlugin -
> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa01:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
Okay, so your replication agreements/servers are not in sync.  I suspect
you created a new replica and used that to initialize a valid replica
which broke things.  Something like that.  You need to find a "good"
replica server and reinitialize the other replicas from that server. 
These errors needs to addressed asap, as it's halting replication for
those agreements which explains the "instability" you are describing.

Mark
> Mar  6 14:16:08 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:08.960622608 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:08 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:08.968927168 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:08 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:08.976952118 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:09 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:09.972315877 -0500] NSMMReplicationPlugin -
> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa01:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:16:10 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:10.034810948 -0500]
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) -
> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988).
> If replication stops, the consumer may need to be reinitialized.
> Mar  6 14:16:10 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:10.040020359 -0500] NSMMReplicationPlugin -
> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local"
> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't
> as up to date, or we purged
> Mar  6 14:16:10 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:10.042846879 -0500] NSMMReplicationPlugin -
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389):
> Data required to update replica has been purged from the changelog.
> The replica must be reinitialized.
> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:13.013253769 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:13.021514225 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:13.027521508 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:13.110566247 -0500] NSMMReplicationPlugin -
> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
> (orldc-prod-ipa01:389): The remote replica has a different database
> generation ID than the local database.  You may have to reinitialize
> the remote replica, or the local replica.
> Mar  6 14:16:14 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:14.179819300 -0500]
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) -
> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988).
> If replication stops, the consumer may need to be reinitialized.
> Mar  6 14:16:14 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:14.188353328 -0500] NSMMReplicationPlugin -
> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local"
> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't
> as up to date, or we purged
> Mar  6 14:16:14 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:14.196463928 -0500] NSMMReplicationPlugin -
> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389):
> Data required to update replica has been purged from the changelog.
> The replica must be reinitialized.
> Mar  6 14:16:17 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:17.068292919 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:17 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:17.071241757 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> Mar  6 14:16:17 orldc-prod-ipa02 ns-slapd:
> [06/Mar/2017:14:16:17.073793922 -0500] attrlist_replace - attr_replace
> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
> failed.
> ------
>
>
> Thanks in advance!!!
>
> -- Chris
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20170307/3b5475c7/attachment.htm>


More information about the Freeipa-users mailing list