[Freeipa-users] Replication Issues

Mark Reynolds mareynol at redhat.com
Tue Mar 7 21:23:18 UTC 2017



On 03/07/2017 11:29 AM, Christopher Young wrote:
> Thank you very much for the response!
>
> To start:
> ----
> [root at orldc-prod-ipa01 ~]# rpm -qa 389-ds-base
> 389-ds-base-1.3.5.10-18.el7_3.x86_64
> ----
You are on the latest version with the latest replication fixes.
>
> So, I believe a good part of my problem is that I'm not _positive_
> which replica is good at this point (though my directory really isn't
> that huge).
>
> Do you have any pointers on a good method of comparing the directory
> data between them?  I was wondering if anyone knows of any tools to
> facilitate that.  I was thinking that it might make sense for me to
> dump the DB and restore, but I really don't know that procedure.  As I
> mentioned, my directory really isn't that large at all, however I'm
> not positive the best bullet-item listed method to proceed.  (I know
> I'm not helping things :) )
Heh, well only you know what your data should be.  You can always do a
db2ldif.pl on each server and compare the ldif files that are
generated.  Then pick the one you think is the most up to date.

https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/10/html/Administration_Guide/Populating_Directory_Databases-Exporting_Data.html#Exporting-db2ldif

Once you decide on a server, then you need to reinitialize all the other
servers/replicas from the "good" one. Use " ipa-replica-manage
re-initialize" for this. 

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Identity_Management_Guide/ipa-replica-manage.html#initialize

That's it.

Good luck,
Mark

>
> Would it be acceptable to just 'assume' one of the replicas is good
> (taking the risk of whatever missing pieces I'll have to deal with),
> completely removing the others, and then rebuilding the replicas from
> scratch?
>
> If I go that route, what are the potential pitfalls?
>
>
> I want to decide on an approach and try and resolve this once and for all.
>
> Thanks again! It really is appreciated as I've been frustrated with
> this for a while now.
>
> -- Chris
>
> On Tue, Mar 7, 2017 at 8:45 AM, Mark Reynolds <mareynol at redhat.com> wrote:
>> What version of 389-ds-base are you using?
>>
>> rpm -qa | grep 389-ds-base
>>
>>
>> comments below..
>>
>> On 03/06/2017 02:37 PM, Christopher Young wrote:
>>
>> I've seen similar posts, but in the interest of asking fresh and
>> trying to understand what is going on, I thought I would ask for
>> advice on how best to handle this situation.
>>
>> In the interest of providing some history:
>> I have three (3) FreeIPA servers.  Everything is running 4.4.0 now.
>> The originals (orldc-prod-ipa01, orldc-prod-ipa02) were upgraded from
>> the 3.x branch quite a while back.  Everything had been working fine,
>> however I ran into a replication issue (that I _think_ may have been a
>> result of IPv6 being disabled by my default Ansible roles).  I thought
>> I had resolved that by reinitializing the 2nd replica,
>> orldc-prod-ipa02.
>>
>> In any case, I feel like the replication has never been fully stable
>> since then, and I have all types of errors in messages that indicate
>> something is off.  I had single introduced a 3rd replica such that the
>> agreements would look like so:
>>
>> orldc-prod-ipa01 -> orldc-prod-ipa02 ---> bohdc-prod-ipa01
>>
>> It feels like orldc-prod-ipa02 & bohdc-prod-ipa01 are out of sync.
>> I've tried reinitializing them in order but with no positive results.
>> At this point, I feel like I'm ready to 'bite the bullet' and tear
>> them down quickly (remove them from IPA, delete the local
>> DBs/directories) and rebuild them from scratch.
>>
>> I want to minimize my impact as much as possible (which I can somewhat
>> do by redirecting LDAP/DNS request via my load-balancers temporarily)
>> and do this right.
>>
>> (Getting to the point...)
>>
>> I'd like advice on the order of operations to do this.  Give the
>> errors (I'll include samples at the bottom of this message), does it
>> make sense for me to remove the replicas on bohdc-prod-ipa01 &
>> orldc-prod-ipa02 (in that order), wipe out any directories/residual
>> pieces (I'd need some idea of what to do there), and then create new
>> replicas? -OR-  Should I export/backup the LDAP DB and rebuild
>> everything from scratch.
>>
>> I need advice and ideas.  Furthermore, if there is someone with
>> experience in this that would be interested in making a little money
>> on the side, let me know, because having an extra brain and set of
>> hands would be welcome.
>>
>> DETAILS:
>> =================
>>
>>
>> ERRORS I see on orldc-prod-ipa01 (the one whose LDAP DB seems the most
>> up-to-date since my changes are usually directed at it):
>> ------
>> Mar  6 14:36:24 orldc-prod-ipa01 ns-slapd:
>> [06/Mar/2017:14:36:24.434956575 -0500] NSMMReplicationPlugin -
>> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
>> (orldc-prod-ipa02:389): The remote replica has a different database
>> generation ID than the local database.  You may have to reinitialize
>> the remote replica, or the local replica.
>> Mar  6 14:36:25 orldc-prod-ipa01 ipa-dnskeysyncd: ipa         : INFO
>>   LDAP bind...
>> Mar  6 14:36:25 orldc-prod-ipa01 ipa-dnskeysyncd: ipa         : INFO
>>   Commencing sync process
>> Mar  6 14:36:26 orldc-prod-ipa01 ipa-dnskeysyncd:
>> ipa.ipapython.dnssec.keysyncer.KeySyncer: INFO     Initial LDAP dump
>> is done, sychronizing with ODS and BIND
>> Mar  6 14:36:27 orldc-prod-ipa01 ns-slapd:
>> [06/Mar/2017:14:36:27.799519203 -0500] NSMMReplicationPlugin -
>> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
>> (orldc-prod-ipa02:389): The remote replica has a different database
>> generation ID than the local database.  You may have to reinitialize
>> the remote replica, or the local replica.
>> Mar  6 14:36:30 orldc-prod-ipa01 ns-slapd:
>> [06/Mar/2017:14:36:30.994760069 -0500] NSMMReplicationPlugin -
>> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
>> (orldc-prod-ipa02:389): The remote replica has a different database
>> generation ID than the local database.  You may have to reinitialize
>> the remote replica, or the local replica.
>> Mar  6 14:36:34 orldc-prod-ipa01 ns-slapd:
>> [06/Mar/2017:14:36:34.940115481 -0500] NSMMReplicationPlugin -
>> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
>> (orldc-prod-ipa02:389): The remote replica has a different database
>> generation ID than the local database.  You may have to reinitialize
>> the remote replica, or the local replica.
>> Mar  6 14:36:35 orldc-prod-ipa01 named-pkcs11[32134]: client
>> 10.26.250.66#49635 (56.10.in-addr.arpa): transfer of
>> '56.10.in-addr.arpa/IN': AXFR-style IXFR started
>> Mar  6 14:36:35 orldc-prod-ipa01 named-pkcs11[32134]: client
>> 10.26.250.66#49635 (56.10.in-addr.arpa): transfer of
>> '56.10.in-addr.arpa/IN': AXFR-style IXFR ended
>> Mar  6 14:36:37 orldc-prod-ipa01 ns-slapd:
>> [06/Mar/2017:14:36:37.977875463 -0500] NSMMReplicationPlugin -
>> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
>> (orldc-prod-ipa02:389): The remote replica has a different database
>> generation ID than the local database.  You may have to reinitialize
>> the remote replica, or the local replica.
>> Mar  6 14:36:40 orldc-prod-ipa01 ns-slapd:
>> [06/Mar/2017:14:36:40.999275184 -0500] NSMMReplicationPlugin -
>> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
>> (orldc-prod-ipa02:389): The remote replica has a different database
>> generation ID than the local database.  You may have to reinitialize
>> the remote replica, or the local replica.
>> Mar  6 14:36:45 orldc-prod-ipa01 ns-slapd:
>> [06/Mar/2017:14:36:45.211260414 -0500] NSMMReplicationPlugin -
>> agmt="cn=cloneAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
>> (orldc-prod-ipa02:389): The remote replica has a different database
>> generation ID than the local database.  You may have to reinitialize
>> the remote replica, or the local replica.
>> ------
>>
>> These messages indicate that the replica does not have the same database as
>> the master.  So either the master or the replica needs to be reinitialized.,
>> More on this below...
>>
>>
>> Errors on orldc-prod-ipa02:
>> ------
>> r  6 14:16:04 orldc-prod-ipa02 ipa-dnskeysyncd: ipa         : INFO
>> Commencing sync process
>> Mar  6 14:16:04 orldc-prod-ipa02 ipa-dnskeysyncd:
>> ipa.ipapython.dnssec.keysyncer.KeySyncer: INFO     Initial LDAP dump
>> is done, sychronizing with ODS and BIND
>> Mar  6 14:16:05 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:05.934405274 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> Mar  6 14:16:05 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:05.937278142 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> Mar  6 14:16:05 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:05.939434025 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>>
>> These are harmless "errors" which have been removed in newer versions of
>> 389-ds-base.
>>
>> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:06.882795654 -0500]
>> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) -
>> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988).
>> If replication stops, the consumer may need to be reinitialized.
>> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:06.886029272 -0500] NSMMReplicationPlugin -
>> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local"
>> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't
>> as up to date, or we purged
>>
>> This "could" also be a known issue that is fixed in newer versions of
>> 389-ds-base.  Or this is a valid error message due to the replica being
>> stale for a very long time and records actually being purged from the
>> changelog before they were replicated.
>>
>> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:06.888679268 -0500] NSMMReplicationPlugin -
>> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389):
>> Data required to update replica has been purged from the changelog.
>> The replica must be reinitialized.
>> Mar  6 14:16:06 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:06.960804253 -0500] NSMMReplicationPlugin -
>> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
>> (orldc-prod-ipa01:389): The remote replica has a different database
>> generation ID than the local database.  You may have to reinitialize
>> the remote replica, or the local replica.
>>
>> Okay, so your replication agreements/servers are not in sync.  I suspect you
>> created a new replica and used that to initialize a valid replica which
>> broke things.  Something like that.  You need to find a "good" replica
>> server and reinitialize the other replicas from that server.  These errors
>> needs to addressed asap, as it's halting replication for those agreements
>> which explains the "instability" you are describing.
>>
>> Mark
>>
>> Mar  6 14:16:08 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:08.960622608 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> Mar  6 14:16:08 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:08.968927168 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> Mar  6 14:16:08 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:08.976952118 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> Mar  6 14:16:09 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:09.972315877 -0500] NSMMReplicationPlugin -
>> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
>> (orldc-prod-ipa01:389): The remote replica has a different database
>> generation ID than the local database.  You may have to reinitialize
>> the remote replica, or the local replica.
>> Mar  6 14:16:10 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:10.034810948 -0500]
>> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) -
>> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988).
>> If replication stops, the consumer may need to be reinitialized.
>> Mar  6 14:16:10 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:10.040020359 -0500] NSMMReplicationPlugin -
>> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local"
>> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't
>> as up to date, or we purged
>> Mar  6 14:16:10 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:10.042846879 -0500] NSMMReplicationPlugin -
>> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389):
>> Data required to update replica has been purged from the changelog.
>> The replica must be reinitialized.
>> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:13.013253769 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:13.021514225 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:13.027521508 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> Mar  6 14:16:13 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:13.110566247 -0500] NSMMReplicationPlugin -
>> agmt="cn=masterAgreement1-orldc-prod-ipa01.passur.local-pki-tomcat"
>> (orldc-prod-ipa01:389): The remote replica has a different database
>> generation ID than the local database.  You may have to reinitialize
>> the remote replica, or the local replica.
>> Mar  6 14:16:14 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:14.179819300 -0500]
>> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389) -
>> Can't locate CSN 58bdf8f5000200070000 in the changelog (DB rc=-30988).
>> If replication stops, the consumer may need to be reinitialized.
>> Mar  6 14:16:14 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:14.188353328 -0500] NSMMReplicationPlugin -
>> changelog program - agmt="cn=meTobohdc-prod-ipa01.passur.local"
>> (bohdc-prod-ipa01:389): CSN 58bdf8f5000200070000 not found, we aren't
>> as up to date, or we purged
>> Mar  6 14:16:14 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:14.196463928 -0500] NSMMReplicationPlugin -
>> agmt="cn=meTobohdc-prod-ipa01.passur.local" (bohdc-prod-ipa01:389):
>> Data required to update replica has been purged from the changelog.
>> The replica must be reinitialized.
>> Mar  6 14:16:17 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:17.068292919 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> Mar  6 14:16:17 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:17.071241757 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> Mar  6 14:16:17 orldc-prod-ipa02 ns-slapd:
>> [06/Mar/2017:14:16:17.073793922 -0500] attrlist_replace - attr_replace
>> (nsslapd-referral, ldap://orldc-prod-ipa01.passur.local:389/o%3Dipaca)
>> failed.
>> ------
>>
>>
>> Thanks in advance!!!
>>
>> -- Chris
>>
>>




More information about the Freeipa-users mailing list