[Freeipa-users] Replication failing on FreeIPA 4.2.0

Nathan Peters Nathan.Peters at globalrelay.net
Sun Jan 17 09:10:15 UTC 2016


After some amount of work, I was able to get my system back to a state where it seems to be replicating ok, but not with FreeIPA 4.2.0.  Because this was a production system with several hundred users and computers attached to it, a wipe of the domain was not an option so I decided to chance that the new replication topology features would help.

I replaced each CentOS 7 domain controller with a Fedora 23 FreeIPA 4.2.3 host and while doing so I noticed an odd behavior of the RUVs.  I know about the current bug where deleting a replica doesn't delete its RUV and I experienced that. I would run a command like this :

dn: cn=clean 4, cn=cleanallruv, cn=tasks, cn=config
objectclass: top
objectclass: extensibleObject
replica-base-dn: dc=mydomain,dc=net
replica-id: 4
replica-force-cleaning: yes
cn: clean 4

It would fail only if I was not in a current agreement with the new Fedora RUV for that host.  Ie, if the old CentOS host had a RUV of 4, and the new Fedora host 15, and I was in an agreement with 15, that ldap code would delete 4, but if I was not in an agreement with 15, it would fail.

After A while I had every server in an agreement with all others and got all the old RUVs cleared.

I was still experiencing strange error messages in my logs with FreeIPA 4.2.3 so I decided to go all the way to 4.3.0.

Here are the 4.2.3 errors :

[16/Jan/2016:22:29:12 -0800] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica dc=mydomain,dc=net: LDAP error - 53
[16/Jan/2016:22:29:13 -0800] NSMMReplicationPlugin - agmt_delete: begin
[16/Jan/2016:22:32:51 -0800] slapi_ldap_bind - Error: could not bind id [cn=Replication Manager masterAgreement1-dc2-ipa-dev-van.mydomain.net-pki-tomcat,ou=csusers,cn=config] authentication mechanism [SIMPLE]: error 32 (No such object) errno 0 (Success)

On 4 servers, 3 upgrades to 4.3.0 went smooth, and 1 just hung during the %post section of the dnf install for an hour with ns-lapd process taking 100% cpu on all 4 cores until I stopped it.  A subsequent ipa-server-upgrade fixed everything.

With the new replication topology management graphs and controls in the ui, I was able to find some missing segments and replace some that were for some reason only 1 way.

Replication seems to actually be proceeding smoothly and now instead of getting the hundreds of error log entries per second that I had reported in my earlier posts, I am only getting about 3 every 5 minutes.  The bugs that were present in 4.2.0 and 4.2.3 seem to be almost entirely gone.

I have ran the new topology suffix verification commands and they say everything is ok.

I still get these errors in batches of 3, but they don't seem to be doing anything harmful in terms of my systems ability to operating and replicate properly :

[17/Jan/2016:01:07:27 -0800] attrlist_replace - attr_replace (nsslapd-referral, ldap://dc1-ipa-dev-nvan.mydomain.net:389/o%3Dipaca) failed.

-----Original Message-----
From: freeipa-users-bounces at redhat.com [mailto:freeipa-users-bounces at redhat.com] On Behalf Of Nathan Peters
Sent: January-15-16 10:00 AM
To: Ludwig Krispenz
Cc: freeipa-users at redhat.com
Subject: Re: [Freeipa-users] Replication failing on FreeIPA 4.2.0

No dice on the rebuild and RUV cleaning. I'm still getting a pile of these on dc1-van : 

[15/Jan/2016:17:55:25 +0000] NSMMReplicationPlugin - agmt="cn=meTodc1-ipa-dev-nvan.mydomain.net" (dc1-ipa-dev-nvan:389): Skipping update operation with no message_id (uniqueid 6e6784a0-b5c911e5-b1f1cd78-f19552bb, CSN 569932db000000040000):

I'm also getting these on dc1-nvan: 

[15/Jan/2016:17:45:36 +0000] attrlist_replace - attr_replace (nsslapd-referral, ldap://dc1-ipa-dev-van.mydomain.net:389/o%3Dipaca) failed.




-----Original Message-----
From: Ludwig Krispenz [mailto:lkrispen at redhat.com] 
Sent: January-15-16 12:19 AM
To: Nathan Peters
Cc: Rob Crittenden; freeipa-users at redhat.com
Subject: Re: [Freeipa-users] Replication failing on FreeIPA 4.2.0


On 01/15/2016 08:32 AM, Nathan Peters wrote:
> I think I've finally started to make some progress on this.  I did a lot of googling and found some stuff to run manually in 389 ds through ldapmodify commands to clean RUVs.  During this process the server crashed and when it came back online, suddenly all my ghost RUVs were visible through ipa-replica-manage list-ruv.  It was really strange, I had like 5 of them from winsync agreements that kept failing and needing re-initialization, and another 5 from my earlier re-installations of the 2 other domain controllers.
>
> I ran some more ruv cleanup commands through ldap and they all appear to be gone.  I'm not sure how the crash suddenly made them visible though or why they had to be cleaned through ldapmodify directly and ipa-replica-manage could neither see nor clean them.
After a crash the RUV could be rebuilt from the changelog, and the changelog could contain references to cleaned ReplicaIds and so they came to live again. The cleanallruv task was enhanced to also clean the changelog, but this fix is in 1.3.4.2+.

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project




More information about the Freeipa-users mailing list