[Freeipa-users] Replication has stopped and server errors

Fri Jan 6 16:58:21 UTC 2017

On 06.01.2017 00:29, sipazzo wrote:
> I have6 ipa servers in 3 locations running 4.2.0-15.0.1on RHEL 7. 
> Ipa1-dev is the CA Renewal and CRL Master server and where most of our 
> updates (host enrollment, password changes) end up taking place.
> Servers had been running fine. Over the holidays we started having 
> some replication issues and looking at 
> /var/log/dirsrv/slapd-REALM-COM/errors showed the following:
>
> All servers currently have these errors for each replica the 
> respective IPA servers are connected to:
> NSMMReplicationPlugin - agmt="cn=meToipa2-dr.example.local" 
> (ipa2-dr:389): Incremental update failed and requires administrator action
> [04/Jan/2017:15:39:48 -0800] agmt="cn=meToipa1-dr.example.local" 
> (ipa1-dr:389) - Can't locate CSN 583c8e74000600110000 in the changelog 
> (DB rc=-30988). If replication stops, the consumer may need to be 
> reinitialized
> NSMMReplicationPlugin - agmt="cn=meToipa1-prod.example.local" 
> (ipa1-prod:389): Data required to update replica has been purged. The 
> replica must be reinitialized.
> [04/Jan/2017:13:33:26 -0800] NSMMReplicationPlugin - 
> agmt="cn=meToipa2-dev.example.local" (ipa2-dev:389): Incremental 
> update failed and requires administrator action
> [04/Jan/2017:13:33:26 -0800] NSMMReplicationPlugin - 
> agmt="cn=meToipa1-prod.example.local" (ipa1-prod:389): Incremental 
> update failed and requires administrator action
> [04/Jan/2017:13:33:27 -0800] agmt="cn=meToipa2-prod.example.local" 
> (ipa2-prod:389) - Can't locate CSN 586d69f0000400120000 in the 
> changelog (DB rc=-30988). If replication stops, the consumer may need 
> to be reinitialized.
> And all servers have these types of errors which are worrisome but 
> they go back quite a way
> *NSACL*Plugin - The ACL target cn=dns,dc=example,dc=local does not exist
> *NSACL*Plugin - The ACL target cn=dns,dc=example,dc=local does not exist
> *NSACL*Plugin - The ACL target cn=groups,cn=compat,dc=example,dc=local 
> does not exist
> *NSACL*Plugin - The ACL target 
> cn=computers,cn=compat,dc=example,dc=local does not exist
> *NSACL*Plugin - The ACL target cn=casigningcert 
> cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=example,dc=local does not exist
> *NSACL*Plugin - The ACL target cn=casigningcert 
> cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=example,dc=local does not exist
> *NSACL*Plugin - The ACL target ou=sudoers,dc=networkfleet,dc=local 
> does not exist
^^^ just INFO messages, you can ignore them

> All servers except one have a lot of these
> DSRetroclPlugin - delete_changerecord: could not delete change record
> Ipa1-dev only has this
> 04/Jan/2017:18:36:52 -0800] NSMMReplicationPlugin - 
> agmt="cn=masterAgreement1-ipa1-prod.example.local-pki-tomcat" 
> (ipa1-prod:389): Replication bind with *SIMPLE* auth resumed
> [04/Jan/2017:18:36:52 -0800] NSMMReplicationPlugin - 
> agmt="cn=masterAgreement1-ipa2-dr.example.local-pki-tomcat" 
> (ipa2-dr:389): Replication bind with *SIMPLE* auth resumed
> [04/Jan/2017:18:36:52 -0800] NSMMReplicationPlugin - 
> agmt="cn=masterAgreement1-ipa1-dr.example.local-pki-tomcat" 
> (ipa1-dr:389): Replication bind with *SIMPLE* auth resumed
> [04/Jan/2017:18:36:53 -0800] NSMMReplicationPlugin - 
> agmt="cn=masterAgreement1-ipa2-prod.example.local-pki-tomcat" 
> (ipa2-prod:389): Replication bind with *SIMPLE* auth resumed
> 3 servers (ipa1-dr ipa2-dr ipa2-prod) have these errors:
> [01/Jan/2017:14:43:06 -0800] - libdb: BDB2055 Lock table is out of 
> available lock entries
> [01/Jan/2017:14:43:06 -0800] - compactdb: failed to compact changelog; 
> db error - 12 Cannot allocate memory

you probably need https://access.redhat.com/solutions/1241063 to 
increase number of locks (or in this thread 
https://lists.fedoraproject.org/pipermail/389-users/2011-June/013299.html)

I would first increase the number of locks, and then look if something 
improved.
We also don't know how your topology looks like, which servers are 
connected together.

Martin

> 4 servers (ipa1-dev, ipa2-dev, ipa1-dr and ipa2-dr) have these errors
> [04/Jan/2017:15:37:21 -0800] slapd_ldap_sasl_interactive_bind - Error: 
> could not perform interactive bind for id [] mech [GSSAPI]: LDAP error 
> -1 (Can't contact LDAP server) ((null)) errno 107 (*Transport* 
> endpoint is not connected)
> [04/Jan/2017:15:37:24 -0800] slapd_ldap_sasl_interactive_bind - Error: 
> could not perform interactive bind for id [] mech [GSSAPI]: LDAP error 
> -1 (Can't contact LDAP server) ((null)) errno 107 (*Transport* 
> endpoint is not connected)
>
> I have tried various combinations or restarting, re-initializing, 
> disconnecting and reconnecting replicas but am down to only two 
> servers replicating with each other currently (ipa1-dev and ipa2-dev). 
> We did have a power outage at the dev location but it does not seem to 
> correspond to when the errors started? Not sure how to recover from 
> this. Any help is appreciated
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20170106/514a034a/attachment.htm>