[Freeipa-users] Replication has stopped and server errors
Martin Basti
mbasti at redhat.com
Fri Jan 6 16:58:21 UTC 2017
On 06.01.2017 00:29, sipazzo wrote:
> I have6 ipa servers in 3 locations running 4.2.0-15.0.1on RHEL 7.
> Ipa1-dev is the CA Renewal and CRL Master server and where most of our
> updates (host enrollment, password changes) end up taking place.
> Servers had been running fine. Over the holidays we started having
> some replication issues and looking at
> /var/log/dirsrv/slapd-REALM-COM/errors showed the following:
>
> All servers currently have these errors for each replica the
> respective IPA servers are connected to:
> NSMMReplicationPlugin - agmt="cn=meToipa2-dr.example.local"
> (ipa2-dr:389): Incremental update failed and requires administrator action
> [04/Jan/2017:15:39:48 -0800] agmt="cn=meToipa1-dr.example.local"
> (ipa1-dr:389) - Can't locate CSN 583c8e74000600110000 in the changelog
> (DB rc=-30988). If replication stops, the consumer may need to be
> reinitialized
> NSMMReplicationPlugin - agmt="cn=meToipa1-prod.example.local"
> (ipa1-prod:389): Data required to update replica has been purged. The
> replica must be reinitialized.
> [04/Jan/2017:13:33:26 -0800] NSMMReplicationPlugin -
> agmt="cn=meToipa2-dev.example.local" (ipa2-dev:389): Incremental
> update failed and requires administrator action
> [04/Jan/2017:13:33:26 -0800] NSMMReplicationPlugin -
> agmt="cn=meToipa1-prod.example.local" (ipa1-prod:389): Incremental
> update failed and requires administrator action
> [04/Jan/2017:13:33:27 -0800] agmt="cn=meToipa2-prod.example.local"
> (ipa2-prod:389) - Can't locate CSN 586d69f0000400120000 in the
> changelog (DB rc=-30988). If replication stops, the consumer may need
> to be reinitialized.
> And all servers have these types of errors which are worrisome but
> they go back quite a way
> *NSACL*Plugin - The ACL target cn=dns,dc=example,dc=local does not exist
> *NSACL*Plugin - The ACL target cn=dns,dc=example,dc=local does not exist
> *NSACL*Plugin - The ACL target cn=groups,cn=compat,dc=example,dc=local
> does not exist
> *NSACL*Plugin - The ACL target
> cn=computers,cn=compat,dc=example,dc=local does not exist
> *NSACL*Plugin - The ACL target cn=casigningcert
> cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=example,dc=local does not exist
> *NSACL*Plugin - The ACL target cn=casigningcert
> cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=example,dc=local does not exist
> *NSACL*Plugin - The ACL target ou=sudoers,dc=networkfleet,dc=local
> does not exist
^^^ just INFO messages, you can ignore them
> All servers except one have a lot of these
> DSRetroclPlugin - delete_changerecord: could not delete change record
> Ipa1-dev only has this
> 04/Jan/2017:18:36:52 -0800] NSMMReplicationPlugin -
> agmt="cn=masterAgreement1-ipa1-prod.example.local-pki-tomcat"
> (ipa1-prod:389): Replication bind with *SIMPLE* auth resumed
> [04/Jan/2017:18:36:52 -0800] NSMMReplicationPlugin -
> agmt="cn=masterAgreement1-ipa2-dr.example.local-pki-tomcat"
> (ipa2-dr:389): Replication bind with *SIMPLE* auth resumed
> [04/Jan/2017:18:36:52 -0800] NSMMReplicationPlugin -
> agmt="cn=masterAgreement1-ipa1-dr.example.local-pki-tomcat"
> (ipa1-dr:389): Replication bind with *SIMPLE* auth resumed
> [04/Jan/2017:18:36:53 -0800] NSMMReplicationPlugin -
> agmt="cn=masterAgreement1-ipa2-prod.example.local-pki-tomcat"
> (ipa2-prod:389): Replication bind with *SIMPLE* auth resumed
> 3 servers (ipa1-dr ipa2-dr ipa2-prod) have these errors:
> [01/Jan/2017:14:43:06 -0800] - libdb: BDB2055 Lock table is out of
> available lock entries
> [01/Jan/2017:14:43:06 -0800] - compactdb: failed to compact changelog;
> db error - 12 Cannot allocate memory
you probably need https://access.redhat.com/solutions/1241063 to
increase number of locks (or in this thread
https://lists.fedoraproject.org/pipermail/389-users/2011-June/013299.html)
I would first increase the number of locks, and then look if something
improved.
We also don't know how your topology looks like, which servers are
connected together.
Martin
> 4 servers (ipa1-dev, ipa2-dev, ipa1-dr and ipa2-dr) have these errors
> [04/Jan/2017:15:37:21 -0800] slapd_ldap_sasl_interactive_bind - Error:
> could not perform interactive bind for id [] mech [GSSAPI]: LDAP error
> -1 (Can't contact LDAP server) ((null)) errno 107 (*Transport*
> endpoint is not connected)
> [04/Jan/2017:15:37:24 -0800] slapd_ldap_sasl_interactive_bind - Error:
> could not perform interactive bind for id [] mech [GSSAPI]: LDAP error
> -1 (Can't contact LDAP server) ((null)) errno 107 (*Transport*
> endpoint is not connected)
>
> I have tried various combinations or restarting, re-initializing,
> disconnecting and reconnecting replicas but am down to only two
> servers replicating with each other currently (ipa1-dev and ipa2-dev).
> We did have a power outage at the dev location but it does not seem to
> correspond to when the errors started? Not sure how to recover from
> this. Any help is appreciated
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20170106/514a034a/attachment.htm>
More information about the Freeipa-users
mailing list