[Freeipa-users] multi-master replication

Rich Megginson rmeggins at redhat.com
Mon Oct 27 16:25:35 UTC 2014


On 10/27/2014 10:12 AM, Craig White wrote:
>
> *From:*freeipa-users-bounces at redhat.com 
> [mailto:freeipa-users-bounces at redhat.com] *On Behalf Of *Rich Megginson
> *Sent:* Monday, October 27, 2014 6:42 AM
> *To:* freeipa-users at redhat.com
> *Subject:* Re: [Freeipa-users] multi-master replication
>
> On 10/25/2014 06:17 PM, Dmitri Pal wrote:
>
>     On 10/24/2014 07:15 PM, Craig White wrote:
>
>         *From:*freeipa-users-bounces at redhat.com
>         <mailto:freeipa-users-bounces at redhat.com>
>         [mailto:freeipa-users-bounces at redhat.com] *On Behalf Of *Craig
>         White
>         *Sent:* Friday, October 24, 2014 4:02 PM
>         *To:* freeipa-users at redhat.com <mailto:freeipa-users at redhat.com>
>         *Subject:* [Freeipa-users] multi-master replication
>
>         I would have thought that changes go from replica to master
>         and not just master to replica.
>
>         Is there something I have to do to make the changes
>         bi-directional?
>
>         Replying to my own post…
>
>         Logs are my friend  ;-)
>
>         [24/Oct/2014:23:08:17 +0000] NSMMReplicationPlugin -
>         agmt="cn=meToipa001.domain.local " (ipa001:389): Replication
>         bind with GSSAPI auth resumed
>
>         [24/Oct/2014:23:08:17 +0000] NSMMReplicationPlugin -
>         agmt="cn=meToipa001.domain.local " (ipa001:389): Warning:
>         unable to replicate schema: rc=2
>
>         [24/Oct/2014:23:08:17 +0000] NSMMReplicationPlugin -
>         agmt="cn=meToipa001.domain.local " (ipa001:389): Failed to
>         send update operation to consumer (uniqueid
>         e018060f-5bb011e4-81078979-dc802980, CSN
>         544aa346000000030000): Can't contact LDAP server. Will retry
>         later.
>
>         [24/Oct/2014:23:08:17 +0000] NSMMReplicationPlugin -
>         agmt="cn=meToipa001.domain.local " (ipa001:389): Consumer
>         failed to replay change (uniqueid (null), CSN (null)): Can't
>         contact LDAP server(-1). Will retry later.
>
>
>     These NULLs look suspicious.
>     I hope DS gurus will have more for you on Monday.
>
>
> 1) Yes, replication is fully bi-directional.
> 2) What are the exact versions of dirsrv?  rpm -q 389-ds-base on 
> supplier and consumer.
> 3) Can you reproduce the problem using the replication log level on 
> both the supplier and consumer? 
> http://www.port389.org/docs/389ds/FAQ/faq.html#troubleshooting
> ----
>
> 1)I sort of expected it to be – that’s what had me worried when it 
> didn’t work as expected.
>
> 2)]# rpm -qa | grep 389
>
> 389-ds-base-1.2.11.15-47.el6.x86_64
>
> 389-ds-base-libs-1.2.11.15-47.el6.x86_64
>
> 3)Upped the log levels – hopefully this is a reasonable representative 
> from ipa002 (the replica)
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - conn=4 op=229791 
> repl="dc=stt,dc=local": Acquired replica
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - conn=4 op=229791 
> repl="dc=stt,dc=local": StartNSDS90ReplicationRequest: response=0 rc=0
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - conn=4 op=229791 
> Relinquishing consumer connection extension
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> ruv_add_csn_inprogress: successfully inserted csn 544e6b02000000040000 
> into pending list
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - Purged state 
> information from entry 
> krbprincipalname=ldap/ipa002.FQDN at FQDN,cn=services,cn=accounts,dc=stt,dc=local 
> up to CSN 5445307e000000040000
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - changelog program 
> - _cl5GetDBFileByReplicaName: found DB object f32b80 for database 
> /var/lib/dirsrv/slapd-STT-LOCAL/cldb/dababc08-5bb011e4-81078979-dc802980_544aa332000000040000.db4
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - changelog program 
> - _cl5GetDBFileByReplicaName: found DB object f32b80 for database 
> /var/lib/dirsrv/slapd-STT-LOCAL/cldb/dababc08-5bb011e4-81078979-dc802980_544aa332000000040000.db4
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - ruv_update_ruv: 
> successfully committed csn 544e6b02000000040000
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): Disconnected from the consumer
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): Warning: unable to replicate 
> schema: rc=2
>
> [27/Oct/2014:15:55:47 +0000] - csngen_adjust_time: gen state before 
> 544e6b040001:1414425347:0:1
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - changelog program 
> - _cl5GetDBFile: found DB object f32b80 for database 
> /var/lib/dirsrv/slapd-STT-LOCAL/cldb/dababc08-5bb011e4-81078979-dc802980_544aa332000000040000.db4
>
> [27/Oct/2014:15:55:47 +0000] - _cl5PositionCursorForReplay 
> (agmt="cn=meToipa001.FQDN" (ipa001:389)): Consumer RUV:
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): {replicageneration} 
> 544aa332000000040000
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): {replica 4 
> ldap://ipa001.FQDN:389} 544aa3a1000000040000 544e6b02000000040000 00000000
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): {replica 3 
> ldap://ipa002.FQDN:389} 544aa339000000030000 544aa33f000400030000 00000000
>
> [27/Oct/2014:15:55:47 +0000] - _cl5PositionCursorForReplay 
> (agmt="cn=meToipa001.FQDN" (ipa001:389)): Supplier RUV:
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): {replicageneration} 
> 544aa332000000040000
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): {replica 3 
> ldap://ipa002.FQDN:389} 544aa339000000030000 544e6aa4000300030000 544e6aa3
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): {replica 4 
> ldap://ipa001.FQDN:389} 544aa3a1000000040000 544e6b02000000040000 544e6b03
>
> [27/Oct/2014:15:55:47 +0000] agmt="cn=meToipa001.FQDN" (ipa001:389) - 
> clcache_get_buffer: found thread private buffer cache 7fe85000ea50
>
> [27/Oct/2014:15:55:47 +0000] agmt="cn=meToipa001.FQDN" (ipa001:389) - 
> clcache_get_buffer: _pool is f89af0 _pool->pl_busy_lists is 
> 7fe8500197d0 _pool->pl_busy_lists->bl_buffers is 7fe85000ea50
>
> [27/Oct/2014:15:55:47 +0000] agmt="cn=meToipa001.FQDN" (ipa001:389) - 
> session start: anchorcsn=544aa33f000400030000
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - changelog program 
> - agmt="cn=meToipa001.FQDN" (ipa001:389): CSN 544aa33f000400030000 
> found, position set for replay
>
> [27/Oct/2014:15:55:47 +0000] agmt="cn=meToipa001.FQDN" (ipa001:389) - 
> load=1 rec=1 csn=544aa346000000030000
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): replay_update: Sending add 
> operation 
> (dn="cn=KDC,cn=ipa002.FQDN,cn=masters,cn=ipa,cn=etc,dc=stt,dc=local" 
> csn=544aa346000000030000)
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): replay_update: Consumer could 
> not replay operation with csn 544aa346000000030000
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): Failed to send update 
> operation to consumer (uniqueid e018060f-5bb011e4-81078979-dc802980, 
> CSN 544aa346000000030000): Can't contact LDAP server. Will retry later.
>
> [27/Oct/2014:15:55:47 +0000] - repl5_inc_result_threadmain starting
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): Connection disconnected by 
> another thread
>
> [27/Oct/2014:15:55:47 +0000] - repl5_inc_result_threadmain: read 
> result for message_id 0
>
> [27/Oct/2014:15:55:47 +0000] - repl5_inc_result_threadmain: result 3, 
> -1, 2, 0, (null)
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): Consumer failed to replay 
> change (uniqueid (null), CSN (null)): Can't contact LDAP server(-1). 
> Will retry later.
>
> [27/Oct/2014:15:55:47 +0000] - repl5_inc_result_threadmain: got op 
> result 205 should finish 1
>
> [27/Oct/2014:15:55:47 +0000] - repl5_inc_result_threadmain exiting
>
> [27/Oct/2014:15:55:47 +0000] agmt="cn=meToipa001.FQDN" (ipa001:389) - 
> session end: state=0 load=1 sent=1 skipped=0 skipped_new_rid=0 
> skipped_csn_gt_cons_maxcsn=0 skipped_up_to_date=0 skipped_csn_gt_ruv=0 
> skipped_csn_covered=0
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): Warning: unable to send 
> endReplication extended operation (Can't contact LDAP server)
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): State: sending_updates -> 
> start_backoff
>
> [27/Oct/2014:15:55:47 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): State: start_backoff -> 
> start_backoff
>
> [27/Oct/2014:15:55:49 +0000] NSMMReplicationPlugin - conn=4 op=229793 
> Acquired consumer connection extension
>
> [27/Oct/2014:15:55:49 +0000] NSMMReplicationPlugin - conn=4 op=229793 
> repl="dc=stt,dc=local": Released replica held by locking_purl=conn=4 
> id=229791
>
> [27/Oct/2014:15:55:49 +0000] NSMMReplicationPlugin - conn=4 op=229793 
> Relinquishing consumer connection extension
>
> [27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): State: start_backoff -> backoff
>
> [27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): Trying non-secure 
> slapi_ldap_init_ext
>
> [27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): binddn = ,  passwd =
>
> [27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): Replication bind with GSSAPI 
> auth resumed
>
> [27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): No linger to cancel on the 
> connection
>
> [27/Oct/2014:15:55:51 +0000] - _csngen_adjust_local_time: gen state 
> before 544e6b040001:1414425347:0:1
>
> [27/Oct/2014:15:55:51 +0000] - _csngen_adjust_local_time: gen state 
> after 544e6b080000:1414425351:0:1
>
> [27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): Replica was successfully acquired.
>
> [27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - 
> agmt="cn=meToipa001.FQDN" (ipa001:389): State: backoff -> sending_updates
>
> [27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - conn=4 op=229794 
> Acquired consumer connection extension
>
> [27/Oct/2014:15:55:51 +0000] NSMMReplicationPlugin - conn=4 op=229794 
> repl="dc=stt,dc=local": Begin incremental protocol
>
> [27/Oct/2014:15:55:51 +0000] - csngen_adjust_time: gen state before 
> 544e6b080001:1414425351:0:1
>
> [27/Oct/2014:15:55:51 +0000] - csngen_adjust_time: gen state after 
> 544e6b080001:1414425351:0:1
>
> This is the log from ipa002 (which was the replica) and I am confused 
> by the error logged, ‘Can’t contact LDAP server’ because it surely can…
>
> ]# telnet 172.29.31.100 389
>
> Trying 172.29.31.100...
>
> Connected to 172.29.31.100.
>
> Escape character is '^]'.
>
> ^]
>
> telnet> quit
>
> Connection closed.
>

Replication uses Kerberos/GSSAPI and also possibly startTLS (I can't 
remember if ipa uses startTLS for an additional layer of encryption, or 
just uses Kerberos/GSSAPI for encryption).  These might also cause the 
directory server to report connection failures even if the tcp layer 
connections are working.

Can you provided excerpts from the 172.29.31.100 directory server access 
logs from around the time of the connection failures reported above?

> This is a new setup (setup master on Thursday, imported users from 
> OpenLDAP and then setup replica on Friday). Had a few issues that I 
> didn’t understand which may or may not be relevant. First, after I 
> made some changes to SSSD to allow for sudo and hopefully HBAC, it 
> seemed like the directory server crashed and I had to restart it.
>

Crashed?  If you can reproduce this problem, see 
http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-crashes
If there is a crash, we need to see the stack trace.

> Seemed like the same thing happened at the same time when I likewise 
> configured the replica.
>

Again, see above.

> Also, when I was trying to setup the replica, I had to resort to 
> –skip-conncheck as it (presume Kerberos) not only asked for the 
> password for admin@$DOMAIN but also the password for 
> admin at ipa001.$DOMAIN <mailto:admin at ipa001.$DOMAIN> for which I had 
> never specifically setup any password and no password already used in 
> Kerberos worked.
>

Hmm - maybe one of the ipa guys knows what the problem is here.

> Thanks
>
> Craig
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20141027/940d07e7/attachment.htm>


More information about the Freeipa-users mailing list