[Freeipa-users] Replication stopped working

Fri Sep 5 18:06:00 UTC 2014

Update:
m2 and m3 are now in sync!

After making sure ldapsearch was working both ways (m1<=>m2 and
m1<=>m3) using the server's keytabs (/etc/dirsrv/ds.keytab) for
getting the ticket, I re-initialize both replicas and they were able
to get updated:
@m2 # ipa-replica-manage re-initialize --from m1.example.com
@m3 # ipa-replica-manage re-initialize --from m1.example.com

Thanks so much for your hint Martin!

On Fri, Sep 5, 2014 at 12:43 PM, Guillermo Fuentes
<guillermo.fuentes at modernizingmedicine.com> wrote:
> Hi Martin,
>
> Attached are m2.log, m3.log and m4.log files.
>
> 1) All masters are time synced with same NTP server pool.
> 2) DNS is fine. Forward and reverse lookup.
> 3) ldapsearch:
> m1 to m2 and m3 work:
>   kinit -k -t /etc/dirsrv/ds.keytab ldap/`hostname` # getting ticket on m1
>
>   ldapsearch -Y GSSAPI -H ldaps://m2.example.com  -b
> "dc=example,dc=com"  uid=testuser
>   ldapsearch -Y GSSAPI -H ldaps://m3.example.com  -b
> "dc=example,dc=com"  uid=testuser
>
> m1 to m4 fails:
> # ldapsearch -Y GSSAPI -H ldaps://m4.example.com  -b
> "dc=example,dc=com"  uid=testuser
> SASL/GSSAPI authentication started
> ldap_sasl_interactive_bind_s: Local error (-2)
> additional info: SASL(-1): generic failure: GSSAPI Error: Unspecified
> GSS failure.  Minor code may provide more information (KDC returned
> error string: FINDING_SERVER_KEY)
>
>
> m2 to m1, and m3 to m1 work fine:
>   kinit -k -t /etc/dirsrv/ds.keytab ldap/`hostname`
>   ldapsearch -Y GSSAPI -H ldaps://m1.example.com  -b
> "dc=example,dc=com"  uid=testuser
>
> m4 to m1 fails:
> # ldapsearch -Y GSSAPI -H ldaps://m1.example.com  -b
> "dc=example,dc=com"  uid=testuser
> SASL/GSSAPI authentication started
> ldap_sasl_interactive_bind_s: Invalid credentials (49)
> additional info: SASL(-14): authorization failure: security flags do
> not match required
>
>
> m2 and m3 are at the same state now where connections between them and
> m1 are fine but the updates won't happen logging the following on m1
> (/var/log/dirsrv/slapd-EXAMPLE-COM/errors) for both:
>
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): replay_update: Sending modify
> operation (dn="uid=testuser,cn=users,cn=accounts,dc=example,dc=com"
> csn=53d66ecb000000040000)
> [05/Sep/2014:12:30:49 -0400] - repl5_inc_result_threadmain: read
> result for message_id 0
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): replay_update: modifys
> operation (dn="uid=testuser,cn=users,cn=accounts,dc=example,dc=com"
> csn=53d66ecb000000040000) not sent - empty
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): replay_update: Consumer
> successfully sent operation with csn 53d66ecb000000040000
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): Skipping update operation with
> no message_id (uniqueid 04b0b435-5ef311e3-9c91ec9f-6cd72e64, CSN
> 53d66ecb000000040000):
> [05/Sep/2014:12:30:49 -0400] agmt="cn=meTom3.example.com" (m3:389) -
> load=1 rec=38 csn=53d66ecb000200040000
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): replay_update: Sending modify
> operation (dn="uid=testuser,cn=users,cn=accounts,dc=example,dc=com"
> csn=53d66ecb000200040000)
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): replay_update: modifys
> operation (dn="uid=testuser,cn=users,cn=accounts,dc=example,dc=com"
> csn=53d66ecb000200040000) not sent - empty
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): replay_update: Consumer
> successfully sent operation with csn 53d66ecb000200040000
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): Skipping update operation with
> no message_id (uniqueid 04b0b435-5ef311e3-9c91ec9f-6cd72e64, CSN
> 53d66ecb000200040000):
> [05/Sep/2014:12:30:49 -0400] agmt="cn=meTom3.example.com" (m3:389) -
> load=1 rec=39 csn=53d66ecc000100040000
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): replay_update: Sending modify
> operation (dn="uid=testuser,cn=users,cn=accounts,dc=example,dc=com"
> csn=53d66ecc000100040000)
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): replay_update: modifys
> operation (dn="uid=testuser,cn=users,cn=accounts,dc=example,dc=com"
> csn=53d66ecc000100040000) not sent - empty
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): replay_update: Consumer
> successfully sent operation with csn 53d66ecc000100040000
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): Skipping update operation with
> no message_id (uniqueid 04b0b435-5ef311e3-9c91ec9f-6cd72e64, CSN
> 53d66ecc000100040000):
> [05/Sep/2014:12:30:49 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): No more updates to send
> (cl5GetNextOperationToReplay)
> [05/Sep/2014:12:30:49 -0400] - repl5_inc_waitfor_async_results: 0 0
> [05/Sep/2014:12:30:49 -0400] - repl5_inc_result_threadmain: read
> result for message_id 0
> [05/Sep/2014:12:30:49 -0400] - repl5_inc_result_threadmain: read
> result for message_id 0
> [05/Sep/2014:12:30:49 -0400] - repl5_inc_result_threadmain: read
> result for message_id 0
> [05/Sep/2014:12:30:50 -0400] - repl5_inc_result_threadmain: read
> result for message_id 0
> [05/Sep/2014:12:30:50 -0400] - repl5_inc_result_threadmain: read
> result for message_id 0
> [05/Sep/2014:12:30:51 -0400] - repl5_inc_result_threadmain exiting
> [05/Sep/2014:12:30:51 -0400] agmt="cn=meTom2.example.com" (m2:389) -
> session end: state=3 load=1 sent=36 skipped=13
> [05/Sep/2014:12:30:51 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom2.example.com" (m2:389): Successfully released consumer
> [05/Sep/2014:12:30:51 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom2.example.com" (m2:389): Beginning linger on the
> connection
> [05/Sep/2014:12:30:51 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom2.example.com" (m2:389): State: sending_updates ->
> wait_for_changes
> [05/Sep/2014:12:30:51 -0400] - repl5_inc_result_threadmain exiting
> [05/Sep/2014:12:30:51 -0400] agmt="cn=meTom3.example.com" (m3:389) -
> session end: state=3 load=1 sent=36 skipped=13
> [05/Sep/2014:12:30:51 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): Successfully released consumer
> [05/Sep/2014:12:30:51 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): Beginning linger on the
> connection
> [05/Sep/2014:12:30:51 -0400] NSMMReplicationPlugin -
> agmt="cn=meTom3.example.com" (m3:389): State: sending_updates ->
> wait_for_changes
>
> Thanks for your help!
>
> --
> Guillermo Fuentes Rodriguez
> Computer Systems Analyst
> 561-880-2998 x337
> guillermo.fuentes at modmed.com
> 866-799-2146 Toll Free
> 3600 FAU Blvd., Ste. 202, Boca Raton FL 33431
>
>
>
> FORBES 2013 Top 50 - America's Most Promising Companies
> SFBJ 2013 Best Places To Work
> SFBJ 2012 & 2013 #1 Fastest Growing Company S. FL "Fast 50"
> Red Herring 2014 North America Top 100 Company
>
>
>
>
> On Fri, Sep 5, 2014 at 2:24 AM, Martin Kosek <mkosek at redhat.com> wrote:
>> On 09/04/2014 05:11 PM, Guillermo Fuentes wrote:
>>> Hello list,
>>>
>>> We’re running FreeIPA with a master and 3 replicas. The replication
>>> stopped working and currently we’re adding resources only to the
>>> master. This is the environment we have:
>>> m1:
>>>   OS: CentOS release 6.5
>>>   FreeIPA: 3.0.0-37
>>>   CA: pki-ca-9.0.3
>>>
>>>
>>> # ipa-replica-manage list -v `hostname`
>>> m2.example.com: replica
>>>   last init status: None
>>>   last init ended: None
>>>   last update status: 49  - LDAP error: Invalid credentials
>>>   last update ended: None
>>> m3.example.com: replica
>>>   last init status: None
>>>   last init ended: None
>>>   last update status: 0 Replica acquired successfully: Incremental
>>> update succeeded
>>>   last update ended: 2014-09-04 14:28:44+00:00
>>> m4.example.com: replica
>>>   last init status: None
>>>   last init ended: None
>>>   last update status: -2  - LDAP error: Local error
>>>   last update ended: None
>>>
>>> m2:
>>>   OS: CentOS release 6.5
>>>   FreeIPA: 3.0.0-37
>>>
>>> # ipa-replica-manage list -v `hostname`
>>> m1.example.com: replica
>>>   last init status: None
>>>   last init ended: None
>>>   last update status: -1 Incremental update has failed and requires
>>> administrator actionLDAP error: Can't contact LDAP server
>>>   last update ended: 2014-09-03 22:53:21+00:00
>>>
>>> m3:
>>>   OS: CentOS release 6.5
>>>   FreeIPA: 3.0.0-37
>>>
>>> # ipa-replica-manage list -v `hostname`
>>> m1.example.com: replica
>>>   last init status: None
>>>   last init ended: None
>>>   last update status: 0 Replica acquired successfully: Incremental
>>> update succeeded
>>>   last update ended: 2014-09-04 14:31:51+00:00
>>>
>>> m4:
>>>   OS: CentOS release 6.5
>>>   FreeIPA: 3.3.3-28
>>>
>>> # ipa-replica-manage list -v `hostname`
>>> m1.example.com: replica
>>>   last init status: None
>>>   last init ended: None
>>>   last update status: 49 Unable to acquire replicaLDAP error: Invalid
>>> credentials
>>>   last update ended: None
>>>
>>>
>>> Note that although m3 reports “Incremental update succeeded”, users
>>> created on m1 are not replicated to m3, and users created on m3 are
>>> not replicated back to m1.
>>>
>>> We’ve tried different things including re-initializing m2.
>>>
>>> Can somebody point me in the right direction to get replication going again?
>>>
>>> Thanks in advance!
>>>
>>> Guillermo
>>
>> Hello,
>>
>> I think we would need more troubleshooting information that are available in
>> /var/log/dirsrv/slapd-EXAMPLE-COM/errors, especially on m2, m3, m4.
>>
>> Few pointers what I would try myself:
>> 1) Check that all masters have time synced (difference in matter of seconds is OK)
>>
>> 2) Check that DNS is all right - all replicas can resolve master's forward and
>> reverse address. Master can resolve all replicas forward and reverse address.
>>
>> This is common source of replication/Kerberos errors
>> (http://www.freeipa.org/page/Troubleshooting#Kerberos_does_not_work)
>> The error "Can't contact LDAP server" may point to DNS issues.
>>
>> 3) Check that you can do plain ldapsearch from replica to master. Ideally even
>> authenticated with keytab from /etc/dirsrv/ds.keytab
>>
>> HTH,
>> Martin
>
>
>
> --
> Guillermo Fuentes Rodriguez
> Computer Systems Analyst
> 561-880-2998 x337
> guillermo.fuentes at modmed.com
> 866-799-2146 Toll Free
> 3600 FAU Blvd., Ste. 202, Boca Raton FL 33431
>
>
>
> FORBES 2013 Top 50 - America's Most Promising Companies
> SFBJ 2013 Best Places To Work
> SFBJ 2012 & 2013 #1 Fastest Growing Company S. FL "Fast 50"
> Red Herring 2014 North America Top 100 Company

-- 
Guillermo Fuentes Rodriguez
Computer Systems Analyst
561-880-2998 x337
guillermo.fuentes at modmed.com
866-799-2146 Toll Free
3600 FAU Blvd., Ste. 202, Boca Raton FL 33431

FORBES 2013 Top 50 - America's Most Promising Companies
SFBJ 2013 Best Places To Work
SFBJ 2012 & 2013 #1 Fastest Growing Company S. FL "Fast 50"
Red Herring 2014 North America Top 100 Company