[Freeipa-users] IPA Replica Issues (Total update abortedLDAP error: Can't contact LDAP server)

Rich Megginson rmeggins at redhat.com
Thu Apr 3 22:32:37 UTC 2014


On 04/03/2014 03:46 PM, Nevada Sanchez wrote:
> Okay, I updated the gist and extended some of the logs (ipa2-errors 
> does stop at 20:50:21). I'll follow up when I have the debug stuff in 
> place.
>
> https://gist.github.com/nevsan/8b6f78d7396963dc5f70

Another strange thing - it looks as if the initial replica init 
completes successfully.

[02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin - Beginning total 
update of replica "agmt="cn=meToipa2.example.com" (ipa2:389)".

On the replica:

[02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin - 
multimaster_be_state_change: replica dc=example,dc=com is going offline; 
disabling replication
[02/Apr/2014:20:50:18 +0000] - WARNING: Import is running with 
nsslapd-db-private-import-mem on; No other process is allowed to access 
the database
[02/Apr/2014:20:50:21 +0000] - import userRoot: Workers finished; 
cleaning up...
[02/Apr/2014:20:50:21 +0000] - import userRoot: Workers cleaned up.
[02/Apr/2014:20:50:21 +0000] - import userRoot: Indexing complete. 
Post-processing...
[02/Apr/2014:20:50:21 +0000] - import userRoot: Generating 
numSubordinates complete.
[02/Apr/2014:20:50:21 +0000] - import userRoot: Flushing caches...
[02/Apr/2014:20:50:21 +0000] - import userRoot: Closing files...
[02/Apr/2014:20:50:21 +0000] - import userRoot: Import complete. 
Processed 453 entries in 3 seconds. (151.00 entries/sec)
[02/Apr/2014:20:50:21 +0000] NSMMReplicationPlugin - 
multimaster_be_state_change: replica dc=example,dc=com is coming online; 
enabling replication

On the master, access log:

[02/Apr/2014:20:50:17 +0000] conn=1365 op=15 MOD 
dn="cn=meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping 
tree,cn=config"

This is the operation that triggers the replica init.  Then 
ipa-replica-install polls for agreement status:
[02/Apr/2014:20:50:19 +0000] conn=1365 op=16 SRCH 
base="cn=meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping 
tree,cn=config" scope=0 filter="(objectClass=*)" 
attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress 
nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh 
nsds5replicaLastInitEnd"
[02/Apr/2014:20:50:19 +0000] conn=1365 op=16 RESULT err=0 tag=101 
nentries=1 etime=0
[02/Apr/2014:20:50:20 +0000] conn=1365 op=17 SRCH 
base="cn=meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping 
tree,cn=config" scope=0 filter="(objectClass=*)" 
attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress 
nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh 
nsds5replicaLastInitEnd"
[02/Apr/2014:20:50:20 +0000] conn=1365 op=17 RESULT err=0 tag=101 
nentries=1 etime=0
[02/Apr/2014:20:50:21 +0000] conn=1365 op=18 SRCH 
base="cn=meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping 
tree,cn=config" scope=0 filter="(objectClass=*)" 
attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress 
nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh 
nsds5replicaLastInitEnd"
[02/Apr/2014:20:50:21 +0000] conn=1365 op=18 RESULT err=0 tag=101 
nentries=1 etime=0
[02/Apr/2014:20:50:22 +0000] conn=1365 op=19 SRCH 
base="cn=meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping 
tree,cn=config" scope=0 filter="(objectClass=*)" 
attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress 
nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh 
nsds5replicaLastInitEnd"
[02/Apr/2014:20:50:22 +0000] conn=1365 op=19 RESULT err=0 tag=101 
nentries=1 etime=1

Something happens here.  The replica init is done, according to the 
replica error log.  We don't have the replica access log from around 
this time to see exactly when the connection was closed, but looking at 
the ipa code, it would appear that ipa did not see a status of "Total 
update succeeded".  Not sure why the master would not have reported 
that, unless there was some problem getting back the status from the 
replica.

[02/Apr/2014:20:50:22 +0000] conn=1365 op=20 UNBIND
[02/Apr/2014:20:50:22 +0000] conn=1365 op=20 fd=114 closed - U1

Then ipa-replica-install closes the connection and reports the error.

>
>
> On Thu, Apr 3, 2014 at 10:38 AM, Rich Megginson <rmeggins at redhat.com 
> <mailto:rmeggins at redhat.com>> wrote:
>
>     On 04/02/2014 09:22 PM, Nevada Sanchez wrote:
>>     Okay. Updated the gist with the additional logs:
>>     https://gist.github.com/nevsan/8b6f78d7396963dc5f70
>>
>>
>
>     1) Dirsrv is crashing:
>     [02/Apr/2014:20:49:53 +0000] - 389-Directory/1.3.1.22.a1
>     B2014.073.1751 starting up
>     [02/Apr/2014:20:49:54 +0000] - Db home directory is not set.
>     Possibly nsslapd-directory (optionally nsslapd-db-home-directory)
>     is missing in the config file.
>     [02/Apr/2014:20:49:54 +0000] - I'm resizing my cache now...cache
>     was 710029312 and is now 8000000
>     [02/Apr/2014:20:49:54 +0000] - 389-Directory/1.3.1.22.a1
>     B2014.073.1751 starting up
>     [02/Apr/2014:20:49:54 +0000] - Detected Disorderly Shutdown last
>     time Directory Server was running, recovering database.
>     [02/Apr/2014:20:49:55 +0000] - slapd started. Listening on All
>     Interfaces port 389 for LDAP requests
>
>     Please use the instructions at
>     http://port389.org/wiki/FAQ#Debugging_Crashes to get a core dump
>     and stack trace.
>
>     2) The first occurrence of the connection error is at
>     [02/Apr/2014:20:52:38 +0000] but there isn't anything in the
>     consumer error log after [02/Apr/2014:20:50:21 +0000] and in the
>     consumer access log after [02/Apr/2014:20:50:22 +0000]
>
>
>>     On Wed, Apr 2, 2014 at 9:38 PM, Rich Megginson
>>     <rmeggins at redhat.com <mailto:rmeggins at redhat.com>> wrote:
>>
>>         On 04/02/2014 03:01 PM, Nevada Sanchez wrote:
>>>         Okay, I ran it with debug on. The output is quite large. I'm
>>>         not sure what the etiquette is for posting large logs, so I
>>>         threw it on gist here:
>>>         https://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt
>>>         <http://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt>
>>>
>>>
>>>         Let me know if I should copy it into the thread instead.
>>
>>         Ok.  Now can you post excerpts from the dirsrv errors log
>>         from both the master replica and the replica from around the
>>         time of the failure?
>>
>>
>>>
>>>
>>>         On Wed, Apr 2, 2014 at 1:49 PM, Rich Megginson
>>>         <rmeggins at redhat.com <mailto:rmeggins at redhat.com>> wrote:
>>>
>>>             On 04/02/2014 11:45 AM, Nevada Sanchez wrote:
>>>>             My apologies. I mistakenly ran the failing ldapsearch
>>>>             from an unpriviliged user (couldn't read
>>>>             slapd-EXAMPLE-COM directory). Running as root, it now
>>>>             works just fine (same result as the one that worked).
>>>>             SSL seems to not be the issue. Also, I haven't change
>>>>             the SSL certs since I first set up the master.
>>>>
>>>>             I have been doing the replica side things from scratch
>>>>             (even so far as starting with a new machine). For the
>>>>             master side, I have just been re-preparing the replica.
>>>>             I hope I don't have to start from scratch with the
>>>>             master replica.
>>>
>>>             I guess the next step would be to do the
>>>             ipa-replica-install using -ddd and review the extra
>>>             debug information that comes out.
>>>
>>>
>>>>
>>>>
>>>>             On Wed, Apr 2, 2014 at 11:45 AM, Rob Crittenden
>>>>             <rcritten at redhat.com <mailto:rcritten at redhat.com>> wrote:
>>>>
>>>>                 Rich Megginson wrote:
>>>>
>>>>                     On 04/02/2014 09:20 AM, Nevada Sanchez wrote:
>>>>
>>>>                         Okay, we might be on to something:
>>>>
>>>>                         ipa -> ipa2
>>>>                         ================================
>>>>                         $
>>>>                         LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM
>>>>                         ldapsearch -xLLLZZ
>>>>                         -h ipa2.example.com
>>>>                         <http://ipa2.example.com>
>>>>                         <http://ipa2.example.com> -s base -b ""
>>>>
>>>>                         'objectclass=*' vendorVersion
>>>>                         dn:
>>>>                         vendorVersion: 389-Directory/1.3.1.22.a1
>>>>                         B2014.073.1751
>>>>                         ================================
>>>>
>>>>                         ipa2 -> ipa
>>>>                         ================================
>>>>                         $
>>>>                         LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM
>>>>                         ldapsearch -xLLLZZ
>>>>                         -h ipa.example.com <http://ipa.example.com>
>>>>                         <http://ipa.example.com> -s base -b ""
>>>>
>>>>                         'objectclass=*' vendorVersion
>>>>                         ldap_start_tls: Connect error (-11)
>>>>                         additional info: TLS error -8172:Peer's
>>>>                         certificate issuer has been
>>>>                         marked as not trusted by the user.
>>>>                         ================================
>>>>
>>>>                         The original IPA trusts the replica (since
>>>>                         it signed the cert, I
>>>>                         assume), but the replica doesn't trust the
>>>>                         main IPA server. I guess
>>>>                         the ZZ option would have shown me the
>>>>                         failure that I missed in my
>>>>                         initial ldapsearch tests.
>>>>
>>>>                     -Z[Z]  Issue StartTLS (Transport Layer
>>>>                     Security) extended
>>>>                     operation. If
>>>>                      you  use  -ZZ, the command will require the
>>>>                     operation to
>>>>                     be suc-
>>>>                      cessful.
>>>>
>>>>                     i.e. use SSL, and force a successful handshake
>>>>
>>>>
>>>>                         Anyway, what's the best way to remedy this
>>>>                         in a way that makes IPA
>>>>                         happy? (I've found that LDAP can have
>>>>                         different requirements on which
>>>>                         certs go where).
>>>>
>>>>
>>>>                     I'm not sure.
>>>>                     ipa-server-install/ipa-replica-prepare/ipa-replica-install
>>>>                     is supposed to take care of installing the CA
>>>>                     cert properly for you. If
>>>>                     you try to hack it and install the CA cert
>>>>                     manually, you will probably
>>>>                     miss something else that ipa install did not do.
>>>>
>>>>                     I think the only way to ensure that you have a
>>>>                     properly configured ipa
>>>>                     server + replicas is to get all of the ipa
>>>>                     commands completing successfully.
>>>>
>>>>                     Which means going back to the drawing board and
>>>>                     starting over from scratch.
>>>>
>>>>
>>>>                 You can compare the certs that each side is using with:
>>>>
>>>>                 # certutil -L -d /etc/dirsrv/slapd-EXAMPLE-COM
>>>>
>>>>                 Did you by chance replace the SSL server certs that
>>>>                 IPA uses on your working master?
>>>>
>>>>                 rob
>>>>
>>>>
>>>
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20140403/8baeb54e/attachment.htm>


More information about the Freeipa-users mailing list