[Freeipa-users] IPA Replica Issues (Total update abortedLDAP error: Can't contact LDAP server)

Mon Apr 7 13:56:14 UTC 2014

On 04/05/2014 09:35 AM, Nevada Sanchez wrote:
> Thanks. I added /var/log/messages to the gist 
> (https://gist.github.com/nevsan/8b6f78d7396963dc5f70)--no 
> <https://gist.github.com/nevsan/8b6f78d7396963dc5f70%29--no> segfaults 
> it seems. Any other kind of disorderly shutdowns that might happen? 
> I'll look into creating a ticket for this.

Only if there is some sort of build issue that causes an undefined 
symbol reference at runtime - that would cause the process to exit 
without a core.

>
>
> On Fri, Apr 4, 2014 at 9:16 PM, Rich Megginson <rmeggins at redhat.com 
> <mailto:rmeggins at redhat.com>> wrote:
>
>     On 04/03/2014 10:25 PM, Nevada Sanchez wrote:
>>     I followed the instructions that would give me a core dump, and
>>     for some reason, I don't see one in
>>     /var/log/dirsrv/slapd-EXAMPLE-COM/, even though I still see the
>>     Disorderly shutdown still shows up in the logs.
>
>     Hmm - check again - it should produce a core file
>
>     grep -i segfault /var/log/messages
>
>
>>     I know that when I explicitly request those attributes, I get "-1
>>     Total update abortedLDAP error: Can't contact LDAP server" for
>>     nds5ReplicaLastInitStatus (see below). Access logs stop
>>     completely on the replica after the time that you mentioned.
>
>     Hmm - looks like a bug.  Please open a ticket.
>
>
>>
>>     ======================================================
>>     [root at ipa2 ipaserver]# ldapsearch ldaps://ipa.example.com:636
>>     <http://ipa.example.com:636/> -D 'cn=Directory Manager' -w #####
>>     -b 'cn=meToipa2.example.com
>>     <http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping
>>     tree,cn=config' '(objectClass=*)' -s base
>>     nsds5ReplicaLastInitStart nsds5replicaUpdateInProgress
>>     nsds5ReplicaLastInitStatus cn nsds5BeginReplicaRefresh
>>     nsds5ReplicaLastInitEnd
>>     # extended LDIF
>>     #
>>     # LDAPv3
>>     # base <cn=meToipa2.example.com
>>     <http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping
>>     tree,cn=config> with scope baseObject
>>     # filter: (objectclass=*)
>>     # requesting: ldaps://ipa.example.com:636
>>     <http://ipa.example.com:636/> (objectClass=*)
>>     nsds5ReplicaLastInitStart nsds5replicaUpdateInProgress
>>     nsds5ReplicaLastInitStatus cn nsds5BeginReplicaRefresh
>>     nsds5ReplicaLastInitEnd
>>     #
>>
>>     # meToipa2.example.com <http://metoipa2.example.com/>, replica,
>>     dc\3Dexample\2Cdc\3Dcom,
>>     mapping tree, config
>>     dn: cn=meToipa2.example.com
>>     <http://metoipa2.example.com/>,cn=replica,cn=dc\3Dexample\2Cd
>>      c\3Dcom,cn=mapping tree,cn=config
>>     nsds5ReplicaLastInitStart: 20140401092800Z
>>     nsds5replicaUpdateInProgress: FALSE
>>     nsds5ReplicaLastInitStatus: -1 Total update abortedLDAP error:
>>     Can't contact L
>>      DAP server
>>     cn: meToipa2.example.com <http://metoipa2.example.com/>
>>     nsds5ReplicaLastInitEnd: 20140401092804Z
>>
>>     # search result
>>     search: 2
>>     result: 0 Success
>>
>>     # numResponses: 2
>>     # numEntries: 1
>>
>>
>>     On Thu, Apr 3, 2014 at 6:32 PM, Rich Megginson
>>     <rmeggins at redhat.com <mailto:rmeggins at redhat.com>> wrote:
>>
>>         On 04/03/2014 03:46 PM, Nevada Sanchez wrote:
>>>         Okay, I updated the gist and extended some of the logs
>>>         (ipa2-errors does stop at 20:50:21). I'll follow up when I
>>>         have the debug stuff in place.
>>>
>>>         https://gist.github.com/nevsan/8b6f78d7396963dc5f70
>>
>>         Another strange thing - it looks as if the initial replica
>>         init completes successfully.
>>
>>         [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin -
>>         Beginning total update of replica
>>         "agmt="cn=meToipa2.example.com <http://meToipa2.example.com>"
>>         (ipa2:389)".
>>
>>         On the replica:
>>
>>         [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin -
>>         multimaster_be_state_change: replica dc=example,dc=com is
>>         going offline; disabling replication
>>         [02/Apr/2014:20:50:18 +0000] - WARNING: Import is running
>>         with nsslapd-db-private-import-mem on; No other process is
>>         allowed to access the database
>>         [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers
>>         finished; cleaning up...
>>         [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers
>>         cleaned up.
>>         [02/Apr/2014:20:50:21 +0000] - import userRoot: Indexing
>>         complete. Post-processing...
>>         [02/Apr/2014:20:50:21 +0000] - import userRoot: Generating
>>         numSubordinates complete.
>>         [02/Apr/2014:20:50:21 +0000] - import userRoot: Flushing
>>         caches...
>>         [02/Apr/2014:20:50:21 +0000] - import userRoot: Closing files...
>>         [02/Apr/2014:20:50:21 +0000] - import userRoot: Import
>>         complete. Processed 453 entries in 3 seconds. (151.00
>>         entries/sec)
>>         [02/Apr/2014:20:50:21 +0000] NSMMReplicationPlugin -
>>         multimaster_be_state_change: replica dc=example,dc=com is
>>         coming online; enabling replication
>>
>>         On the master, access log:
>>
>>         [02/Apr/2014:20:50:17 +0000] conn=1365 op=15 MOD
>>         dn="cn=meToipa2.example.com
>>         <http://meToipa2.example.com>,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>>         tree,cn=config"
>>
>>         This is the operation that triggers the replica init.  Then
>>         ipa-replica-install polls for agreement status:
>>         [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 SRCH
>>         base="cn=meToipa2.example.com
>>         <http://meToipa2.example.com>,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>>         tree,cn=config" scope=0 filter="(objectClass=*)"
>>         attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>>         nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>>         nsds5replicaLastInitEnd"
>>         [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 RESULT err=0
>>         tag=101 nentries=1 etime=0
>>         [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 SRCH
>>         base="cn=meToipa2.example.com
>>         <http://meToipa2.example.com>,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>>         tree,cn=config" scope=0 filter="(objectClass=*)"
>>         attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>>         nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>>         nsds5replicaLastInitEnd"
>>         [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 RESULT err=0
>>         tag=101 nentries=1 etime=0
>>         [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 SRCH
>>         base="cn=meToipa2.example.com
>>         <http://meToipa2.example.com>,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>>         tree,cn=config" scope=0 filter="(objectClass=*)"
>>         attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>>         nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>>         nsds5replicaLastInitEnd"
>>         [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 RESULT err=0
>>         tag=101 nentries=1 etime=0
>>         [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 SRCH
>>         base="cn=meToipa2.example.com
>>         <http://meToipa2.example.com>,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>>         tree,cn=config" scope=0 filter="(objectClass=*)"
>>         attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>>         nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>>         nsds5replicaLastInitEnd"
>>         [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 RESULT err=0
>>         tag=101 nentries=1 etime=1
>>
>>         Something happens here.  The replica init is done, according
>>         to the replica error log. We don't have the replica access
>>         log from around this time to see exactly when the connection
>>         was closed, but looking at the ipa code, it would appear that
>>         ipa did not see a status of "Total update succeeded". Not
>>         sure why the master would not have reported that, unless
>>         there was some problem getting back the status from the replica.
>>
>>         [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 UNBIND
>>         [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 fd=114 closed - U1
>>
>>         Then ipa-replica-install closes the connection and reports
>>         the error.
>>
>>
>>>
>>>
>>>         On Thu, Apr 3, 2014 at 10:38 AM, Rich Megginson
>>>         <rmeggins at redhat.com <mailto:rmeggins at redhat.com>> wrote:
>>>
>>>             On 04/02/2014 09:22 PM, Nevada Sanchez wrote:
>>>>             Okay. Updated the gist with the additional logs:
>>>>             https://gist.github.com/nevsan/8b6f78d7396963dc5f70
>>>>
>>>>
>>>
>>>             1) Dirsrv is crashing:
>>>             [02/Apr/2014:20:49:53 +0000] - 389-Directory/1.3.1.22.a1
>>>             B2014.073.1751 starting up
>>>             [02/Apr/2014:20:49:54 +0000] - Db home directory is not
>>>             set. Possibly nsslapd-directory (optionally
>>>             nsslapd-db-home-directory) is missing in the config file.
>>>             [02/Apr/2014:20:49:54 +0000] - I'm resizing my cache
>>>             now...cache was 710029312 and is now 8000000
>>>             [02/Apr/2014:20:49:54 +0000] - 389-Directory/1.3.1.22.a1
>>>             B2014.073.1751 starting up
>>>             [02/Apr/2014:20:49:54 +0000] - Detected Disorderly
>>>             Shutdown last time Directory Server was running,
>>>             recovering database.
>>>             [02/Apr/2014:20:49:55 +0000] - slapd started. Listening
>>>             on All Interfaces port 389 for LDAP requests
>>>
>>>             Please use the instructions at
>>>             http://port389.org/wiki/FAQ#Debugging_Crashes to get a
>>>             core dump and stack trace.
>>>
>>>             2) The first occurrence of the connection error is at
>>>             [02/Apr/2014:20:52:38 +0000] but there isn't anything in
>>>             the consumer error log after [02/Apr/2014:20:50:21
>>>             +0000] and in the consumer access log after
>>>             [02/Apr/2014:20:50:22 +0000]
>>>
>>>
>>>>             On Wed, Apr 2, 2014 at 9:38 PM, Rich Megginson
>>>>             <rmeggins at redhat.com <mailto:rmeggins at redhat.com>> wrote:
>>>>
>>>>                 On 04/02/2014 03:01 PM, Nevada Sanchez wrote:
>>>>>                 Okay, I ran it with debug on. The output is quite
>>>>>                 large. I'm not sure what the etiquette is for
>>>>>                 posting large logs, so I threw it on gist here:
>>>>>                 https://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt
>>>>>                 <http://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt>
>>>>>
>>>>>
>>>>>                 Let me know if I should copy it into the thread
>>>>>                 instead.
>>>>
>>>>                 Ok.  Now can you post excerpts from the dirsrv
>>>>                 errors log from both the master replica and the
>>>>                 replica from around the time of the failure?
>>>>
>>>>
>>>>>
>>>>>
>>>>>                 On Wed, Apr 2, 2014 at 1:49 PM, Rich Megginson
>>>>>                 <rmeggins at redhat.com <mailto:rmeggins at redhat.com>>
>>>>>                 wrote:
>>>>>
>>>>>                     On 04/02/2014 11:45 AM, Nevada Sanchez wrote:
>>>>>>                     My apologies. I mistakenly ran the failing
>>>>>>                     ldapsearch from an unpriviliged user
>>>>>>                     (couldn't read slapd-EXAMPLE-COM directory).
>>>>>>                     Running as root, it now works just fine (same
>>>>>>                     result as the one that worked). SSL seems to
>>>>>>                     not be the issue. Also, I haven't change the
>>>>>>                     SSL certs since I first set up the master.
>>>>>>
>>>>>>                     I have been doing the replica side things
>>>>>>                     from scratch (even so far as starting with a
>>>>>>                     new machine). For the master side, I have
>>>>>>                     just been re-preparing the replica. I hope I
>>>>>>                     don't have to start from scratch with the
>>>>>>                     master replica.
>>>>>
>>>>>                     I guess the next step would be to do the
>>>>>                     ipa-replica-install using -ddd and review the
>>>>>                     extra debug information that comes out.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>                     On Wed, Apr 2, 2014 at 11:45 AM, Rob
>>>>>>                     Crittenden <rcritten at redhat.com
>>>>>>                     <mailto:rcritten at redhat.com>> wrote:
>>>>>>
>>>>>>                         Rich Megginson wrote:
>>>>>>
>>>>>>                             On 04/02/2014 09:20 AM, Nevada
>>>>>>                             Sanchez wrote:
>>>>>>
>>>>>>                                 Okay, we might be on to something:
>>>>>>
>>>>>>                                 ipa -> ipa2
>>>>>>                                 ================================
>>>>>>                                 $
>>>>>>                                 LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM
>>>>>>                                 ldapsearch -xLLLZZ
>>>>>>                                 -h ipa2.example.com
>>>>>>                                 <http://ipa2.example.com>
>>>>>>                                 <http://ipa2.example.com> -s base
>>>>>>                                 -b ""
>>>>>>
>>>>>>                                 'objectclass=*' vendorVersion
>>>>>>                                 dn:
>>>>>>                                 vendorVersion:
>>>>>>                                 389-Directory/1.3.1.22.a1
>>>>>>                                 B2014.073.1751
>>>>>>                                 ================================
>>>>>>
>>>>>>                                 ipa2 -> ipa
>>>>>>                                 ================================
>>>>>>                                 $
>>>>>>                                 LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM
>>>>>>                                 ldapsearch -xLLLZZ
>>>>>>                                 -h ipa.example.com
>>>>>>                                 <http://ipa.example.com>
>>>>>>                                 <http://ipa.example.com> -s base
>>>>>>                                 -b ""
>>>>>>
>>>>>>                                 'objectclass=*' vendorVersion
>>>>>>                                 ldap_start_tls: Connect error (-11)
>>>>>>                                 additional info: TLS error
>>>>>>                                 -8172:Peer's certificate issuer
>>>>>>                                 has been
>>>>>>                                 marked as not trusted by the user.
>>>>>>                                 ================================
>>>>>>
>>>>>>                                 The original IPA trusts the
>>>>>>                                 replica (since it signed the cert, I
>>>>>>                                 assume), but the replica doesn't
>>>>>>                                 trust the main IPA server. I guess
>>>>>>                                 the ZZ option would have shown me
>>>>>>                                 the failure that I missed in my
>>>>>>                                 initial ldapsearch tests.
>>>>>>
>>>>>>                             -Z[Z]  Issue StartTLS (Transport
>>>>>>                             Layer Security) extended
>>>>>>                             operation. If
>>>>>>                              you  use  -ZZ, the command will
>>>>>>                             require the operation to
>>>>>>                             be suc-
>>>>>>                              cessful.
>>>>>>
>>>>>>                             i.e. use SSL, and force a successful
>>>>>>                             handshake
>>>>>>
>>>>>>
>>>>>>                                 Anyway, what's the best way to
>>>>>>                                 remedy this in a way that makes IPA
>>>>>>                                 happy? (I've found that LDAP can
>>>>>>                                 have different requirements on which
>>>>>>                                 certs go where).
>>>>>>
>>>>>>
>>>>>>                             I'm not sure.
>>>>>>                             ipa-server-install/ipa-replica-prepare/ipa-replica-install
>>>>>>                             is supposed to take care of
>>>>>>                             installing the CA cert properly for
>>>>>>                             you. If
>>>>>>                             you try to hack it and install the CA
>>>>>>                             cert manually, you will probably
>>>>>>                             miss something else that ipa install
>>>>>>                             did not do.
>>>>>>
>>>>>>                             I think the only way to ensure that
>>>>>>                             you have a properly configured ipa
>>>>>>                             server + replicas is to get all of
>>>>>>                             the ipa commands completing successfully.
>>>>>>
>>>>>>                             Which means going back to the drawing
>>>>>>                             board and starting over from scratch.
>>>>>>
>>>>>>
>>>>>>                         You can compare the certs that each side
>>>>>>                         is using with:
>>>>>>
>>>>>>                         # certutil -L -d
>>>>>>                         /etc/dirsrv/slapd-EXAMPLE-COM
>>>>>>
>>>>>>                         Did you by chance replace the SSL server
>>>>>>                         certs that IPA uses on your working master?
>>>>>>
>>>>>>                         rob
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20140407/e7049803/attachment.htm>