[Freeipa-users] IPA Replica Issues (Total update abortedLDAP error: Can't contact LDAP server)
Rich Megginson
rmeggins at redhat.com
Mon Apr 7 13:56:14 UTC 2014
On 04/05/2014 09:35 AM, Nevada Sanchez wrote:
> Thanks. I added /var/log/messages to the gist
> (https://gist.github.com/nevsan/8b6f78d7396963dc5f70)--no
> <https://gist.github.com/nevsan/8b6f78d7396963dc5f70%29--no> segfaults
> it seems. Any other kind of disorderly shutdowns that might happen?
> I'll look into creating a ticket for this.
Only if there is some sort of build issue that causes an undefined
symbol reference at runtime - that would cause the process to exit
without a core.
>
>
> On Fri, Apr 4, 2014 at 9:16 PM, Rich Megginson <rmeggins at redhat.com
> <mailto:rmeggins at redhat.com>> wrote:
>
> On 04/03/2014 10:25 PM, Nevada Sanchez wrote:
>> I followed the instructions that would give me a core dump, and
>> for some reason, I don't see one in
>> /var/log/dirsrv/slapd-EXAMPLE-COM/, even though I still see the
>> Disorderly shutdown still shows up in the logs.
>
> Hmm - check again - it should produce a core file
>
> grep -i segfault /var/log/messages
>
>
>> I know that when I explicitly request those attributes, I get "-1
>> Total update abortedLDAP error: Can't contact LDAP server" for
>> nds5ReplicaLastInitStatus (see below). Access logs stop
>> completely on the replica after the time that you mentioned.
>
> Hmm - looks like a bug. Please open a ticket.
>
>
>>
>> ======================================================
>> [root at ipa2 ipaserver]# ldapsearch ldaps://ipa.example.com:636
>> <http://ipa.example.com:636/> -D 'cn=Directory Manager' -w #####
>> -b 'cn=meToipa2.example.com
>> <http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping
>> tree,cn=config' '(objectClass=*)' -s base
>> nsds5ReplicaLastInitStart nsds5replicaUpdateInProgress
>> nsds5ReplicaLastInitStatus cn nsds5BeginReplicaRefresh
>> nsds5ReplicaLastInitEnd
>> # extended LDIF
>> #
>> # LDAPv3
>> # base <cn=meToipa2.example.com
>> <http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping
>> tree,cn=config> with scope baseObject
>> # filter: (objectclass=*)
>> # requesting: ldaps://ipa.example.com:636
>> <http://ipa.example.com:636/> (objectClass=*)
>> nsds5ReplicaLastInitStart nsds5replicaUpdateInProgress
>> nsds5ReplicaLastInitStatus cn nsds5BeginReplicaRefresh
>> nsds5ReplicaLastInitEnd
>> #
>>
>> # meToipa2.example.com <http://metoipa2.example.com/>, replica,
>> dc\3Dexample\2Cdc\3Dcom,
>> mapping tree, config
>> dn: cn=meToipa2.example.com
>> <http://metoipa2.example.com/>,cn=replica,cn=dc\3Dexample\2Cd
>> c\3Dcom,cn=mapping tree,cn=config
>> nsds5ReplicaLastInitStart: 20140401092800Z
>> nsds5replicaUpdateInProgress: FALSE
>> nsds5ReplicaLastInitStatus: -1 Total update abortedLDAP error:
>> Can't contact L
>> DAP server
>> cn: meToipa2.example.com <http://metoipa2.example.com/>
>> nsds5ReplicaLastInitEnd: 20140401092804Z
>>
>> # search result
>> search: 2
>> result: 0 Success
>>
>> # numResponses: 2
>> # numEntries: 1
>>
>>
>> On Thu, Apr 3, 2014 at 6:32 PM, Rich Megginson
>> <rmeggins at redhat.com <mailto:rmeggins at redhat.com>> wrote:
>>
>> On 04/03/2014 03:46 PM, Nevada Sanchez wrote:
>>> Okay, I updated the gist and extended some of the logs
>>> (ipa2-errors does stop at 20:50:21). I'll follow up when I
>>> have the debug stuff in place.
>>>
>>> https://gist.github.com/nevsan/8b6f78d7396963dc5f70
>>
>> Another strange thing - it looks as if the initial replica
>> init completes successfully.
>>
>> [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin -
>> Beginning total update of replica
>> "agmt="cn=meToipa2.example.com <http://meToipa2.example.com>"
>> (ipa2:389)".
>>
>> On the replica:
>>
>> [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin -
>> multimaster_be_state_change: replica dc=example,dc=com is
>> going offline; disabling replication
>> [02/Apr/2014:20:50:18 +0000] - WARNING: Import is running
>> with nsslapd-db-private-import-mem on; No other process is
>> allowed to access the database
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers
>> finished; cleaning up...
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers
>> cleaned up.
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Indexing
>> complete. Post-processing...
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Generating
>> numSubordinates complete.
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Flushing
>> caches...
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Closing files...
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Import
>> complete. Processed 453 entries in 3 seconds. (151.00
>> entries/sec)
>> [02/Apr/2014:20:50:21 +0000] NSMMReplicationPlugin -
>> multimaster_be_state_change: replica dc=example,dc=com is
>> coming online; enabling replication
>>
>> On the master, access log:
>>
>> [02/Apr/2014:20:50:17 +0000] conn=1365 op=15 MOD
>> dn="cn=meToipa2.example.com
>> <http://meToipa2.example.com>,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>> tree,cn=config"
>>
>> This is the operation that triggers the replica init. Then
>> ipa-replica-install polls for agreement status:
>> [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 SRCH
>> base="cn=meToipa2.example.com
>> <http://meToipa2.example.com>,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>> tree,cn=config" scope=0 filter="(objectClass=*)"
>> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>> nsds5replicaLastInitEnd"
>> [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 RESULT err=0
>> tag=101 nentries=1 etime=0
>> [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 SRCH
>> base="cn=meToipa2.example.com
>> <http://meToipa2.example.com>,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>> tree,cn=config" scope=0 filter="(objectClass=*)"
>> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>> nsds5replicaLastInitEnd"
>> [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 RESULT err=0
>> tag=101 nentries=1 etime=0
>> [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 SRCH
>> base="cn=meToipa2.example.com
>> <http://meToipa2.example.com>,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>> tree,cn=config" scope=0 filter="(objectClass=*)"
>> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>> nsds5replicaLastInitEnd"
>> [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 RESULT err=0
>> tag=101 nentries=1 etime=0
>> [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 SRCH
>> base="cn=meToipa2.example.com
>> <http://meToipa2.example.com>,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>> tree,cn=config" scope=0 filter="(objectClass=*)"
>> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>> nsds5replicaLastInitEnd"
>> [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 RESULT err=0
>> tag=101 nentries=1 etime=1
>>
>> Something happens here. The replica init is done, according
>> to the replica error log. We don't have the replica access
>> log from around this time to see exactly when the connection
>> was closed, but looking at the ipa code, it would appear that
>> ipa did not see a status of "Total update succeeded". Not
>> sure why the master would not have reported that, unless
>> there was some problem getting back the status from the replica.
>>
>> [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 UNBIND
>> [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 fd=114 closed - U1
>>
>> Then ipa-replica-install closes the connection and reports
>> the error.
>>
>>
>>>
>>>
>>> On Thu, Apr 3, 2014 at 10:38 AM, Rich Megginson
>>> <rmeggins at redhat.com <mailto:rmeggins at redhat.com>> wrote:
>>>
>>> On 04/02/2014 09:22 PM, Nevada Sanchez wrote:
>>>> Okay. Updated the gist with the additional logs:
>>>> https://gist.github.com/nevsan/8b6f78d7396963dc5f70
>>>>
>>>>
>>>
>>> 1) Dirsrv is crashing:
>>> [02/Apr/2014:20:49:53 +0000] - 389-Directory/1.3.1.22.a1
>>> B2014.073.1751 starting up
>>> [02/Apr/2014:20:49:54 +0000] - Db home directory is not
>>> set. Possibly nsslapd-directory (optionally
>>> nsslapd-db-home-directory) is missing in the config file.
>>> [02/Apr/2014:20:49:54 +0000] - I'm resizing my cache
>>> now...cache was 710029312 and is now 8000000
>>> [02/Apr/2014:20:49:54 +0000] - 389-Directory/1.3.1.22.a1
>>> B2014.073.1751 starting up
>>> [02/Apr/2014:20:49:54 +0000] - Detected Disorderly
>>> Shutdown last time Directory Server was running,
>>> recovering database.
>>> [02/Apr/2014:20:49:55 +0000] - slapd started. Listening
>>> on All Interfaces port 389 for LDAP requests
>>>
>>> Please use the instructions at
>>> http://port389.org/wiki/FAQ#Debugging_Crashes to get a
>>> core dump and stack trace.
>>>
>>> 2) The first occurrence of the connection error is at
>>> [02/Apr/2014:20:52:38 +0000] but there isn't anything in
>>> the consumer error log after [02/Apr/2014:20:50:21
>>> +0000] and in the consumer access log after
>>> [02/Apr/2014:20:50:22 +0000]
>>>
>>>
>>>> On Wed, Apr 2, 2014 at 9:38 PM, Rich Megginson
>>>> <rmeggins at redhat.com <mailto:rmeggins at redhat.com>> wrote:
>>>>
>>>> On 04/02/2014 03:01 PM, Nevada Sanchez wrote:
>>>>> Okay, I ran it with debug on. The output is quite
>>>>> large. I'm not sure what the etiquette is for
>>>>> posting large logs, so I threw it on gist here:
>>>>> https://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt
>>>>> <http://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt>
>>>>>
>>>>>
>>>>> Let me know if I should copy it into the thread
>>>>> instead.
>>>>
>>>> Ok. Now can you post excerpts from the dirsrv
>>>> errors log from both the master replica and the
>>>> replica from around the time of the failure?
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 2, 2014 at 1:49 PM, Rich Megginson
>>>>> <rmeggins at redhat.com <mailto:rmeggins at redhat.com>>
>>>>> wrote:
>>>>>
>>>>> On 04/02/2014 11:45 AM, Nevada Sanchez wrote:
>>>>>> My apologies. I mistakenly ran the failing
>>>>>> ldapsearch from an unpriviliged user
>>>>>> (couldn't read slapd-EXAMPLE-COM directory).
>>>>>> Running as root, it now works just fine (same
>>>>>> result as the one that worked). SSL seems to
>>>>>> not be the issue. Also, I haven't change the
>>>>>> SSL certs since I first set up the master.
>>>>>>
>>>>>> I have been doing the replica side things
>>>>>> from scratch (even so far as starting with a
>>>>>> new machine). For the master side, I have
>>>>>> just been re-preparing the replica. I hope I
>>>>>> don't have to start from scratch with the
>>>>>> master replica.
>>>>>
>>>>> I guess the next step would be to do the
>>>>> ipa-replica-install using -ddd and review the
>>>>> extra debug information that comes out.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 2, 2014 at 11:45 AM, Rob
>>>>>> Crittenden <rcritten at redhat.com
>>>>>> <mailto:rcritten at redhat.com>> wrote:
>>>>>>
>>>>>> Rich Megginson wrote:
>>>>>>
>>>>>> On 04/02/2014 09:20 AM, Nevada
>>>>>> Sanchez wrote:
>>>>>>
>>>>>> Okay, we might be on to something:
>>>>>>
>>>>>> ipa -> ipa2
>>>>>> ================================
>>>>>> $
>>>>>> LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM
>>>>>> ldapsearch -xLLLZZ
>>>>>> -h ipa2.example.com
>>>>>> <http://ipa2.example.com>
>>>>>> <http://ipa2.example.com> -s base
>>>>>> -b ""
>>>>>>
>>>>>> 'objectclass=*' vendorVersion
>>>>>> dn:
>>>>>> vendorVersion:
>>>>>> 389-Directory/1.3.1.22.a1
>>>>>> B2014.073.1751
>>>>>> ================================
>>>>>>
>>>>>> ipa2 -> ipa
>>>>>> ================================
>>>>>> $
>>>>>> LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM
>>>>>> ldapsearch -xLLLZZ
>>>>>> -h ipa.example.com
>>>>>> <http://ipa.example.com>
>>>>>> <http://ipa.example.com> -s base
>>>>>> -b ""
>>>>>>
>>>>>> 'objectclass=*' vendorVersion
>>>>>> ldap_start_tls: Connect error (-11)
>>>>>> additional info: TLS error
>>>>>> -8172:Peer's certificate issuer
>>>>>> has been
>>>>>> marked as not trusted by the user.
>>>>>> ================================
>>>>>>
>>>>>> The original IPA trusts the
>>>>>> replica (since it signed the cert, I
>>>>>> assume), but the replica doesn't
>>>>>> trust the main IPA server. I guess
>>>>>> the ZZ option would have shown me
>>>>>> the failure that I missed in my
>>>>>> initial ldapsearch tests.
>>>>>>
>>>>>> -Z[Z] Issue StartTLS (Transport
>>>>>> Layer Security) extended
>>>>>> operation. If
>>>>>> you use -ZZ, the command will
>>>>>> require the operation to
>>>>>> be suc-
>>>>>> cessful.
>>>>>>
>>>>>> i.e. use SSL, and force a successful
>>>>>> handshake
>>>>>>
>>>>>>
>>>>>> Anyway, what's the best way to
>>>>>> remedy this in a way that makes IPA
>>>>>> happy? (I've found that LDAP can
>>>>>> have different requirements on which
>>>>>> certs go where).
>>>>>>
>>>>>>
>>>>>> I'm not sure.
>>>>>> ipa-server-install/ipa-replica-prepare/ipa-replica-install
>>>>>> is supposed to take care of
>>>>>> installing the CA cert properly for
>>>>>> you. If
>>>>>> you try to hack it and install the CA
>>>>>> cert manually, you will probably
>>>>>> miss something else that ipa install
>>>>>> did not do.
>>>>>>
>>>>>> I think the only way to ensure that
>>>>>> you have a properly configured ipa
>>>>>> server + replicas is to get all of
>>>>>> the ipa commands completing successfully.
>>>>>>
>>>>>> Which means going back to the drawing
>>>>>> board and starting over from scratch.
>>>>>>
>>>>>>
>>>>>> You can compare the certs that each side
>>>>>> is using with:
>>>>>>
>>>>>> # certutil -L -d
>>>>>> /etc/dirsrv/slapd-EXAMPLE-COM
>>>>>>
>>>>>> Did you by chance replace the SSL server
>>>>>> certs that IPA uses on your working master?
>>>>>>
>>>>>> rob
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20140407/e7049803/attachment.htm>
More information about the Freeipa-users
mailing list