[Freeipa-users] ipa replica failed PR_DeleteSemaphore

Tue Mar 15 09:47:49 UTC 2016

On 03/14/2016 05:33 PM, Andrew E. Bruno wrote:
> On Mon, Mar 14, 2016 at 09:35:15AM +0100, Ludwig Krispenz wrote:
>> On 03/12/2016 04:02 PM, Andrew E. Bruno wrote:
>>> On Wed, Mar 09, 2016 at 06:08:04PM +0100, Ludwig Krispenz wrote:
>>>> On 03/09/2016 05:51 PM, Andrew E. Bruno wrote:
>>>>> On Wed, Mar 09, 2016 at 05:21:50PM +0100, Ludwig Krispenz wrote:
>>>>>
>>>>> [09/Mar/2016:11:33:03 -0500] NSMMReplicationPlugin - changelog program - _cl5NewDBFile: PR_DeleteSemaphore: /var/lib/dirsrv/slapd-CBLS-CCR-BUFFALO-EDU/cldb/ed35d212-2cb811e5-af63d574-de3f6355.sema; NSPR error - -5943
>>>> if ds is cleanly shutdown this file should be removed, if ds is killed it
>>>> remains and should be recreated at restart, which fails. could you try
>>>> another stop, remove the file manually and start again ?
>>> We had our replicas crash again. Curious if it's safe to delete the
>>> other db files as well:
>>>
>>> ls -alh /var/lib/dirsrv/slapd-CBLS-CCR-BUFFALO-EDU/cldb/
>>>    30  DBVERSION
>>> 6.8G  ed35d212-2cb811e5-af63d574-de3f6355_55a95591000000040000.db
>>>     0  ed35d212-2cb811e5-af63d574-de3f6355.sema
>>>   18M  f32bb356-2cb811e5-af63d574-de3f6355_55a955ca000000600000.db
>>>     0  f32bb356-2cb811e5-af63d574-de3f6355.sema
>>>
>>>
>>> Should all these files be deleted if the ds is cleanly shutdown? or should we
>>> only remove the *.sema files.
>> the *.db file contains the data of the changelog, if you delete them you
>> start with a new cl and could get into replication problems requiring
>> reinitialization. you normally shoul not delete them.
>> The .sema is used to control how many threads can concurrently access the
>> cl, it should be recreated at restart, so it is safe to delete them after a
>> crash.
> Sounds good..thanks. We deleted the .sema files after the crash and the
> replicas came back up ok.
>
>> If you getting frequent crashes, we shoul try to find the reason for the
>> crashes, could you try to get a core file ?
> This time we had two replicas crash and ns-slapd wasn't running so we
> couldn't grab a pstack. Here's a snip from the error logs right before
> the crash (not sure if this is related or not):
>
> [11/Mar/2016:09:57:56 -0500] ldbm_back_delete - conn=0 op=0 [retry: 1] No original_tombstone for changenumber=11573832,cn=changelog!!
> [11/Mar/2016:09:57:57 -0500] ldbm_back_delete - conn=0 op=0 [retry: 1] No original_tombstone for changenumber=11575824,cn=changelog!!
> [11/Mar/2016:09:57:58 -0500] ldbm_back_delete - conn=0 op=0 [retry: 1] No original_tombstone for changenumber=11575851,cn=changelog!!
> [11/Mar/2016:10:00:28 -0500] - libdb: BDB2055 Lock table is out of available lock entries
> [11/Mar/2016:10:00:28 -0500] NSMMReplicationPlugin - changelog program - _cl5CompactDBs: failed to compact 986efe12-71b811e5-9d33a516-e778e883; db error - 12 Cannot allocate memory
> [11/Mar/2016:10:02:07 -0500] - libdb: BDB2055 Lock table is out of available lock entries
> [11/Mar/2016:10:02:07 -0500] - compactdb: failed to compact changelog; db error - 12 Cannot allocate memory
don't know if this is related to your crashes, but compation of 
changelog was running, probably for some time, and finally failed. The 
idea behind compaction is to compact a fragmented btree and reclaim some 
space, but it uses a transaction for the complete operation and lock 
every page accessed. This can be time consuming, blocking other txns, 
and run out of locks.

There are two options to address this, either increase the number of 
configured db locks (problem is there is no good hint how much locks 
will be needed), or disable changelog compaction, by setting:
dn: cn=changelog5,cn=config
..
nsslapd-changelogcompactdb-interval: 0

I would disable compaction, I don't think there is much benefit (in my 
memory BDB compaction was slow and not very effective) and it is better 
to avoid the side effects
> [11/Mar/2016:12:36:18 -0500] - slapd_poll(377) timed out
> [11/Mar/2016:13:06:17 -0500] - slapd_poll(377) timed out
>
> We just upgraded to ipa 4.2 centos 7.2 and if we see anymore crashes
> we'll try and get more info.
>
> Thanks again.
>
> --Andrew
>
>

-- 
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Paul Argiry, Charles Cachera, Michael Cunningham, Michael O'Neill