[Freeipa-users] lock table errors

Tue Feb 23 17:12:07 UTC 2016

> On 02/23/2016 05:10 PM, Andy Thompson wrote:
> >>>> On 02/23/2016 03:02 PM, Andy Thompson wrote:
> >>>>> Came across one of my replicas this morning with the following in
> >>>>> the error log
> >>>>>
> >>>>> [20/Feb/2016:17:23:38 -0500] - libdb: BDB2055 Lock table is out of
> >>>>> available lock entries
> >>>>> [20/Feb/2016:17:23:38 -0500] entryrdn-index -
> _entryrdn_delete_key:
> >>>>> Deleting C1 failed; Cannot allocate memory(12)
> >>>>> [20/Feb/2016:17:23:38 -0500] - database index operation failed BAD
> >>>>> 1031, err=12 Cannot allocate memory
> >>>>> [20/Feb/2016:17:23:38 -0500] -
> >>>>> index_del_entry(changenumber=1328662,cn=changelog, 0x26) failed
> >> (12)
> >>>>> [20/Feb/2016:17:23:38 -0500] DSRetroclPlugin - delete_changerecord:
> >>>>> could not delete change record 1328662 (rc: 1)
> >>>>> [20/Feb/2016:17:23:38 -0500] - libdb: BDB2055 Lock table is out of
> >>>>> available lock entries
> >>>>> [20/Feb/2016:17:23:38 -0500] entryrdn-index - _entryrdn_get_elem:
> >>>>> Failed to position cursor at the key: 1328666: Cannot allocate
> >>>>> memory(12)
> >>>>> [20/Feb/2016:17:23:38 -0500] entryrdn-index -
> _entryrdn_delete_key:
> >>>>> Failed to position cursor at the key: 1328666: Cannot allocate
> >>>>> memory(12)
> >>>>> [20/Feb/2016:17:23:38 -0500] - libdb: BDB2055 Lock table is out of
> >>>>> available lock entries
> >>>>> [20/Feb/2016:17:23:38 -0500] - database index operation failed BAD
> >>>>> 1031, err=12 Cannot allocate memory
> >>>>> [20/Feb/2016:17:23:38 -0500] -
> >>>>> index_del_entry(changenumber=1328663,cn=changelog, 0x26) failed
> >> (12)
> >>>>> [20/Feb/2016:17:23:38 -0500] NSMMReplicationPlugin - changelog
> >>>>> program
> >>>>> - _cl5CompactDBs: failed to compact
> >>>>> 5f1d2b12-cf1411e4-b055ba8a-f4b484f7; db error - 12 Cannot allocate
> >>>>> memory
> >>>>> [20/Feb/2016:17:23:38 -0500] DSRetroclPlugin - delete_changerecord:
> >>>>> could not delete change record 1328663 (rc: 1)
> >>>>> [20/Feb/2016:17:23:41 -0500] ldbm_back_delete - conn=0 op=0 [retry:
> >>>>> 1]
> >>>> No original_tombstone for changenumber=1330335,cn=changelog!!
> >>>>> And then nothing.  Was troubleshooting some clients that were
> >>>>> having
> >>>> issues resolving some trusted domain users.
> >>>>> I restarted IPA and it rolled through a few thousand missing
> >>>>> change records
> >>>>>
> >>>>> 23/Feb/2016:08:39:34 -0500] DSRetroclPlugin - delete_changerecord:
> >>>>> could not delete change record 1328696 (rc: 32)
> >>>>>
> >>>>> Any thoughts as to what might have caused the lock table errors?
> >>>> in BerkeleyDB this means that the number of pages which would have
> >>>> to be locked in one transaction exceeds the configured number of
> locks.
> >>>> This could happen if eg a large group is deleted and for each
> >>>> member of the group inside the same transaction the memberof
> >>>> attribute has to be modified
> >>> Are there any configuration options to increase that setting?  And
> >>> would it
> >> have caused the replica to become unresponsive?
> >> you can change
> >>
> >> nsslapd-db-locks
> >>
> >> in the entry:
> >>
> >> dn: cn=config,cn=ldbm database,cn=plugins,cn=config
> >>
> >> yes. in that state it would not process updates, the txn should be
> >> finally aborted and the system should recover,but ..
> > Is there any rule of thumb or anything I can look at to get an idea of what I
> should increase that to or should it even be necessary?
> >
> > The current setting has a default of 10000
> >
> > cn=database,cn=monitor,cn=ldbm database,cn=plugins,cn=config
> >
> > currently shows
> >
> > nsslapd-db-current-locks: 82
> >
> > What might cause that to spike up that significantly to deplete the locks?
> That's a pretty huge jump.
> I have given you an example of what operation could use a high number of
> page locks, to find out what was going on in your case would require to
> investigate which operations were active when the problem started, what
> the entries modified added looked like ......
> >

Right, is there anything I can look at now that might give me any useful information?  Access log looks pretty normal around that time.  At the time the error occurred there would have been very little going on in the system other than internal processing and normal user access.  My environment is almost entirely an AD trust setup with HBAC and sudo.  There are very few users and groups in the local database for a large transaction to even be in the scope of possible that I can think of.

I'm checking with the windows group to see if there was anything out of the ordinary going on in AD at the time but there were no changes scheduled.  Is it possible that AD changes could be suspect?

-andy