[Freeipa-users] Freeipa 4.2.0 hangs intermittently

Mon Aug 29 16:53:50 UTC 2016

Hi Thierry,

My machine has 30GB RAM ..and  389-ds version is 1.3.4

ldapsearch shows the values for nsslapd-cachememsize updated to 200MB.

ldapsearch -LLL -o ldif-wrap=no -D "cn=directory manager" -w 'mypassword'
-b 'cn=userRoot,cn=ldbm database,cn=plugins,cn=config'|grep
nsslapd-cachememsize
nsslapd-cachememsize: 209715200

So, it seems to have updated though seeing that warning(WARNING: ipaca:
entry cache size 10485760B is less than db size 11599872B) in the log
confuses me a bit.

Thers one more entry that I found from the ldapsearch to be bit low

nsslapd-dncachememsize: 10485760
maxdncachesize: 10485760

Should I update these as well to a higher value

At the time when the issue happened, the memory usage as well as the
overall load of the system was very low .
I will try reproducing the issue atleast in my QA env..probably by trying
to mock  simultaneous parallel logins to a large number of hosts

thanks
Rakesh

On Mon, Aug 29, 2016 at 8:16 PM, thierry bordaz <tbordaz at redhat.com> wrote:

> Hi Rakesh,
>
> Those tuning may depend on the memory available on your machine.
> nsslapd-cachememsize allows the entry cache to consume up to 200Mb but its
> memory footprint is known to go above.
> 200Mb both looks pretty good to me. How large is your machine ? What is
> your version of 389-ds ?
>
> Those warnings do not change your settings. It just raise that entry cache
> of 'ipaca' and 'retrocl' are small but it is fine. The size of the entry
> cache is important mostly in userRoot.
> You may double check the actual values, after restart, with ldapsearch on
> 'cn=userRoot,cn=ldbm database,cn=plugins,cn=config' and 'cn=config,cn=ldbm
> database,cn=plugins,cn=config'.
>
> A step is to know what will be response time of DS to know if it is
> responsible of the hang or not.
> The logs and possibly pstack during those intermittent hangs will help to
> determine that.
>
> regards
> thierry
>
>
>
>
>
> On 08/29/2016 04:25 PM, Rakesh Rajasekharan wrote:
>
> I tried increasing the nsslapd-dbcachesize and nsslapd-cachememsize in my
> QA envs to 200MB.
>
> However, in my log files, I still see this message
> [29/Aug/2016:04:34:37 +0000] - WARNING: ipaca: entry cache size 10485760B
> is less than db size 11599872B; We recommend to increase the entry cache
> size nsslapd-cachememsize.
> [29/Aug/2016:04:34:37 +0000] - WARNING: changelog: entry cache size
> 2097152B is less than db size 441647104B; We recommend to increase the
> entry cache size nsslapd-cachememsize.
>
> these are my ldif files that i used to modify the values
> modify entry cache size
> cat modify-cache-mem-size.ldif
> dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
> changetype: modify
> replace: nsslapd-cachememsize
> nsslapd-cachememsize: 209715200
>
> modify db cache size
> cat modfy-db-cache-size.ldif
> dn: cn=config,cn=ldbm database,cn=plugins,cn=config
> changetype: modify
> replace: nsslapd-dbcachesize
> nsslapd-dbcachesize: 209715200
>
> After modifying , i restarted IPA services
>
> Is there anything else that  I need to take care of as the logs suggest
> its still not getting the updated values
>
> Thanks
> Rakesh
>
> On Mon, Aug 29, 2016 at 6:07 PM, Rakesh Rajasekharan <
> rakesh.rajasekharan at gmail.com> wrote:
>
>> Hi Thierry,
>>
>> Coz of the issues we had to revert back to earlier running openldap in
>> production.
>>
>> I have now done a few TCP related changes in sysctl.conf and have also
>> increased the nsslapd-dbcachesize and nsslapd-cachememsize to 200MB
>>
>> I will again start migrating hosts back to IPA and see if I face the
>> earlier issue.
>>
>> I will update back once I have something
>>
>>
>> Thanks,
>> Rakesh
>>
>>
>>
>> On Thu, Aug 25, 2016 at 2:17 PM, thierry bordaz < <tbordaz at redhat.com>
>> tbordaz at redhat.com> wrote:
>>
>>>
>>>
>>> On 08/25/2016 10:15 AM, Rakesh Rajasekharan wrote:
>>>
>>> All of the troubleshooting seems fine.
>>>
>>>
>>> However, Running libconv.pl gives me this output
>>>
>>> ----- Recommendations -----
>>>
>>>  1.  You have unindexed components, this can be caused from a search on
>>> an unindexed attribute, or your returned results exceeded the
>>> allidsthreshold.  Unindexed components are not recommended. To refuse
>>> unindexed searches, switch 'nsslapd-require-index' to 'on' under your
>>> database entry (e.g. cn=UserRoot,cn=ldbm database,cn=plugins,cn=config).
>>>
>>>  2.  You have a significant difference between binds and unbinds.  You
>>> may want to investigate this difference.
>>>
>>>
>>> I feel, this could be a pointer to things going slow.. and IPA hanging.
>>> I think i now have something that I can try and nail down this issue.
>>>
>>> On a sidenote, I was earlier running openldap and migrated over to
>>> Freeipa,
>>>
>>> Thanks
>>> Rakesh
>>>
>>>
>>>
>>> On Wed, Aug 24, 2016 at 12:38 PM, Petr Spacek < <pspacek at redhat.com>
>>> pspacek at redhat.com> wrote:
>>>
>>>> On 23.8.2016 18:44, Rakesh Rajasekharan wrote:
>>>> > I think thers something seriously wrong with my system
>>>> >
>>>> > not able to run any  IPA commands
>>>> >
>>>> > klist
>>>> > Ticket cache: KEYRING:persistent:0:0
>>>> > Default principal: <admin at XYZ.COM>admin at XYZ.COM
>>>> >
>>>> > Valid starting       Expires              Service principal
>>>> > 2016-08-23T16:26:36  2016-08-24T16:26:22  krbtgt/ <XYZ.COM at XYZ.COM>
>>>> XYZ.COM at XYZ.COM
>>>> >
>>>> >
>>>> > [root at prod-ipa-master-1a :~] ipactl status
>>>> > Directory Service: RUNNING
>>>> > krb5kdc Service: RUNNING
>>>> > kadmin Service: RUNNING
>>>> > ipa_memcached Service: RUNNING
>>>> > httpd Service: RUNNING
>>>> > pki-tomcatd Service: RUNNING
>>>> > ipa-otpd Service: RUNNING
>>>> > ipa: INFO: The ipactl command was successful
>>>> >
>>>> >
>>>> >
>>>> > [root at prod-ipa-master :~] ipa user-find p-testuser
>>>> > ipa: ERROR: Kerberos error: ('Unspecified GSS failure.  Minor code may
>>>> > provide more information', 851968)/("Cannot contact any KDC for realm
>>>> '
>>>> > XYZ.COM'", -1765328228)
>>>>
>>>
>>> Hi Rakesh,
>>>
>>> Having a reproducible test case would you rerun the command above.
>>> During its processing you may monitor DS process load (top). If it is
>>> high, you may get some pstacks of it.
>>> Also would you attach the part of DS access logs taken during the
>>> command.
>>>
>>> regards
>>> thierry
>>>
>>> >
>>>>
>>>> This is weird because the server seems to be up.
>>>>
>>>> Please follow
>>>> http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos
>>>>
>>>> Petr^2 Spacek
>>>>
>>>> >
>>>> >
>>>> > Thanks
>>>> >
>>>> > Rakesh
>>>> >
>>>> > On Tue, Aug 23, 2016 at 10:01 PM, Rakesh Rajasekharan <
>>>> > rakesh.rajasekharan at gmail.com> wrote:
>>>> >
>>>> >> i changed the loggin level to 4 . Modifying nsslapd-accesslog-level
>>>> >>
>>>> >> But, the hang is still there. though I dont see the sigfault now
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Tue, Aug 23, 2016 at 9:02 PM, Rakesh Rajasekharan <
>>>> >> <rakesh.rajasekharan at gmail.com>rakesh.rajasekharan at gmail.com> wrote:
>>>> >>
>>>> >>> My disk was getting filled too fast
>>>> >>>
>>>> >>> logs under /var/log/dirsrv was coming around 5 gb quickly filling up
>>>> >>>
>>>> >>> Is there a way to make the logging less verbose
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Tue, Aug 23, 2016 at 6:41 PM, Petr Spacek <pspacek at redhat.com>
>>>> wrote:
>>>> >>>
>>>> >>>> On 23.8.2016 15:07, Rakesh Rajasekharan wrote:
>>>> >>>>> I was able to fix that may be temporarily... when i checked the
>>>> >>>> network..
>>>> >>>>> there was another process that was running and consuming a lot of
>>>> >>>> network (
>>>> >>>>> i have no idea who did that. I need to seriously start restricting
>>>> >>>> people
>>>> >>>>> access to this machine )
>>>> >>>>>
>>>> >>>>> after killing that perfomance improved drastically
>>>> >>>>>
>>>> >>>>> But now, suddenly I started experiencing the same hang.
>>>> >>>>>
>>>> >>>>> This time , I gert the following error when checked dmesg
>>>> >>>>>
>>>> >>>>> [  301.236976] ns-slapd[3124]: segfault at 0 ip 00007f1de416951c
>>>> sp
>>>> >>>>> 00007f1dee1dba70 error 4 in libcos-plugin.so[7f1de4166000+b000]
>>>> >>>>> [ 1116.248431] TCP: request_sock_TCP: Possible SYN flooding on
>>>> port 88.
>>>> >>>>> Sending cookies.  Check SNMP counters.
>>>> >>>>> [11831.397037] ns-slapd[22550]: segfault at 0 ip 00007f533d82251c
>>>> sp
>>>> >>>>> 00007f5347894a70 error 4 in libcos-plugin.so[7f533d81f000+b000]
>>>> >>>>> [11832.727989] ns-slapd[22606]: segfault at 0 ip 00007f6231eb951c
>>>> sp
>>>> >>>>> 00007f623bf2ba70 error 4 in libcos-plugin.so[7f6231eb6000+b00
>>>> >>>>
>>>> >>>> Okay, this one is serious. The LDAP server crashed.
>>>> >>>>
>>>> >>>> 1. Make sure all your packages are up-to-date.
>>>> >>>>
>>>> >>>> Please see
>>>> >>>> <http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d>
>>>> http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d
>>>> >>>> ebugging-crashes
>>>> >>>> for further instructions how to debug this.
>>>> >>>>
>>>> >>>> Petr^2 Spacek
>>>> >>>>
>>>> >>>>>
>>>> >>>>> and in /var/log/dirsrv/example-com/errors
>>>> >>>>>
>>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>> delete_changerecord:
>>>> >>>> could
>>>> >>>>> not delete change record 3291138 (rc: 32)
>>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>> delete_changerecord:
>>>> >>>> could
>>>> >>>>> not delete change record 3291139 (rc: 32)
>>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>> delete_changerecord:
>>>> >>>> could
>>>> >>>>> not delete change record 3291140 (rc: 32)
>>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>> delete_changerecord:
>>>> >>>> could
>>>> >>>>> not delete change record 3291141 (rc: 32)
>>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>> delete_changerecord:
>>>> >>>> could
>>>> >>>>> not delete change record 3291142 (rc: 32)
>>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>> delete_changerecord:
>>>> >>>> could
>>>> >>>>> not delete change record 3291143 (rc: 32)
>>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>> delete_changerecord:
>>>> >>>> could
>>>> >>>>> not delete change record 3291144 (rc: 32)
>>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>> delete_changerecord:
>>>> >>>> could
>>>> >>>>> not delete change record 3291145 (rc: 32)
>>>> >>>>> [23/Aug/2016:12:49:50 +0000] - Retry count exceeded in delete
>>>> >>>>> [23/Aug/2016:12:49:50 +0000] DSRetroclPlugin -
>>>> delete_changerecord:
>>>> >>>> could
>>>> >>>>> not delete change record 3292734 (rc: 51)
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Can  i do something about this error.. I treid to restart ipa a
>>>> couple
>>>> >>>> of
>>>> >>>>> time but that did not help
>>>> >>>>>
>>>> >>>>> Thanks
>>>> >>>>> Rakesh
>>>> >>>>>
>>>> >>>>> On Mon, Aug 22, 2016 at 2:27 PM, Petr Spacek <pspacek at redhat.com>
>>>> >>>> wrote:
>>>> >>>>>
>>>> >>>>>> On 19.8.2016 19:32, Rakesh Rajasekharan wrote:
>>>> >>>>>>> I am running my set up on AWS cloud, and entropy is low at
>>>> around
>>>> >>>> 180 .
>>>> >>>>>>>
>>>> >>>>>>> I plan to increase it bu installing haveged . But, would low
>>>> entropy
>>>> >>>> by
>>>> >>>>>> any
>>>> >>>>>>> chance cause this issue of intermittent hang .
>>>> >>>>>>> Also, the hang is mostly observed when registering around 20
>>>> clients
>>>> >>>>>>> together
>>>> >>>>>>
>>>> >>>>>> Possibly, I'm not sure. If you want to dig into this, I would do
>>>> this:
>>>> >>>>>> 1. look what process hangs on client (using pstree command or so)
>>>> >>>>>> $ pstree
>>>> >>>>>>
>>>> >>>>>> 2. look to what server and port is the hanging client connected
>>>> to
>>>> >>>>>> $ lsof -p <PID of the hanging process>
>>>> >>>>>>
>>>> >>>>>> 3. jump to server and see what process is bound to the target
>>>> port
>>>> >>>>>> $ netstat -pn
>>>> >>>>>>
>>>> >>>>>> 4. see where the process if hanging
>>>> >>>>>> $ strace -p <PID of the hanging process>
>>>> >>>>>>
>>>> >>>>>> I hope it helps.
>>>> >>>>>>
>>>> >>>>>> Petr^2 Spacek
>>>> >>>>>>
>>>> >>>>>>> On Fri, Aug 19, 2016 at 7:24 PM, Rakesh Rajasekharan <
>>>> >>>>>>> <rakesh.rajasekharan at gmail.com>rakesh.rajasekharan at gmail.com>
>>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>>> yes there seems to be something thats worrying.. I have faced
>>>> this
>>>> >>>> today
>>>> >>>>>>>> as well.
>>>> >>>>>>>> There are few hosts around 280 odd left and when i try adding
>>>> them
>>>> >>>> to
>>>> >>>>>> IPA
>>>> >>>>>>>> , the slowness begins..
>>>> >>>>>>>>
>>>> >>>>>>>> all the ipa commands like ipa user-find.. etc becomes very
>>>> slow in
>>>> >>>>>>>> responding.
>>>> >>>>>>>>
>>>> >>>>>>>> the SYNC_RECV are not many though just around 80-90 and today
>>>> that
>>>> >>>> was
>>>> >>>>>>>> around 20 only
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> I have for now increased tcp_max_syn_backlog to 5000.
>>>> >>>>>>>> For now the slowness seems to have gone.. but I will do a try
>>>> >>>> adding the
>>>> >>>>>>>> clients again tomorrow and see how it goes
>>>> >>>>>>>>
>>>> >>>>>>>> Thanks
>>>> >>>>>>>> Rakesh
>>>> >>>>>>>>
>>>> >>>>>>>> The issues
>>>> >>>>>>>>
>>>> >>>>>>>> On Fri, Aug 19, 2016 at 12:58 PM, Petr Spacek <
>>>> <pspacek at redhat.com>pspacek at redhat.com>
>>>> >>>>>> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>>> On 18.8.2016 17:23, Rakesh Rajasekharan wrote:
>>>> >>>>>>>>>> Hi
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I am migrating to freeipa from openldap and have around 4000
>>>> >>>> clients
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I had openned a another thread on that, but chose to start a
>>>> new
>>>> >>>> one
>>>> >>>>>>>>> here
>>>> >>>>>>>>>> as its a separate issue
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I was able to change the nssslapd-maxdescriptors adding an
>>>> ldif
>>>> >>>> file
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> cat nsslapd-modify.ldif
>>>> >>>>>>>>>> dn: cn=config
>>>> >>>>>>>>>> changetype: modify
>>>> >>>>>>>>>> replace: nsslapd-maxdescriptors
>>>> >>>>>>>>>> nsslapd-maxdescriptors: 17000
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> and running the ldapmodify command
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I have now started moving clients running an openldap to
>>>> Freeipa
>>>> >>>> and
>>>> >>>>>>>>> have
>>>> >>>>>>>>>> today moved close to 2000 clients
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> However, I have noticed that IPA hangs intermittently.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> running a kinit admin returns the below error
>>>> >>>>>>>>>> kinit: Generic error (see e-text) while getting initial
>>>> >>>> credentials
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> from the /var/log/messages, I see this entry
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>  prod-ipa-master-int kernel: [104090.315801] TCP:
>>>> >>>> request_sock_TCP:
>>>> >>>>>>>>>> Possible SYN flooding on port 88. Sending cookies.  Check
>>>> SNMP
>>>> >>>>>> counters.
>>>> >>>>>>>>>
>>>> >>>>>>>>> I would be worried about this message. Maybe kernel/firewall
>>>> is
>>>> >>>> doing
>>>> >>>>>>>>> something fishy behind your back and blocking some
>>>> connections or
>>>> >>>> so.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Petr^2 Spacek
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int systemd[1]: Started
>>>> Session
>>>> >>>> 4885
>>>> >>>>>> of
>>>> >>>>>>>>>> user root.
>>>> >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int systemd[1]: Starting
>>>> Session
>>>> >>>> 4885
>>>> >>>>>> of
>>>> >>>>>>>>>> user root.
>>>> >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int systemd[1]: Started
>>>> Session
>>>> >>>> 4886
>>>> >>>>>> of
>>>> >>>>>>>>>> user root.
>>>> >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int systemd[1]: Starting
>>>> Session
>>>> >>>> 4886
>>>> >>>>>> of
>>>> >>>>>>>>>> user root.
>>>> >>>>>>>>>> Aug 18 13:02:40 prod-ipa-master-int python[28984]:
>>>> ansible-command
>>>> >>>>>>>>> Invoked
>>>> >>>>>>>>>> with creates=None executable=None shell=True args=
>>>> removes=None
>>>> >>>>>>>>> warn=True
>>>> >>>>>>>>>> chdir=None
>>>> >>>>>>>>>> Aug 18 13:04:37 prod-ipa-master-int sssd_be: GSSAPI Error:
>>>> >>>> Unspecified
>>>> >>>>>>>>> GSS
>>>> >>>>>>>>>> failure.  Minor code may provide more information (KDC
>>>> returned
>>>> >>>> error
>>>> >>>>>>>>>> string: PROCESS_TGS)
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Could it be possible that its due to the initial load of
>>>> adding
>>>> >>>> the
>>>> >>>>>>>>> clients
>>>> >>>>>>>>>> or is there something else that I need to take care of.
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20160829/c782cd0e/attachment.htm>