[Freeipa-users] Freeipa 4.2.0 hangs intermittently

Mon Aug 29 17:48:12 UTC 2016

On 08/29/2016 10:53 AM, Rakesh Rajasekharan wrote:
> Hi Thierry,
>
> My machine has 30GB RAM ..and  389-ds version is 1.3.4
>
> ldapsearch shows the values for nsslapd-cachememsize updated to 200MB.
>
> ldapsearch -LLL -o ldif-wrap=no -D "cn=directory manager" -w 
> 'mypassword' -b 'cn=userRoot,cn=ldbm 
> database,cn=plugins,cn=config'|grep nsslapd-cachememsize
> nsslapd-cachememsize: 209715200
>
>
> So, it seems to have updated though seeing that warning(WARNING: 
> ipaca: entry cache size 10485760B is less than db size 11599872B) in 
> the log confuses me a bit.
>
> Thers one more entry that I found from the ldapsearch to be bit low
>
> nsslapd-dncachememsize: 10485760
> maxdncachesize: 10485760
>
> Should I update these as well to a higher value
>
> At the time when the issue happened, the memory usage as well as the 
> overall load of the system was very low .
> I will try reproducing the issue atleast in my QA env..probably by 
> trying to mock  simultaneous parallel logins to a large number of hosts

To monitor your cache sizes, please use the dbmon.sh tool provided with 
your distro.  If that is not available with your particular distro, see 
https://github.com/richm/scripts/wiki/dbmon.sh

>
>
> thanks
> Rakesh
>
>
>
>
> On Mon, Aug 29, 2016 at 8:16 PM, thierry bordaz <tbordaz at redhat.com 
> <mailto:tbordaz at redhat.com>> wrote:
>
>     Hi Rakesh,
>
>     Those tuning may depend on the memory available on your machine.
>     nsslapd-cachememsize allows the entry cache to consume up to 200Mb
>     but its memory footprint is known to go above.
>     200Mb both looks pretty good to me. How large is your machine ?
>     What is your version of 389-ds ?
>
>     Those warnings do not change your settings. It just raise that
>     entry cache of 'ipaca' and 'retrocl' are small but it is fine. The
>     size of the entry cache is important mostly in userRoot.
>     You may double check the actual values, after restart, with
>     ldapsearch on 'cn=userRoot,cn=ldbm database,cn=plugins,cn=config'
>     and 'cn=config,cn=ldbm database,cn=plugins,cn=config'.
>
>     A step is to know what will be response time of DS to know if it
>     is responsible of the hang or not.
>     The logs and possibly pstack during those intermittent hangs will
>     help to determine that.
>
>     regards
>     thierry
>
>
>
>
>
>     On 08/29/2016 04:25 PM, Rakesh Rajasekharan wrote:
>>     I tried increasing the nsslapd-dbcachesize and
>>     nsslapd-cachememsize in my QA envs to 200MB.
>>
>>     However, in my log files, I still see this message
>>     [29/Aug/2016:04:34:37 +0000] - WARNING: ipaca: entry cache size
>>     10485760B is less than db size 11599872B; We recommend to
>>     increase the entry cache size nsslapd-cachememsize.
>>     [29/Aug/2016:04:34:37 +0000] - WARNING: changelog: entry cache
>>     size 2097152B is less than db size 441647104B; We recommend to
>>     increase the entry cache size nsslapd-cachememsize.
>>
>>     these are my ldif files that i used to modify the values
>>     modify entry cache size
>>     cat modify-cache-mem-size.ldif
>>     dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
>>     changetype: modify
>>     replace: nsslapd-cachememsize
>>     nsslapd-cachememsize: 209715200
>>
>>     modify db cache size
>>     cat modfy-db-cache-size.ldif
>>     dn: cn=config,cn=ldbm database,cn=plugins,cn=config
>>     changetype: modify
>>     replace: nsslapd-dbcachesize
>>     nsslapd-dbcachesize: 209715200
>>
>>     After modifying , i restarted IPA services
>>
>>     Is there anything else that  I need to take care of as the logs
>>     suggest its still not getting the updated values
>>
>>     Thanks
>>     Rakesh
>>
>>     On Mon, Aug 29, 2016 at 6:07 PM, Rakesh Rajasekharan
>>     <rakesh.rajasekharan at gmail.com
>>     <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>
>>         Hi Thierry,
>>
>>         Coz of the issues we had to revert back to earlier running
>>         openldap in production.
>>
>>         I have now done a few TCP related changes in sysctl.conf and
>>         have also increased the nsslapd-dbcachesize and
>>         nsslapd-cachememsize to 200MB
>>
>>         I will again start migrating hosts back to IPA and see if I
>>         face the earlier issue.
>>
>>         I will update back once I have something
>>
>>
>>         Thanks,
>>         Rakesh
>>
>>
>>
>>         On Thu, Aug 25, 2016 at 2:17 PM, thierry bordaz
>>         <tbordaz at redhat.com <mailto:tbordaz at redhat.com>> wrote:
>>
>>
>>
>>             On 08/25/2016 10:15 AM, Rakesh Rajasekharan wrote:
>>>             All of the troubleshooting seems fine.
>>>
>>>
>>>             However, Running libconv.pl <http://libconv.pl> gives me
>>>             this output
>>>
>>>             ----- Recommendations -----
>>>
>>>              1.  You have unindexed components, this can be caused
>>>             from a search on an unindexed attribute, or your
>>>             returned results exceeded the allidsthreshold. Unindexed
>>>             components are not recommended. To refuse unindexed
>>>             searches, switch 'nsslapd-require-index' to 'on' under
>>>             your database entry (e.g. cn=UserRoot,cn=ldbm
>>>             database,cn=plugins,cn=config).
>>>
>>>              2.  You have a significant difference between binds and
>>>             unbinds.  You may want to investigate this difference.
>>>
>>>
>>>             I feel, this could be a pointer to things going slow..
>>>             and IPA hanging. I think i now have something that I can
>>>             try and nail down this issue.
>>>
>>>             On a sidenote, I was earlier running openldap and
>>>             migrated over to Freeipa,
>>>
>>>             Thanks
>>>             Rakesh
>>>
>>>
>>>
>>>             On Wed, Aug 24, 2016 at 12:38 PM, Petr Spacek
>>>             <pspacek at redhat.com <mailto:pspacek at redhat.com>> wrote:
>>>
>>>                 On 23.8.2016 18:44, Rakesh Rajasekharan wrote:
>>>                 > I think thers something seriously wrong with my system
>>>                 >
>>>                 > not able to run any  IPA commands
>>>                 >
>>>                 > klist
>>>                 > Ticket cache: KEYRING:persistent:0:0
>>>                 > Default principal: admin at XYZ.COM
>>>                 <mailto:admin at XYZ.COM>
>>>                 >
>>>                 > Valid starting      Expires       Service principal
>>>                 > 2016-08-23T16:26:36 2016-08-24T16:26:22
>>>                 krbtgt/XYZ.COM at XYZ.COM <mailto:XYZ.COM at XYZ.COM>
>>>                 >
>>>                 >
>>>                 > [root at prod-ipa-master-1a :~] ipactl status
>>>                 > Directory Service: RUNNING
>>>                 > krb5kdc Service: RUNNING
>>>                 > kadmin Service: RUNNING
>>>                 > ipa_memcached Service: RUNNING
>>>                 > httpd Service: RUNNING
>>>                 > pki-tomcatd Service: RUNNING
>>>                 > ipa-otpd Service: RUNNING
>>>                 > ipa: INFO: The ipactl command was successful
>>>                 >
>>>                 >
>>>                 >
>>>                 > [root at prod-ipa-master :~] ipa user-find p-testuser
>>>                 > ipa: ERROR: Kerberos error: ('Unspecified GSS
>>>                 failure.  Minor code may
>>>                 > provide more information', 851968)/("Cannot
>>>                 contact any KDC for realm '
>>>                 > XYZ.COM <http://XYZ.COM>'", -1765328228)
>>>
>>
>>             Hi Rakesh,
>>
>>                 Having a reproducible test case would you rerun the
>>                 command above.
>>                 During its processing you may monitor DS process load
>>                 (top). If it is high, you may get some pstacks of it.
>>                 Also would you attach the part of DS access logs
>>                 taken during the command.
>>
>>                 regards
>>                 thierry
>>
>>>                 >
>>>
>>>                 This is weird because the server seems to be up.
>>>
>>>                 Please follow
>>>                 http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos
>>>                 <http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos>
>>>
>>>                 Petr^2 Spacek
>>>
>>>                 >
>>>                 >
>>>                 > Thanks
>>>                 >
>>>                 > Rakesh
>>>                 >
>>>                 > On Tue, Aug 23, 2016 at 10:01 PM, Rakesh
>>>                 Rajasekharan <
>>>                 > rakesh.rajasekharan at gmail.com
>>>                 <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>>                 >
>>>                 >> i changed the loggin level to 4 . Modifying
>>>                 nsslapd-accesslog-level
>>>                 >>
>>>                 >> But, the hang is still there. though I dont see
>>>                 the sigfault now
>>>                 >>
>>>                 >>
>>>                 >>
>>>                 >>
>>>                 >> On Tue, Aug 23, 2016 at 9:02 PM, Rakesh
>>>                 Rajasekharan <
>>>                 >> rakesh.rajasekharan at gmail.com
>>>                 <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>>                 >>
>>>                 >>> My disk was getting filled too fast
>>>                 >>>
>>>                 >>> logs under /var/log/dirsrv was coming around 5
>>>                 gb quickly filling up
>>>                 >>>
>>>                 >>> Is there a way to make the logging less verbose
>>>                 >>>
>>>                 >>>
>>>                 >>>
>>>                 >>> On Tue, Aug 23, 2016 at 6:41 PM, Petr Spacek
>>>                 <pspacek at redhat.com <mailto:pspacek at redhat.com>> wrote:
>>>                 >>>
>>>                 >>>> On 23.8.2016 15:07, Rakesh Rajasekharan wrote:
>>>                 >>>>> I was able to fix that may be temporarily...
>>>                 when i checked the
>>>                 >>>> network..
>>>                 >>>>> there was another process that was running and
>>>                 consuming a lot of
>>>                 >>>> network (
>>>                 >>>>> i have no idea who did that. I need to
>>>                 seriously start restricting
>>>                 >>>> people
>>>                 >>>>> access to this machine )
>>>                 >>>>>
>>>                 >>>>> after killing that perfomance improved drastically
>>>                 >>>>>
>>>                 >>>>> But now, suddenly I started experiencing the
>>>                 same hang.
>>>                 >>>>>
>>>                 >>>>> This time , I gert the following error when
>>>                 checked dmesg
>>>                 >>>>>
>>>                 >>>>> [  301.236976] ns-slapd[3124]: segfault at 0
>>>                 ip 00007f1de416951c sp
>>>                 >>>>> 00007f1dee1dba70 error 4 in
>>>                 libcos-plugin.so[7f1de4166000+b000]
>>>                 >>>>> [ 1116.248431] TCP: request_sock_TCP: Possible
>>>                 SYN flooding on port 88.
>>>                 >>>>> Sending cookies.  Check SNMP counters.
>>>                 >>>>> [11831.397037] ns-slapd[22550]: segfault at 0
>>>                 ip 00007f533d82251c sp
>>>                 >>>>> 00007f5347894a70 error 4 in
>>>                 libcos-plugin.so[7f533d81f000+b000]
>>>                 >>>>> [11832.727989] ns-slapd[22606]: segfault at 0
>>>                 ip 00007f6231eb951c sp
>>>                 >>>>> 00007f623bf2ba70 error 4 in
>>>                 libcos-plugin.so[7f6231eb6000+b00
>>>                 >>>>
>>>                 >>>> Okay, this one is serious. The LDAP server crashed.
>>>                 >>>>
>>>                 >>>> 1. Make sure all your packages are up-to-date.
>>>                 >>>>
>>>                 >>>> Please see
>>>                 >>>>
>>>                 http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d
>>>                 >>>> ebugging-crashes
>>>                 >>>> for further instructions how to debug this.
>>>                 >>>>
>>>                 >>>> Petr^2 Spacek
>>>                 >>>>
>>>                 >>>>>
>>>                 >>>>> and in /var/log/dirsrv/example-com/errors
>>>                 >>>>>
>>>                 >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>                 delete_changerecord:
>>>                 >>>> could
>>>                 >>>>> not delete change record 3291138 (rc: 32)
>>>                 >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>                 delete_changerecord:
>>>                 >>>> could
>>>                 >>>>> not delete change record 3291139 (rc: 32)
>>>                 >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>                 delete_changerecord:
>>>                 >>>> could
>>>                 >>>>> not delete change record 3291140 (rc: 32)
>>>                 >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>                 delete_changerecord:
>>>                 >>>> could
>>>                 >>>>> not delete change record 3291141 (rc: 32)
>>>                 >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>                 delete_changerecord:
>>>                 >>>> could
>>>                 >>>>> not delete change record 3291142 (rc: 32)
>>>                 >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>                 delete_changerecord:
>>>                 >>>> could
>>>                 >>>>> not delete change record 3291143 (rc: 32)
>>>                 >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>                 delete_changerecord:
>>>                 >>>> could
>>>                 >>>>> not delete change record 3291144 (rc: 32)
>>>                 >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>>                 delete_changerecord:
>>>                 >>>> could
>>>                 >>>>> not delete change record 3291145 (rc: 32)
>>>                 >>>>> [23/Aug/2016:12:49:50 +0000] - Retry count
>>>                 exceeded in delete
>>>                 >>>>> [23/Aug/2016:12:49:50 +0000] DSRetroclPlugin -
>>>                 delete_changerecord:
>>>                 >>>> could
>>>                 >>>>> not delete change record 3292734 (rc: 51)
>>>                 >>>>>
>>>                 >>>>>
>>>                 >>>>> Can  i do something about this error.. I treid
>>>                 to restart ipa a couple
>>>                 >>>> of
>>>                 >>>>> time but that did not help
>>>                 >>>>>
>>>                 >>>>> Thanks
>>>                 >>>>> Rakesh
>>>                 >>>>>
>>>                 >>>>> On Mon, Aug 22, 2016 at 2:27 PM, Petr Spacek
>>>                 <pspacek at redhat.com <mailto:pspacek at redhat.com>>
>>>                 >>>> wrote:
>>>                 >>>>>
>>>                 >>>>>> On 19.8.2016 19:32, Rakesh Rajasekharan wrote:
>>>                 >>>>>>> I am running my set up on AWS cloud, and
>>>                 entropy is low at around
>>>                 >>>> 180 .
>>>                 >>>>>>>
>>>                 >>>>>>> I plan to increase it bu installing haveged
>>>                 . But, would low entropy
>>>                 >>>> by
>>>                 >>>>>> any
>>>                 >>>>>>> chance cause this issue of intermittent hang .
>>>                 >>>>>>> Also, the hang is mostly observed when
>>>                 registering around 20 clients
>>>                 >>>>>>> together
>>>                 >>>>>>
>>>                 >>>>>> Possibly, I'm not sure. If you want to dig
>>>                 into this, I would do this:
>>>                 >>>>>> 1. look what process hangs on client (using
>>>                 pstree command or so)
>>>                 >>>>>> $ pstree
>>>                 >>>>>>
>>>                 >>>>>> 2. look to what server and port is the
>>>                 hanging client connected to
>>>                 >>>>>> $ lsof -p <PID of the hanging process>
>>>                 >>>>>>
>>>                 >>>>>> 3. jump to server and see what process is
>>>                 bound to the target port
>>>                 >>>>>> $ netstat -pn
>>>                 >>>>>>
>>>                 >>>>>> 4. see where the process if hanging
>>>                 >>>>>> $ strace -p <PID of the hanging process>
>>>                 >>>>>>
>>>                 >>>>>> I hope it helps.
>>>                 >>>>>>
>>>                 >>>>>> Petr^2 Spacek
>>>                 >>>>>>
>>>                 >>>>>>> On Fri, Aug 19, 2016 at 7:24 PM, Rakesh
>>>                 Rajasekharan <
>>>                 >>>>>>> rakesh.rajasekharan at gmail.com
>>>                 <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>>                 >>>>>>>
>>>                 >>>>>>>> yes there seems to be something thats
>>>                 worrying.. I have faced this
>>>                 >>>> today
>>>                 >>>>>>>> as well.
>>>                 >>>>>>>> There are few hosts around 280 odd left and
>>>                 when i try adding them
>>>                 >>>> to
>>>                 >>>>>> IPA
>>>                 >>>>>>>> , the slowness begins..
>>>                 >>>>>>>>
>>>                 >>>>>>>> all the ipa commands like ipa user-find..
>>>                 etc becomes very slow in
>>>                 >>>>>>>> responding.
>>>                 >>>>>>>>
>>>                 >>>>>>>> the SYNC_RECV are not many though just
>>>                 around 80-90 and today that
>>>                 >>>> was
>>>                 >>>>>>>> around 20 only
>>>                 >>>>>>>>
>>>                 >>>>>>>>
>>>                 >>>>>>>> I have for now increased
>>>                 tcp_max_syn_backlog to 5000.
>>>                 >>>>>>>> For now the slowness seems to have gone..
>>>                 but I will do a try
>>>                 >>>> adding the
>>>                 >>>>>>>> clients again tomorrow and see how it goes
>>>                 >>>>>>>>
>>>                 >>>>>>>> Thanks
>>>                 >>>>>>>> Rakesh
>>>                 >>>>>>>>
>>>                 >>>>>>>> The issues
>>>                 >>>>>>>>
>>>                 >>>>>>>> On Fri, Aug 19, 2016 at 12:58 PM, Petr
>>>                 Spacek <pspacek at redhat.com <mailto:pspacek at redhat.com>>
>>>                 >>>>>> wrote:
>>>                 >>>>>>>>
>>>                 >>>>>>>>> On 18.8.2016 17:23, Rakesh Rajasekharan wrote:
>>>                 >>>>>>>>>> Hi
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>> I am migrating to freeipa from openldap
>>>                 and have around 4000
>>>                 >>>> clients
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>> I had openned a another thread on that,
>>>                 but chose to start a new
>>>                 >>>> one
>>>                 >>>>>>>>> here
>>>                 >>>>>>>>>> as its a separate issue
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>> I was able to change the
>>>                 nssslapd-maxdescriptors adding an ldif
>>>                 >>>> file
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>> cat nsslapd-modify.ldif
>>>                 >>>>>>>>>> dn: cn=config
>>>                 >>>>>>>>>> changetype: modify
>>>                 >>>>>>>>>> replace: nsslapd-maxdescriptors
>>>                 >>>>>>>>>> nsslapd-maxdescriptors: 17000
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>> and running the ldapmodify command
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>> I have now started moving clients running
>>>                 an openldap to Freeipa
>>>                 >>>> and
>>>                 >>>>>>>>> have
>>>                 >>>>>>>>>> today moved close to 2000 clients
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>> However, I have noticed that IPA hangs
>>>                 intermittently.
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>> running a kinit admin returns the below error
>>>                 >>>>>>>>>> kinit: Generic error (see e-text) while
>>>                 getting initial
>>>                 >>>> credentials
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>> from the /var/log/messages, I see this entry
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>>  prod-ipa-master-int kernel:
>>>                 [104090.315801] TCP:
>>>                 >>>> request_sock_TCP:
>>>                 >>>>>>>>>> Possible SYN flooding on port 88. Sending
>>>                 cookies.  Check SNMP
>>>                 >>>>>> counters.
>>>                 >>>>>>>>>
>>>                 >>>>>>>>> I would be worried about this message.
>>>                 Maybe kernel/firewall is
>>>                 >>>> doing
>>>                 >>>>>>>>> something fishy behind your back and
>>>                 blocking some connections or
>>>                 >>>> so.
>>>                 >>>>>>>>>
>>>                 >>>>>>>>> Petr^2 Spacek
>>>                 >>>>>>>>>
>>>                 >>>>>>>>>
>>>                 >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int
>>>                 systemd[1]: Started Session
>>>                 >>>> 4885
>>>                 >>>>>> of
>>>                 >>>>>>>>>> user root.
>>>                 >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int
>>>                 systemd[1]: Starting Session
>>>                 >>>> 4885
>>>                 >>>>>> of
>>>                 >>>>>>>>>> user root.
>>>                 >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int
>>>                 systemd[1]: Started Session
>>>                 >>>> 4886
>>>                 >>>>>> of
>>>                 >>>>>>>>>> user root.
>>>                 >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int
>>>                 systemd[1]: Starting Session
>>>                 >>>> 4886
>>>                 >>>>>> of
>>>                 >>>>>>>>>> user root.
>>>                 >>>>>>>>>> Aug 18 13:02:40 prod-ipa-master-int
>>>                 python[28984]: ansible-command
>>>                 >>>>>>>>> Invoked
>>>                 >>>>>>>>>> with creates=None executable=None
>>>                 shell=True args= removes=None
>>>                 >>>>>>>>> warn=True
>>>                 >>>>>>>>>> chdir=None
>>>                 >>>>>>>>>> Aug 18 13:04:37 prod-ipa-master-int
>>>                 sssd_be: GSSAPI Error:
>>>                 >>>> Unspecified
>>>                 >>>>>>>>> GSS
>>>                 >>>>>>>>>> failure.  Minor code may provide more
>>>                 information (KDC returned
>>>                 >>>> error
>>>                 >>>>>>>>>> string: PROCESS_TGS)
>>>                 >>>>>>>>>>
>>>                 >>>>>>>>>> Could it be possible that its due to the
>>>                 initial load of adding
>>>                 >>>> the
>>>                 >>>>>>>>> clients
>>>                 >>>>>>>>>> or is there something else that I need to
>>>                 take care of.
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20160829/45144cf8/attachment.htm>