[Freeipa-users] Freeipa 4.2.0 hangs intermittently

Mon Aug 29 14:46:30 UTC 2016

Hi Rakesh,

Those tuning may depend on the memory available on your machine.
nsslapd-cachememsize allows the entry cache to consume up to 200Mb but 
its memory footprint is known to go above.
200Mb both looks pretty good to me. How large is your machine ? What is 
your version of 389-ds ?

Those warnings do not change your settings. It just raise that entry 
cache of 'ipaca' and 'retrocl' are small but it is fine. The size of the 
entry cache is important mostly in userRoot.
You may double check the actual values, after restart, with ldapsearch 
on 'cn=userRoot,cn=ldbm database,cn=plugins,cn=config' and 
'cn=config,cn=ldbm database,cn=plugins,cn=config'.

A step is to know what will be response time of DS to know if it is 
responsible of the hang or not.
The logs and possibly pstack during those intermittent hangs will help 
to determine that.

regards
thierry

On 08/29/2016 04:25 PM, Rakesh Rajasekharan wrote:
> I tried increasing the nsslapd-dbcachesize and nsslapd-cachememsize in 
> my QA envs to 200MB.
>
> However, in my log files, I still see this message
> [29/Aug/2016:04:34:37 +0000] - WARNING: ipaca: entry cache size 
> 10485760B is less than db size 11599872B; We recommend to increase the 
> entry cache size nsslapd-cachememsize.
> [29/Aug/2016:04:34:37 +0000] - WARNING: changelog: entry cache size 
> 2097152B is less than db size 441647104B; We recommend to increase the 
> entry cache size nsslapd-cachememsize.
>
> these are my ldif files that i used to modify the values
> modify entry cache size
> cat modify-cache-mem-size.ldif
> dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
> changetype: modify
> replace: nsslapd-cachememsize
> nsslapd-cachememsize: 209715200
>
> modify db cache size
> cat modfy-db-cache-size.ldif
> dn: cn=config,cn=ldbm database,cn=plugins,cn=config
> changetype: modify
> replace: nsslapd-dbcachesize
> nsslapd-dbcachesize: 209715200
>
> After modifying , i restarted IPA services
>
> Is there anything else that  I need to take care of as the logs 
> suggest its still not getting the updated values
>
> Thanks
> Rakesh
>
> On Mon, Aug 29, 2016 at 6:07 PM, Rakesh Rajasekharan 
> <rakesh.rajasekharan at gmail.com <mailto:rakesh.rajasekharan at gmail.com>> 
> wrote:
>
>     Hi Thierry,
>
>     Coz of the issues we had to revert back to earlier running
>     openldap in production.
>
>     I have now done a few TCP related changes in sysctl.conf and have
>     also increased the nsslapd-dbcachesize and nsslapd-cachememsize to
>     200MB
>
>     I will again start migrating hosts back to IPA and see if I face
>     the earlier issue.
>
>     I will update back once I have something
>
>
>     Thanks,
>     Rakesh
>
>
>
>     On Thu, Aug 25, 2016 at 2:17 PM, thierry bordaz
>     <tbordaz at redhat.com <mailto:tbordaz at redhat.com>> wrote:
>
>
>
>         On 08/25/2016 10:15 AM, Rakesh Rajasekharan wrote:
>>         All of the troubleshooting seems fine.
>>
>>
>>         However, Running libconv.pl <http://libconv.pl> gives me this
>>         output
>>
>>         ----- Recommendations -----
>>
>>          1.  You have unindexed components, this can be caused from a
>>         search on an unindexed attribute, or your returned results
>>         exceeded the allidsthreshold. Unindexed components are not
>>         recommended. To refuse unindexed searches, switch
>>         'nsslapd-require-index' to 'on' under your database entry
>>         (e.g. cn=UserRoot,cn=ldbm database,cn=plugins,cn=config).
>>
>>          2.  You have a significant difference between binds and
>>         unbinds.  You may want to investigate this difference.
>>
>>
>>         I feel, this could be a pointer to things going slow.. and
>>         IPA hanging. I think i now have something that I can try and
>>         nail down this issue.
>>
>>         On a sidenote, I was earlier running openldap and migrated
>>         over to Freeipa,
>>
>>         Thanks
>>         Rakesh
>>
>>
>>
>>         On Wed, Aug 24, 2016 at 12:38 PM, Petr Spacek
>>         <pspacek at redhat.com <mailto:pspacek at redhat.com>> wrote:
>>
>>             On 23.8.2016 18:44, Rakesh Rajasekharan wrote:
>>             > I think thers something seriously wrong with my system
>>             >
>>             > not able to run any  IPA commands
>>             >
>>             > klist
>>             > Ticket cache: KEYRING:persistent:0:0
>>             > Default principal: admin at XYZ.COM <mailto:admin at XYZ.COM>
>>             >
>>             > Valid starting       Expires             Service principal
>>             > 2016-08-23T16:26:36 2016-08-24T16:26:22 
>>             krbtgt/XYZ.COM at XYZ.COM <mailto:XYZ.COM at XYZ.COM>
>>             >
>>             >
>>             > [root at prod-ipa-master-1a :~] ipactl status
>>             > Directory Service: RUNNING
>>             > krb5kdc Service: RUNNING
>>             > kadmin Service: RUNNING
>>             > ipa_memcached Service: RUNNING
>>             > httpd Service: RUNNING
>>             > pki-tomcatd Service: RUNNING
>>             > ipa-otpd Service: RUNNING
>>             > ipa: INFO: The ipactl command was successful
>>             >
>>             >
>>             >
>>             > [root at prod-ipa-master :~] ipa user-find p-testuser
>>             > ipa: ERROR: Kerberos error: ('Unspecified GSS failure. 
>>             Minor code may
>>             > provide more information', 851968)/("Cannot contact any
>>             KDC for realm '
>>             > XYZ.COM <http://XYZ.COM>'", -1765328228)
>>
>
>         Hi Rakesh,
>
>             Having a reproducible test case would you rerun the
>             command above.
>             During its processing you may monitor DS process load
>             (top). If it is high, you may get some pstacks of it.
>             Also would you attach the part of DS access logs taken
>             during the command.
>
>             regards
>             thierry
>
>>             >
>>
>>             This is weird because the server seems to be up.
>>
>>             Please follow
>>             http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos
>>             <http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos>
>>
>>             Petr^2 Spacek
>>
>>             >
>>             >
>>             > Thanks
>>             >
>>             > Rakesh
>>             >
>>             > On Tue, Aug 23, 2016 at 10:01 PM, Rakesh Rajasekharan <
>>             > rakesh.rajasekharan at gmail.com
>>             <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>             >
>>             >> i changed the loggin level to 4 . Modifying
>>             nsslapd-accesslog-level
>>             >>
>>             >> But, the hang is still there. though I dont see the
>>             sigfault now
>>             >>
>>             >>
>>             >>
>>             >>
>>             >> On Tue, Aug 23, 2016 at 9:02 PM, Rakesh Rajasekharan <
>>             >> rakesh.rajasekharan at gmail.com
>>             <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>             >>
>>             >>> My disk was getting filled too fast
>>             >>>
>>             >>> logs under /var/log/dirsrv was coming around 5 gb
>>             quickly filling up
>>             >>>
>>             >>> Is there a way to make the logging less verbose
>>             >>>
>>             >>>
>>             >>>
>>             >>> On Tue, Aug 23, 2016 at 6:41 PM, Petr Spacek
>>             <pspacek at redhat.com <mailto:pspacek at redhat.com>> wrote:
>>             >>>
>>             >>>> On 23.8.2016 15:07, Rakesh Rajasekharan wrote:
>>             >>>>> I was able to fix that may be temporarily... when i
>>             checked the
>>             >>>> network..
>>             >>>>> there was another process that was running and
>>             consuming a lot of
>>             >>>> network (
>>             >>>>> i have no idea who did that. I need to seriously
>>             start restricting
>>             >>>> people
>>             >>>>> access to this machine )
>>             >>>>>
>>             >>>>> after killing that perfomance improved drastically
>>             >>>>>
>>             >>>>> But now, suddenly I started experiencing the same hang.
>>             >>>>>
>>             >>>>> This time , I gert the following error when checked
>>             dmesg
>>             >>>>>
>>             >>>>> [ 301.236976] ns-slapd[3124]: segfault at 0 ip
>>             00007f1de416951c sp
>>             >>>>> 00007f1dee1dba70 error 4 in
>>             libcos-plugin.so[7f1de4166000+b000]
>>             >>>>> [ 1116.248431] TCP: request_sock_TCP: Possible SYN
>>             flooding on port 88.
>>             >>>>> Sending cookies.  Check SNMP counters.
>>             >>>>> [11831.397037] ns-slapd[22550]: segfault at 0 ip
>>             00007f533d82251c sp
>>             >>>>> 00007f5347894a70 error 4 in
>>             libcos-plugin.so[7f533d81f000+b000]
>>             >>>>> [11832.727989] ns-slapd[22606]: segfault at 0 ip
>>             00007f6231eb951c sp
>>             >>>>> 00007f623bf2ba70 error 4 in
>>             libcos-plugin.so[7f6231eb6000+b00
>>             >>>>
>>             >>>> Okay, this one is serious. The LDAP server crashed.
>>             >>>>
>>             >>>> 1. Make sure all your packages are up-to-date.
>>             >>>>
>>             >>>> Please see
>>             >>>>
>>             http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d
>>             <http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d>
>>             >>>> ebugging-crashes
>>             >>>> for further instructions how to debug this.
>>             >>>>
>>             >>>> Petr^2 Spacek
>>             >>>>
>>             >>>>>
>>             >>>>> and in /var/log/dirsrv/example-com/errors
>>             >>>>>
>>             >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>             delete_changerecord:
>>             >>>> could
>>             >>>>> not delete change record 3291138 (rc: 32)
>>             >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>             delete_changerecord:
>>             >>>> could
>>             >>>>> not delete change record 3291139 (rc: 32)
>>             >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>             delete_changerecord:
>>             >>>> could
>>             >>>>> not delete change record 3291140 (rc: 32)
>>             >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>             delete_changerecord:
>>             >>>> could
>>             >>>>> not delete change record 3291141 (rc: 32)
>>             >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>             delete_changerecord:
>>             >>>> could
>>             >>>>> not delete change record 3291142 (rc: 32)
>>             >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>             delete_changerecord:
>>             >>>> could
>>             >>>>> not delete change record 3291143 (rc: 32)
>>             >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>             delete_changerecord:
>>             >>>> could
>>             >>>>> not delete change record 3291144 (rc: 32)
>>             >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>             delete_changerecord:
>>             >>>> could
>>             >>>>> not delete change record 3291145 (rc: 32)
>>             >>>>> [23/Aug/2016:12:49:50 +0000] - Retry count exceeded
>>             in delete
>>             >>>>> [23/Aug/2016:12:49:50 +0000] DSRetroclPlugin -
>>             delete_changerecord:
>>             >>>> could
>>             >>>>> not delete change record 3292734 (rc: 51)
>>             >>>>>
>>             >>>>>
>>             >>>>> Can  i do something about this error.. I treid to
>>             restart ipa a couple
>>             >>>> of
>>             >>>>> time but that did not help
>>             >>>>>
>>             >>>>> Thanks
>>             >>>>> Rakesh
>>             >>>>>
>>             >>>>> On Mon, Aug 22, 2016 at 2:27 PM, Petr Spacek
>>             <pspacek at redhat.com <mailto:pspacek at redhat.com>>
>>             >>>> wrote:
>>             >>>>>
>>             >>>>>> On 19.8.2016 19:32, Rakesh Rajasekharan wrote:
>>             >>>>>>> I am running my set up on AWS cloud, and entropy
>>             is low at around
>>             >>>> 180 .
>>             >>>>>>>
>>             >>>>>>> I plan to increase it bu installing haveged .
>>             But, would low entropy
>>             >>>> by
>>             >>>>>> any
>>             >>>>>>> chance cause this issue of intermittent hang .
>>             >>>>>>> Also, the hang is mostly observed when
>>             registering around 20 clients
>>             >>>>>>> together
>>             >>>>>>
>>             >>>>>> Possibly, I'm not sure. If you want to dig into
>>             this, I would do this:
>>             >>>>>> 1. look what process hangs on client (using pstree
>>             command or so)
>>             >>>>>> $ pstree
>>             >>>>>>
>>             >>>>>> 2. look to what server and port is the hanging
>>             client connected to
>>             >>>>>> $ lsof -p <PID of the hanging process>
>>             >>>>>>
>>             >>>>>> 3. jump to server and see what process is bound to
>>             the target port
>>             >>>>>> $ netstat -pn
>>             >>>>>>
>>             >>>>>> 4. see where the process if hanging
>>             >>>>>> $ strace -p <PID of the hanging process>
>>             >>>>>>
>>             >>>>>> I hope it helps.
>>             >>>>>>
>>             >>>>>> Petr^2 Spacek
>>             >>>>>>
>>             >>>>>>> On Fri, Aug 19, 2016 at 7:24 PM, Rakesh
>>             Rajasekharan <
>>             >>>>>>> rakesh.rajasekharan at gmail.com
>>             <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>             >>>>>>>
>>             >>>>>>>> yes there seems to be something thats worrying..
>>             I have faced this
>>             >>>> today
>>             >>>>>>>> as well.
>>             >>>>>>>> There are few hosts around 280 odd left and when
>>             i try adding them
>>             >>>> to
>>             >>>>>> IPA
>>             >>>>>>>> , the slowness begins..
>>             >>>>>>>>
>>             >>>>>>>> all the ipa commands like ipa user-find.. etc
>>             becomes very slow in
>>             >>>>>>>> responding.
>>             >>>>>>>>
>>             >>>>>>>> the SYNC_RECV are not many though just around
>>             80-90 and today that
>>             >>>> was
>>             >>>>>>>> around 20 only
>>             >>>>>>>>
>>             >>>>>>>>
>>             >>>>>>>> I have for now increased tcp_max_syn_backlog to
>>             5000.
>>             >>>>>>>> For now the slowness seems to have gone.. but I
>>             will do a try
>>             >>>> adding the
>>             >>>>>>>> clients again tomorrow and see how it goes
>>             >>>>>>>>
>>             >>>>>>>> Thanks
>>             >>>>>>>> Rakesh
>>             >>>>>>>>
>>             >>>>>>>> The issues
>>             >>>>>>>>
>>             >>>>>>>> On Fri, Aug 19, 2016 at 12:58 PM, Petr Spacek
>>             <pspacek at redhat.com <mailto:pspacek at redhat.com>>
>>             >>>>>> wrote:
>>             >>>>>>>>
>>             >>>>>>>>> On 18.8.2016 17:23, Rakesh Rajasekharan wrote:
>>             >>>>>>>>>> Hi
>>             >>>>>>>>>>
>>             >>>>>>>>>> I am migrating to freeipa from openldap and
>>             have around 4000
>>             >>>> clients
>>             >>>>>>>>>>
>>             >>>>>>>>>> I had openned a another thread on that, but
>>             chose to start a new
>>             >>>> one
>>             >>>>>>>>> here
>>             >>>>>>>>>> as its a separate issue
>>             >>>>>>>>>>
>>             >>>>>>>>>> I was able to change the
>>             nssslapd-maxdescriptors adding an ldif
>>             >>>> file
>>             >>>>>>>>>>
>>             >>>>>>>>>> cat nsslapd-modify.ldif
>>             >>>>>>>>>> dn: cn=config
>>             >>>>>>>>>> changetype: modify
>>             >>>>>>>>>> replace: nsslapd-maxdescriptors
>>             >>>>>>>>>> nsslapd-maxdescriptors: 17000
>>             >>>>>>>>>>
>>             >>>>>>>>>> and running the ldapmodify command
>>             >>>>>>>>>>
>>             >>>>>>>>>> I have now started moving clients running an
>>             openldap to Freeipa
>>             >>>> and
>>             >>>>>>>>> have
>>             >>>>>>>>>> today moved close to 2000 clients
>>             >>>>>>>>>>
>>             >>>>>>>>>> However, I have noticed that IPA hangs
>>             intermittently.
>>             >>>>>>>>>>
>>             >>>>>>>>>> running a kinit admin returns the below error
>>             >>>>>>>>>> kinit: Generic error (see e-text) while
>>             getting initial
>>             >>>> credentials
>>             >>>>>>>>>>
>>             >>>>>>>>>> from the /var/log/messages, I see this entry
>>             >>>>>>>>>>
>>             >>>>>>>>>> prod-ipa-master-int kernel: [104090.315801] TCP:
>>             >>>> request_sock_TCP:
>>             >>>>>>>>>> Possible SYN flooding on port 88. Sending
>>             cookies.  Check SNMP
>>             >>>>>> counters.
>>             >>>>>>>>>
>>             >>>>>>>>> I would be worried about this message. Maybe
>>             kernel/firewall is
>>             >>>> doing
>>             >>>>>>>>> something fishy behind your back and blocking
>>             some connections or
>>             >>>> so.
>>             >>>>>>>>>
>>             >>>>>>>>> Petr^2 Spacek
>>             >>>>>>>>>
>>             >>>>>>>>>
>>             >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int
>>             systemd[1]: Started Session
>>             >>>> 4885
>>             >>>>>> of
>>             >>>>>>>>>> user root.
>>             >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int
>>             systemd[1]: Starting Session
>>             >>>> 4885
>>             >>>>>> of
>>             >>>>>>>>>> user root.
>>             >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int
>>             systemd[1]: Started Session
>>             >>>> 4886
>>             >>>>>> of
>>             >>>>>>>>>> user root.
>>             >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int
>>             systemd[1]: Starting Session
>>             >>>> 4886
>>             >>>>>> of
>>             >>>>>>>>>> user root.
>>             >>>>>>>>>> Aug 18 13:02:40 prod-ipa-master-int
>>             python[28984]: ansible-command
>>             >>>>>>>>> Invoked
>>             >>>>>>>>>> with creates=None executable=None shell=True
>>             args= removes=None
>>             >>>>>>>>> warn=True
>>             >>>>>>>>>> chdir=None
>>             >>>>>>>>>> Aug 18 13:04:37 prod-ipa-master-int sssd_be:
>>             GSSAPI Error:
>>             >>>> Unspecified
>>             >>>>>>>>> GSS
>>             >>>>>>>>>> failure.  Minor code may provide more
>>             information (KDC returned
>>             >>>> error
>>             >>>>>>>>>> string: PROCESS_TGS)
>>             >>>>>>>>>>
>>             >>>>>>>>>> Could it be possible that its due to the
>>             initial load of adding
>>             >>>> the
>>             >>>>>>>>> clients
>>             >>>>>>>>>> or is there something else that I need to take
>>             care of.
>>
>>
>>
>>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20160829/ce4246df/attachment.htm>