[Freeipa-users] Freeipa 4.2.0 hangs intermittently
Rich Megginson
rmeggins at redhat.com
Mon Aug 29 17:48:12 UTC 2016
On 08/29/2016 10:53 AM, Rakesh Rajasekharan wrote:
> Hi Thierry,
>
> My machine has 30GB RAM ..and 389-ds version is 1.3.4
>
> ldapsearch shows the values for nsslapd-cachememsize updated to 200MB.
>
> ldapsearch -LLL -o ldif-wrap=no -D "cn=directory manager" -w
> 'mypassword' -b 'cn=userRoot,cn=ldbm
> database,cn=plugins,cn=config'|grep nsslapd-cachememsize
> nsslapd-cachememsize: 209715200
>
>
> So, it seems to have updated though seeing that warning(WARNING:
> ipaca: entry cache size 10485760B is less than db size 11599872B) in
> the log confuses me a bit.
>
> Thers one more entry that I found from the ldapsearch to be bit low
>
> nsslapd-dncachememsize: 10485760
> maxdncachesize: 10485760
>
> Should I update these as well to a higher value
>
> At the time when the issue happened, the memory usage as well as the
> overall load of the system was very low .
> I will try reproducing the issue atleast in my QA env..probably by
> trying to mock simultaneous parallel logins to a large number of hosts
To monitor your cache sizes, please use the dbmon.sh tool provided with
your distro. If that is not available with your particular distro, see
https://github.com/richm/scripts/wiki/dbmon.sh
>
>
> thanks
> Rakesh
>
>
>
>
> On Mon, Aug 29, 2016 at 8:16 PM, thierry bordaz <tbordaz at redhat.com
> <mailto:tbordaz at redhat.com>> wrote:
>
> Hi Rakesh,
>
> Those tuning may depend on the memory available on your machine.
> nsslapd-cachememsize allows the entry cache to consume up to 200Mb
> but its memory footprint is known to go above.
> 200Mb both looks pretty good to me. How large is your machine ?
> What is your version of 389-ds ?
>
> Those warnings do not change your settings. It just raise that
> entry cache of 'ipaca' and 'retrocl' are small but it is fine. The
> size of the entry cache is important mostly in userRoot.
> You may double check the actual values, after restart, with
> ldapsearch on 'cn=userRoot,cn=ldbm database,cn=plugins,cn=config'
> and 'cn=config,cn=ldbm database,cn=plugins,cn=config'.
>
> A step is to know what will be response time of DS to know if it
> is responsible of the hang or not.
> The logs and possibly pstack during those intermittent hangs will
> help to determine that.
>
> regards
> thierry
>
>
>
>
>
> On 08/29/2016 04:25 PM, Rakesh Rajasekharan wrote:
>> I tried increasing the nsslapd-dbcachesize and
>> nsslapd-cachememsize in my QA envs to 200MB.
>>
>> However, in my log files, I still see this message
>> [29/Aug/2016:04:34:37 +0000] - WARNING: ipaca: entry cache size
>> 10485760B is less than db size 11599872B; We recommend to
>> increase the entry cache size nsslapd-cachememsize.
>> [29/Aug/2016:04:34:37 +0000] - WARNING: changelog: entry cache
>> size 2097152B is less than db size 441647104B; We recommend to
>> increase the entry cache size nsslapd-cachememsize.
>>
>> these are my ldif files that i used to modify the values
>> modify entry cache size
>> cat modify-cache-mem-size.ldif
>> dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
>> changetype: modify
>> replace: nsslapd-cachememsize
>> nsslapd-cachememsize: 209715200
>>
>> modify db cache size
>> cat modfy-db-cache-size.ldif
>> dn: cn=config,cn=ldbm database,cn=plugins,cn=config
>> changetype: modify
>> replace: nsslapd-dbcachesize
>> nsslapd-dbcachesize: 209715200
>>
>> After modifying , i restarted IPA services
>>
>> Is there anything else that I need to take care of as the logs
>> suggest its still not getting the updated values
>>
>> Thanks
>> Rakesh
>>
>> On Mon, Aug 29, 2016 at 6:07 PM, Rakesh Rajasekharan
>> <rakesh.rajasekharan at gmail.com
>> <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>
>> Hi Thierry,
>>
>> Coz of the issues we had to revert back to earlier running
>> openldap in production.
>>
>> I have now done a few TCP related changes in sysctl.conf and
>> have also increased the nsslapd-dbcachesize and
>> nsslapd-cachememsize to 200MB
>>
>> I will again start migrating hosts back to IPA and see if I
>> face the earlier issue.
>>
>> I will update back once I have something
>>
>>
>> Thanks,
>> Rakesh
>>
>>
>>
>> On Thu, Aug 25, 2016 at 2:17 PM, thierry bordaz
>> <tbordaz at redhat.com <mailto:tbordaz at redhat.com>> wrote:
>>
>>
>>
>> On 08/25/2016 10:15 AM, Rakesh Rajasekharan wrote:
>>> All of the troubleshooting seems fine.
>>>
>>>
>>> However, Running libconv.pl <http://libconv.pl> gives me
>>> this output
>>>
>>> ----- Recommendations -----
>>>
>>> 1. You have unindexed components, this can be caused
>>> from a search on an unindexed attribute, or your
>>> returned results exceeded the allidsthreshold. Unindexed
>>> components are not recommended. To refuse unindexed
>>> searches, switch 'nsslapd-require-index' to 'on' under
>>> your database entry (e.g. cn=UserRoot,cn=ldbm
>>> database,cn=plugins,cn=config).
>>>
>>> 2. You have a significant difference between binds and
>>> unbinds. You may want to investigate this difference.
>>>
>>>
>>> I feel, this could be a pointer to things going slow..
>>> and IPA hanging. I think i now have something that I can
>>> try and nail down this issue.
>>>
>>> On a sidenote, I was earlier running openldap and
>>> migrated over to Freeipa,
>>>
>>> Thanks
>>> Rakesh
>>>
>>>
>>>
>>> On Wed, Aug 24, 2016 at 12:38 PM, Petr Spacek
>>> <pspacek at redhat.com <mailto:pspacek at redhat.com>> wrote:
>>>
>>> On 23.8.2016 18:44, Rakesh Rajasekharan wrote:
>>> > I think thers something seriously wrong with my system
>>> >
>>> > not able to run any IPA commands
>>> >
>>> > klist
>>> > Ticket cache: KEYRING:persistent:0:0
>>> > Default principal: admin at XYZ.COM
>>> <mailto:admin at XYZ.COM>
>>> >
>>> > Valid starting Expires Service principal
>>> > 2016-08-23T16:26:36 2016-08-24T16:26:22
>>> krbtgt/XYZ.COM at XYZ.COM <mailto:XYZ.COM at XYZ.COM>
>>> >
>>> >
>>> > [root at prod-ipa-master-1a :~] ipactl status
>>> > Directory Service: RUNNING
>>> > krb5kdc Service: RUNNING
>>> > kadmin Service: RUNNING
>>> > ipa_memcached Service: RUNNING
>>> > httpd Service: RUNNING
>>> > pki-tomcatd Service: RUNNING
>>> > ipa-otpd Service: RUNNING
>>> > ipa: INFO: The ipactl command was successful
>>> >
>>> >
>>> >
>>> > [root at prod-ipa-master :~] ipa user-find p-testuser
>>> > ipa: ERROR: Kerberos error: ('Unspecified GSS
>>> failure. Minor code may
>>> > provide more information', 851968)/("Cannot
>>> contact any KDC for realm '
>>> > XYZ.COM <http://XYZ.COM>'", -1765328228)
>>>
>>
>> Hi Rakesh,
>>
>> Having a reproducible test case would you rerun the
>> command above.
>> During its processing you may monitor DS process load
>> (top). If it is high, you may get some pstacks of it.
>> Also would you attach the part of DS access logs
>> taken during the command.
>>
>> regards
>> thierry
>>
>>> >
>>>
>>> This is weird because the server seems to be up.
>>>
>>> Please follow
>>> http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos
>>> <http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos>
>>>
>>> Petr^2 Spacek
>>>
>>> >
>>> >
>>> > Thanks
>>> >
>>> > Rakesh
>>> >
>>> > On Tue, Aug 23, 2016 at 10:01 PM, Rakesh
>>> Rajasekharan <
>>> > rakesh.rajasekharan at gmail.com
>>> <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>> >
>>> >> i changed the loggin level to 4 . Modifying
>>> nsslapd-accesslog-level
>>> >>
>>> >> But, the hang is still there. though I dont see
>>> the sigfault now
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Aug 23, 2016 at 9:02 PM, Rakesh
>>> Rajasekharan <
>>> >> rakesh.rajasekharan at gmail.com
>>> <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>> >>
>>> >>> My disk was getting filled too fast
>>> >>>
>>> >>> logs under /var/log/dirsrv was coming around 5
>>> gb quickly filling up
>>> >>>
>>> >>> Is there a way to make the logging less verbose
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Aug 23, 2016 at 6:41 PM, Petr Spacek
>>> <pspacek at redhat.com <mailto:pspacek at redhat.com>> wrote:
>>> >>>
>>> >>>> On 23.8.2016 15:07, Rakesh Rajasekharan wrote:
>>> >>>>> I was able to fix that may be temporarily...
>>> when i checked the
>>> >>>> network..
>>> >>>>> there was another process that was running and
>>> consuming a lot of
>>> >>>> network (
>>> >>>>> i have no idea who did that. I need to
>>> seriously start restricting
>>> >>>> people
>>> >>>>> access to this machine )
>>> >>>>>
>>> >>>>> after killing that perfomance improved drastically
>>> >>>>>
>>> >>>>> But now, suddenly I started experiencing the
>>> same hang.
>>> >>>>>
>>> >>>>> This time , I gert the following error when
>>> checked dmesg
>>> >>>>>
>>> >>>>> [ 301.236976] ns-slapd[3124]: segfault at 0
>>> ip 00007f1de416951c sp
>>> >>>>> 00007f1dee1dba70 error 4 in
>>> libcos-plugin.so[7f1de4166000+b000]
>>> >>>>> [ 1116.248431] TCP: request_sock_TCP: Possible
>>> SYN flooding on port 88.
>>> >>>>> Sending cookies. Check SNMP counters.
>>> >>>>> [11831.397037] ns-slapd[22550]: segfault at 0
>>> ip 00007f533d82251c sp
>>> >>>>> 00007f5347894a70 error 4 in
>>> libcos-plugin.so[7f533d81f000+b000]
>>> >>>>> [11832.727989] ns-slapd[22606]: segfault at 0
>>> ip 00007f6231eb951c sp
>>> >>>>> 00007f623bf2ba70 error 4 in
>>> libcos-plugin.so[7f6231eb6000+b00
>>> >>>>
>>> >>>> Okay, this one is serious. The LDAP server crashed.
>>> >>>>
>>> >>>> 1. Make sure all your packages are up-to-date.
>>> >>>>
>>> >>>> Please see
>>> >>>>
>>> http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d
>>> >>>> ebugging-crashes
>>> >>>> for further instructions how to debug this.
>>> >>>>
>>> >>>> Petr^2 Spacek
>>> >>>>
>>> >>>>>
>>> >>>>> and in /var/log/dirsrv/example-com/errors
>>> >>>>>
>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>> delete_changerecord:
>>> >>>> could
>>> >>>>> not delete change record 3291138 (rc: 32)
>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>> delete_changerecord:
>>> >>>> could
>>> >>>>> not delete change record 3291139 (rc: 32)
>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>> delete_changerecord:
>>> >>>> could
>>> >>>>> not delete change record 3291140 (rc: 32)
>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>> delete_changerecord:
>>> >>>> could
>>> >>>>> not delete change record 3291141 (rc: 32)
>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>> delete_changerecord:
>>> >>>> could
>>> >>>>> not delete change record 3291142 (rc: 32)
>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>> delete_changerecord:
>>> >>>> could
>>> >>>>> not delete change record 3291143 (rc: 32)
>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>> delete_changerecord:
>>> >>>> could
>>> >>>>> not delete change record 3291144 (rc: 32)
>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin -
>>> delete_changerecord:
>>> >>>> could
>>> >>>>> not delete change record 3291145 (rc: 32)
>>> >>>>> [23/Aug/2016:12:49:50 +0000] - Retry count
>>> exceeded in delete
>>> >>>>> [23/Aug/2016:12:49:50 +0000] DSRetroclPlugin -
>>> delete_changerecord:
>>> >>>> could
>>> >>>>> not delete change record 3292734 (rc: 51)
>>> >>>>>
>>> >>>>>
>>> >>>>> Can i do something about this error.. I treid
>>> to restart ipa a couple
>>> >>>> of
>>> >>>>> time but that did not help
>>> >>>>>
>>> >>>>> Thanks
>>> >>>>> Rakesh
>>> >>>>>
>>> >>>>> On Mon, Aug 22, 2016 at 2:27 PM, Petr Spacek
>>> <pspacek at redhat.com <mailto:pspacek at redhat.com>>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>>> On 19.8.2016 19:32, Rakesh Rajasekharan wrote:
>>> >>>>>>> I am running my set up on AWS cloud, and
>>> entropy is low at around
>>> >>>> 180 .
>>> >>>>>>>
>>> >>>>>>> I plan to increase it bu installing haveged
>>> . But, would low entropy
>>> >>>> by
>>> >>>>>> any
>>> >>>>>>> chance cause this issue of intermittent hang .
>>> >>>>>>> Also, the hang is mostly observed when
>>> registering around 20 clients
>>> >>>>>>> together
>>> >>>>>>
>>> >>>>>> Possibly, I'm not sure. If you want to dig
>>> into this, I would do this:
>>> >>>>>> 1. look what process hangs on client (using
>>> pstree command or so)
>>> >>>>>> $ pstree
>>> >>>>>>
>>> >>>>>> 2. look to what server and port is the
>>> hanging client connected to
>>> >>>>>> $ lsof -p <PID of the hanging process>
>>> >>>>>>
>>> >>>>>> 3. jump to server and see what process is
>>> bound to the target port
>>> >>>>>> $ netstat -pn
>>> >>>>>>
>>> >>>>>> 4. see where the process if hanging
>>> >>>>>> $ strace -p <PID of the hanging process>
>>> >>>>>>
>>> >>>>>> I hope it helps.
>>> >>>>>>
>>> >>>>>> Petr^2 Spacek
>>> >>>>>>
>>> >>>>>>> On Fri, Aug 19, 2016 at 7:24 PM, Rakesh
>>> Rajasekharan <
>>> >>>>>>> rakesh.rajasekharan at gmail.com
>>> <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>> >>>>>>>
>>> >>>>>>>> yes there seems to be something thats
>>> worrying.. I have faced this
>>> >>>> today
>>> >>>>>>>> as well.
>>> >>>>>>>> There are few hosts around 280 odd left and
>>> when i try adding them
>>> >>>> to
>>> >>>>>> IPA
>>> >>>>>>>> , the slowness begins..
>>> >>>>>>>>
>>> >>>>>>>> all the ipa commands like ipa user-find..
>>> etc becomes very slow in
>>> >>>>>>>> responding.
>>> >>>>>>>>
>>> >>>>>>>> the SYNC_RECV are not many though just
>>> around 80-90 and today that
>>> >>>> was
>>> >>>>>>>> around 20 only
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> I have for now increased
>>> tcp_max_syn_backlog to 5000.
>>> >>>>>>>> For now the slowness seems to have gone..
>>> but I will do a try
>>> >>>> adding the
>>> >>>>>>>> clients again tomorrow and see how it goes
>>> >>>>>>>>
>>> >>>>>>>> Thanks
>>> >>>>>>>> Rakesh
>>> >>>>>>>>
>>> >>>>>>>> The issues
>>> >>>>>>>>
>>> >>>>>>>> On Fri, Aug 19, 2016 at 12:58 PM, Petr
>>> Spacek <pspacek at redhat.com <mailto:pspacek at redhat.com>>
>>> >>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> On 18.8.2016 17:23, Rakesh Rajasekharan wrote:
>>> >>>>>>>>>> Hi
>>> >>>>>>>>>>
>>> >>>>>>>>>> I am migrating to freeipa from openldap
>>> and have around 4000
>>> >>>> clients
>>> >>>>>>>>>>
>>> >>>>>>>>>> I had openned a another thread on that,
>>> but chose to start a new
>>> >>>> one
>>> >>>>>>>>> here
>>> >>>>>>>>>> as its a separate issue
>>> >>>>>>>>>>
>>> >>>>>>>>>> I was able to change the
>>> nssslapd-maxdescriptors adding an ldif
>>> >>>> file
>>> >>>>>>>>>>
>>> >>>>>>>>>> cat nsslapd-modify.ldif
>>> >>>>>>>>>> dn: cn=config
>>> >>>>>>>>>> changetype: modify
>>> >>>>>>>>>> replace: nsslapd-maxdescriptors
>>> >>>>>>>>>> nsslapd-maxdescriptors: 17000
>>> >>>>>>>>>>
>>> >>>>>>>>>> and running the ldapmodify command
>>> >>>>>>>>>>
>>> >>>>>>>>>> I have now started moving clients running
>>> an openldap to Freeipa
>>> >>>> and
>>> >>>>>>>>> have
>>> >>>>>>>>>> today moved close to 2000 clients
>>> >>>>>>>>>>
>>> >>>>>>>>>> However, I have noticed that IPA hangs
>>> intermittently.
>>> >>>>>>>>>>
>>> >>>>>>>>>> running a kinit admin returns the below error
>>> >>>>>>>>>> kinit: Generic error (see e-text) while
>>> getting initial
>>> >>>> credentials
>>> >>>>>>>>>>
>>> >>>>>>>>>> from the /var/log/messages, I see this entry
>>> >>>>>>>>>>
>>> >>>>>>>>>> prod-ipa-master-int kernel:
>>> [104090.315801] TCP:
>>> >>>> request_sock_TCP:
>>> >>>>>>>>>> Possible SYN flooding on port 88. Sending
>>> cookies. Check SNMP
>>> >>>>>> counters.
>>> >>>>>>>>>
>>> >>>>>>>>> I would be worried about this message.
>>> Maybe kernel/firewall is
>>> >>>> doing
>>> >>>>>>>>> something fishy behind your back and
>>> blocking some connections or
>>> >>>> so.
>>> >>>>>>>>>
>>> >>>>>>>>> Petr^2 Spacek
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int
>>> systemd[1]: Started Session
>>> >>>> 4885
>>> >>>>>> of
>>> >>>>>>>>>> user root.
>>> >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int
>>> systemd[1]: Starting Session
>>> >>>> 4885
>>> >>>>>> of
>>> >>>>>>>>>> user root.
>>> >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int
>>> systemd[1]: Started Session
>>> >>>> 4886
>>> >>>>>> of
>>> >>>>>>>>>> user root.
>>> >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int
>>> systemd[1]: Starting Session
>>> >>>> 4886
>>> >>>>>> of
>>> >>>>>>>>>> user root.
>>> >>>>>>>>>> Aug 18 13:02:40 prod-ipa-master-int
>>> python[28984]: ansible-command
>>> >>>>>>>>> Invoked
>>> >>>>>>>>>> with creates=None executable=None
>>> shell=True args= removes=None
>>> >>>>>>>>> warn=True
>>> >>>>>>>>>> chdir=None
>>> >>>>>>>>>> Aug 18 13:04:37 prod-ipa-master-int
>>> sssd_be: GSSAPI Error:
>>> >>>> Unspecified
>>> >>>>>>>>> GSS
>>> >>>>>>>>>> failure. Minor code may provide more
>>> information (KDC returned
>>> >>>> error
>>> >>>>>>>>>> string: PROCESS_TGS)
>>> >>>>>>>>>>
>>> >>>>>>>>>> Could it be possible that its due to the
>>> initial load of adding
>>> >>>> the
>>> >>>>>>>>> clients
>>> >>>>>>>>>> or is there something else that I need to
>>> take care of.
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20160829/45144cf8/attachment.htm>
More information about the Freeipa-users
mailing list