[Freeipa-users] Freeipa 4.2.0 hangs intermittently

thierry bordaz tbordaz at redhat.com
Mon Sep 5 11:18:32 UTC 2016



On 09/05/2016 12:05 PM, Rakesh Rajasekharan wrote:
> Hi Thierry,
>
>
> I was getting the hang issue while running ipa-client-install 
> simultaneously on few clients..
> However, today, I am not able to replicate that.
>
> I could not get a gdb . But i will try getting that the next time I 
> face this issue.
>
> The CPU does not stay high.. it just momentarily touches a high value 
> and then drops down to around 2-7%
>
> One question I have is , is it ok to set it nsslapd-threadnumber to a 
> very high value .
> I have around 4000 clients and with nsslapd-maxthreadsperconn set to 
> 5..So, can I set nsslapd-threadnumber to around 25000.

Hello,

I know some users running in production with several hunderds of threads 
(>600) and this without problem.

I do not recall having suggested to increase that number and for what 
reason.
Usually 30 workers is a good enough value. It can create bootleneck if 
for some reason each operation is very long to satisfy and exhaust the 
number of workers. You can monitor the work queue:

    ldapsearch  -D "cn=directory manager" -w xxx -LLL -b "cn=monitor" -s
    base opsinitiated opscompleted


If opscompleted-opsinitiated remains close to threadnumber, then yes it 
would be valuable to increase it.

The computation #client * #async_op_per_client sound an overkill. Even 
if all clients send at the exact same time all their requests, it is 
very likely that some common resource (db page, log, allocator...) will 
serialize them. If you monitor a need to increase the work, you would 
for example set it to 50, then monitor, then set it to 100, then 
monitor... until you find a good enough value.
Note the increasing the #thread, increases the memory footprint that 
will reduce the efficiency of file system cache and can increase the 
response time.


best regards
thierry

>
> Thanks
>
> On Mon, Sep 5, 2016 at 1:03 PM, thierry bordaz <tbordaz at redhat.com 
> <mailto:tbordaz at redhat.com>> wrote:
>
>
>     Hi Rakesh,
>
>     Were you able to get a pstack or full stack with gdb
>     (http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-crashes
>     <http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-crashes>) when
>     the server hangs ?
>
>     If it happens with 500 threads as well as with 30, using 30
>     threads is a better choice to debug this issue.
>     I will try to reproduce using 150 parallel 'ipa user-find
>     p-testipa' commands
>
>     Something I am unsure is if the CPU consumption stays high (you
>     mentioned 340% CPU usage) as long as the hang happens or if after
>     a suddent shot up to 340% (that marks the beginning of the hang)
>     it drops and stay hanging ?
>
>     thanks
>     thierry
>
>     On 09/04/2016 08:40 PM, Rakesh Rajasekharan wrote:
>>     starce on the slapd process actually had this in the output..
>>     FUTEX_WAIT_PRIVATE
>>
>>     and checking for the number of threads slapd had.. there were
>>     5015 threads
>>
>>     ps -efL|grep slapd|wc -l
>>     5015
>>
>>     strace on most of the threads gave this output
>>
>>     strace -p 67411
>>     Process 67411 attached
>>     futex(0x7f3f0226b41c, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN
>>     (Resource temporarily unavailable)
>>     futex(0x7f3f0226b41c, FUTEX_WAIT_PRIVATE, 2, NULL^CProcess 67411
>>     detached
>>
>>
>>
>>
>>
>>     On Sun, Sep 4, 2016 at 5:34 PM, Rakesh Rajasekharan
>>     <rakesh.rajasekharan at gmail.com
>>     <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>
>>         I have again got the issue of IPA hanging.. The issue came up
>>         when i tried to run ipa-client-isntall on 142 clients
>>         simultaneously
>>
>>
>>         None of the IPA commands are responding,  and I see this error
>>
>>         ipa user-find p-testipa
>>         ipa: ERROR: Insufficient access: SASL(-1): generic failure:
>>         GSSAPI Error: Unspecified GSS failure.  Minor code may
>>         provide more information (KDC returned error string: PROCESS_TGS)
>>
>>          KRB5_TRACE=/dev/stdout kinit admin
>>         [41178] 1472984115.233214: Getting initial credentials for
>>         admin at XYZ.COM <mailto:admin at XYZ.COM>
>>         [41178] 1472984115.235257: Sending request (167 bytes) to
>>         XYZ.COM <http://XYZ.COM>
>>         [41178] 1472984115.235419: Initiating TCP connection to
>>         stream 10.1.3.36:88 <http://10.1.3.36:88>
>>         [41178] 1472984115.235685: Sending TCP request to stream
>>         10.1.3.36:88 <http://10.1.3.36:88>
>>         [41178] 1472984120.238914: Received answer (174 bytes) from
>>         stream 10.1.3.36:88 <http://10.1.3.36:88>
>>         [41178] 1472984120.238925: Terminating TCP connection to
>>         stream 10.1.3.36:88 <http://10.1.3.36:88>
>>         [41178] 1472984120.238993: Response was from master KDC
>>         [41
>>
>>
>>         Running an ldapsearch to see the db.. does not give any
>>         results and just hangs there
>>
>>         ldapsearch -x -D 'cn=Directory Manager' -W -s one -b
>>         'cn=kerberos,dc=xyz,dc=com'
>>         Enter LDAP Password:
>>
>>         even an ldapsearch -x does not respond
>>         At this point, am sure that slapd is the one causing issues
>>
>>         Running an strace against the hung slapd itself seems to get
>>         stuck does not proceed after saying "attaching to process"
>>
>>         From some others posts I read Thierry suggesting to increase
>>         the nsslapd-threadnumber value
>>
>>         It was set to 30, I think that might be too low.
>>
>>         I have raised it to  500
>>
>>         Now after restarting the service .. ldapsearch starts responding.
>>         But running the test to add a sudden high number of clients
>>         again left ns-slapd to hung state
>>
>>         When i attempted adding the clients.. the ns-slapd cpu usage
>>         shot up to 340% and after that ns-slapd stopped responding
>>
>>         So now, atleast I know what might be causing the issue and I
>>         can now easily reproduce it.
>>
>>         Is there a way I can make ns-slapd handle a sudden bump in
>>         incoming request for ipa-client-install
>>
>>         Thanks
>>         Rakesh
>>
>>
>>
>>
>>
>>
>>         On Mon, Aug 29, 2016 at 11:18 PM, Rich Megginson
>>         <rmeggins at redhat.com <mailto:rmeggins at redhat.com>> wrote:
>>
>>             On 08/29/2016 10:53 AM, Rakesh Rajasekharan wrote:
>>>             Hi Thierry,
>>>
>>>             My machine has 30GB RAM ..and  389-ds version is 1.3.4
>>>
>>>             ldapsearch shows the values for nsslapd-cachememsize
>>>             updated to 200MB.
>>>
>>>             ldapsearch -LLL -o ldif-wrap=no -D "cn=directory
>>>             manager" -w 'mypassword' -b 'cn=userRoot,cn=ldbm
>>>             database,cn=plugins,cn=config'|grep nsslapd-cachememsize
>>>             nsslapd-cachememsize: 209715200
>>>
>>>
>>>             So, it seems to have updated though seeing that
>>>             warning(WARNING: ipaca: entry cache size 10485760B is
>>>             less than db size 11599872B) in the log confuses me a bit.
>>>
>>>             Thers one more entry that I found from the ldapsearch to
>>>             be bit low
>>>
>>>             nsslapd-dncachememsize: 10485760
>>>             maxdncachesize: 10485760
>>>
>>>             Should I update these as well to a higher value
>>>
>>>             At the time when the issue happened, the memory usage as
>>>             well as the overall load of the system was very low .
>>>             I will try reproducing the issue atleast in my QA
>>>             env..probably by trying to mock  simultaneous parallel
>>>             logins to a large number of hosts
>>
>>             To monitor your cache sizes, please use the dbmon.sh tool
>>             provided with your distro.  If that is not available with
>>             your particular distro, see
>>             https://github.com/richm/scripts/wiki/dbmon.sh
>>             <https://github.com/richm/scripts/wiki/dbmon.sh>
>>
>>
>>>
>>>
>>>             thanks
>>>             Rakesh
>>>
>>>
>>>
>>>
>>>             On Mon, Aug 29, 2016 at 8:16 PM, thierry bordaz
>>>             <tbordaz at redhat.com <mailto:tbordaz at redhat.com>> wrote:
>>>
>>>                 Hi Rakesh,
>>>
>>>                 Those tuning may depend on the memory available on
>>>                 your machine.
>>>                 nsslapd-cachememsize allows the entry cache to
>>>                 consume up to 200Mb but its memory footprint is
>>>                 known to go above.
>>>                 200Mb both looks pretty good to me. How large is
>>>                 your machine ? What is your version of 389-ds ?
>>>
>>>                 Those warnings do not change your settings. It just
>>>                 raise that entry cache of 'ipaca' and 'retrocl' are
>>>                 small but it is fine. The size of the entry cache is
>>>                 important mostly in userRoot.
>>>                 You may double check the actual values, after
>>>                 restart, with ldapsearch on 'cn=userRoot,cn=ldbm
>>>                 database,cn=plugins,cn=config' and
>>>                 'cn=config,cn=ldbm database,cn=plugins,cn=config'.
>>>
>>>                 A step is to know what will be response time of DS
>>>                 to know if it is responsible of the hang or not.
>>>                 The logs and possibly pstack during those
>>>                 intermittent hangs will help to determine that.
>>>
>>>                 regards
>>>                 thierry
>>>
>>>
>>>
>>>
>>>
>>>                 On 08/29/2016 04:25 PM, Rakesh Rajasekharan wrote:
>>>>                 I tried increasing the nsslapd-dbcachesize and
>>>>                 nsslapd-cachememsize in my QA envs to 200MB.
>>>>
>>>>                 However, in my log files, I still see this message
>>>>                 [29/Aug/2016:04:34:37 +0000] - WARNING: ipaca:
>>>>                 entry cache size 10485760B is less than db size
>>>>                 11599872B; We recommend to increase the entry cache
>>>>                 size nsslapd-cachememsize.
>>>>                 [29/Aug/2016:04:34:37 +0000] - WARNING: changelog:
>>>>                 entry cache size 2097152B is less than db size
>>>>                 441647104B; We recommend to increase the entry
>>>>                 cache size nsslapd-cachememsize.
>>>>
>>>>                 these are my ldif files that i used to modify the
>>>>                 values
>>>>                 modify entry cache size
>>>>                 cat modify-cache-mem-size.ldif
>>>>                 dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
>>>>                 changetype: modify
>>>>                 replace: nsslapd-cachememsize
>>>>                 nsslapd-cachememsize: 209715200
>>>>
>>>>                 modify db cache size
>>>>                 cat modfy-db-cache-size.ldif
>>>>                 dn: cn=config,cn=ldbm database,cn=plugins,cn=config
>>>>                 changetype: modify
>>>>                 replace: nsslapd-dbcachesize
>>>>                 nsslapd-dbcachesize: 209715200
>>>>
>>>>                 After modifying , i restarted IPA services
>>>>
>>>>                 Is there anything else that  I need to take care of
>>>>                 as the logs suggest its still not getting the
>>>>                 updated values
>>>>
>>>>                 Thanks
>>>>                 Rakesh
>>>>
>>>>                 On Mon, Aug 29, 2016 at 6:07 PM, Rakesh
>>>>                 Rajasekharan <rakesh.rajasekharan at gmail.com
>>>>                 <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>>>
>>>>                     Hi Thierry,
>>>>
>>>>                     Coz of the issues we had to revert back to
>>>>                     earlier running openldap in production.
>>>>
>>>>                     I have now done a few TCP related changes in
>>>>                     sysctl.conf and have also increased the
>>>>                     nsslapd-dbcachesize and nsslapd-cachememsize to
>>>>                     200MB
>>>>
>>>>                     I will again start migrating hosts back to IPA
>>>>                     and see if I face the earlier issue.
>>>>
>>>>                     I will update back once I have something
>>>>
>>>>
>>>>                     Thanks,
>>>>                     Rakesh
>>>>
>>>>
>>>>
>>>>                     On Thu, Aug 25, 2016 at 2:17 PM, thierry bordaz
>>>>                     <tbordaz at redhat.com
>>>>                     <mailto:tbordaz at redhat.com>> wrote:
>>>>
>>>>
>>>>
>>>>                         On 08/25/2016 10:15 AM, Rakesh Rajasekharan
>>>>                         wrote:
>>>>>                         All of the troubleshooting seems fine.
>>>>>
>>>>>
>>>>>                         However, Running libconv.pl
>>>>>                         <http://libconv.pl> gives me this output
>>>>>
>>>>>                         ----- Recommendations -----
>>>>>
>>>>>                          1.  You have unindexed components, this
>>>>>                         can be caused from a search on an
>>>>>                         unindexed attribute, or your returned
>>>>>                         results exceeded the allidsthreshold.
>>>>>                         Unindexed components are not recommended.
>>>>>                         To refuse unindexed searches, switch
>>>>>                         'nsslapd-require-index' to 'on' under your
>>>>>                         database entry (e.g. cn=UserRoot,cn=ldbm
>>>>>                         database,cn=plugins,cn=config).
>>>>>
>>>>>                          2.  You have a significant difference
>>>>>                         between binds and unbinds. You may want to
>>>>>                         investigate this difference.
>>>>>
>>>>>
>>>>>                         I feel, this could be a pointer to things
>>>>>                         going slow.. and IPA hanging. I think i
>>>>>                         now have something that I can try and nail
>>>>>                         down this issue.
>>>>>
>>>>>                         On a sidenote, I was earlier running
>>>>>                         openldap and migrated over to Freeipa,
>>>>>
>>>>>                         Thanks
>>>>>                         Rakesh
>>>>>
>>>>>
>>>>>
>>>>>                         On Wed, Aug 24, 2016 at 12:38 PM, Petr
>>>>>                         Spacek <pspacek at redhat.com
>>>>>                         <mailto:pspacek at redhat.com>> wrote:
>>>>>
>>>>>                             On 23.8.2016 18:44, Rakesh
>>>>>                             Rajasekharan wrote:
>>>>>                             > I think thers something seriously
>>>>>                             wrong with my system
>>>>>                             >
>>>>>                             > not able to run any IPA commands
>>>>>                             >
>>>>>                             > klist
>>>>>                             > Ticket cache: KEYRING:persistent:0:0
>>>>>                             > Default principal: admin at XYZ.COM
>>>>>                             <mailto:admin at XYZ.COM>
>>>>>                             >
>>>>>                             > Valid starting  Expires Service
>>>>>                             principal
>>>>>                             > 2016-08-23T16:26:36
>>>>>                             2016-08-24T16:26:22
>>>>>                             krbtgt/XYZ.COM at XYZ.COM
>>>>>                             <mailto:XYZ.COM at XYZ.COM>
>>>>>                             >
>>>>>                             >
>>>>>                             > [root at prod-ipa-master-1a :~] ipactl
>>>>>                             status
>>>>>                             > Directory Service: RUNNING
>>>>>                             > krb5kdc Service: RUNNING
>>>>>                             > kadmin Service: RUNNING
>>>>>                             > ipa_memcached Service: RUNNING
>>>>>                             > httpd Service: RUNNING
>>>>>                             > pki-tomcatd Service: RUNNING
>>>>>                             > ipa-otpd Service: RUNNING
>>>>>                             > ipa: INFO: The ipactl command was
>>>>>                             successful
>>>>>                             >
>>>>>                             >
>>>>>                             >
>>>>>                             > [root at prod-ipa-master :~] ipa
>>>>>                             user-find p-testuser
>>>>>                             > ipa: ERROR: Kerberos error:
>>>>>                             ('Unspecified GSS failure. Minor code may
>>>>>                             > provide more information',
>>>>>                             851968)/("Cannot contact any KDC for
>>>>>                             realm '
>>>>>                             > XYZ.COM <http://XYZ.COM>'", -1765328228)
>>>>>
>>>>
>>>>                         Hi Rakesh,
>>>>
>>>>                             Having a reproducible test case would
>>>>                             you rerun the command above.
>>>>                             During its processing you may monitor
>>>>                             DS process load (top). If it is high,
>>>>                             you may get some pstacks of it.
>>>>                             Also would you attach the part of DS
>>>>                             access logs taken during the command.
>>>>
>>>>                             regards
>>>>                             thierry
>>>>
>>>>>                             >
>>>>>
>>>>>                             This is weird because the server seems
>>>>>                             to be up.
>>>>>
>>>>>                             Please follow
>>>>>                             http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos
>>>>>
>>>>>                             Petr^2 Spacek
>>>>>
>>>>>                             >
>>>>>                             >
>>>>>                             > Thanks
>>>>>                             >
>>>>>                             > Rakesh
>>>>>                             >
>>>>>                             > On Tue, Aug 23, 2016 at 10:01 PM,
>>>>>                             Rakesh Rajasekharan <
>>>>>                             > rakesh.rajasekharan at gmail.com
>>>>>                             <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>>>>                             >
>>>>>                             >> i changed the loggin level to 4 .
>>>>>                             Modifying nsslapd-accesslog-level
>>>>>                             >>
>>>>>                             >> But, the hang is still there.
>>>>>                             though I dont see the sigfault now
>>>>>                             >>
>>>>>                             >>
>>>>>                             >>
>>>>>                             >>
>>>>>                             >> On Tue, Aug 23, 2016 at 9:02 PM,
>>>>>                             Rakesh Rajasekharan <
>>>>>                             >> rakesh.rajasekharan at gmail.com
>>>>>                             <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>>>>                             >>
>>>>>                             >>> My disk was getting filled too fast
>>>>>                             >>>
>>>>>                             >>> logs under /var/log/dirsrv was
>>>>>                             coming around 5 gb quickly filling up
>>>>>                             >>>
>>>>>                             >>> Is there a way to make the logging
>>>>>                             less verbose
>>>>>                             >>>
>>>>>                             >>>
>>>>>                             >>>
>>>>>                             >>> On Tue, Aug 23, 2016 at 6:41 PM,
>>>>>                             Petr Spacek <pspacek at redhat.com
>>>>>                             <mailto:pspacek at redhat.com>> wrote:
>>>>>                             >>>
>>>>>                             >>>> On 23.8.2016 15:07, Rakesh
>>>>>                             Rajasekharan wrote:
>>>>>                             >>>>> I was able to fix that may be
>>>>>                             temporarily... when i checked the
>>>>>                             >>>> network..
>>>>>                             >>>>> there was another process that
>>>>>                             was running and consuming a lot of
>>>>>                             >>>> network (
>>>>>                             >>>>> i have no idea who did that. I
>>>>>                             need to seriously start restricting
>>>>>                             >>>> people
>>>>>                             >>>>> access to this machine )
>>>>>                             >>>>>
>>>>>                             >>>>> after killing that perfomance
>>>>>                             improved drastically
>>>>>                             >>>>>
>>>>>                             >>>>> But now, suddenly I started
>>>>>                             experiencing the same hang.
>>>>>                             >>>>>
>>>>>                             >>>>> This time , I gert the following
>>>>>                             error when checked dmesg
>>>>>                             >>>>>
>>>>>                             >>>>> [  301.236976] ns-slapd[3124]:
>>>>>                             segfault at 0 ip 00007f1de416951c sp
>>>>>                             >>>>> 00007f1dee1dba70 error 4 in
>>>>>                             libcos-plugin.so[7f1de4166000+b000]
>>>>>                             >>>>> [ 1116.248431] TCP:
>>>>>                             request_sock_TCP: Possible SYN
>>>>>                             flooding on port 88.
>>>>>                             >>>>> Sending cookies. Check SNMP
>>>>>                             counters.
>>>>>                             >>>>> [11831.397037] ns-slapd[22550]:
>>>>>                             segfault at 0 ip 00007f533d82251c sp
>>>>>                             >>>>> 00007f5347894a70 error 4 in
>>>>>                             libcos-plugin.so[7f533d81f000+b000]
>>>>>                             >>>>> [11832.727989] ns-slapd[22606]:
>>>>>                             segfault at 0 ip 00007f6231eb951c sp
>>>>>                             >>>>> 00007f623bf2ba70 error 4 in
>>>>>                             libcos-plugin.so[7f6231eb6000+b00
>>>>>                             >>>>
>>>>>                             >>>> Okay, this one is serious. The
>>>>>                             LDAP server crashed.
>>>>>                             >>>>
>>>>>                             >>>> 1. Make sure all your packages
>>>>>                             are up-to-date.
>>>>>                             >>>>
>>>>>                             >>>> Please see
>>>>>                             >>>>
>>>>>                             http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d
>>>>>                             >>>> ebugging-crashes
>>>>>                             >>>> for further instructions how to
>>>>>                             debug this.
>>>>>                             >>>>
>>>>>                             >>>> Petr^2 Spacek
>>>>>                             >>>>
>>>>>                             >>>>>
>>>>>                             >>>>> and in
>>>>>                             /var/log/dirsrv/example-com/errors
>>>>>                             >>>>>
>>>>>                             >>>>> [23/Aug/2016:12:49:36 +0000]
>>>>>                             DSRetroclPlugin - delete_changerecord:
>>>>>                             >>>> could
>>>>>                             >>>>> not delete change record 3291138
>>>>>                             (rc: 32)
>>>>>                             >>>>> [23/Aug/2016:12:49:36 +0000]
>>>>>                             DSRetroclPlugin - delete_changerecord:
>>>>>                             >>>> could
>>>>>                             >>>>> not delete change record 3291139
>>>>>                             (rc: 32)
>>>>>                             >>>>> [23/Aug/2016:12:49:36 +0000]
>>>>>                             DSRetroclPlugin - delete_changerecord:
>>>>>                             >>>> could
>>>>>                             >>>>> not delete change record 3291140
>>>>>                             (rc: 32)
>>>>>                             >>>>> [23/Aug/2016:12:49:36 +0000]
>>>>>                             DSRetroclPlugin - delete_changerecord:
>>>>>                             >>>> could
>>>>>                             >>>>> not delete change record 3291141
>>>>>                             (rc: 32)
>>>>>                             >>>>> [23/Aug/2016:12:49:36 +0000]
>>>>>                             DSRetroclPlugin - delete_changerecord:
>>>>>                             >>>> could
>>>>>                             >>>>> not delete change record 3291142
>>>>>                             (rc: 32)
>>>>>                             >>>>> [23/Aug/2016:12:49:36 +0000]
>>>>>                             DSRetroclPlugin - delete_changerecord:
>>>>>                             >>>> could
>>>>>                             >>>>> not delete change record 3291143
>>>>>                             (rc: 32)
>>>>>                             >>>>> [23/Aug/2016:12:49:36 +0000]
>>>>>                             DSRetroclPlugin - delete_changerecord:
>>>>>                             >>>> could
>>>>>                             >>>>> not delete change record 3291144
>>>>>                             (rc: 32)
>>>>>                             >>>>> [23/Aug/2016:12:49:36 +0000]
>>>>>                             DSRetroclPlugin - delete_changerecord:
>>>>>                             >>>> could
>>>>>                             >>>>> not delete change record 3291145
>>>>>                             (rc: 32)
>>>>>                             >>>>> [23/Aug/2016:12:49:50 +0000] -
>>>>>                             Retry count exceeded in delete
>>>>>                             >>>>> [23/Aug/2016:12:49:50 +0000]
>>>>>                             DSRetroclPlugin - delete_changerecord:
>>>>>                             >>>> could
>>>>>                             >>>>> not delete change record 3292734
>>>>>                             (rc: 51)
>>>>>                             >>>>>
>>>>>                             >>>>>
>>>>>                             >>>>> Can  i do something about this
>>>>>                             error.. I treid to restart ipa a couple
>>>>>                             >>>> of
>>>>>                             >>>>> time but that did not help
>>>>>                             >>>>>
>>>>>                             >>>>> Thanks
>>>>>                             >>>>> Rakesh
>>>>>                             >>>>>
>>>>>                             >>>>> On Mon, Aug 22, 2016 at 2:27 PM,
>>>>>                             Petr Spacek <pspacek at redhat.com
>>>>>                             <mailto:pspacek at redhat.com>>
>>>>>                             >>>> wrote:
>>>>>                             >>>>>
>>>>>                             >>>>>> On 19.8.2016 19:32, Rakesh
>>>>>                             Rajasekharan wrote:
>>>>>                             >>>>>>> I am running my set up on AWS
>>>>>                             cloud, and entropy is low at around
>>>>>                             >>>> 180 .
>>>>>                             >>>>>>>
>>>>>                             >>>>>>> I plan to increase it bu
>>>>>                             installing haveged . But, would low
>>>>>                             entropy
>>>>>                             >>>> by
>>>>>                             >>>>>> any
>>>>>                             >>>>>>> chance cause this issue of
>>>>>                             intermittent hang .
>>>>>                             >>>>>>> Also, the hang is mostly
>>>>>                             observed when registering around 20
>>>>>                             clients
>>>>>                             >>>>>>> together
>>>>>                             >>>>>>
>>>>>                             >>>>>> Possibly, I'm not sure. If you
>>>>>                             want to dig into this, I would do this:
>>>>>                             >>>>>> 1. look what process hangs on
>>>>>                             client (using pstree command or so)
>>>>>                             >>>>>> $ pstree
>>>>>                             >>>>>>
>>>>>                             >>>>>> 2. look to what server and port
>>>>>                             is the hanging client connected to
>>>>>                             >>>>>> $ lsof -p <PID of the hanging
>>>>>                             process>
>>>>>                             >>>>>>
>>>>>                             >>>>>> 3. jump to server and see what
>>>>>                             process is bound to the target port
>>>>>                             >>>>>> $ netstat -pn
>>>>>                             >>>>>>
>>>>>                             >>>>>> 4. see where the process if hanging
>>>>>                             >>>>>> $ strace -p <PID of the hanging
>>>>>                             process>
>>>>>                             >>>>>>
>>>>>                             >>>>>> I hope it helps.
>>>>>                             >>>>>>
>>>>>                             >>>>>> Petr^2 Spacek
>>>>>                             >>>>>>
>>>>>                             >>>>>>> On Fri, Aug 19, 2016 at 7:24
>>>>>                             PM, Rakesh Rajasekharan <
>>>>>                             >>>>>>> rakesh.rajasekharan at gmail.com
>>>>>                             <mailto:rakesh.rajasekharan at gmail.com>> wrote:
>>>>>                             >>>>>>>
>>>>>                             >>>>>>>> yes there seems to be
>>>>>                             something thats worrying.. I have
>>>>>                             faced this
>>>>>                             >>>> today
>>>>>                             >>>>>>>> as well.
>>>>>                             >>>>>>>> There are few hosts around
>>>>>                             280 odd left and when i try adding them
>>>>>                             >>>> to
>>>>>                             >>>>>> IPA
>>>>>                             >>>>>>>> , the slowness begins..
>>>>>                             >>>>>>>>
>>>>>                             >>>>>>>> all the ipa commands like ipa
>>>>>                             user-find.. etc becomes very slow in
>>>>>                             >>>>>>>> responding.
>>>>>                             >>>>>>>>
>>>>>                             >>>>>>>> the SYNC_RECV are not many
>>>>>                             though just around 80-90 and today that
>>>>>                             >>>> was
>>>>>                             >>>>>>>> around 20 only
>>>>>                             >>>>>>>>
>>>>>                             >>>>>>>>
>>>>>                             >>>>>>>> I have for now increased
>>>>>                             tcp_max_syn_backlog to 5000.
>>>>>                             >>>>>>>> For now the slowness seems to
>>>>>                             have gone.. but I will do a try
>>>>>                             >>>> adding the
>>>>>                             >>>>>>>> clients again tomorrow and
>>>>>                             see how it goes
>>>>>                             >>>>>>>>
>>>>>                             >>>>>>>> Thanks
>>>>>                             >>>>>>>> Rakesh
>>>>>                             >>>>>>>>
>>>>>                             >>>>>>>> The issues
>>>>>                             >>>>>>>>
>>>>>                             >>>>>>>> On Fri, Aug 19, 2016 at 12:58
>>>>>                             PM, Petr Spacek <pspacek at redhat.com
>>>>>                             <mailto:pspacek at redhat.com>>
>>>>>                             >>>>>> wrote:
>>>>>                             >>>>>>>>
>>>>>                             >>>>>>>>> On 18.8.2016 17:23, Rakesh
>>>>>                             Rajasekharan wrote:
>>>>>                             >>>>>>>>>> Hi
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> I am migrating to freeipa
>>>>>                             from openldap and have around 4000
>>>>>                             >>>> clients
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> I had openned a another
>>>>>                             thread on that, but chose to start a new
>>>>>                             >>>> one
>>>>>                             >>>>>>>>> here
>>>>>                             >>>>>>>>>> as its a separate issue
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> I was able to change the
>>>>>                             nssslapd-maxdescriptors adding an ldif
>>>>>                             >>>> file
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> cat nsslapd-modify.ldif
>>>>>                             >>>>>>>>>> dn: cn=config
>>>>>                             >>>>>>>>>> changetype: modify
>>>>>                             >>>>>>>>>> replace: nsslapd-maxdescriptors
>>>>>                             >>>>>>>>>> nsslapd-maxdescriptors: 17000
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> and running the ldapmodify
>>>>>                             command
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> I have now started moving
>>>>>                             clients running an openldap to Freeipa
>>>>>                             >>>> and
>>>>>                             >>>>>>>>> have
>>>>>                             >>>>>>>>>> today moved close to 2000
>>>>>                             clients
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> However, I have noticed
>>>>>                             that IPA hangs intermittently.
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> running a kinit admin
>>>>>                             returns the below error
>>>>>                             >>>>>>>>>> kinit: Generic error (see
>>>>>                             e-text) while getting initial
>>>>>                             >>>> credentials
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> from the /var/log/messages,
>>>>>                             I see this entry
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> prod-ipa-master-int kernel:
>>>>>                             [104090.315801] TCP:
>>>>>                             >>>> request_sock_TCP:
>>>>>                             >>>>>>>>>> Possible SYN flooding on
>>>>>                             port 88. Sending cookies. Check SNMP
>>>>>                             >>>>>> counters.
>>>>>                             >>>>>>>>>
>>>>>                             >>>>>>>>> I would be worried about
>>>>>                             this message. Maybe kernel/firewall is
>>>>>                             >>>> doing
>>>>>                             >>>>>>>>> something fishy behind your
>>>>>                             back and blocking some connections or
>>>>>                             >>>> so.
>>>>>                             >>>>>>>>>
>>>>>                             >>>>>>>>> Petr^2 Spacek
>>>>>                             >>>>>>>>>
>>>>>                             >>>>>>>>>
>>>>>                             >>>>>>>>>> Aug 18 13:00:01
>>>>>                             prod-ipa-master-int systemd[1]:
>>>>>                             Started Session
>>>>>                             >>>> 4885
>>>>>                             >>>>>> of
>>>>>                             >>>>>>>>>> user root.
>>>>>                             >>>>>>>>>> Aug 18 13:00:01
>>>>>                             prod-ipa-master-int systemd[1]:
>>>>>                             Starting Session
>>>>>                             >>>> 4885
>>>>>                             >>>>>> of
>>>>>                             >>>>>>>>>> user root.
>>>>>                             >>>>>>>>>> Aug 18 13:01:01
>>>>>                             prod-ipa-master-int systemd[1]:
>>>>>                             Started Session
>>>>>                             >>>> 4886
>>>>>                             >>>>>> of
>>>>>                             >>>>>>>>>> user root.
>>>>>                             >>>>>>>>>> Aug 18 13:01:01
>>>>>                             prod-ipa-master-int systemd[1]:
>>>>>                             Starting Session
>>>>>                             >>>> 4886
>>>>>                             >>>>>> of
>>>>>                             >>>>>>>>>> user root.
>>>>>                             >>>>>>>>>> Aug 18 13:02:40
>>>>>                             prod-ipa-master-int python[28984]:
>>>>>                             ansible-command
>>>>>                             >>>>>>>>> Invoked
>>>>>                             >>>>>>>>>> with creates=None
>>>>>                             executable=None shell=True args=
>>>>>                             removes=None
>>>>>                             >>>>>>>>> warn=True
>>>>>                             >>>>>>>>>> chdir=None
>>>>>                             >>>>>>>>>> Aug 18 13:04:37
>>>>>                             prod-ipa-master-int sssd_be: GSSAPI Error:
>>>>>                             >>>> Unspecified
>>>>>                             >>>>>>>>> GSS
>>>>>                             >>>>>>>>>> failure. Minor code may
>>>>>                             provide more information (KDC returned
>>>>>                             >>>> error
>>>>>                             >>>>>>>>>> string: PROCESS_TGS)
>>>>>                             >>>>>>>>>>
>>>>>                             >>>>>>>>>> Could it be possible that
>>>>>                             its due to the initial load of adding
>>>>>                             >>>> the
>>>>>                             >>>>>>>>> clients
>>>>>                             >>>>>>>>>> or is there something else
>>>>>                             that I need to take care of.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>             --
>>             Manage your subscription for the Freeipa-users mailing list:
>>             https://www.redhat.com/mailman/listinfo/freeipa-users
>>             <https://www.redhat.com/mailman/listinfo/freeipa-users>
>>             Go to http://freeipa.org for more info on the project
>>
>>
>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20160905/cb952b1a/attachment.htm>


More information about the Freeipa-users mailing list