<div dir="ltr"><div><div><div><div><div><div>Hi Thierry,<br><br></div>Coz of the issues we had to revert back to earlier running openldap in production.<br><br></div>I have now done a few TCP related changes in sysctl.conf and have also increased the nsslapd-dbcachesize and nsslapd-cachememsize to 200MB<br><br></div>I will again start migrating hosts back to IPA and see if I face the earlier issue.<br><br></div>I will update back once I have something<br><br><br></div>Thanks,<br></div>Rakesh<br><div><div><div><div><br><br></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 25, 2016 at 2:17 PM, thierry bordaz <span dir="ltr"><<a href="mailto:tbordaz@redhat.com" target="_blank">tbordaz@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF"><div><div class="h5">
    <br>
    <br>
    <div>On 08/25/2016 10:15 AM, Rakesh
      Rajasekharan wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">
        <div>All of the troubleshooting seems fine.<br>
          <br>
          <br>
        </div>
        <div>However, Running <a href="http://libconv.pl" target="_blank">libconv.pl</a> gives me this output<br>
          <br>
          ----- Recommendations -----<br>
          <br>
           1.  You have unindexed components, this can be caused from a
          search on an unindexed attribute, or your returned results
          exceeded the allidsthreshold.  Unindexed components are not
          recommended. To refuse unindexed searches, switch
          'nsslapd-require-index' to 'on' under your database entry
          (e.g. cn=UserRoot,cn=ldbm database,cn=plugins,cn=config)<wbr>.<br>
          <br>
           2.  You have a significant difference between binds and
          unbinds.  You may want to investigate this difference.<br>
          <br>
        </div>
        <div><br>
        </div>
        <div>I feel, this could be a pointer to things going slow.. and
          IPA hanging. I think i now have something that I can try and
          nail down this issue.<br>
          <br>
          On a sidenote, I was earlier running openldap and migrated
          over to Freeipa, <br>
          <br>
        </div>
        <div>Thanks<br>
        </div>
        <div>Rakesh<br>
        </div>
        <div><br>
          <br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Wed, Aug 24, 2016 at 12:38 PM, Petr
          Spacek <span dir="ltr"><<a href="mailto:pspacek@redhat.com" target="_blank">pspacek@redhat.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 23.8.2016 18:44, Rakesh Rajasekharan wrote:<br>
              > I think thers something seriously wrong with my
              system<br>
              ><br>
              > not able to run any  IPA commands<br>
              ><br>
              > klist<br>
              > Ticket cache: KEYRING:persistent:0:0<br>
              > Default principal: <a href="mailto:admin@XYZ.COM" target="_blank">admin@XYZ.COM</a><br>
              ><br>
              > Valid starting       Expires              Service
              principal<br>
              > 2016-08-23T16:26:36  2016-08-24T16:26:22  krbtgt/<a href="mailto:XYZ.COM@XYZ.COM" target="_blank"></a><a href="mailto:XYZ.COM@XYZ.COM" target="_blank">XYZ.COM@XYZ.COM</a><br>
              ><br>
              ><br>
              > [root@prod-ipa-master-1a :~] ipactl status<br>
              > Directory Service: RUNNING<br>
              > krb5kdc Service: RUNNING<br>
              > kadmin Service: RUNNING<br>
              > ipa_memcached Service: RUNNING<br>
              > httpd Service: RUNNING<br>
              > pki-tomcatd Service: RUNNING<br>
              > ipa-otpd Service: RUNNING<br>
              > ipa: INFO: The ipactl command was successful<br>
              ><br>
              ><br>
              ><br>
              > [root@prod-ipa-master :~] ipa user-find p-testuser<br>
              > ipa: ERROR: Kerberos error: ('Unspecified GSS
              failure.  Minor code may<br>
              > provide more information', 851968)/("Cannot contact
              any KDC for realm '<br>
              > <a href="http://XYZ.COM" rel="noreferrer" target="_blank">XYZ.COM</a>'",
              -1765328228)<br>
            </span></blockquote>
        </div>
      </div>
    </blockquote>
    <br></div></div>
    Hi Rakesh,<br>
    <br>
    <blockquote>Having a reproducible test case would you rerun the
      command above.<br>
      During its processing you may monitor DS process load (top). If it
      is high, you may get some pstacks of it.<br>
      Also would you attach the part of DS access logs taken during the
      command.<br>
      <br>
      regards<br>
      thierry<br>
    </blockquote><div><div class="h5">
    <blockquote type="cite">
      <div class="gmail_extra">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>
              ><br>
              <br>
            </span>This is weird because the server seems to be up.<br>
            <br>
            Please follow<br>
            <a href="http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos" rel="noreferrer" target="_blank">http://www.freeipa.org/page/Tr<wbr>oubleshooting#Authentication.<wbr>2FKerberos</a><br>
            <br>
            Petr^2 Spacek<br>
            <div>
              <div><br>
                ><br>
                ><br>
                > Thanks<br>
                ><br>
                > Rakesh<br>
                ><br>
                > On Tue, Aug 23, 2016 at 10:01 PM, Rakesh
                Rajasekharan <<br>
                > <a href="mailto:rakesh.rajasekharan@gmail.com" target="_blank">rakesh.rajasekharan@gmail.com</a>>
                wrote:<br>
                ><br>
                >> i changed the loggin level to 4 . Modifying
                nsslapd-accesslog-level<br>
                >><br>
                >> But, the hang is still there. though I dont see
                the sigfault now<br>
                >><br>
                >><br>
                >><br>
                >><br>
                >> On Tue, Aug 23, 2016 at 9:02 PM, Rakesh
                Rajasekharan <<br>
                >> <a href="mailto:rakesh.rajasekharan@gmail.com" target="_blank">rakesh.rajasekharan@gmail.com</a>>
                wrote:<br>
                >><br>
                >>> My disk was getting filled too fast<br>
                >>><br>
                >>> logs under /var/log/dirsrv was coming
                around 5 gb quickly filling up<br>
                >>><br>
                >>> Is there a way to make the logging less
                verbose<br>
                >>><br>
                >>><br>
                >>><br>
                >>> On Tue, Aug 23, 2016 at 6:41 PM, Petr
                Spacek <<a href="mailto:pspacek@redhat.com" target="_blank">pspacek@redhat.com</a>>
                wrote:<br>
                >>><br>
                >>>> On 23.8.2016 15:07, Rakesh Rajasekharan
                wrote:<br>
                >>>>> I was able to fix that may be
                temporarily... when i checked the<br>
                >>>> network..<br>
                >>>>> there was another process that was
                running and consuming a lot of<br>
                >>>> network (<br>
                >>>>> i have no idea who did that. I need
                to seriously start restricting<br>
                >>>> people<br>
                >>>>> access to this machine )<br>
                >>>>><br>
                >>>>> after killing that perfomance
                improved drastically<br>
                >>>>><br>
                >>>>> But now, suddenly I started
                experiencing the same hang.<br>
                >>>>><br>
                >>>>> This time , I gert the following
                error when checked dmesg<br>
                >>>>><br>
                >>>>> [  301.236976] ns-slapd[3124]:
                segfault at 0 ip 00007f1de416951c sp<br>
                >>>>> 00007f1dee1dba70 error 4 in
                libcos-plugin.so[7f1de4166000+<wbr>b000]<br>
                >>>>> [ 1116.248431] TCP:
                request_sock_TCP: Possible SYN flooding on port 88.<br>
                >>>>> Sending cookies.  Check SNMP
                counters.<br>
                >>>>> [11831.397037] ns-slapd[22550]:
                segfault at 0 ip 00007f533d82251c sp<br>
                >>>>> 00007f5347894a70 error 4 in
                libcos-plugin.so[7f533d81f000+<wbr>b000]<br>
                >>>>> [11832.727989] ns-slapd[22606]:
                segfault at 0 ip 00007f6231eb951c sp<br>
                >>>>> 00007f623bf2ba70 error 4 in
                libcos-plugin.so[7f6231eb6000+<wbr>b00<br>
                >>>><br>
                >>>> Okay, this one is serious. The LDAP
                server crashed.<br>
                >>>><br>
                >>>> 1. Make sure all your packages are
                up-to-date.<br>
                >>>><br>
                >>>> Please see<br>
                >>>> <a href="http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d" rel="noreferrer" target="_blank">http://directory.fedoraproject<wbr>.org/docs/389ds/FAQ/faq.html#d</a><br>
                >>>> ebugging-crashes<br>
                >>>> for further instructions how to debug
                this.<br>
                >>>><br>
                >>>> Petr^2 Spacek<br>
                >>>><br>
                >>>>><br>
                >>>>> and in /var/log/dirsrv/example-com/er<wbr>rors<br>
                >>>>><br>
                >>>>> [23/Aug/2016:12:49:36 +0000]
                DSRetroclPlugin - delete_changerecord:<br>
                >>>> could<br>
                >>>>> not delete change record 3291138
                (rc: 32)<br>
                >>>>> [23/Aug/2016:12:49:36 +0000]
                DSRetroclPlugin - delete_changerecord:<br>
                >>>> could<br>
                >>>>> not delete change record 3291139
                (rc: 32)<br>
                >>>>> [23/Aug/2016:12:49:36 +0000]
                DSRetroclPlugin - delete_changerecord:<br>
                >>>> could<br>
                >>>>> not delete change record 3291140
                (rc: 32)<br>
                >>>>> [23/Aug/2016:12:49:36 +0000]
                DSRetroclPlugin - delete_changerecord:<br>
                >>>> could<br>
                >>>>> not delete change record 3291141
                (rc: 32)<br>
                >>>>> [23/Aug/2016:12:49:36 +0000]
                DSRetroclPlugin - delete_changerecord:<br>
                >>>> could<br>
                >>>>> not delete change record 3291142
                (rc: 32)<br>
                >>>>> [23/Aug/2016:12:49:36 +0000]
                DSRetroclPlugin - delete_changerecord:<br>
                >>>> could<br>
                >>>>> not delete change record 3291143
                (rc: 32)<br>
                >>>>> [23/Aug/2016:12:49:36 +0000]
                DSRetroclPlugin - delete_changerecord:<br>
                >>>> could<br>
                >>>>> not delete change record 3291144
                (rc: 32)<br>
                >>>>> [23/Aug/2016:12:49:36 +0000]
                DSRetroclPlugin - delete_changerecord:<br>
                >>>> could<br>
                >>>>> not delete change record 3291145
                (rc: 32)<br>
                >>>>> [23/Aug/2016:12:49:50 +0000] -
                Retry count exceeded in delete<br>
                >>>>> [23/Aug/2016:12:49:50 +0000]
                DSRetroclPlugin - delete_changerecord:<br>
                >>>> could<br>
                >>>>> not delete change record 3292734
                (rc: 51)<br>
                >>>>><br>
                >>>>><br>
                >>>>> Can  i do something about this
                error.. I treid to restart ipa a couple<br>
                >>>> of<br>
                >>>>> time but that did not help<br>
                >>>>><br>
                >>>>> Thanks<br>
                >>>>> Rakesh<br>
                >>>>><br>
                >>>>> On Mon, Aug 22, 2016 at 2:27 PM,
                Petr Spacek <<a href="mailto:pspacek@redhat.com" target="_blank">pspacek@redhat.com</a>><br>
                >>>> wrote:<br>
                >>>>><br>
                >>>>>> On 19.8.2016 19:32, Rakesh
                Rajasekharan wrote:<br>
                >>>>>>> I am running my set up on
                AWS cloud, and entropy is low at around<br>
                >>>> 180 .<br>
                >>>>>>><br>
                >>>>>>> I plan to increase it bu
                installing haveged . But, would low entropy<br>
                >>>> by<br>
                >>>>>> any<br>
                >>>>>>> chance cause this issue of
                intermittent hang .<br>
                >>>>>>> Also, the hang is mostly
                observed when registering around 20 clients<br>
                >>>>>>> together<br>
                >>>>>><br>
                >>>>>> Possibly, I'm not sure. If you
                want to dig into this, I would do this:<br>
                >>>>>> 1. look what process hangs on
                client (using pstree command or so)<br>
                >>>>>> $ pstree<br>
                >>>>>><br>
                >>>>>> 2. look to what server and port
                is the hanging client connected to<br>
                >>>>>> $ lsof -p <PID of the
                hanging process><br>
                >>>>>><br>
                >>>>>> 3. jump to server and see what
                process is bound to the target port<br>
                >>>>>> $ netstat -pn<br>
                >>>>>><br>
                >>>>>> 4. see where the process if
                hanging<br>
                >>>>>> $ strace -p <PID of the
                hanging process><br>
                >>>>>><br>
                >>>>>> I hope it helps.<br>
                >>>>>><br>
                >>>>>> Petr^2 Spacek<br>
                >>>>>><br>
                >>>>>>> On Fri, Aug 19, 2016 at
                7:24 PM, Rakesh Rajasekharan <<br>
                >>>>>>> <a href="mailto:rakesh.rajasekharan@gmail.com" target="_blank">rakesh.rajasekharan@gmail.com</a>>
                wrote:<br>
                >>>>>>><br>
                >>>>>>>> yes there seems to be
                something thats worrying.. I have faced this<br>
                >>>> today<br>
                >>>>>>>> as well.<br>
                >>>>>>>> There are few hosts
                around 280 odd left and when i try adding them<br>
                >>>> to<br>
                >>>>>> IPA<br>
                >>>>>>>> , the slowness begins..<br>
                >>>>>>>><br>
                >>>>>>>> all the ipa commands
                like ipa user-find.. etc becomes very slow in<br>
                >>>>>>>> responding.<br>
                >>>>>>>><br>
                >>>>>>>> the SYNC_RECV are not
                many though just around 80-90 and today that<br>
                >>>> was<br>
                >>>>>>>> around 20 only<br>
                >>>>>>>><br>
                >>>>>>>><br>
                >>>>>>>> I have for now
                increased tcp_max_syn_backlog to 5000.<br>
                >>>>>>>> For now the slowness
                seems to have gone.. but I will do a try<br>
                >>>> adding the<br>
                >>>>>>>> clients again tomorrow
                and see how it goes<br>
                >>>>>>>><br>
                >>>>>>>> Thanks<br>
                >>>>>>>> Rakesh<br>
                >>>>>>>><br>
                >>>>>>>> The issues<br>
                >>>>>>>><br>
                >>>>>>>> On Fri, Aug 19, 2016 at
                12:58 PM, Petr Spacek <<a href="mailto:pspacek@redhat.com" target="_blank">pspacek@redhat.com</a>><br>
                >>>>>> wrote:<br>
                >>>>>>>><br>
                >>>>>>>>> On 18.8.2016 17:23,
                Rakesh Rajasekharan wrote:<br>
                >>>>>>>>>> Hi<br>
                >>>>>>>>>><br>
                >>>>>>>>>> I am migrating
                to freeipa from openldap and have around 4000<br>
                >>>> clients<br>
                >>>>>>>>>><br>
                >>>>>>>>>> I had openned a
                another thread on that, but chose to start a new<br>
                >>>> one<br>
                >>>>>>>>> here<br>
                >>>>>>>>>> as its a
                separate issue<br>
                >>>>>>>>>><br>
                >>>>>>>>>> I was able to
                change the nssslapd-maxdescriptors adding an ldif<br>
                >>>> file<br>
                >>>>>>>>>><br>
                >>>>>>>>>> cat
                nsslapd-modify.ldif<br>
                >>>>>>>>>> dn: cn=config<br>
                >>>>>>>>>> changetype:
                modify<br>
                >>>>>>>>>> replace:
                nsslapd-maxdescriptors<br>
                >>>>>>>>>>
                nsslapd-maxdescriptors: 17000<br>
                >>>>>>>>>><br>
                >>>>>>>>>> and running the
                ldapmodify command<br>
                >>>>>>>>>><br>
                >>>>>>>>>> I have now
                started moving clients running an openldap to Freeipa<br>
                >>>> and<br>
                >>>>>>>>> have<br>
                >>>>>>>>>> today moved
                close to 2000 clients<br>
                >>>>>>>>>><br>
                >>>>>>>>>> However, I have
                noticed that IPA hangs intermittently.<br>
                >>>>>>>>>><br>
                >>>>>>>>>> running a kinit
                admin returns the below error<br>
                >>>>>>>>>> kinit: Generic
                error (see e-text) while getting initial<br>
                >>>> credentials<br>
                >>>>>>>>>><br>
                >>>>>>>>>> from the
                /var/log/messages, I see this entry<br>
                >>>>>>>>>><br>
                >>>>>>>>>> 
                prod-ipa-master-int kernel: [104090.315801] TCP:<br>
                >>>> request_sock_TCP:<br>
                >>>>>>>>>> Possible SYN
                flooding on port 88. Sending cookies.  Check SNMP<br>
                >>>>>> counters.<br>
                >>>>>>>>><br>
                >>>>>>>>> I would be worried
                about this message. Maybe kernel/firewall is<br>
                >>>> doing<br>
                >>>>>>>>> something fishy
                behind your back and blocking some connections or<br>
                >>>> so.<br>
                >>>>>>>>><br>
                >>>>>>>>> Petr^2 Spacek<br>
                >>>>>>>>><br>
                >>>>>>>>><br>
                >>>>>>>>>> Aug 18 13:00:01
                prod-ipa-master-int systemd[1]: Started Session<br>
                >>>> 4885<br>
                >>>>>> of<br>
                >>>>>>>>>> user root.<br>
                >>>>>>>>>> Aug 18 13:00:01
                prod-ipa-master-int systemd[1]: Starting Session<br>
                >>>> 4885<br>
                >>>>>> of<br>
                >>>>>>>>>> user root.<br>
                >>>>>>>>>> Aug 18 13:01:01
                prod-ipa-master-int systemd[1]: Started Session<br>
                >>>> 4886<br>
                >>>>>> of<br>
                >>>>>>>>>> user root.<br>
                >>>>>>>>>> Aug 18 13:01:01
                prod-ipa-master-int systemd[1]: Starting Session<br>
                >>>> 4886<br>
                >>>>>> of<br>
                >>>>>>>>>> user root.<br>
                >>>>>>>>>> Aug 18 13:02:40
                prod-ipa-master-int python[28984]: ansible-command<br>
                >>>>>>>>> Invoked<br>
                >>>>>>>>>> with
                creates=None executable=None shell=True args=
                removes=None<br>
                >>>>>>>>> warn=True<br>
                >>>>>>>>>> chdir=None<br>
                >>>>>>>>>> Aug 18 13:04:37
                prod-ipa-master-int sssd_be: GSSAPI Error:<br>
                >>>> Unspecified<br>
                >>>>>>>>> GSS<br>
                >>>>>>>>>> failure.  Minor
                code may provide more information (KDC returned<br>
                >>>> error<br>
                >>>>>>>>>> string:
                PROCESS_TGS)<br>
                >>>>>>>>>><br>
                >>>>>>>>>> Could it be
                possible that its due to the initial load of adding<br>
                >>>> the<br>
                >>>>>>>>> clients<br>
                >>>>>>>>>> or is there
                something else that I need to take care of.<br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div>