[Freeipa-users] Timeout (?) issues

Mon Sep 23 15:59:02 UTC 2013

I'm pretty sure this is the root of my problem (not confirmed yet, but it's
AIX -- that's always the problem):

http://www-01.ibm.com/support/docview.wss?uid=swg21212940

The takeaway is this:

"The first query (184) is a normal IPV4 lookup for "ldap.austin.texas.com",
which returns "192.168.1.255". But then an IPV6 lookup is done for the same
name. Because there is no IPV6 address for ldap.austin.texas.com, it
continues searching every search domain in the resolv.conf file (
example.austin.texas.com austin.texas.com texas.com) trying to find one."

On Fri, Sep 20, 2013 at 3:07 AM, Petr Spacek <pspacek at redhat.com> wrote:

> On 20.9.2013 01:24, KodaK wrote:
>
>> This is ridiculous, right?
>>
>> IPA server 1:
>>
>> # for i in $(ls access*); do echo -n  $i:\  ;grep err=32 $i | wc -l; done
>> access: 248478
>> access.20130916-043207: 302774
>> access.20130916-123642: 272572
>> access.20130916-201516: 294308
>> access.20130917-081053: 295060
>> access.20130917-144559: 284498
>> access.20130917-231435: 281035
>> access.20130918-091611: 291165
>> access.20130918-154945: 275792
>> access.20130919-014322: 296113
>>
>> IPA server 2:
>>
>> access: 4313
>> access.20130909-200216: 4023
>> access.20130910-200229: 4161
>> access.20130911-200239: 4182
>> access.20130912-200249: 5069
>> access.20130913-200258: 3833
>> access.20130914-200313: 4208
>> access.20130915-200323: 4702
>> access.20130916-200332: 4532
>>
>>
>> IPA server 3:
>>
>> access: 802
>> access.20130910-080737: 3876
>> access.20130911-080748: 3902
>> access.20130912-080802: 3678
>> access.20130913-080810: 3765
>> access.20130914-080826: 3524
>> access.20130915-080907: 4142
>> access.20130916-080916: 4930
>> access.20130917-080926: 4769
>> access.20130918-081005: 2879
>>
>> IPA server 4:
>>
>> access: 2812
>> access.20130910-003051: 4095
>> access.20130911-003105: 3623
>> access.20130912-003113: 3606
>> access.20130913-003125: 3581
>> access.20130914-003135: 3758
>> access.20130915-003150: 3935
>> access.20130916-003159: 4184
>> access.20130917-003210: 3859
>> access.20130918-003221: 5110
>>
>>
>> The vast majority of the err=32 messages are DNS entries.
>>
>
> It depends on your setup. Bind-dyndb-ldap does LDAP search for each
> non-existent name to verify that the name wasn't added to LDAP in
> meanwhile. If you have clients doing 1M queries for non-existing names per
> day, then you will see 1M LDAP queries with err=32 per day.
>
> Next major version of bind-dyndb-ldap will have reworked internal database
> and it will support negative caching, so number of err=32 should drop
> significantly.
>
>
>  Here are some samples:
>>
>> [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 SRCH base="idnsName=xxx.com
>> ,idnsname=unix.xxx.com,cn=dns,**dc=unix,dc=xxx,dc=com" scope=0
>> filter="(objectClass=**idnsRecord)" attrs=ALL
>> [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 RESULT err=32 tag=101
>> nentries=0 etime=0
>>
>
> This is interesting, because this LDAP query is equal to DNS query for "
> xxx.com.unix.xxx.com." Are your clients that crazy? :-)
>
>
>  [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 SRCH base="idnsName=
>> slpoxacl01.unix.xxx.com,**idnsname=unix.xxx.com,cn=dns,**
>> dc=unix,dc=xxx,dc=com"
>> scope=0 filter="(objectClass=**idnsRecord)" attrs=ALL
>> [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 RESULT err=32 tag=101
>> nentries=0 etime=0
>>
>
> This is equivalent to DNS query for "slpoxacl01.unix.xxx.com.unix.**
> xxx.com <http://slpoxacl01.unix.xxx.com.unix.xxx.com>.".
>
>
>  [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 SRCH base="idnsName=
>> sla400q1.unix.xxx.com,**idnsname=unix.xxx.com,cn=dns,**
>> dc=unix,dc=xxx,dc=com"
>> scope=0 filter="(objectClass=**idnsRecord)" attrs=ALL
>> [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 RESULT err=32 tag=101
>> nentries=0 etime=0
>>
>
> And this is "sla400q1.unix.xxx.com.unix.**xxx.com<http://sla400q1.unix.xxx.com.unix.xxx.com>
> .".
>
>
>  [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 SRCH base="idnsName=
>> magellanhealth.com,idnsname=un**ix.magellanhealth.com<http://unix.magellanhealth.com>
>> ,cn=dns,**dc=unix,dc=magellanhealth,dc=**com"
>> scope=0 filter="(objectClass=**idnsRecord)" attrs=ALL
>> [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 RESULT err=32 tag=101
>> nentries=0 etime=0
>>
>> So far today there are over half a million of these.  That can't be right.
>>
>
> I would recommend you to use network sniffer and check which clients sends
> these crazy queries.
>
> My guess is that your resolver library (libc?) causes this.
>
> On my Linux system with glibc-2.17-14.fc19.x86_64 it behaves in this way:
>
> client query = nonexistent.example.com.
> (I used $ "ping nonexistent.example.com.")
> search domain in /etc/resolv.conf = brq.redhat.com.
>
> DNS query #1: nonexistent.example.com. => NXDOMAIN
> DNS query #2: nonexistent.example.com.brq.**redhat.com<http://nonexistent.example.com.brq.redhat.com>.
> => NXDOMAIN
> DNS query #3: nonexistent.example.com.**redhat.com<http://nonexistent.example.com.redhat.com>.
> => NXDOMAIN
>
>
>  On Thu, Sep 19, 2013 at 3:05 PM, KodaK <sakodak at gmail.com> wrote:
>>
>>  I didn't realize that DNS created one connection.  I thought it was one
>>> connection spanning several days.
>>>
>>
> In theory, there should be 2-4 LDAP connections from each DNS server and
> those connections should live until DNS or LDAP server restarts/crashes.
>
> Petr^2 Spacek
>
>  On Thu, Sep 19, 2013 at 2:51 PM, Rich Megginson <rmeggins at redhat.com
>>> >wrote:
>>>
>>>    On 09/19/2013 12:57 PM, KodaK wrote:
>>>>
>>>> Well, this is awkward:
>>>>
>>>>   [root at slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l
>>>> 5453936
>>>> [root at slpidml01 slapd-UNIX-xxx-COM]#
>>>>
>>>>
>>>> Why is it awkward?
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Sep 19, 2013 at 1:48 PM, KodaK <sakodak at gmail.com> wrote:
>>>>
>>>>  Thanks.  I've been running that against my logs, and this has to be
>>>>> abnormal:
>>>>>
>>>>>   err=32               129274    No Such Object
>>>>> err=0                 10952    Successful Operations
>>>>> err=14                  536    SASL Bind in Progress
>>>>> err=53                   39    Unwilling To Perform
>>>>> err=49                    3    Invalid Credentials (Bad Password)
>>>>>
>>>>>   I'm still trying to figure out why there are so many error 32s.  Are
>>>>> there any usual suspects I should know about?  (That's just the current
>>>>> access log, btw.)
>>>>>
>>>>>
>>>>> On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson <rmeggins at redhat.com
>>>>> >wrote:
>>>>>
>>>>>     On 09/16/2013 07:57 PM, Dmitri Pal wrote:
>>>>>>
>>>>>> On 09/16/2013 12:02 PM, KodaK wrote:
>>>>>>
>>>>>> Yet another AIX related problem:
>>>>>>
>>>>>>   The AIX LDAP client is called secldapclntd (sure, they could make it
>>>>>> more awkward, but the budget ran out.)  I'm running into the issue
>>>>>> detailed
>>>>>> here:
>>>>>>
>>>>>>   http://www-01.ibm.com/support/**docview.wss?uid=isg1IV11344<http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344>
>>>>>>
>>>>>>   "If an LDAP server fails to answer an LDAP query, secldapclntd
>>>>>> caches
>>>>>> the non-answered query negatively. This may happen if the LDAP server
>>>>>> is down for example. After the LDAP server is back again secldapclntd
>>>>>> will
>>>>>> use the negative cache entry and the application initiating the
>>>>>> original
>>>>>> query will still fail until the cache entry expires."
>>>>>>
>>>>>>   IBM is working on porting the fix to our specific TL and SP levels.
>>>>>>
>>>>>>   What I'm concerned with here, though, is *why* is it timing out?  I
>>>>>> don't know what the current timeout values are (AIX sucks, etc.)
>>>>>>
>>>>>>   I don't see timeout issues on my Linux boxes, which leads me to
>>>>>> believe that either the sssd timouts are longer or that sssd is just
>>>>>> more
>>>>>> robust when dealing with timeouts.
>>>>>>
>>>>>>   I believe I'm seeing similar behavior with LDAP sudo on AIX as well,
>>>>>> because I occasionally have to re-run sudo commands because they
>>>>>> initially
>>>>>> fail (and I know I'm using the right passwords.)  However, sudo
>>>>>> doesn't
>>>>>> appear to have a cache (or it handles caching better.)
>>>>>>
>>>>>>   Does anyone have any troubleshooting suggestions?  Any general
>>>>>> "speed
>>>>>> things up" suggestions on the IPA side?
>>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>   --Jason
>>>>>>
>>>>>>   --
>>>>>> The government is going to read our mail anyway, might as well make it
>>>>>> tough for them.  GPG Public key ID:  B6A1A7C6
>>>>>>
>>>>>>
>>>>>> ______________________________**_________________
>>>>>> Freeipa-users mailing listFreeipa-users at redhat.**comhttps://
>>>>>> www.redhat.com/**mailman/listinfo/freeipa-users<http://www.redhat.com/mailman/listinfo/freeipa-users>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Is the server FreeIPA?
>>>>>> Can see in the server logs what is actually happening is it the server
>>>>>> that really takes time or there is a network connectivity issue or FW
>>>>>> is
>>>>>> dropping packets?
>>>>>> I would really start with the server side logs.
>>>>>>
>>>>>>
>>>>>>   As far as 389 goes, run logconv.pl against the access logs in
>>>>>> /var/log/dirsrv/slapd-DOMAIN-**COM
>>>>>>
>>>>>
> ______________________________**_________________
> Freeipa-users mailing list
> Freeipa-users at redhat.com
> https://www.redhat.com/**mailman/listinfo/freeipa-users<https://www.redhat.com/mailman/listinfo/freeipa-users>
>

-- 
The government is going to read our mail anyway, might as well make it
tough for them.  GPG Public key ID:  B6A1A7C6
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20130923/ea86be4e/attachment.htm>