[Freeipa-users] Timeout (?) issues

Fri Sep 20 08:07:04 UTC 2013

On 20.9.2013 01:24, KodaK wrote:
> This is ridiculous, right?
>
> IPA server 1:
>
> # for i in $(ls access*); do echo -n  $i:\  ;grep err=32 $i | wc -l; done
> access: 248478
> access.20130916-043207: 302774
> access.20130916-123642: 272572
> access.20130916-201516: 294308
> access.20130917-081053: 295060
> access.20130917-144559: 284498
> access.20130917-231435: 281035
> access.20130918-091611: 291165
> access.20130918-154945: 275792
> access.20130919-014322: 296113
>
> IPA server 2:
>
> access: 4313
> access.20130909-200216: 4023
> access.20130910-200229: 4161
> access.20130911-200239: 4182
> access.20130912-200249: 5069
> access.20130913-200258: 3833
> access.20130914-200313: 4208
> access.20130915-200323: 4702
> access.20130916-200332: 4532
>
>
> IPA server 3:
>
> access: 802
> access.20130910-080737: 3876
> access.20130911-080748: 3902
> access.20130912-080802: 3678
> access.20130913-080810: 3765
> access.20130914-080826: 3524
> access.20130915-080907: 4142
> access.20130916-080916: 4930
> access.20130917-080926: 4769
> access.20130918-081005: 2879
>
> IPA server 4:
>
> access: 2812
> access.20130910-003051: 4095
> access.20130911-003105: 3623
> access.20130912-003113: 3606
> access.20130913-003125: 3581
> access.20130914-003135: 3758
> access.20130915-003150: 3935
> access.20130916-003159: 4184
> access.20130917-003210: 3859
> access.20130918-003221: 5110
>
>
> The vast majority of the err=32 messages are DNS entries.

It depends on your setup. Bind-dyndb-ldap does LDAP search for each 
non-existent name to verify that the name wasn't added to LDAP in meanwhile. 
If you have clients doing 1M queries for non-existing names per day, then you 
will see 1M LDAP queries with err=32 per day.

Next major version of bind-dyndb-ldap will have reworked internal database and 
it will support negative caching, so number of err=32 should drop significantly.

> Here are some samples:
>
> [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 SRCH base="idnsName=xxx.com
> ,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com" scope=0
> filter="(objectClass=idnsRecord)" attrs=ALL
> [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 RESULT err=32 tag=101
> nentries=0 etime=0

This is interesting, because this LDAP query is equal to DNS query for 
"xxx.com.unix.xxx.com." Are your clients that crazy? :-)

> [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 SRCH base="idnsName=
> slpoxacl01.unix.xxx.com,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com"
> scope=0 filter="(objectClass=idnsRecord)" attrs=ALL
> [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 RESULT err=32 tag=101
> nentries=0 etime=0

This is equivalent to DNS query for "slpoxacl01.unix.xxx.com.unix.xxx.com.".

> [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 SRCH base="idnsName=
> sla400q1.unix.xxx.com,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com"
> scope=0 filter="(objectClass=idnsRecord)" attrs=ALL
> [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 RESULT err=32 tag=101
> nentries=0 etime=0

And this is "sla400q1.unix.xxx.com.unix.xxx.com.".

> [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 SRCH base="idnsName=
> magellanhealth.com,idnsname=unix.magellanhealth.com,cn=dns,dc=unix,dc=magellanhealth,dc=com"
> scope=0 filter="(objectClass=idnsRecord)" attrs=ALL
> [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 RESULT err=32 tag=101
> nentries=0 etime=0
>
> So far today there are over half a million of these.  That can't be right.

I would recommend you to use network sniffer and check which clients sends 
these crazy queries.

My guess is that your resolver library (libc?) causes this.

On my Linux system with glibc-2.17-14.fc19.x86_64 it behaves in this way:

client query = nonexistent.example.com.
(I used $ "ping nonexistent.example.com.")
search domain in /etc/resolv.conf = brq.redhat.com.

DNS query #1: nonexistent.example.com. => NXDOMAIN
DNS query #2: nonexistent.example.com.brq.redhat.com. => NXDOMAIN
DNS query #3: nonexistent.example.com.redhat.com. => NXDOMAIN

> On Thu, Sep 19, 2013 at 3:05 PM, KodaK <sakodak at gmail.com> wrote:
>
>> I didn't realize that DNS created one connection.  I thought it was one
>> connection spanning several days.

In theory, there should be 2-4 LDAP connections from each DNS server and those 
connections should live until DNS or LDAP server restarts/crashes.

Petr^2 Spacek

>> On Thu, Sep 19, 2013 at 2:51 PM, Rich Megginson <rmeggins at redhat.com>wrote:
>>
>>>   On 09/19/2013 12:57 PM, KodaK wrote:
>>>
>>> Well, this is awkward:
>>>
>>>   [root at slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l
>>> 5453936
>>> [root at slpidml01 slapd-UNIX-xxx-COM]#
>>>
>>>
>>> Why is it awkward?
>>>
>>>
>>>
>>>
>>> On Thu, Sep 19, 2013 at 1:48 PM, KodaK <sakodak at gmail.com> wrote:
>>>
>>>> Thanks.  I've been running that against my logs, and this has to be
>>>> abnormal:
>>>>
>>>>   err=32               129274    No Such Object
>>>> err=0                 10952    Successful Operations
>>>> err=14                  536    SASL Bind in Progress
>>>> err=53                   39    Unwilling To Perform
>>>> err=49                    3    Invalid Credentials (Bad Password)
>>>>
>>>>   I'm still trying to figure out why there are so many error 32s.  Are
>>>> there any usual suspects I should know about?  (That's just the current
>>>> access log, btw.)
>>>>
>>>>
>>>> On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson <rmeggins at redhat.com>wrote:
>>>>
>>>>>    On 09/16/2013 07:57 PM, Dmitri Pal wrote:
>>>>>
>>>>> On 09/16/2013 12:02 PM, KodaK wrote:
>>>>>
>>>>> Yet another AIX related problem:
>>>>>
>>>>>   The AIX LDAP client is called secldapclntd (sure, they could make it
>>>>> more awkward, but the budget ran out.)  I'm running into the issue detailed
>>>>> here:
>>>>>
>>>>>   http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344
>>>>>
>>>>>   "If an LDAP server fails to answer an LDAP query, secldapclntd caches
>>>>> the non-answered query negatively. This may happen if the LDAP server
>>>>> is down for example. After the LDAP server is back again secldapclntd will
>>>>> use the negative cache entry and the application initiating the original
>>>>> query will still fail until the cache entry expires."
>>>>>
>>>>>   IBM is working on porting the fix to our specific TL and SP levels.
>>>>>
>>>>>   What I'm concerned with here, though, is *why* is it timing out?  I
>>>>> don't know what the current timeout values are (AIX sucks, etc.)
>>>>>
>>>>>   I don't see timeout issues on my Linux boxes, which leads me to
>>>>> believe that either the sssd timouts are longer or that sssd is just more
>>>>> robust when dealing with timeouts.
>>>>>
>>>>>   I believe I'm seeing similar behavior with LDAP sudo on AIX as well,
>>>>> because I occasionally have to re-run sudo commands because they initially
>>>>> fail (and I know I'm using the right passwords.)  However, sudo doesn't
>>>>> appear to have a cache (or it handles caching better.)
>>>>>
>>>>>   Does anyone have any troubleshooting suggestions?  Any general "speed
>>>>> things up" suggestions on the IPA side?
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>   --Jason
>>>>>
>>>>>   --
>>>>> The government is going to read our mail anyway, might as well make it
>>>>> tough for them.  GPG Public key ID:  B6A1A7C6
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Freeipa-users mailing listFreeipa-users at redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users
>>>>>
>>>>>
>>>>> Is the server FreeIPA?
>>>>> Can see in the server logs what is actually happening is it the server
>>>>> that really takes time or there is a network connectivity issue or FW is
>>>>> dropping packets?
>>>>> I would really start with the server side logs.
>>>>>
>>>>>
>>>>>   As far as 389 goes, run logconv.pl against the access logs in
>>>>> /var/log/dirsrv/slapd-DOMAIN-COM