[Freeipa-users] dirsrv access logs flooded from single connection id

Thu Jul 2 14:37:34 UTC 2015

On 07/02/2015 04:14 PM, Andrew E. Bruno wrote:
> On Thu, Jul 02, 2015 at 11:04:00AM +0200, thierry bordaz wrote:
>> On 06/29/2015 06:34 PM, Andrew E. Bruno wrote:
>>> On Mon, Jun 29, 2015 at 10:29:24AM -0600, Rich Megginson wrote:
>>>> On 06/29/2015 10:13 AM, Andrew E. Bruno wrote:
>>>>> Our dirsrv access logs on our freeipa master server are getting flooded
>>>>> with this:
>>>>>
>>>>> [29/Jun/2015:12:02:09 -0400] conn=215758 op=1355326784 SRCH
>>>>> base="cn=u2,cn=groups,cn=accounts,dc=ccr,dc=buffalo,dc=edu" scope=0
>>>>> filter="(objectClass=*)" attrs="objectClass posixgroup cn userPassword
>>>>> gidNumber member ipaNTSecurityIdentifier modifyTimestamp entryusn uid"
>>>>>
>>>>> [29/Jun/2015:12:08:08 -0400] conn=215758 op=1356545457 RESULT err=0
>>>>> tag=101 nentries=0 etime=0 notes=P
>>>>>
>>>>> All from the same conn=215758. Logs get rotated every minute.
>>>>>
>>>>> logconv.pl is showing
>>>>>
>>>>> Searches:     265803        (3322.54/sec) (199352.25/min)
>>>>>
>>>>>
>>>>> How can I figure out which ip address this query is coming from? Is
>>>>> there a way to fetch the ip using the connection id? conn=215758?
>>>> grep "conn=215758 fd=" /var/log/dirsrv/slapd-INST/access*
>>>>
>>>> Unfortunately, if it has been rotated away, you won't be able to get the
>>>> information from the access log.
>>>>
>>> No luck .. looks like it has been rotated away. Any other thoughts?
>>>
>>> Is it correct to assume this is all coming from a single host? My
>>> thinking is that if I can kill the query coming from the host that it
>>> would solve the problem.
>>>
>> Hi,
>>
>> This is looking like bug https://fedorahosted.org/389/ticket/48192.
>> Because a ldap client (likely SSSD ?) keeps sending page results requests
>> although 0 entries are returned.
>> A condition for this is that the search has been abandonned but it is
>> difficult to very this as the log file has rotated.
>>
>> This is fixed in 6.7 and 7.1.z
>>
> Thanks Thierry. We get bit by this sporadically every few days.
> Load/network traffic from ns-slapd spikes up on one of our FreeIPA
> servers. Access logs getting hammered from a single host. We then login
> to the host in question and see the "sssd_be" process consuming higher
> than avg cpu load (20%). netstat shows two connections to the FreeIPA
> server:
>
> tcp  0      456   sssd_be
> tcp  0        0   sssd_be
>
>
> Simply restarting sssd on the host always fixes the problem.
That is a good way tor recover, as it closes the connection SSSD->DS and 
then SSSD will send new requests.
The problem occurs because DS can sometime be too slow to respond and 
then SSSD issue an ABANDON triggered by an exception.
DS was expecting that after the ABANDON the current page result search 
was dropped by SSSD and SSSD will stop by itself using the cookie.
Unfortunately SSSD keep using the cookie and then starts this infinite loop.

WIth the patch, DS is sending on ABANDON a end of result cookie.

> We tried
> backtracing the sssd_be process when it's in this state but didn't see
> any obvious clues:
>
> (gdb) bt
> #0  0x00000036b5ee8ed3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:82
> #1  0x00000036bae08d9c in epoll_event_loop (ev=<value optimized out>, location=<value optimized out>) at ../tevent_epoll.c:650
> #2  epoll_event_loop_once (ev=<value optimized out>, location=<value optimized out>) at ../tevent_epoll.c:931
> #3  0x00000036bae072e6 in std_event_loop_once (ev=0xd86590, location=0x36be23d4f5 "src/util/server.c:602") at ../tevent_standard.c:112
> #4  0x00000036bae0349d in _tevent_loop_once (ev=0xd86590, location=0x36be23d4f5 "src/util/server.c:602") at ../tevent.c:530
> #5  0x00000036bae0351b in tevent_common_loop_wait (ev=0xd86590, location=0x36be23d4f5 "src/util/server.c:602") at ../tevent.c:634
> #6  0x00000036bae07256 in std_event_loop_wait (ev=0xd86590, location=0x36be23d4f5 "src/util/server.c:602") at ../tevent_standard.c:138
> #7  0x00000036be22b963 in server_loop (main_ctx=0xd87900) at src/util/server.c:602
> #8  0x000000000040a286 in main (argc=4, argv=<value optimized out>) at src/providers/data_provider_be.c:2856
>
>
> Could this also be an issue with sssd? Is it normal for the sssd process
> to keep sending page results requests like this? So far this behavior
> has only happened on centos 6.6 clients running sssd-1.11.6-30.
>
> Would setting a low connection timeout on the dirsrv help to alleviate
> this problem until the patches from #48192 make it into mainstream
> centos 7.1 updates?
I do not see a workaround on DS side. The connection timeout will not 
help as the connection is very active.
On SSSD side, I do not know exactly.
I believe it can be configure to not use page result 
(ldap_disable_paging ?) or could it be tune to avoid ABANDON (with 
higher timeout).
>
> Many thanks!
>
> --Andrew