[Freeipa-users] Excessive CPU usage by ns-slapd

Rich Megginson rmeggins at redhat.com
Thu Feb 19 19:47:34 UTC 2015


On 02/19/2015 12:11 PM, Jim Richard wrote:
> Hi Rich, here’s what all 4 of my IPA servers look like right now. You 
> can see that SSO-107’s CPU usage is much higher than the other 3 and 
> it spikes to over 100% often. And what I see over time is that the 
> higher and higher cpu usage will happen between two of my four 
> servers, one will drop off and the other increases and each time this 
> cycle happens, the cpu usage on the one server that is spiking will 
> get a little bit higher.
>
> The two servers that show this behavior are SSO-107 and SSO-109.

SSO-107 is almost entirely idle except for 1 thread doing replication 
updates, and the poll thread.  There are 255 descriptors it is polling on.

SSO-109 is entirely idle except for the poll thread.  There are 230 
descriptors it is polling on.

You might try using top with the ns-slapd process and the  'H' - Threads 
mode flag.  It would be very interesting to see the cpu usage breakdown 
by thread.

If it is indeed the poll thread that is consuming all of the cpu, 
there's not much that can be done.  CPU usage in the poll thread is a 
function of the number of connections, but since there is not much 
difference between 230 and 255, I would not expect a large CPU usage 
difference between 107 and 109 based solely on number of connections.

Are you seeing timeouts or application failures or poor performance that 
seems to be due to high CPU usage?  If so, and these are virtual 
machines, you might consider adding more virtual CPUs to give the server 
more processing power for the worker threads to compensate for 
monopolization by the poll thread.

>
> I’ve attached some more detailed stack trace as well.
>
>
>
>
>
>
>
>
>
> Here’s what my replication agreements look like:
>
> [root at sso-107 (NY) ~]$ ipa-replica-manage list
> sso-108.nym1.placeiq.net <http://sso-108.nym1.placeiq.net>: master
> sso-110.nym1.placeiq.net <http://sso-110.nym1.placeiq.net>: master
> sso-107.nym1.placeiq.net <http://sso-107.nym1.placeiq.net>: master
> sso-109.nym1.placeiq.net <http://sso-109.nym1.placeiq.net>: master
>
> [root at sso-107 (NY) ~]$ ipa-replica-manage list 
> sso-107.nym1.placeiq.net <http://sso-107.nym1.placeiq.net>
> sso-108.nym1.placeiq.net <http://sso-108.nym1.placeiq.net>: replica
> sso-110.nym1.placeiq.net <http://sso-110.nym1.placeiq.net>: replica
>
> [root at sso-107 (NY) ~]$ ipa-replica-manage list 
> sso-108.nym1.placeiq.net <http://sso-108.nym1.placeiq.net>
> sso-107.nym1.placeiq.net <http://sso-107.nym1.placeiq.net>: replica
> sso-109.nym1.placeiq.net <http://sso-109.nym1.placeiq.net>: replica
>
> [root at sso-107 (NY) ~]$ ipa-replica-manage list 
> sso-109.nym1.placeiq.net <http://sso-109.nym1.placeiq.net>
> sso-108.nym1.placeiq.net <http://sso-108.nym1.placeiq.net>: replica
> sso-110.nym1.placeiq.net <http://sso-110.nym1.placeiq.net>: replica
>
> [root at sso-107 (NY) ~]$ ipa-replica-manage list 
> sso-110.nym1.placeiq.net <http://sso-110.nym1.placeiq.net>
> sso-107.nym1.placeiq.net <http://sso-107.nym1.placeiq.net>: replica
> sso-109.nym1.placeiq.net <http://sso-109.nym1.placeiq.net>: replica
>
>
>
>
>
> SSO-107
>
> top - 15:58:08 up 2 days, 10:00,  1 user,  load average: 0.00, 0.03, 0.06
> Tasks:   1 total,   0 running,   1 sleeping, 0 stopped,   0 zombie
> Cpu(s): 12.2%us,  1.1%sy,  0.0%ni, 86.7%id,  0.1%wa,  0.0%hi,  0.0%si, 
>  0.0%st
> Mem:   2952788k total,  2160216k used, 792572k free,   182584k buffers
> Swap:  4094972k total,        0k used,  4094972k free,   678292k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 11615 dirsrv    20   0 2063m 843m  19m S 25.5 29.3 403:53.56 ns-slapd
>
> [root at sso-107 (NY) /var/log/dirsrv/slapd-PLACEIQ-NET]$ ls -al 
> /proc/`cat /var/run/dirsrv/slapd-PLACEIQ-NET.pid`/fd|grep socket|wc -l
> 245
>
>
>
>
> SSO-108
>
> top - 15:57:26 up 3 days, 17:25,  1 user,  load average: 0.03, 0.03, 0.00
> Tasks:   1 total,   0 running,   1 sleeping, 0 stopped,   0 zombie
> Cpu(s):  0.3%us,  0.2%sy,  0.0%ni, 99.4%id,  0.1%wa,  0.0%hi,  0.0%si, 
>  0.0%st
> Mem:   2952788k total,  2200792k used, 751996k free,   182084k buffers
> Swap:  4094972k total,        0k used,  4094972k free,   713848k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 24399 dirsrv    20   0 2055m 819m  19m S  0.8 28.4  54:48.53 ns-slapd
>
> [root at sso-108 (NY) /var/log/dirsrv/slapd-PLACEIQ-NET]$ ls -al 
> /proc/`cat /var/run/dirsrv/slapd-PLACEIQ-NET.pid`/fd|grep socket|wc -l
> 232
>
>
>
>
> SSO-109
>
> top - 16:00:05 up 4 days,  9:10,  1 user,  load average: 0.06, 0.32, 0.35
> Tasks:   1 total,   0 running,   1 sleeping, 0 stopped,   0 zombie
> Cpu(s):  0.7%us,  0.3%sy,  0.0%ni, 98.9%id,  0.2%wa,  0.0%hi,  0.0%si, 
>  0.0%st
> Mem:   2952788k total,  2422572k used, 530216k free,   235472k buffers
> Swap:  4094972k total,        0k used,  4094972k free,   906080k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 22522 dirsrv    20   0 2065m 772m  19m S  1.2 26.8 308:13.07 ns-slapd
>
> [root at sso-109 (NY) ~]$ ls -al /proc/`cat 
> /var/run/dirsrv/slapd-PLACEIQ-NET.pid`/fd|grep socket|wc -l
> 219
>
>
>
>
>
> SSO-110
>
> top - 16:07:54 up 14 days, 18:03,  1 user,  load average: 0.00, 0.01, 0.00
> Tasks:   1 total,   0 running,   1 sleeping, 0 stopped,   0 zombie
> Cpu(s):  2.0%us,  1.0%sy,  0.0%ni, 96.7%id,  0.3%wa,  0.0%hi,  0.0%si, 
>  0.0%st
> Mem:   2952788k total,  2304556k used, 648232k free,   155216k buffers
> Swap:  4094972k total,       64k used,  4094908k free,   748972k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2401 dirsrv    20   0 2074m 839m  18m S  4.8 29.1  48:25.58 ns-slapd
> [root at sso-110 (NY) /var/log/dirsrv/slapd-PLACEIQ-NET]$ ls -al 
> /proc/`cat /var/run/dirsrv/slapd-PLACEIQ-NET.pid`/fd|grep socket|wc -l
> 257
>
>
>
>
>
>
>
>
>
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> 	
> Jim Richard  | PlaceIQ 
> <http://www.google.com/url?q=http%3A%2F%2Fwww.placeiq.com%2F&sa=D&sntz=1&usg=AFrqEzcYjZpDPyqW7feNK9EgLq-c9JlHiw>  | 
>  Systems Administrator  |  jrichard at placeiq.com 
> <mailto:name at placeiq.com>  | +1 (646) 338-8905
>
>
>
>
>> On Feb 19, 2015, at 9:33 AM, Rich Megginson <rmeggins at redhat.com 
>> <mailto:rmeggins at redhat.com>> wrote:
>>
>> On 02/18/2015 11:05 PM, Jatin Nansi wrote:
>>> Check the ns-slapd access and error logs of the DS instance hosting 
>>> the IPA instance. The strace output indicates that ns-slapd spent 
>>> most of its time waiting for network activity to happen (poll), 
>>> which is normal for ns-slapd.
>>
>> The number of open connections correlates to the CPU usage.  Do this:
>>
>> # ls -al /proc/`cat /var/run/dirsrv/slapd-MY-DOMAIN.pid`/fd|grep 
>> socket|wc -l
>>
>> How many socket connections do you have?
>>
>> Also, it will be very useful to get some stack traces of the running 
>> server to see what the various threads are doing.  See 
>> http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-hangs
>>
>>>
>>> Jatin
>>> On 19/02/15 15:52, Jim Richard wrote:
>>>> I’ve got 4 Redhat IDM masters in a multi-master 
>>>> config. 3.0.0-42.el6.centos is the IPA version, 389-ds-base version 
>>>> 1.2.11.15-48.el6_6, Centos 6.6
>>>>
>>>> Monitoring established connections on port 389 and dsInOps over 
>>>> time shows a consistent/even level of activity however 2 of the 4 
>>>> IPA servers show ever increasing CPI usage by ns-slapd. One 
>>>> ns-slapd process will start to show increased CPU for a time, then 
>>>> drop off as another then increases. This cycle continues with each 
>>>> switch seeing more and more total SPU usage by ns-slapd.
>>>>
>>>> strace timing for the offending ns-slapd looks like the following:
>>>>
>>>>
>
>>>> % time     seconds  usecs/call   calls    errors syscall
>>>> ------ ----------- ----------- --------- --------- ----------------
>>>>  96.12    9.342272        1133    8243           poll
>>>>   3.86    0.375457          53    7066        41 futex
>>>>   0.01    0.000668           0    8244      8244 getpeername
>>>>   0.00    0.000374           0     929           close
>>>>   0.00    0.000368           0    3201           read
>>>>   0.00    0.000151           0     882           setsockopt
>>>>   0.00    0.000095           2      42           access
>>>>   0.00    0.000033           0    1365           fcntl
>>>>   0.00    0.000000           0      42           open
>>>>   0.00    0.000000           0      39           stat
>>>>   0.00    0.000000           0      42           fstat
>>>>   0.00    0.000000           0       1           madvise
>>>>   0.00    0.000000           0     441           accept
>>>>   0.00    0.000000           0     441           getsockname
>>>>   0.00    0.000000           0       1           restart_syscall
>>>> ------ ----------- ----------- --------- --------- ----------------
>>>> 100.00    9.719418   30979      8285 total
>>>>
>>>>
>>>> I have carefully reviewed cn=config settings on all four master 
>>>> servers to confirm that they match.
>>>>
>>>> Based on this strace output can you perhaps point me in the right 
>>>> direction, give me a clue on what I should be looking at.
>>>>
>>>> Here’s a screen shot of my Zabbix reporting to help describe the 
>>>> problem. Note the graph in the bottom right corner.
>>>>
>>>> The problem is most certainly related to replication but I just 
>>>> don’t know what specifically to look at.
>>>>
>>>> <Mail Attachment.png>
>>>>
>>>>
>>>>
>>>> Thanks in advance for any clues you can provide.
>>>>
>>>>
>>>>
>>>>
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> 	
>>>> Jim Richard  | PlaceIQ 
>>>> <http://www.google.com/url?q=http%3A%2F%2Fwww.placeiq.com%2F&sa=D&sntz=1&usg=AFrqEzcYjZpDPyqW7feNK9EgLq-c9JlHiw>  | 
>>>>  Systems Administrator  |  jrichard at placeiq.com 
>>>> <mailto:name at placeiq.com>  | +1 (646) 338-8905
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>> -- 
>> Manage your subscription for the Freeipa-users mailing list:
>> https://www.redhat.com/mailman/listinfo/freeipa-users
>> Go To http://freeipa.org for more info on the project
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20150219/d571a0b4/attachment.htm>


More information about the Freeipa-users mailing list