R: [Linux-cluster] High system CPU usage in one of a two node cluster
Marco Lusini
marco.lusini at governo.it
Fri Jan 5 09:38:41 UTC 2007
>
> System load averages are the average of the number of
> processes on the run queue over the past 1, 5, and 15
> minutes. It doesn't generally trend upwards over time; if
> that were the case, I'd be in trouble:
>
I am in trouble, then :-)
As I told in the first mail, as system (i.e. kernel) CPU
usage grows so does the system load (1, 5, and 15 mins average).
In order to better show what I see in my clusters, I am sending
more graphs (on a yearly time base) that illustrate how system load
trends upwards as kernel usage grows.
Graphs were produced by CACTI probing the snmpd daemon running on the nodes.
Again note how the trend swap from node to node on reboots.
>
> However, it is a little odd that you had 10 hours of runtime
> for clurgmgrd and over 6 for dlm_recvd. Just taking a wild
> guess, but it looks like the locks were all mastered on frascati.
>
How can I get more info on this? I checked /proc/cluster/dlm_locks
on both nodes and it is empty.
Here is the output of cat /proc/cluster/dlm_stats:
[root at estestest ~]# cat /proc/cluster/dlm_stats
DLM stats (HZ=1000)
Lock operations: 1688738
Unlock operations: 838064
Convert operations: 0
Completion ASTs: 2526802
Blocking ASTs: 0
Lockqueue num waittime ave
[root at frascati ~]# cat /proc/cluster/dlm_stats
DLM stats (HZ=1000)
Lock operations: 1122141
Unlock operations: 556623
Convert operations: 0
Completion ASTs: 1678764
Blocking ASTs: 0
Lockqueue num waittime ave
WAIT_RSB 6 3 0
WAIT_GRANT 1122141 32507056 28
WAIT_UNLOCK 556623 316924 0
Total 1678770 32823983 19
>
> How many services are you running?
>
At the moment I have 3 services on estestest (Sybase SQL server, a tomcat5
application and an apache web site) and 2 services on frascati (another
tomcat5 application and Postgres SQL server).
> Also, take a look at:
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634
>
> The RPMs there might solve the problem with dlm_recvd.
> Rgmanager in some situations causes a strange leak of NL
> locks in the DLM. If dlm_recvd has to traverse lock lists
> and that list is ever-growing (total speculation here), it
> could explain the amount of consumed system time.
>
If I use those RPMs, will the patches be included in RHCS 4.5
(I think so, but just to be sure...)?
Thanks,
Marco
_______________________________________________________
Messaggio analizzato e protetto da tecnologia antivirus
Servizio erogato dal sistema informativo della
Presidenza del Consiglio dei Ministri
-------------- next part --------------
A non-text attachment was scrubbed...
Name: frascati_yearly_CPU_Usage.jpg
Type: image/jpeg
Size: 29334 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070105/5a281359/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: frascati_yearly_System_Load.jpg
Type: image/jpeg
Size: 27710 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070105/5a281359/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: estestest_yearly_CPU_Usage.jpg
Type: image/jpeg
Size: 30518 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070105/5a281359/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: estestest_yearly_System_Load.jpg
Type: image/jpeg
Size: 26103 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070105/5a281359/attachment-0003.jpg>
More information about the Linux-cluster
mailing list