R: [Linux-cluster] High system CPU usage in one of a two node cluster
Marco Lusini
marco.lusini at governo.it
Fri Jan 5 10:49:46 UTC 2007
Thanks Patrick,
I have tried to get the locks for Magma on both nodes,
and I get the same error of
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634:
cat: /proc/cluster/dlm_locks: Cannot allocate memory
I will try to install the RPMs from Lon if I can and
see if it solve the problem...
Marco
> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] Per conto di
> Patrick Caulfield
> Inviato: venerdì 5 gennaio 2007 11.13
> A: linux clustering
> Oggetto: Re: [Linux-cluster] High system CPU usage in one of
> a two node cluster
>
>
> Lon Hohberger wrote:
> > On Wed, 2007-01-03 at 12:35 +0100, Marco Lusini wrote:
> >> Hi all,
> >>
> >> I have 3 2-node clusters, running just cluster suite, without gfs,
> >> each one updated with the latest packages released by RHN.
> >>
> >> In each cluster one of the two nodes has a steadily growing system
> >> CPU usage, which seems to be consumed by clurgmgrd and dlm_recvd.
> >> As an example here is the running time accumulated on one cluster
> >> since 20 december when oit was rebooted:
> >>
> >> [root at estestest ~]# ps axo pid,start,time,args
> >> PID STARTED TIME COMMAND
> >> ...
> >> 10221 Dec 20 10:37:05 clurgmgrd
> >> 11169 Dec 20 06:48:24 [dlm_recvd]
> >> ...
> >>
> >> [root at frascati ~]# ps axo pid,start,time,args
> >> PID STARTED TIME COMMAND
> >> ...
> >> 6226 Dec 20 00:04:17 clurgmgrd
> >> 8249 Dec 20 00:00:19 [dlm_recvd]
> >> ...
>
> I suspect these two being at the top are related. If
> clurgmgrd is taking out locks then dlm_recvd will also be busy
>
> >> I attach two graphs made with RRD which show that the system CPU
> >> usage is steadily growing:
> >> note how the trend changed after the reboot on 20 december.
> >
> >> Of course as the system usage increases so does the system
> load and I
> >> am afraid of what will happen after 1-2 months of uptime...
> >
> > System load averages are the average of the number of
> processes on the
> > run queue over the past 1, 5, and 15 minutes. It doesn't generally
> > trend upwards over time; if that were the case, I'd be in trouble:
> >
> > ...
> > 28204 15:11:11 01:04:19
> /usr/lib/firefox-1.5.0.9/firefox-bin -UILocale
> > en-US ...
> >
> > However, it is a little odd that you had 10 hours of runtime for
> > clurgmgrd and over 6 for dlm_recvd. Just taking a wild
> guess, but it
> > looks like the locks were all mastered on frascati.
> >
> > How many services are you running?
> >
> > Also, take a look at:
> >
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634
> >
> > The RPMs there might solve the problem with dlm_recvd.
> Rgmanager in
> > some situations causes a strange leak of NL locks in the DLM. If
> > dlm_recvd has to traverse lock lists and that list is ever-growing
> > (total speculation here), it could explain the amount of consumed
> > system time.
> >
>
>
> Yes, DLM will do a lot of traversing lock lists if there are
> a lot of similar locks on one resource. VMS has an
> optimisation on this known as the group grant and concversion
> grant modes that we don't currently implement.
>
>
> > How can I get more info on this? I checked
> /proc/cluster/dlm_locks on
> > both nodes and it is empty.
>
> /proc/cluster/dlm_locks needs to be told which lockspace to
> use. Just catting that file after bootup will show nothing.
> What you need to do is to echo the lockspace name into that
> file, then look a it. You can get the lockspace names with
> the "cman_tool services" command so (eg)
>
> # cman_tool services
>
> Service Name GID LID
> State Code
> Fence Domain: "default" 1 2 run -
> [1 2]
>
> DLM Lock Space: "clvmd" 2 3 run -
> [1 2]
>
> # echo "clvmd" > /proc/cluster/dlm_locks # cat /proc/cluster/dlm_locks
>
> This shows locks held by clvmd. If you want to look at
> another lockspace just echo the other name into the /proc file.
> --
>
> patrick
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> _______________________________________________________
> Messaggio analizzato e protetto da tecnologia antivirus
>
> Servizio erogato dal sistema informativo della Presidenza del
> Consiglio dei Ministri
More information about the Linux-cluster
mailing list