[Linux-cluster] Re: rgmanager or clustat problem

Wed Apr 11 16:24:57 UTC 2007

It seems that rgmanager is failing to report because it is busy.  Since I am
running 20 to 25 services on each node, perhaps I can increase the poll
interval from 30 seconds to one minute in /usr/share/cluster/script.sh.

Maybe the cluster suite is not properly configured for so many services
on each node.

<actions>
        <action name="start" timeout="0"/>
        <action name="stop" timeout="0"/>

        <!-- This is just a wrapper for LSB init scripts, so monitor
             and status can't have a timeout, nor do they do any extra
             work regardless of the depth -->
        <action name="status" interval="30s" timeout="0"/>
        <action name="monitor" interval="30s" timeout="0"/>

        <action name="meta-data" timeout="0"/>
        <action name="verify-all" timeout="0"/>
    </actions>

Do you think that this might help clustat to report?

Thank you for your help.

On 4/9/07, David M <diggercheer at gmail.com> wrote:
>
>
> I am running a four node GFS cluster with about 20 services per node.  All
> four nodes belong to the same failover domain, and they each have a priority
> of 1.  My shared storage is an iSCSI SAN.
>
> After rgmanager has been running for a couple of days, clustat produces
> the following result on all four nodes:
>
> Timed out waiting for a response from Resource Group Manager
> Member Status: Quorate
>
>   Member Name                              Status
>   ------ ----                              ------
>   node01           Online, rgmanager
>   node02           Online, Local, rgmanager
>   node03           Online, rgmanager
>   node04           Online, rgmanager
>
> I also get a time out when I try to determine the status of a particular
> service with "clustat -s servicename".
>
> All of the services seem to be up and running, but clustat does not work.
> Is there something wrong?  Is there a way for me to increase the time out?
>
> clurgmgrd and dlm_recvd seem to be using a lot of CPU cycles on Node02, 40
> and 60 percent, respectively.
>
> Thank you for your help.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070411/47ef03c3/attachment.htm>