[Linux-cluster] Odd cluster problems

Jay Leafey jleafey at utmem.edu
Tue Jul 31 15:48:44 UTC 2007


I've got a 3-node cluster running CentOS 4.5 and I cannot communicate 
with the resource group manager.  When I use the clustat command I get a 
timeout:

> [root at rapier ~]# clustat
> Timed out waiting for a response from Resource Group Manager
> Member Status: Quorate
> 
>   Member Name                              Status
>   ------ ----                              ------
>   rapier.utmem.edu                         Online, Local, rgmanager
>   thorax.utmem.edu                         Offline
>   cyclops.utmem.edu                        Online, rgmanager

I've got rgmanager 1.9.68-1 installed, along with the following 
"relevant" packages:

kernel-2.6.9-55.EL.x86_64
ccs-1.0.10-0.x86_64
cman-1.0.17-0.x86_64
cman-kernel-2.6.9-50.2.x86_64
dlm-1.0.3-1.x86_64
dlm-kernel-2.6.9-46.16.x86_64
fence-1.32.45-1.0.1.x86_64
GFS-6.1.14-0.x86_64
GFS-kernel-2.6.9-72.2.x86_64
gulm-1.0.10-0.x86_64
lvm2-cluster-2.02.21-7.el4.x86_64
magma-1.0.7-1.x86_64
magma-plugins-1.0.12-0.x86_64
rgmanager-1.9.68-1.x86_64
system-config-cluster-1.0.45-1.0.noarch

I checked the archives and saw similar reports, but they all seem to 
reference an older version of rgmanager.

I did some poking around and there is one service (show by cman_tool 
services) shown in a state other than "run", the "usrm::manager" 
service.  Here's the anomalous output:

> [root at rapier ~]# cman_tool services
> Service          Name                              GID LID State     Code
> Fence Domain:    "default"                           2   2 recover 4 -
> [1 2]
> 
> <SNIP>
> 
> User:            "usrm::manager"                    10  10 recover 2 -
> [1 2]
> 

The services handled by rgmanager are all running, but any attempt to 
update the cluster.conf file via ccs_tool update 
"/etc/cluster/cluster.conf" is ineffective.  The file gets updated, but 
the config version shown by "cman_tool status" does not change.

Any thought on how to proceed with troubleshooting this?
-- 
Jay Leafey - University of Tennessee
E-Mail:  jleafey at utmem.edu  Phone:  901-448-6534  FAX:  901-448-8199

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5158 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070731/f583582b/attachment.bin>


More information about the Linux-cluster mailing list