[Linux-cluster] weird rgmanager

frederic randriamora frederic at ovsg.univ-ag.fr
Mon Jan 25 11:26:42 UTC 2010


In a four node cluster running RH5.4, connected to a FC SAN

clustat says:
node 1 and node 3 are online with rgmanager
node 2 and node 4 are offline

The cluster remains quorate because of a qdiskd running on each node


BUT, node 4, which is offline as per clustat and cman_tool nodes, is 
still reported by clustat as running services ( those services are 
actually dead ).
I have on the two alive nodes ( node 1 and node 3 ):

cman_tool services
type             level name       id       state       
fence            0     default    00010004 FAIL_ALL_STOPPED
[1 2 3 4]
dlm              1     clvmd      00020004 LEAVE_STOP_WAIT
[1 2 3 4]
dlm              1     rgmanager  00030004 FAIL_ALL_STOPPED
[1 3 4]

they are running services ( xen vm, nfs and dns ) OK.

The other two dead nodes ( they don't run ccs neither cman neither 
nothing ) can access the SAN as is displayed by multipath -ll
I know I can restart the whole cluster but i would like to know why this 
is happening.

I someone please can help.


More information about the Linux-cluster mailing list