[Linux-cluster] cman startup after after update to 5.3

Tue Feb 10 17:24:10 UTC 2009

On Fri, 30 Jan 2009 14:56:53 +0100 Gunther Schlegel wrote:
> Rolling back to openais-0.80.3-15.el5 worked for me as well.

Hello,
with the same strategy I was only able to partially solve the problem.
Two nodes with this in cluster.conf
<cluster alias="oracs" config_version="37" name="oracs">
        <cman expected_votes="3" two_node="0"/>
and
<quorumd device="/dev/mapper/mpath3" interval="3" label="acsquorum"
log_level="4" tko="5" votes="1">
                <heuristic interval="2" program="ping -c1 -w1
10.4.5.250" score="1" tko="3"/>
        </quorumd>

after updating to U3 I got the same error in initial post.
After downgrading openais to 0.80.3-15.el5 I can start up each one
node with the other one powered off
but as soon as I start up the second node, when it arrives at

starting fence....
I got cman dead on the first node started (node02 in example) and with
this on /var/log/messages
Feb 10 18:13:18 oracs2 openais[5894]: [CMAN ] cman killed by node 1
because we rejoined the cluster without a full restart
Feb 10 18:13:18 oracs2 dlm_controld[5960]: cluster is down, exiting
Feb 10 18:13:18 oracs2 kernel: dlm: closing connection to node 1
Feb 10 18:13:18 oracs2 kernel: dlm: closing connection to node 2
Feb 10 18:13:18 oracs2 gfs_controld[5966]: cluster is down, exiting
Feb 10 18:13:18 oracs2 fenced[5954]: cluster is down, exiting
Feb 10 18:13:19 oracs2 qdiskd[5937]: <err> cman_dispatch: Host is down
Feb 10 18:13:19 oracs2 qdiskd[5937]: <err> Halting qdisk operations
Feb 10 18:13:29 oracs2 kernel: dlm: clvmd: remove fr 0 ID 2
Feb 10 18:13:29 oracs2 last message repeated 3 times
Feb 10 18:13:43 oracs2 ccsd[5883]: Unable to connect to cluster
infrastructure after 30 seconds.
Feb 10 18:14:13 oracs2 ccsd[5883]: Unable to connect to cluster
infrastructure after 60 seconds.
Feb 10 18:14:43 oracs2 ccsd[5883]: Unable to connect to cluster
infrastructure after 90 seconds.
Feb 10 18:15:13 oracs2 ccsd[5883]: Unable to connect to cluster
infrastructure after 120 seconds.

the first started keeps the services but
[root ~]# clustat -l
Could not connect to CMAN: Connection refused

while the second (node01) doesn't becomes active on the services but gives:
[root at oracs1 ~]# clustat -l
Cluster Status for oracs @ Tue Feb 10 18:22:06 2009
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 node01                                      1 Online, Local
 node02                                      2 Offline
 /dev/dm-5                                   0 Online, Quorum Disk

very strance.....
Any hints?

Gianluca