[Linux-cluster] cman startup after after update to 5.3
Gianluca Cecchi
gianluca.cecchi at gmail.com
Tue Feb 10 17:24:10 UTC 2009
On Fri, 30 Jan 2009 14:56:53 +0100 Gunther Schlegel wrote:
> Rolling back to openais-0.80.3-15.el5 worked for me as well.
Hello,
with the same strategy I was only able to partially solve the problem.
Two nodes with this in cluster.conf
<cluster alias="oracs" config_version="37" name="oracs">
<cman expected_votes="3" two_node="0"/>
and
<quorumd device="/dev/mapper/mpath3" interval="3" label="acsquorum"
log_level="4" tko="5" votes="1">
<heuristic interval="2" program="ping -c1 -w1
10.4.5.250" score="1" tko="3"/>
</quorumd>
after updating to U3 I got the same error in initial post.
After downgrading openais to 0.80.3-15.el5 I can start up each one
node with the other one powered off
but as soon as I start up the second node, when it arrives at
starting fence....
I got cman dead on the first node started (node02 in example) and with
this on /var/log/messages
Feb 10 18:13:18 oracs2 openais[5894]: [CMAN ] cman killed by node 1
because we rejoined the cluster without a full restart
Feb 10 18:13:18 oracs2 dlm_controld[5960]: cluster is down, exiting
Feb 10 18:13:18 oracs2 kernel: dlm: closing connection to node 1
Feb 10 18:13:18 oracs2 kernel: dlm: closing connection to node 2
Feb 10 18:13:18 oracs2 gfs_controld[5966]: cluster is down, exiting
Feb 10 18:13:18 oracs2 fenced[5954]: cluster is down, exiting
Feb 10 18:13:19 oracs2 qdiskd[5937]: <err> cman_dispatch: Host is down
Feb 10 18:13:19 oracs2 qdiskd[5937]: <err> Halting qdisk operations
Feb 10 18:13:29 oracs2 kernel: dlm: clvmd: remove fr 0 ID 2
Feb 10 18:13:29 oracs2 last message repeated 3 times
Feb 10 18:13:43 oracs2 ccsd[5883]: Unable to connect to cluster
infrastructure after 30 seconds.
Feb 10 18:14:13 oracs2 ccsd[5883]: Unable to connect to cluster
infrastructure after 60 seconds.
Feb 10 18:14:43 oracs2 ccsd[5883]: Unable to connect to cluster
infrastructure after 90 seconds.
Feb 10 18:15:13 oracs2 ccsd[5883]: Unable to connect to cluster
infrastructure after 120 seconds.
the first started keeps the services but
[root ~]# clustat -l
Could not connect to CMAN: Connection refused
while the second (node01) doesn't becomes active on the services but gives:
[root at oracs1 ~]# clustat -l
Cluster Status for oracs @ Tue Feb 10 18:22:06 2009
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node01 1 Online, Local
node02 2 Offline
/dev/dm-5 0 Online, Quorum Disk
very strance.....
Any hints?
Gianluca
More information about the Linux-cluster
mailing list