[Linux-cluster] Why my cluster stop to work when one node down?
Tiago Cruz
tiagocruz at forumgdh.net
Wed Apr 2 15:08:53 UTC 2008
Hello guys,
I have one cluster with two machines, running RHEL 5.1 x86_64.
The Storage device has imported using GNDB and formated using GFS, to
mount on both nodes:
[root at teste-spo-la-v1 ~]# gnbd_import -v -l
Device name : cluster
----------------------
Minor # : 0
sysfs name : /block/gnbd0
Server : gnbdserv
Port : 14567
State : Open Connected Clear
Readonly : No
Sectors : 20971520
# gfs2_mkfs -p lock_dlm -t mycluster:export1 -j 2 /dev/gnbd/cluster
# mount /dev/gnbd/cluster /mnt/
Everything works graceful, until one node get out (shutdown, network
stop, xm destroy...)
teste-spo-la-v1 clurgmgrd[3557]: <emerg> #1: Quorum Dissolved Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering GATHER state from 0.
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Creating commit token because I am the rep.
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Saving state aru 46 high seq received 46
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Storing new sequence id for ring 4c
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering COMMIT state.
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering RECOVERY state.
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] position [0] member 10.25.0.251:
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] previous ring seq 72 rep 10.25.0.251
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] aru 46 high delivered 46 received flag 1
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Did not need to originate any messages in recovery.
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Sending initial ORF token
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] CLM CONFIGURATION CHANGE
Apr 2 12:00:07 teste-spo-la-v1 kernel: dlm: closing connection to node 3
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] New Configuration:
Apr 2 12:00:07 teste-spo-la-v1 clurgmgrd[3557]: <emerg> #1: Quorum Dissolved
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.251)
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Left:
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.252)
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Joined:
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CMAN ] quorum lost, blocking activity
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] CLM CONFIGURATION CHANGE
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] New Configuration:
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.251)
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Left:
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Joined:
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [SYNC ] This node is within the primary component and will provide service.
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering OPERATIONAL state.
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] got nodejoin message 10.25.0.251
Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CPG ] got joinlist message from node 2
Apr 2 12:00:12 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection.
Apr 2 12:00:12 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused
Apr 2 12:00:16 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection.
Apr 2 12:00:17 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused
Apr 2 12:00:22 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection.
So then, my GFS mount point has broken... the terminal freeze when I try
to access the directory "/mnt" and just come back when the second node
has back again to the cluster.
Follow the cluster.conf:
<?xml version="1.0"?>
<cluster name="mycluster" config_version="2">
<cman expected_votes="1">
</cman>
<fence_daemon post_join_delay="60">
</fence_daemon>
<clusternodes>
<clusternode name="node1.mycluster.com" nodeid="2">
<fence>
<method name="single">
<device name="gnbd" ipaddr="10.25.0.251"/>
</method>
</fence>
</clusternode>
<clusternode name="node2.mycluster.com" nodeid="3">
<fence>
<method name="single">
<device name="gnbd" ipaddr="10.25.0.252"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="gnbd" agent="fence_gnbd"/>
</fencedevices>
</cluster>
Thanks!
--
Tiago Cruz
http://everlinux.com
Linux User #282636
More information about the Linux-cluster
mailing list