[Linux-cluster] Why my cluster stop to work when one node down?

Wed Apr 2 15:59:37 UTC 2008

Nice Gordan!!!

It works now!! :-p

"Quorum" its the number minimum of nodes on the cluster?

[root at teste-spo-la-v1 ~]# cman_tool status
Version: 6.0.1
Config Version: 3
Cluster Name: mycluster
Cluster Id: 56756
Cluster Member: Yes
Cluster Generation: 140
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1  
Active subsystems: 8
Flags: 2node 
Ports Bound: 0 11 177  
Node name: node1.mycluster.com
Node ID: 1
Multicast addresses: 239.192.221.146 
Node addresses: 10.25.0.251 

Many thanks!!

On Wed, 2008-04-02 at 16:16 +0100, gordan at bobich.net wrote:
> Replace:
> 
> <cman expected_votes="1">
> </cman>
> 
> with
> 
> <cman two_node="1" expected_votes="1"/>
> 
> in cluster.conf.
> 
> Gordan
> 
> On Wed, 2 Apr 2008, Tiago Cruz wrote:
> 
> > Hello guys,
> >
> > I have one cluster with two machines, running RHEL 5.1 x86_64.
> > The Storage device has imported using GNDB and formated using GFS, to
> > mount on both nodes:
> >
> > [root at teste-spo-la-v1 ~]# gnbd_import -v -l
> > Device name : cluster
> > ----------------------
> >    Minor # : 0
> > sysfs name : /block/gnbd0
> >     Server : gnbdserv
> >       Port : 14567
> >      State : Open Connected Clear
> >   Readonly : No
> >    Sectors : 20971520
> >
> > # gfs2_mkfs -p lock_dlm -t mycluster:export1 -j 2 /dev/gnbd/cluster
> > # mount /dev/gnbd/cluster /mnt/
> >
> > Everything works graceful, until one node get out (shutdown, network
> > stop, xm destroy...)
> >
> >
> > teste-spo-la-v1 clurgmgrd[3557]: <emerg> #1: Quorum Dissolved Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering GATHER state from 0.
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Creating commit token because I am the rep.
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Saving state aru 46 high seq received 46
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Storing new sequence id for ring 4c
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering COMMIT state.
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering RECOVERY state.
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] position [0] member 10.25.0.251:
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] previous ring seq 72 rep 10.25.0.251
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] aru 46 high delivered 46 received flag 1
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Did not need to originate any messages in recovery.
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Sending initial ORF token
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] CLM CONFIGURATION CHANGE
> > Apr  2 12:00:07 teste-spo-la-v1 kernel: dlm: closing connection to node 3
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] New Configuration:
> > Apr  2 12:00:07 teste-spo-la-v1 clurgmgrd[3557]: <emerg> #1: Quorum Dissolved
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] 	r(0) ip(10.25.0.251)
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Left:
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] 	r(0) ip(10.25.0.252)
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Joined:
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CMAN ] quorum lost, blocking activity
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] CLM CONFIGURATION CHANGE
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] New Configuration:
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] 	r(0) ip(10.25.0.251)
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Left:
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Joined:
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [SYNC ] This node is within the primary component and will provide service.
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering OPERATIONAL state.
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] got nodejoin message 10.25.0.251
> > Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CPG  ] got joinlist message from node 2
> > Apr  2 12:00:12 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate.  Refusing connection.
> > Apr  2 12:00:12 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused
> > Apr  2 12:00:16 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate.  Refusing connection.
> > Apr  2 12:00:17 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused
> > Apr  2 12:00:22 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate.  Refusing connection.
> >
> >
> > So then, my GFS mount point has broken... the terminal freeze when I try
> > to access the directory "/mnt" and just come back when the second node
> > has back again to the cluster.
> >
> >
> > Follow the cluster.conf:
> >
> > <?xml version="1.0"?>
> > <cluster name="mycluster" config_version="2">
> >
> > <cman expected_votes="1">
> > </cman>
> >
> > <fence_daemon post_join_delay="60">
> > </fence_daemon>
> >
> > <clusternodes>
> > <clusternode name="node1.mycluster.com" nodeid="2">
> > 	<fence>
> > 		<method name="single">
> > 			<device name="gnbd" ipaddr="10.25.0.251"/>
> > 		</method>
> > 	</fence>
> > </clusternode>
> > <clusternode name="node2.mycluster.com" nodeid="3">
> > 	<fence>
> > 		<method name="single">
> > 			<device name="gnbd" ipaddr="10.25.0.252"/>
> > 		</method>
> > 	</fence>
> > </clusternode>
> > </clusternodes>
> >
> > <fencedevices>
> > 	<fencedevice name="gnbd" agent="fence_gnbd"/>
> > </fencedevices>
> > </cluster>
> >
> >
> > Thanks!
> >
> > -- 
> > Tiago Cruz
> > http://everlinux.com
> > Linux User #282636
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
-- 
Tiago Cruz
http://everlinux.com
Linux User #282636