[Linux-cluster] Why my cluster stop to work when one node down?
Tiago Cruz
tiagocruz at forumgdh.net
Wed Apr 2 15:59:37 UTC 2008
Nice Gordan!!!
It works now!! :-p
"Quorum" its the number minimum of nodes on the cluster?
[root at teste-spo-la-v1 ~]# cman_tool status
Version: 6.0.1
Config Version: 3
Cluster Name: mycluster
Cluster Id: 56756
Cluster Member: Yes
Cluster Generation: 140
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1
Active subsystems: 8
Flags: 2node
Ports Bound: 0 11 177
Node name: node1.mycluster.com
Node ID: 1
Multicast addresses: 239.192.221.146
Node addresses: 10.25.0.251
Many thanks!!
On Wed, 2008-04-02 at 16:16 +0100, gordan at bobich.net wrote:
> Replace:
>
> <cman expected_votes="1">
> </cman>
>
> with
>
> <cman two_node="1" expected_votes="1"/>
>
> in cluster.conf.
>
> Gordan
>
> On Wed, 2 Apr 2008, Tiago Cruz wrote:
>
> > Hello guys,
> >
> > I have one cluster with two machines, running RHEL 5.1 x86_64.
> > The Storage device has imported using GNDB and formated using GFS, to
> > mount on both nodes:
> >
> > [root at teste-spo-la-v1 ~]# gnbd_import -v -l
> > Device name : cluster
> > ----------------------
> > Minor # : 0
> > sysfs name : /block/gnbd0
> > Server : gnbdserv
> > Port : 14567
> > State : Open Connected Clear
> > Readonly : No
> > Sectors : 20971520
> >
> > # gfs2_mkfs -p lock_dlm -t mycluster:export1 -j 2 /dev/gnbd/cluster
> > # mount /dev/gnbd/cluster /mnt/
> >
> > Everything works graceful, until one node get out (shutdown, network
> > stop, xm destroy...)
> >
> >
> > teste-spo-la-v1 clurgmgrd[3557]: <emerg> #1: Quorum Dissolved Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering GATHER state from 0.
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Creating commit token because I am the rep.
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Saving state aru 46 high seq received 46
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Storing new sequence id for ring 4c
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering COMMIT state.
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering RECOVERY state.
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] position [0] member 10.25.0.251:
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] previous ring seq 72 rep 10.25.0.251
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] aru 46 high delivered 46 received flag 1
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Did not need to originate any messages in recovery.
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Sending initial ORF token
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] CLM CONFIGURATION CHANGE
> > Apr 2 12:00:07 teste-spo-la-v1 kernel: dlm: closing connection to node 3
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] New Configuration:
> > Apr 2 12:00:07 teste-spo-la-v1 clurgmgrd[3557]: <emerg> #1: Quorum Dissolved
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.251)
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Left:
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.252)
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Joined:
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CMAN ] quorum lost, blocking activity
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] CLM CONFIGURATION CHANGE
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] New Configuration:
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.251)
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Left:
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Joined:
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [SYNC ] This node is within the primary component and will provide service.
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering OPERATIONAL state.
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] got nodejoin message 10.25.0.251
> > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CPG ] got joinlist message from node 2
> > Apr 2 12:00:12 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection.
> > Apr 2 12:00:12 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused
> > Apr 2 12:00:16 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection.
> > Apr 2 12:00:17 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused
> > Apr 2 12:00:22 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection.
> >
> >
> > So then, my GFS mount point has broken... the terminal freeze when I try
> > to access the directory "/mnt" and just come back when the second node
> > has back again to the cluster.
> >
> >
> > Follow the cluster.conf:
> >
> > <?xml version="1.0"?>
> > <cluster name="mycluster" config_version="2">
> >
> > <cman expected_votes="1">
> > </cman>
> >
> > <fence_daemon post_join_delay="60">
> > </fence_daemon>
> >
> > <clusternodes>
> > <clusternode name="node1.mycluster.com" nodeid="2">
> > <fence>
> > <method name="single">
> > <device name="gnbd" ipaddr="10.25.0.251"/>
> > </method>
> > </fence>
> > </clusternode>
> > <clusternode name="node2.mycluster.com" nodeid="3">
> > <fence>
> > <method name="single">
> > <device name="gnbd" ipaddr="10.25.0.252"/>
> > </method>
> > </fence>
> > </clusternode>
> > </clusternodes>
> >
> > <fencedevices>
> > <fencedevice name="gnbd" agent="fence_gnbd"/>
> > </fencedevices>
> > </cluster>
> >
> >
> > Thanks!
> >
> > --
> > Tiago Cruz
> > http://everlinux.com
> > Linux User #282636
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
Tiago Cruz
http://everlinux.com
Linux User #282636
More information about the Linux-cluster
mailing list