[Linux-cluster] cluster latest cvs does not fence dead nodes automatically
Fajar A. Nugraha
fajar at telkom.co.id
Tue Feb 15 08:34:44 UTC 2005
David Teigland wrote:
>Above the names were "hosting-cl02-01" and "hosting-cl02-02". Could you
>clear that up and if there are still problems send your cluster.conf file?
>Thanks
>
>
>
Here's how it is now.
Using new hostnames and cluster.conf (blade center's IP address and
community string removed):
==================================
<?xml version="1.0"?>
<cluster name="cluster" config_version="3">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>
<clusternode name="cluster-node2" votes="1">
<fence>
<method name="single">
<device name="ibmblade" port="7"/>
</method>
</fence>
</clusternode>
<clusternode name="cluster-node1" votes="1">
<fence>
<method name="single">
<device name="ibmblade" port="6"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="ibmblade" agent="fence_ibmblade"
ipaddr="IP_ADDRESS_HERE" community="COMMUNITY_HERE"/>
</fencedevices>
</cluster>
===========================================
Commands and their output (console or syslog):
# modprobe gfs
# modprobe lock_dlm
Feb 15 15:10:04 cluster-node1 Lock_Harness <CVS> (built Feb 15 2005
12:00:38) installed
Feb 15 15:10:04 cluster-node1 GFS <CVS> (built Feb 15 2005 12:00:52)
installed
Feb 15 15:10:08 cluster-node1 CMAN <CVS> (built Feb 15 2005 12:00:31)
installed
Feb 15 15:10:08 cluster-node1 NET: Registered protocol family 30
Feb 15 15:10:08 cluster-node1 DLM <CVS> (built Feb 15 2005 12:00:34)
installed
Feb 15 15:10:08 cluster-node1 Lock_DLM (built Feb 15 2005 12:00:39)
installed
dm-mod is built-in in the kernel (not a module)
# ccsd -V
ccsd DEVEL.1108443619 (built Feb 15 2005 12:01:01)
Copyright (C) Red Hat, Inc. 2004 All rights reserved.
# ccsd -4
Feb 15 15:10:58 cluster-node1 ccsd[8556]: Starting ccsd DEVEL.1108443619:
Feb 15 15:10:58 cluster-node1 ccsd[8556]: Built: Feb 15 2005 12:01:01
Feb 15 15:10:58 cluster-node1 ccsd[8556]: Copyright (C) Red Hat, Inc.
2004 All rights reserved.
Feb 15 15:10:58 cluster-node1 ccsd[8556]: IP Protocol:: IPv4 only
# cman_tool join
Feb 15 15:12:27 cluster-node1 ccsd[8556]: cluster.conf (cluster name =
cluster, version = 3) found.
Feb 15 15:12:28 cluster-node1 CMAN: Waiting to join or form a Linux-cluster
Feb 15 15:12:28 cluster-node1 ccsd[8558]: Connected to cluster
infrastruture via: CMAN/SM Plugin v1.1
Feb 15 15:12:28 cluster-node1 ccsd[8558]: Initial status:: Inquorate
Feb 15 15:13:00 cluster-node1 CMAN: forming a new cluster
Feb 15 15:13:00 cluster-node1 CMAN: quorum regained, resuming activity
Feb 15 15:13:00 cluster-node1 ccsd[8558]: Cluster is quorate. Allowing
connections.
# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 0
Node name: cluster-node1
Node addresses: 192.168.192.146
# cman_tool nodes
Node Votes Exp Sts Name
1 1 1 M cluster-node1
# fence_tool join
Feb 15 15:14:26 cluster-node1 fenced[8847]: cluster-node2 not a cluster
member after 6 sec post_join_delay
Feb 15 15:14:26 cluster-node1 fenced[8847]: fencing node "cluster-node2"
Feb 15 15:14:32 cluster-node1 fenced[8847]: fence "cluster-node2" success
at this point "cluster-node2" was fenced and automatically rebooted,
which is good.
Now I join the cluster-node2 to the cluster :
# modprobe gfs
# modprobe lock_dlm
# cman_tool join
# fence_tool join
Feb 15 15:18:30 cluster-node2 ccsd[8376]: Starting ccsd DEVEL.1108443619:
Feb 15 15:18:30 cluster-node2 ccsd[8376]: Built: Feb 15 2005 12:01:01
Feb 15 15:18:30 cluster-node2 ccsd[8376]: Copyright (C) Red Hat, Inc.
2004 All rights reserved.
Feb 15 15:18:30 cluster-node2 ccsd[8376]: IP Protocol:: IPv4 only
Feb 15 15:18:34 cluster-node2 ccsd[8376]: cluster.conf (cluster name =
cluster, version = 3) found.
Feb 15 15:18:34 cluster-node2 ccsd[8376]: Remote copy of cluster.conf is
from quorate node.
Feb 15 15:18:34 cluster-node2 ccsd[8376]: Local version # : 3
Feb 15 15:18:34 cluster-node2 ccsd[8376]: Remote version #: 3
Feb 15 15:18:41 cluster-node2 Lock_Harness <CVS> (built Feb 15 2005
12:00:38) installed
Feb 15 15:18:41 cluster-node2 GFS <CVS> (built Feb 15 2005 12:00:52)
installed
Feb 15 15:18:44 cluster-node2 CMAN <CVS> (built Feb 15 2005 12:00:31)
installed
Feb 15 15:18:44 cluster-node2 NET: Registered protocol family 30
Feb 15 15:18:44 cluster-node2 DLM <CVS> (built Feb 15 2005 12:00:34)
installed
Feb 15 15:18:44 cluster-node2 Lock_DLM (built Feb 15 2005 12:00:39)
installed
Feb 15 15:18:47 cluster-node2 ccsd[8376]: Remote copy of cluster.conf is
from quorate node.
Feb 15 15:18:47 cluster-node2 ccsd[8376]: Local version # : 3
Feb 15 15:18:47 cluster-node2 ccsd[8376]: Remote version #: 3
Feb 15 15:18:47 cluster-node2 CMAN: Waiting to join or form a Linux-cluster
Feb 15 15:18:48 cluster-node2 ccsd[8378]: Connected to cluster
infrastruture via: CMAN/SM Plugin v1.1
Feb 15 15:18:48 cluster-node2 ccsd[8378]: Initial status:: Inquorate
Feb 15 15:18:50 cluster-node2 CMAN: sending membership request
Feb 15 15:18:50 cluster-node2 CMAN: got node cluster-node1
Feb 15 15:18:50 cluster-node2 CMAN: quorum regained, resuming activity
Feb 15 15:18:50 cluster-node2 ccsd[8378]: Cluster is quorate. Allowing
connections.
on node 1 :
# clvmd
Feb 15 15:24:56 cluster-node1 CMAN: WARNING no listener for port 11 on
node cluster-node2
on node 2 :
# clvmd
Feb 15 15:25:03 cluster-node2 clvmd: Cluster LVM daemon started -
connected to CMAN
on node 1 :
# cman_tool nodes
Node Votes Exp Sts Name
1 1 1 M cluster-node1
2 1 1 M cluster-node2
# cman_tool services
Service Name GID LID State Code
Fence Domain: "default" 1 2 run -
[1 2]
DLM Lock Space: "clvmd" 3 3 run -
[1 2]
# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node name: cluster-node1
Node addresses: 192.168.192.146
Now I shutdown node2's network interface.
On node 2 :
# ifconfig eth0 down
On node 1 :
Feb 15 15:29:50 cluster-node1 CMAN: removing node cluster-node2 from the
cluster : Missed too many heartbeats
# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node name: cluster-node1
Node addresses: 192.168.192.146
# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 3
Node name: cluster-node1
Node addresses: 192.168.192.146
# cman_tool nodes
Node Votes Exp Sts Name
1 1 1 M cluster-node1
2 1 1 X cluster-node2
# cman_tool services
Service Name GID LID State Code
Fence Domain: "default" 1 2 run -
[1 2]
DLM Lock Space: "clvmd" 3 3 run -
[1 2]
No note about fencing whatsoever, and node 2 is not automatically rebooted.
Shouldn't node 2 get fenced here?
Regards,
Fajar
More information about the Linux-cluster
mailing list