[Linux-cluster] cluster latest cvs does not fence dead nodes automatically

Fajar A. Nugraha fajar at telkom.co.id
Tue Feb 15 08:34:44 UTC 2005


David Teigland wrote:

>Above the names were "hosting-cl02-01" and "hosting-cl02-02".  Could you
>clear that up and if there are still problems send your cluster.conf file?
>Thanks
>
>  
>
Here's how it is now.
Using new hostnames and cluster.conf (blade center's IP address and 
community string removed):
==================================
<?xml version="1.0"?>
<cluster name="cluster" config_version="3">

<cman two_node="1" expected_votes="1">
</cman>

<clusternodes>
<clusternode name="cluster-node2" votes="1">
        <fence>
        <method name="single">
                <device name="ibmblade" port="7"/>
        </method>
        </fence>
</clusternode>
<clusternode name="cluster-node1" votes="1">
        <fence>
        <method name="single">
                <device name="ibmblade" port="6"/>
        </method>
        </fence>
</clusternode>


</clusternodes>

<fencedevices>
        <fencedevice name="ibmblade" agent="fence_ibmblade" 
ipaddr="IP_ADDRESS_HERE" community="COMMUNITY_HERE"/>
</fencedevices>

</cluster>
===========================================

Commands and their output (console or syslog):

# modprobe gfs
# modprobe lock_dlm

Feb 15 15:10:04 cluster-node1 Lock_Harness <CVS> (built Feb 15 2005 
12:00:38) installed
Feb 15 15:10:04 cluster-node1 GFS <CVS> (built Feb 15 2005 12:00:52) 
installed
Feb 15 15:10:08 cluster-node1 CMAN <CVS> (built Feb 15 2005 12:00:31) 
installed
Feb 15 15:10:08 cluster-node1 NET: Registered protocol family 30
Feb 15 15:10:08 cluster-node1 DLM <CVS> (built Feb 15 2005 12:00:34) 
installed
Feb 15 15:10:08 cluster-node1 Lock_DLM (built Feb 15 2005 12:00:39) 
installed

dm-mod is built-in in the kernel (not a module)

# ccsd -V
ccsd DEVEL.1108443619 (built Feb 15 2005 12:01:01)
Copyright (C) Red Hat, Inc.  2004  All rights reserved.

# ccsd -4
Feb 15 15:10:58 cluster-node1 ccsd[8556]: Starting ccsd DEVEL.1108443619:
Feb 15 15:10:58 cluster-node1 ccsd[8556]:  Built: Feb 15 2005 12:01:01
Feb 15 15:10:58 cluster-node1 ccsd[8556]:  Copyright (C) Red Hat, Inc.  
2004  All rights reserved.
Feb 15 15:10:58 cluster-node1 ccsd[8556]:   IP Protocol:: IPv4 only

# cman_tool join
Feb 15 15:12:27 cluster-node1 ccsd[8556]: cluster.conf (cluster name = 
cluster, version = 3) found.
Feb 15 15:12:28 cluster-node1 CMAN: Waiting to join or form a Linux-cluster
Feb 15 15:12:28 cluster-node1 ccsd[8558]: Connected to cluster 
infrastruture via: CMAN/SM Plugin v1.1
Feb 15 15:12:28 cluster-node1 ccsd[8558]: Initial status:: Inquorate
Feb 15 15:13:00 cluster-node1 CMAN: forming a new cluster
Feb 15 15:13:00 cluster-node1 CMAN: quorum regained, resuming activity
Feb 15 15:13:00 cluster-node1 ccsd[8558]: Cluster is quorate.  Allowing 
connections.

# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 0
Node name: cluster-node1
Node addresses: 192.168.192.146

# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    1   M   cluster-node1

# fence_tool join
Feb 15 15:14:26 cluster-node1 fenced[8847]: cluster-node2 not a cluster 
member after 6 sec post_join_delay
Feb 15 15:14:26 cluster-node1 fenced[8847]: fencing node "cluster-node2"
Feb 15 15:14:32 cluster-node1 fenced[8847]: fence "cluster-node2" success

at this point "cluster-node2" was fenced and automatically rebooted, 
which is good.

Now I join the cluster-node2 to the cluster :
# modprobe gfs
# modprobe lock_dlm
# cman_tool join
# fence_tool join

Feb 15 15:18:30 cluster-node2 ccsd[8376]: Starting ccsd DEVEL.1108443619:
Feb 15 15:18:30 cluster-node2 ccsd[8376]:  Built: Feb 15 2005 12:01:01
Feb 15 15:18:30 cluster-node2 ccsd[8376]:  Copyright (C) Red Hat, Inc.  
2004  All rights reserved.
Feb 15 15:18:30 cluster-node2 ccsd[8376]:   IP Protocol:: IPv4 only
Feb 15 15:18:34 cluster-node2 ccsd[8376]: cluster.conf (cluster name = 
cluster, version = 3) found.
Feb 15 15:18:34 cluster-node2 ccsd[8376]: Remote copy of cluster.conf is 
from quorate node.
Feb 15 15:18:34 cluster-node2 ccsd[8376]:  Local version # : 3
Feb 15 15:18:34 cluster-node2 ccsd[8376]:  Remote version #: 3
Feb 15 15:18:41 cluster-node2 Lock_Harness <CVS> (built Feb 15 2005 
12:00:38) installed
Feb 15 15:18:41 cluster-node2 GFS <CVS> (built Feb 15 2005 12:00:52) 
installed
Feb 15 15:18:44 cluster-node2 CMAN <CVS> (built Feb 15 2005 12:00:31) 
installed
Feb 15 15:18:44 cluster-node2 NET: Registered protocol family 30
Feb 15 15:18:44 cluster-node2 DLM <CVS> (built Feb 15 2005 12:00:34) 
installed
Feb 15 15:18:44 cluster-node2 Lock_DLM (built Feb 15 2005 12:00:39) 
installed
Feb 15 15:18:47 cluster-node2 ccsd[8376]: Remote copy of cluster.conf is 
from quorate node.
Feb 15 15:18:47 cluster-node2 ccsd[8376]:  Local version # : 3
Feb 15 15:18:47 cluster-node2 ccsd[8376]:  Remote version #: 3
Feb 15 15:18:47 cluster-node2 CMAN: Waiting to join or form a Linux-cluster
Feb 15 15:18:48 cluster-node2 ccsd[8378]: Connected to cluster 
infrastruture via: CMAN/SM Plugin v1.1
Feb 15 15:18:48 cluster-node2 ccsd[8378]: Initial status:: Inquorate
Feb 15 15:18:50 cluster-node2 CMAN: sending membership request
Feb 15 15:18:50 cluster-node2 CMAN: got node cluster-node1
Feb 15 15:18:50 cluster-node2 CMAN: quorum regained, resuming activity
Feb 15 15:18:50 cluster-node2 ccsd[8378]: Cluster is quorate.  Allowing 
connections.

on node 1 :
# clvmd
Feb 15 15:24:56 cluster-node1 CMAN: WARNING no listener for port 11 on 
node cluster-node2

on node 2 :
# clvmd
Feb 15 15:25:03 cluster-node2 clvmd: Cluster LVM daemon started - 
connected to CMAN

on node 1 :
# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    1   M   cluster-node1
   2    1    1   M   cluster-node2

# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2]

DLM Lock Space:  "clvmd"                             3   3 run       -
[1 2]

# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node name: cluster-node1
Node addresses: 192.168.192.146

Now I shutdown node2's network interface.

On node 2 :
# ifconfig eth0 down

On node 1 :
Feb 15 15:29:50 cluster-node1 CMAN: removing node cluster-node2 from the 
cluster : Missed too many heartbeats

# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 3
Node name: cluster-node1
Node addresses: 192.168.192.146

# cman_tool status
Protocol version: 5.0.1
Config version: 3
Cluster name: cluster
Cluster ID: 13364
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 3
Node name: cluster-node1
Node addresses: 192.168.192.146

# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    1   M   cluster-node1
   2    1    1   X   cluster-node2

# cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2]

DLM Lock Space:  "clvmd"                             3   3 run       -
[1 2]

No note about fencing whatsoever, and node 2 is not automatically rebooted.
Shouldn't node 2 get fenced here?

Regards,

Fajar




More information about the Linux-cluster mailing list