[Linux-cluster] Waiting for fenced to join the fence group
Wolfgang Hotwagner
listener at may.co.at
Tue Jul 21 11:51:22 UTC 2009
Hello,
i am not able to make a gfs2-cluster on a drbd-device. I always have the
problem with joining the fence group. I am using a debian stable(lenny)
system. On eth0 there is also a ctdb-service which enables 2 additional
ip's. Maybe someone could help me to get it working..
Greetings
Wolfgang
dslin1:
eth0: 172.30.50.83
eth1: 10.13.13.2
/etc/hosts:
127.0.0.1 localhost
172.30.50.83 dslin1
172.30.50.84 dslin2
10.13.13.2 node1
10.13.13.3 node2
/proc/drbd:
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by
phil at fat-tyre, 2008-11-12 16:40:33
0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r---
ns:0 nr:12288 dw:12288 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0
resync: used:0/61 hits:765 misses:3 starving:0 dirty:0 changed:3
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
syslog:
Jul 21 13:38:27 dslin1 ccsd[14975]: Starting ccsd 2.03.09:
Jul 21 13:38:27 dslin1 ccsd[14975]: Built: Nov 3 2008 18:22:21
Jul 21 13:38:27 dslin1 ccsd[14975]: Copyright (C) Red Hat, Inc.
2004-2008 All rights reserved.
Jul 21 13:38:28 dslin1 ccsd[14975]: /etc/cluster/cluster.conf (cluster
name = cluster, version = 1) found.
Jul 21 13:38:31 dslin1 ccsd[14975]: Initial status:: Quorate
Jul 21 13:38:35 dslin1 openais[14980]: cman killed by node 2 because we
rejoined the cluster without a full restart
Jul 21 13:38:35 dslin1 groupd[14984]: cman_get_nodes error -1 104
Jul 21 13:38:35 dslin1 gfs_controld[14992]: cluster is down, exiting
Jul 21 13:39:00 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 30 seconds.
Jul 21 13:39:30 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 60 seconds.
Jul 21 13:40:00 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 90 seconds.
Jul 21 13:40:30 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 120 seconds.
and so on..
dslin2:
eth0: 172.30.50.84
eth1: 10.13.13.3
/etc/hosts:
127.0.0.1 localhost
172.30.50.83 dslin1
172.30.50.84 dslin2
10.13.13.2 node1
10.13.13.3 node2
/proc/drbd
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by
phil at fat-tyre, 2008-11-12 16:40:33
0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r---
ns:12292 nr:0 dw:0 dr:12296 al:0 bm:6 lo:0 pe:0 ua:0 ap:0
resync: used:0/61 hits:765 misses:3 starving:0 dirty:0 changed:3
act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
syslog:
Jul 21 13:38:27 dslin1 ccsd[14975]: Starting ccsd 2.03.09:
Jul 21 13:38:27 dslin1 ccsd[14975]: Built: Nov 3 2008 18:22:21
Jul 21 13:38:27 dslin1 ccsd[14975]: Copyright (C) Red Hat, Inc.
2004-2008 All rights reserved.
Jul 21 13:38:28 dslin1 ccsd[14975]: /etc/cluster/cluster.conf (cluster
name = cluster, version = 1) found.
Jul 21 13:38:31 dslin1 ccsd[14975]: Initial status:: Quorate
Jul 21 13:38:35 dslin1 openais[14980]: cman killed by node 2 because we
rejoined the cluster without a full restart
Jul 21 13:38:35 dslin1 groupd[14984]: cman_get_nodes error -1 104
Jul 21 13:38:35 dslin1 gfs_controld[14992]: cluster is down, exiting
Jul 21 13:39:00 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 30 seconds.
Jul 21 13:39:30 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 60 seconds.
Jul 21 13:40:00 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 90 seconds.
Jul 21 13:40:30 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 120 seconds.
Jul 21 13:41:00 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 150 seconds.
Jul 21 13:41:30 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 180 seconds.
Jul 21 13:42:00 dslin1 ccsd[14975]: Unable to connect to cluster
infrastructure after 210 seconds.
and so on..
/etc/cluster/cluster.conf:
<?xml version="1.0"?>
<cluster name="cluster" config_version="1">
<!-- post_join_delay: number of seconds the daemon will wait before
fencing any victims after a node joins the domain
post_fail_delay: number of seconds the daemon will wait before
fencing any victims after a domain member fails
clean_start : prevent any startup fencing the daemon might do.
It indicates that the daemon should assume all nodes
are in a clean state to start. -->
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="dslin1" votes="1" nodeid="1">
<fence>
<!-- Handle fencing manually -->
<method name="human">
<device name="human" nodename="dslin1" ipaddr="10.13.13.2"/>
</method>
</fence>
</clusternode>
<clusternode name="dslin2" votes="1" nodeid="2">
<fence>
<!-- Handle fencing manually -->
<method name="human">
<device name="human" nodename="dslin2" ipaddr="10.13.13.3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<!-- cman two nodes specification -->
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<!-- Define manual fencing -->
<fencedevice name="human" agent="fence_manual"/>
</fencedevices>
</cluster>
More information about the Linux-cluster
mailing list