[Linux-cluster] New Cluster Installation Starts Partitioned
Tim Spaulding
tspauld98 at yahoo.com
Wed Oct 19 19:49:01 UTC 2005
Hi All,
I have a couple of machines that I'm trying to cluster. The machines are freshly installed FC4
machines that have been fully updated and running the latest kernel. They are configured to use
the lvm2 by default so lvm2 and dm was already installed. I'm following the directions in the
usage.txt off RedHat's web site. I compile the cluster tarball, run depmod, and start ccsd
without issue. When I do a cman_tool join -w on each node, both nodes start cman and join the
cluster, but the cluster is apparently partitioned (i.e. they both see the cluster and are joined
to it, but the two nodes cannot see that the other node is joined). I've searched around and
haven't found anything specific to this symptom. I have a feeling that it's something to do with
my network configuration. Any help would be appreciated.
Both machines are i686 archs with dual NICs. The NICs are connected to networks that do not route
to each other. One network (eth0 on both machines) is a development network. The other network
(eth1) is our corporate network. I'm trying to configure the cluster to use the dev network
(eth0).
Here's the output from uname:
Linux ctclinux1.clam.com 2.6.13-1.1526_FC4 #1 Wed Sep 28 19:15:10 EDT 2005 i686 i686 i386
GNU/Linux
Linux ctclinux2.clam.com 2.6.13-1.1526_FC4 #1 Wed Sep 28 19:15:10 EDT 2005 i686 i686 i386
GNU/Linux
Here's the network configuration on ctclinux1:
eth0 Link encap:Ethernet HWaddr 00:01:03:26:5C:C9
inet addr:192.168.36.200 Bcast:192.168.36.255 Mask:255.255.255.0
inet6 addr: fe80::201:3ff:fe26:5cc9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7260 errors:0 dropped:0 overruns:0 frame:0
TX packets:350 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:449183 (438.6 KiB) TX bytes:27853 (27.2 KiB)
Interrupt:10 Base address:0xec00
eth1 Link encap:Ethernet HWaddr 00:B0:D0:41:0F:65
inet addr:10.10.10.200 Bcast:10.10.255.255 Mask:255.255.0.0
inet6 addr: fe80::2b0:d0ff:fe41:f65/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:57450 errors:0 dropped:0 overruns:1 frame:0
TX packets:12957 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:10040767 (9.5 MiB) TX bytes:1962029 (1.8 MiB)
Interrupt:5 Base address:0xe880
eth1:1 Link encap:Ethernet HWaddr 00:B0:D0:41:0F:65
inet addr:10.10.10.204 Bcast:10.10.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:5 Base address:0xe880
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:17568 errors:0 dropped:0 overruns:0 frame:0
TX packets:17568 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3692600 (3.5 MiB) TX bytes:3692600 (3.5 MiB)
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.36.0 * 255.255.255.0 U 0 0 0 eth0
10.74.0.0 192.168.36.10 255.255.255.0 UG 0 0 0 eth0
10.72.0.0 192.168.36.10 255.255.255.0 UG 0 0 0 eth0
10.75.0.0 192.168.36.10 255.255.255.0 UG 0 0 0 eth0
10.73.0.0 192.168.36.10 255.255.255.0 UG 0 0 0 eth0
10.10.0.0 * 255.255.0.0 U 0 0 0 eth1
169.254.0.0 * 255.255.0.0 U 0 0 0 eth1
default 10.10.1.1 0.0.0.0 UG 0 0 0 eth1
cat /etc/hosts
10.10.10.200 ctclinux1-svc
192.168.36.200 ctclinux1-cls
192.168.36.201 ctclinux2-cls
10.10.10.201 ctclinux2-svc
Here's the network configuration on ctclinux2:
ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:01:03:D4:80:7C
inet addr:192.168.36.201 Bcast:192.168.36.255 Mask:255.255.255.0
inet6 addr: fe80::201:3ff:fed4:807c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7702 errors:0 dropped:0 overruns:1 frame:0
TX packets:282 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:477769 (466.5 KiB) TX bytes:22444 (21.9 KiB)
Interrupt:10 Base address:0xec00
eth1 Link encap:Ethernet HWaddr 00:B0:D0:41:0F:9B
inet addr:10.10.10.201 Bcast:10.10.255.255 Mask:255.255.0.0
inet6 addr: fe80::2b0:d0ff:fe41:f9b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:53846 errors:0 dropped:0 overruns:1 frame:0
TX packets:7759 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5733713 (5.4 MiB) TX bytes:1155588 (1.1 MiB)
Interrupt:5 Base address:0xe880
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:17912 errors:0 dropped:0 overruns:0 frame:0
TX packets:17912 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3401868 (3.2 MiB) TX bytes:3401868 (3.2 MiB)
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.36.0 * 255.255.255.0 U 0 0 0 eth0
10.74.0.0 192.168.36.10 255.255.255.0 UG 0 0 0 eth0
10.72.0.0 192.168.36.10 255.255.255.0 UG 0 0 0 eth0
10.75.0.0 192.168.36.10 255.255.255.0 UG 0 0 0 eth0
10.73.0.0 192.168.36.10 255.255.255.0 UG 0 0 0 eth0
10.10.0.0 * 255.255.0.0 U 0 0 0 eth1
169.254.0.0 * 255.255.0.0 U 0 0 0 eth1
default 10.10.1.1 0.0.0.0 UG 0 0 0 eth1
cat /etc/hosts
10.10.10.201 ctclinux2-svc
192.168.36.201 ctclinux2-cls
192.168.36.200 ctclinux1-cls
10.10.10.200 ctclinux1-svc
Here's the cluster configuration file:
<?xml version="1.0"?>
<cluster name="cl_tic" config_version="1">
<cman>
</cman>
<clusternodes>
<clusternode name="ctclinux1-cls">
<fence>
<method name="single">
<device name="human" nodename="ctclinux1-cls"/>
</method>
</fence>
</clusternode>
<clusternode name="ctclinux2-cls">
<fence>
<method name="single">
<device name="human" nodename="ctclinux2-cls"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fence_devices>
<fence_device name="human" agent="fence_manual"/>
</fence_devices>
</cluster>
Here's the cluster information from ctclinux1 after the cluster is started and joined:
cman_tool -d join -w
nodename ctclinux1.clam.com not found
nodename ctclinux1 (truncated) not found
nodename ctclinux1 doesn't match ctclinux1-cls (ctclinux1-cls in cluster.conf)
nodename ctclinux1 doesn't match ctclinux2-cls (ctclinux2-cls in cluster.conf)
nodename localhost (if lo) not found
selected nodename ctclinux1-cls
setup up interface for address: ctclinux1-cls
Broadcast address for c824a8c0 is ff24a8c0
cman_tool status
Protocol version: 5.0.1
Config version: 1
Cluster name: cl_tic
Cluster ID: 6429
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 2
Total_votes: 1
Quorum: 2 Activity blocked
Active subsystems: 0
Node name: ctclinux1-cls
Node addresses: 192.168.36.200
cman_tool nodes
Node Votes Exp Sts Name
1 1 2 M ctclinux1-cls
Here's the cluster information from ctclinux2 after the cluster is started and joined:
cman_tool -d join -w
nodename ctclinux2.clam.com not found
nodename ctclinux2 (truncated) not found
nodename ctclinux2 doesn't match ctclinux1-cls (ctclinux1-cls in cluster.conf)
nodename ctclinux2 doesn't match ctclinux2-cls (ctclinux2-cls in cluster.conf)
nodename localhost (if lo) not found
selected nodename ctclinux2-cls
setup up interface for address: ctclinux2-cls
Broadcast address for c924a8c0 is ff24a8c0
cman_tool status
Protocol version: 5.0.1
Config version: 1
Cluster name: cl_tic
Cluster ID: 6429
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 2
Total_votes: 1
Quorum: 2 Activity blocked
Active subsystems: 0
Node name: ctclinux2-cls
Node addresses: 192.168.36.201
cman_tool nodes
Node Votes Exp Sts Name
1 1 2 M ctclinux2-cls
Let me know if there is more information that I need to provide. As an aside, I've tried reducing
the quorum count with no difference in behavior and I've tried using multicast which fails on the
cman_tool join with an "Unknown Host" error. I'm open to any other suggestions.
Thanks,
tims
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
More information about the Linux-cluster
mailing list