[Linux-cluster] New Cluster Installation Starts Partitioned

Tim Spaulding tspauld98 at yahoo.com
Wed Oct 19 19:49:01 UTC 2005


Hi All,

I have a couple of machines that I'm trying to cluster.  The machines are freshly installed FC4
machines that have been fully updated and running the latest kernel.  They are configured to use
the lvm2 by default so lvm2 and dm was already installed.  I'm following the directions in the
usage.txt off RedHat's web site.  I compile the cluster tarball, run depmod, and start ccsd
without issue.  When I do a cman_tool join -w on each node, both nodes start cman and join the
cluster, but the cluster is apparently partitioned (i.e. they both see the cluster and are joined
to it, but the two nodes cannot see that the other node is joined).  I've searched around and
haven't found anything specific to this symptom.  I have a feeling that it's something to do with
my network configuration.  Any help would be appreciated.

Both machines are i686 archs with dual NICs.  The NICs are connected to networks that do not route
to each other.  One network (eth0 on both machines) is a development network.  The other network
(eth1) is our corporate network.  I'm trying to configure the cluster to use the dev network
(eth0).

Here's the output from uname:

Linux ctclinux1.clam.com 2.6.13-1.1526_FC4 #1 Wed Sep 28 19:15:10 EDT 2005 i686 i686 i386
GNU/Linux
Linux ctclinux2.clam.com 2.6.13-1.1526_FC4 #1 Wed Sep 28 19:15:10 EDT 2005 i686 i686 i386
GNU/Linux

Here's the network configuration on ctclinux1:

eth0      Link encap:Ethernet  HWaddr 00:01:03:26:5C:C9
          inet addr:192.168.36.200  Bcast:192.168.36.255  Mask:255.255.255.0
          inet6 addr: fe80::201:3ff:fe26:5cc9/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7260 errors:0 dropped:0 overruns:0 frame:0
          TX packets:350 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:449183 (438.6 KiB)  TX bytes:27853 (27.2 KiB)
          Interrupt:10 Base address:0xec00

eth1      Link encap:Ethernet  HWaddr 00:B0:D0:41:0F:65
          inet addr:10.10.10.200  Bcast:10.10.255.255  Mask:255.255.0.0
          inet6 addr: fe80::2b0:d0ff:fe41:f65/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:57450 errors:0 dropped:0 overruns:1 frame:0
          TX packets:12957 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:10040767 (9.5 MiB)  TX bytes:1962029 (1.8 MiB)
          Interrupt:5 Base address:0xe880

eth1:1    Link encap:Ethernet  HWaddr 00:B0:D0:41:0F:65
          inet addr:10.10.10.204  Bcast:10.10.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:5 Base address:0xe880

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:17568 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17568 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:3692600 (3.5 MiB)  TX bytes:3692600 (3.5 MiB)

sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.36.0    *               255.255.255.0   U     0      0        0 eth0
10.74.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
10.72.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
10.75.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
10.73.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
10.10.0.0       *               255.255.0.0     U     0      0        0 eth1
169.254.0.0     *               255.255.0.0     U     0      0        0 eth1
default         10.10.1.1       0.0.0.0         UG    0      0        0 eth1

cat /etc/hosts
10.10.10.200    ctclinux1-svc
192.168.36.200  ctclinux1-cls
192.168.36.201  ctclinux2-cls
10.10.10.201    ctclinux2-svc

Here's the network configuration on ctclinux2:

ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:01:03:D4:80:7C
          inet addr:192.168.36.201  Bcast:192.168.36.255  Mask:255.255.255.0
          inet6 addr: fe80::201:3ff:fed4:807c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7702 errors:0 dropped:0 overruns:1 frame:0
          TX packets:282 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:477769 (466.5 KiB)  TX bytes:22444 (21.9 KiB)
          Interrupt:10 Base address:0xec00

eth1      Link encap:Ethernet  HWaddr 00:B0:D0:41:0F:9B
          inet addr:10.10.10.201  Bcast:10.10.255.255  Mask:255.255.0.0
          inet6 addr: fe80::2b0:d0ff:fe41:f9b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:53846 errors:0 dropped:0 overruns:1 frame:0
          TX packets:7759 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:5733713 (5.4 MiB)  TX bytes:1155588 (1.1 MiB)
          Interrupt:5 Base address:0xe880

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:17912 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17912 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:3401868 (3.2 MiB)  TX bytes:3401868 (3.2 MiB)

sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.36.0    *               255.255.255.0   U     0      0        0 eth0
10.74.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
10.72.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
10.75.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
10.73.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
10.10.0.0       *               255.255.0.0     U     0      0        0 eth1
169.254.0.0     *               255.255.0.0     U     0      0        0 eth1
default         10.10.1.1       0.0.0.0         UG    0      0        0 eth1

cat /etc/hosts
10.10.10.201    ctclinux2-svc
192.168.36.201  ctclinux2-cls
192.168.36.200  ctclinux1-cls
10.10.10.200    ctclinux1-svc

Here's the cluster configuration file:

<?xml version="1.0"?>
<cluster name="cl_tic" config_version="1">
        <cman>
        </cman>

        <clusternodes>
                <clusternode name="ctclinux1-cls">
                        <fence>
                                <method name="single">
                                        <device name="human" nodename="ctclinux1-cls"/>
                                </method>
                        </fence>
                </clusternode>

                <clusternode name="ctclinux2-cls">
                        <fence>
                                <method name="single">
                                        <device name="human" nodename="ctclinux2-cls"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>

        <fence_devices>
                <fence_device name="human" agent="fence_manual"/>
        </fence_devices>
</cluster>

Here's the cluster information from ctclinux1 after the cluster is started and joined:

cman_tool -d join -w
nodename ctclinux1.clam.com not found
nodename ctclinux1 (truncated) not found
nodename ctclinux1 doesn't match ctclinux1-cls (ctclinux1-cls in cluster.conf)
nodename ctclinux1 doesn't match ctclinux2-cls (ctclinux2-cls in cluster.conf)
nodename localhost (if lo) not found
selected nodename ctclinux1-cls
setup up interface for address: ctclinux1-cls
Broadcast address for c824a8c0 is ff24a8c0

cman_tool status
Protocol version: 5.0.1
Config version: 1
Cluster name: cl_tic
Cluster ID: 6429
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 2
Total_votes: 1
Quorum: 2  Activity blocked
Active subsystems: 0
Node name: ctclinux1-cls
Node addresses: 192.168.36.200

cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    2   M   ctclinux1-cls

Here's the cluster information from ctclinux2 after the cluster is started and joined:

cman_tool -d join -w
nodename ctclinux2.clam.com not found
nodename ctclinux2 (truncated) not found
nodename ctclinux2 doesn't match ctclinux1-cls (ctclinux1-cls in cluster.conf)
nodename ctclinux2 doesn't match ctclinux2-cls (ctclinux2-cls in cluster.conf)
nodename localhost (if lo) not found
selected nodename ctclinux2-cls
setup up interface for address: ctclinux2-cls
Broadcast address for c924a8c0 is ff24a8c0

cman_tool status
Protocol version: 5.0.1
Config version: 1
Cluster name: cl_tic
Cluster ID: 6429
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 2
Total_votes: 1
Quorum: 2  Activity blocked
Active subsystems: 0
Node name: ctclinux2-cls
Node addresses: 192.168.36.201

cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    2   M   ctclinux2-cls

Let me know if there is more information that I need to provide.  As an aside, I've tried reducing
the quorum count with no difference in behavior and I've tried using multicast which fails on the
cman_tool join with an "Unknown Host" error.  I'm open to any other suggestions.

Thanks,

tims


	
		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com




More information about the Linux-cluster mailing list