[Linux-cluster] New Cluster Installation Starts Partitioned

Mark Hlawatschek hlawatschek at atix.de
Wed Oct 19 22:28:41 UTC 2005


Hi Tim,

make sure that the cmans on both nodes can talk to each other. I
observed this problem when iptables wasn't configured correctly. If you
have an active iptables config shut it down and try again.

Hope that helps ...

Mark 

On Wed, 2005-10-19 at 12:49 -0700, Tim Spaulding wrote:
> Hi All,
> 
> I have a couple of machines that I'm trying to cluster.  The machines are freshly installed FC4
> machines that have been fully updated and running the latest kernel.  They are configured to use
> the lvm2 by default so lvm2 and dm was already installed.  I'm following the directions in the
> usage.txt off RedHat's web site.  I compile the cluster tarball, run depmod, and start ccsd
> without issue.  When I do a cman_tool join -w on each node, both nodes start cman and join the
> cluster, but the cluster is apparently partitioned (i.e. they both see the cluster and are joined
> to it, but the two nodes cannot see that the other node is joined).  I've searched around and
> haven't found anything specific to this symptom.  I have a feeling that it's something to do with
> my network configuration.  Any help would be appreciated.
> 
> Both machines are i686 archs with dual NICs.  The NICs are connected to networks that do not route
> to each other.  One network (eth0 on both machines) is a development network.  The other network
> (eth1) is our corporate network.  I'm trying to configure the cluster to use the dev network
> (eth0).
> 
> Here's the output from uname:
> 
> Linux ctclinux1.clam.com 2.6.13-1.1526_FC4 #1 Wed Sep 28 19:15:10 EDT 2005 i686 i686 i386
> GNU/Linux
> Linux ctclinux2.clam.com 2.6.13-1.1526_FC4 #1 Wed Sep 28 19:15:10 EDT 2005 i686 i686 i386
> GNU/Linux
> 
> Here's the network configuration on ctclinux1:
> 
> eth0      Link encap:Ethernet  HWaddr 00:01:03:26:5C:C9
>           inet addr:192.168.36.200  Bcast:192.168.36.255  Mask:255.255.255.0
>           inet6 addr: fe80::201:3ff:fe26:5cc9/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:7260 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:350 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:449183 (438.6 KiB)  TX bytes:27853 (27.2 KiB)
>           Interrupt:10 Base address:0xec00
> 
> eth1      Link encap:Ethernet  HWaddr 00:B0:D0:41:0F:65
>           inet addr:10.10.10.200  Bcast:10.10.255.255  Mask:255.255.0.0
>           inet6 addr: fe80::2b0:d0ff:fe41:f65/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:57450 errors:0 dropped:0 overruns:1 frame:0
>           TX packets:12957 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:10040767 (9.5 MiB)  TX bytes:1962029 (1.8 MiB)
>           Interrupt:5 Base address:0xe880
> 
> eth1:1    Link encap:Ethernet  HWaddr 00:B0:D0:41:0F:65
>           inet addr:10.10.10.204  Bcast:10.10.255.255  Mask:255.255.0.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           Interrupt:5 Base address:0xe880
> 
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:17568 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:17568 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:3692600 (3.5 MiB)  TX bytes:3692600 (3.5 MiB)
> 
> sit0      Link encap:IPv6-in-IPv4
>           NOARP  MTU:1480  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
> 
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
> 192.168.36.0    *               255.255.255.0   U     0      0        0 eth0
> 10.74.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
> 10.72.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
> 10.75.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
> 10.73.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
> 10.10.0.0       *               255.255.0.0     U     0      0        0 eth1
> 169.254.0.0     *               255.255.0.0     U     0      0        0 eth1
> default         10.10.1.1       0.0.0.0         UG    0      0        0 eth1
> 
> cat /etc/hosts
> 10.10.10.200    ctclinux1-svc
> 192.168.36.200  ctclinux1-cls
> 192.168.36.201  ctclinux2-cls
> 10.10.10.201    ctclinux2-svc
> 
> Here's the network configuration on ctclinux2:
> 
> ifconfig -a
> eth0      Link encap:Ethernet  HWaddr 00:01:03:D4:80:7C
>           inet addr:192.168.36.201  Bcast:192.168.36.255  Mask:255.255.255.0
>           inet6 addr: fe80::201:3ff:fed4:807c/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:7702 errors:0 dropped:0 overruns:1 frame:0
>           TX packets:282 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:477769 (466.5 KiB)  TX bytes:22444 (21.9 KiB)
>           Interrupt:10 Base address:0xec00
> 
> eth1      Link encap:Ethernet  HWaddr 00:B0:D0:41:0F:9B
>           inet addr:10.10.10.201  Bcast:10.10.255.255  Mask:255.255.0.0
>           inet6 addr: fe80::2b0:d0ff:fe41:f9b/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:53846 errors:0 dropped:0 overruns:1 frame:0
>           TX packets:7759 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:5733713 (5.4 MiB)  TX bytes:1155588 (1.1 MiB)
>           Interrupt:5 Base address:0xe880
> 
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:17912 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:17912 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:3401868 (3.2 MiB)  TX bytes:3401868 (3.2 MiB)
> 
> sit0      Link encap:IPv6-in-IPv4
>           NOARP  MTU:1480  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
> 
> route
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
> 192.168.36.0    *               255.255.255.0   U     0      0        0 eth0
> 10.74.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
> 10.72.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
> 10.75.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
> 10.73.0.0       192.168.36.10   255.255.255.0   UG    0      0        0 eth0
> 10.10.0.0       *               255.255.0.0     U     0      0        0 eth1
> 169.254.0.0     *               255.255.0.0     U     0      0        0 eth1
> default         10.10.1.1       0.0.0.0         UG    0      0        0 eth1
> 
> cat /etc/hosts
> 10.10.10.201    ctclinux2-svc
> 192.168.36.201  ctclinux2-cls
> 192.168.36.200  ctclinux1-cls
> 10.10.10.200    ctclinux1-svc
> 
> Here's the cluster configuration file:
> 
> <?xml version="1.0"?>
> <cluster name="cl_tic" config_version="1">
>         <cman>
>         </cman>
> 
>         <clusternodes>
>                 <clusternode name="ctclinux1-cls">
>                         <fence>
>                                 <method name="single">
>                                         <device name="human" nodename="ctclinux1-cls"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
> 
>                 <clusternode name="ctclinux2-cls">
>                         <fence>
>                                 <method name="single">
>                                         <device name="human" nodename="ctclinux2-cls"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
> 
>         <fence_devices>
>                 <fence_device name="human" agent="fence_manual"/>
>         </fence_devices>
> </cluster>
> 
> Here's the cluster information from ctclinux1 after the cluster is started and joined:
> 
> cman_tool -d join -w
> nodename ctclinux1.clam.com not found
> nodename ctclinux1 (truncated) not found
> nodename ctclinux1 doesn't match ctclinux1-cls (ctclinux1-cls in cluster.conf)
> nodename ctclinux1 doesn't match ctclinux2-cls (ctclinux2-cls in cluster.conf)
> nodename localhost (if lo) not found
> selected nodename ctclinux1-cls
> setup up interface for address: ctclinux1-cls
> Broadcast address for c824a8c0 is ff24a8c0
> 
> cman_tool status
> Protocol version: 5.0.1
> Config version: 1
> Cluster name: cl_tic
> Cluster ID: 6429
> Cluster Member: Yes
> Membership state: Cluster-Member
> Nodes: 1
> Expected_votes: 2
> Total_votes: 1
> Quorum: 2  Activity blocked
> Active subsystems: 0
> Node name: ctclinux1-cls
> Node addresses: 192.168.36.200
> 
> cman_tool nodes
> Node  Votes Exp Sts  Name
>    1    1    2   M   ctclinux1-cls
> 
> Here's the cluster information from ctclinux2 after the cluster is started and joined:
> 
> cman_tool -d join -w
> nodename ctclinux2.clam.com not found
> nodename ctclinux2 (truncated) not found
> nodename ctclinux2 doesn't match ctclinux1-cls (ctclinux1-cls in cluster.conf)
> nodename ctclinux2 doesn't match ctclinux2-cls (ctclinux2-cls in cluster.conf)
> nodename localhost (if lo) not found
> selected nodename ctclinux2-cls
> setup up interface for address: ctclinux2-cls
> Broadcast address for c924a8c0 is ff24a8c0
> 
> cman_tool status
> Protocol version: 5.0.1
> Config version: 1
> Cluster name: cl_tic
> Cluster ID: 6429
> Cluster Member: Yes
> Membership state: Cluster-Member
> Nodes: 1
> Expected_votes: 2
> Total_votes: 1
> Quorum: 2  Activity blocked
> Active subsystems: 0
> Node name: ctclinux2-cls
> Node addresses: 192.168.36.201
> 
> cman_tool nodes
> Node  Votes Exp Sts  Name
>    1    1    2   M   ctclinux2-cls
> 
> Let me know if there is more information that I need to provide.  As an aside, I've tried reducing
> the quorum count with no difference in behavior and I've tried using multicast which fails on the
> cman_tool join with an "Unknown Host" error.  I'm open to any other suggestions.
> 
> Thanks,
> 
> tims
> 
> 
> 	
> 		
> __________________________________ 
> Yahoo! Mail - PC Magazine Editors' Choice 2005 
> http://mail.yahoo.com
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Mark Hlawatschek <hlawatschek at atix.de>




More information about the Linux-cluster mailing list