[Linux-cluster] having problems trying to setup a two node cluster
Rick Stevens
rstevens at vitalstream.com
Wed Dec 1 18:39:49 UTC 2004
vahram wrote:
> Rick Stevens wrote:
>
>>
>> I had a similar issue. The problem was with the multicast routing.
>> I was using two NICs on each node...one public (eth0) and one private
>> (eth1), with the default gateway going out eth0.
>>
>> The route for the multicast (224.x.x.x) was going out the default
>> gateway and not reaching the other machine. By putting in a fixed route
>> in for multicast:
>>
>> route add -net 224.0.0.0/8 dev eth1
>>
>> it all started working. This was my fix, it may not work for you.
>> Also, I use the CVS code from http://sources.redhat.com/cluster and
>> not the source RPMs from where you specified.
>> ----------------------------------------------------------------------
>> - Rick Stevens, Senior Systems Engineer rstevens at vitalstream.com -
>> - VitalStream, Inc. http://www.vitalstream.com -
>> - -
>> - Veni, Vidi, VISA: I came, I saw, I did a little shopping. -
>> ----------------------------------------------------------------------
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> http://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> Yeap, both boxes have two NICs. eth0 is public, and eth1 is private
> (192.168.2.x). I tried adding the route, and that didn't fix it. I've
> also tried disabling the private NIC before and running with one public
> NIC, and that didn't fix it either. One other interesting thing I
> noticed...when I run cman_tool join on nodeA, netstat shows ccsd trying
> to do this:
>
> tcp 0 0 127.0.0.1:50006 127.0.0.1:739
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:738
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:737
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:736
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:743
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:742
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:741
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:740
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:727
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:731
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:730
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:729
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:728
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:735
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:734
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:733
> TIME_WAIT -
> tcp 0 0 127.0.0.1:50006 127.0.0.1:732
> TIME_WAIT -
>
Looking back at your cluster.conf, I see you're using broadcast. I used
multicast because, in the first CVS checkout I did, broadcast didn't
work properly. It's possible your SRPMs also have that flaw. Why not
try multicast and see if that works. Add that route I mentioned and
here's my cluster.conf which you can crib:
<?xml version="1.0"?>
<cluster name="test" config_version="1">
<cman two-node="1" expected_votes="1">
<multicast addr="224.0.0.1"/>
</cman>
<nodes>
<node name="gfs-01-001" votes="1">
<multicast addr="224.0.0.1" interface="eth1"/>
<fence>
<method name="single">
<device name="human" ipaddr="gfs-01-001"/>
</method>
</fence>
</node>
<node name="gfs-01-002" votes="1">
<multicast addr="224.0.0.1" interface="eth1"/>
<fence>
<method name="single">
<device name="human" ipaddr="gfs-01-002"/>
</method>
</fence>
</node>
</nodes>
<fence_devices>
<device name="human" agent="fence_manual"/>
</fence_devices>
</cluster>
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer rstevens at vitalstream.com -
- VitalStream, Inc. http://www.vitalstream.com -
- -
- What's small, yellow and very, VERY dangerous? The root canary! -
----------------------------------------------------------------------
More information about the Linux-cluster
mailing list