[Linux-cluster] Strange error returned by openais

Wed Mar 3 09:33:49 UTC 2010

Christine Caulfield wrote:
> On 03/03/10 09:02, carlopmart wrote:
>> martijn at tenheuvel.net wrote:
>>>> Hi all,
>>>>
>>>> I am trying to setup a rh5.4 cluster with only two nodes, but I can't.
>>>> Under
>>>> /var/log/messages I can see a lot of errors like these:
>>>>
>>>> These nodes have two network interfaces, one on the same network for
>>>> cluster
>>>> operation and another on different subnet. Like this:
>>>>
>>>> Node01: 172.16.1.1 (eth0) and 192.168.35.1 (eth1)
>>>> Node02: 172.16.1.2 (eth0) and 172.26.50.1 (eth1)
>>>>
>>>> Default gateways point to 192.168.35.20 in node01 and on node02 to
>>>> 172.26.50.30
>>>> ... maybe this is the problem??
>>>>
>>>> I have put ip routing rules on both nodes but problem continues ... How
>>>> can I fix
>>>> this??
>>>
>>> I've had exactly the same errors, and eventually found what was wrong.
>>> The problem seems to be the vlans, switches which block the multicast
>>> traffic. For now I'm using a crosscable.
>>>
>>> So, check with the network engineers, they should be able to assist you,
>>> but you can convince them they're blocking you using the crosscable.
>>>
>>> regards,
>>> Martijn
>>>
>>>
>>>
>>
>> Maybe you are right Martijn. I have copied manually cluster.conf from
>> node02 to node01 and all works ok (node01 joins to cluster). But If
>> mutlicast is the problem, why node01 joins to cluster if cluster.conf it
>> is at same version than on node02??
>>
>> My problem only occurs when cluster.conf version is different between
>> nodes ...
> 
> 
> Well, that's exactly your problem! cman expects the cluster.conf to be 
> the same version on all nodes. ccsd is meant to synchronise these in 
> RHEL5 but it has problems with a two node cluster where quorum cannot be 
> established.
> 
> What you need to do is either use two_node="1" mode in cluster.conf or 
> use a quorum disk to maintain quorum while a single node is up.
> 
> Chrissie
> 

But I am using two_node=1 on my cluster.conf. Here it is:

<?xml version="1.0"?>
<cluster alias="MiddleEarth" config_version="12" name="MiddleEarth">
         <fence_daemon post_fail_delay="0" post_join_delay="3" clean_start="1"/>
         <clusternodes>
                 <clusternode name="mgmtnode01.hpulabs.org" nodeid="1" votes="1">
                         <multicast addr="239.192.11.25" interface="eth1"/>
                         <fence>
                                 <method name="1">
                                         <device name="last-resort" 
nodename="mgmtnode01.hpulabs.org"/>
                                 </method>
                         </fence>
                 </clusternode>
                 <clusternode name="mgmtnode02.hpulabs.org" nodeid="2" votes="1">
                         <multicast addr="239.192.11.25" interface="eth1"/>
                         <fence>
                                 <method name="1">
                                         <device name="last-resort" 
nodename="mgmtnode02.hpulabs.org"/>
                                 </method>
                         </fence>
                 </clusternode>
         </clusternodes>
         <cman expected_votes="1" two_node="1">
                 <multicast addr="239.192.11.25"/>
         </cman>
         <fencedevices>
                 <fencedevice agent="fence_manual" name="last-resort"/>
         </fencedevices>
         <rm log_facility="local4" log_level="7"/>
</cluster>

  I have another two-node cluster configured like this (except on these nodes they 
have only one interface) and all works ok. When I make changes in cluster.conf on 
one node is replicated automatically on the other ... Why doesn't occurs the same on 
this two-node cluster??

Thanks.

-- 
CL Martinez
carlopmart {at} gmail {d0t} com