[Linux-cluster] Strange error returned by openais

Wed Mar 3 18:18:26 UTC 2010

carlopmart wrote:
> Christine Caulfield wrote:
>> On 03/03/10 09:02, carlopmart wrote:
>>> martijn at tenheuvel.net wrote:
>>>>> Hi all,
>>>>>
>>>>> I am trying to setup a rh5.4 cluster with only two nodes, but I can't.
>>>>> Under
>>>>> /var/log/messages I can see a lot of errors like these:
>>>>>
>>>>> These nodes have two network interfaces, one on the same network for
>>>>> cluster
>>>>> operation and another on different subnet. Like this:
>>>>>
>>>>> Node01: 172.16.1.1 (eth0) and 192.168.35.1 (eth1)
>>>>> Node02: 172.16.1.2 (eth0) and 172.26.50.1 (eth1)
>>>>>
>>>>> Default gateways point to 192.168.35.20 in node01 and on node02 to
>>>>> 172.26.50.30
>>>>> ... maybe this is the problem??
>>>>>
>>>>> I have put ip routing rules on both nodes but problem continues ... 
>>>>> How
>>>>> can I fix
>>>>> this??
>>>>
>>>> I've had exactly the same errors, and eventually found what was wrong.
>>>> The problem seems to be the vlans, switches which block the multicast
>>>> traffic. For now I'm using a crosscable.
>>>>
>>>> So, check with the network engineers, they should be able to assist 
>>>> you,
>>>> but you can convince them they're blocking you using the crosscable.
>>>>
>>>> regards,
>>>> Martijn
>>>>
>>>>
>>>>
>>>
>>> Maybe you are right Martijn. I have copied manually cluster.conf from
>>> node02 to node01 and all works ok (node01 joins to cluster). But If
>>> mutlicast is the problem, why node01 joins to cluster if cluster.conf it
>>> is at same version than on node02??
>>>
>>> My problem only occurs when cluster.conf version is different between
>>> nodes ...
>>
>>
>> Well, that's exactly your problem! cman expects the cluster.conf to be 
>> the same version on all nodes. ccsd is meant to synchronise these in 
>> RHEL5 but it has problems with a two node cluster where quorum cannot 
>> be established.
>>
>> What you need to do is either use two_node="1" mode in cluster.conf or 
>> use a quorum disk to maintain quorum while a single node is up.
>>
>> Chrissie
>>
> 
> But I am using two_node=1 on my cluster.conf. Here it is:
> 
> <?xml version="1.0"?>
> <cluster alias="MiddleEarth" config_version="12" name="MiddleEarth">
>         <fence_daemon post_fail_delay="0" post_join_delay="3" 
> clean_start="1"/>
>         <clusternodes>
>                 <clusternode name="mgmtnode01.hpulabs.org" nodeid="1" 
> votes="1">
>                         <multicast addr="239.192.11.25" interface="eth1"/>
>                         <fence>
>                                 <method name="1">
>                                         <device name="last-resort" 
> nodename="mgmtnode01.hpulabs.org"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="mgmtnode02.hpulabs.org" nodeid="2" 
> votes="1">
>                         <multicast addr="239.192.11.25" interface="eth1"/>
>                         <fence>
>                                 <method name="1">
>                                         <device name="last-resort" 
> nodename="mgmtnode02.hpulabs.org"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1">
>                 <multicast addr="239.192.11.25"/>
>         </cman>
>         <fencedevices>
>                 <fencedevice agent="fence_manual" name="last-resort"/>
>         </fencedevices>
>         <rm log_facility="local4" log_level="7"/>
> </cluster>
> 
>  I have another two-node cluster configured like this (except on these 
> nodes they have only one interface) and all works ok. When I make 
> changes in cluster.conf on one node is replicated automatically on the 
> other ... Why doesn't occurs the same on this two-node cluster??
> 
> Thanks.
> 

Any ideas please??

-- 
CL Martinez
carlopmart {at} gmail {d0t} com