[Linux-cluster] Re: Node2 kills node1 when it is booting ...

Tue Jan 27 10:30:55 UTC 2009

Stewart Walters wrote:
> carlopmart wrote:
>> Stewart Walters wrote:
>>> carlopmart wrote:
>>>> carlopmart wrote:
>>>>> Hi all,
>>>>>
>>>>>  I need to setup another rhcs today with two nodes. But every times 
>>>>> that I start second node, node1 returns this error:
>>>>>
>>>>> cman killed by node 2 because we rejoined the cluster without a 
>>>>> full restart
>>>>>
>>>>>  .. and cman stops on node1. Why?? I didn't find any solution under 
>>>>> http://sources.redhat.com/cluster/wiki/FAQ/
>>>>>
>>>>>  My nodes are rhel5.3
>>>>>
>>>>>  Many thanks.
>>>>>
>>>>
>>>> Please, I need your help ... Any ideas???
>>>>
>>>
>>> Sounds like node1 fenced node2, and node2 hasn't been rebooted since 
>>> being fenced. Either that, or node2 uses manual fencing and you 
>>> haven't yet manually acknowledged that it was rebooted.
>>>
>>> Check your logs in /var/log/messages on node1, I'm pretty sure you'll 
>>> see a reference there that node2 has been fenced.
>>>
>>> You'll probably also see somewhere in the logs on node1, that it 
>>> detected node2 did not leave the cluster after being fenced, and as a 
>>> result node1 itself has decided to stop itself to prevent data 
>>> corruption (the message will be something like that anyway).
>>>
>>> If you are using manual fencing on a node2, after you reboot it you 
>>> need to run "fence_manual_ack -n <node2>" from node1.  Do this only 
>>> after you've restarted node2 but before cman starts back up on it in 
>>> the next boot sequence.  At this point node1 will stop fencing node2 
>>> and both nodes should be able to join the cluster succesfully.
>>>
>>> Manual fencing is evil :-)
>>>
>>> Try to avoid it if you can - as you'll get this scenario on your 
>>> cluster every time a node is fenced.  This is the reason why Red Hat 
>>> write in their documentation numerous times that manual fencing is 
>>> not supported in Production clusters (it's almost as if they're 
>>> trying to tell us something...). ;-)
>>>
>>> Also, you mentioned that the solution was not found in the FAQ.  
>>> While it might not include reference to this specific symptoms, I'm 
>>> pretty sure the FAQ, the man pages for fence_manual and the RHCS 
>>> documentation from Red Hat all cover the requirements of having to 
>>> manually acknowleging nodes that use manual fencing.  If you do in 
>>> fact employ manual fencing in your cluster, you might want to go over 
>>> this documentation again.
>>>
>>> If you don't use manual fencing, please accept my apologies for 
>>> expressing my general distaste for manual fencing instead of actually 
>>> helping you!! :-)
>>>
>>> Kind Regards,
>>>
>>> Stewart
>>>
>>> -- 
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> Many thanks for your help Stewart, but I don't use manual fence as 
>> fence device in this cluster. I am using gnbd to do this.
>>
>> I post my cluster.conf
>>
>> ------------------------------------------------------------------------
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> Silly question then, have you actually restarted (i.e. actually 
> rebooted) the cluster node1?
> 
> Regards,
> 
> Stewart
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
Yes, and then works, but when I need to do an ordered shutdown (first node1), 
fenced daemon on node2 doesn't stops ....


-- 
CL Martinez
carlopmart {at} gmail {d0t} com