[Linux-cluster] Re: Node2 kills node1 when it is booting ...
stewart at epits.com.au
Tue Jan 27 10:26:10 UTC 2009
> Stewart Walters wrote:
>> carlopmart wrote:
>>> carlopmart wrote:
>>>> Hi all,
>>>> I need to setup another rhcs today with two nodes. But every times
>>>> that I start second node, node1 returns this error:
>>>> cman killed by node 2 because we rejoined the cluster without a
>>>> full restart
>>>> .. and cman stops on node1. Why?? I didn't find any solution under
>>>> My nodes are rhel5.3
>>>> Many thanks.
>>> Please, I need your help ... Any ideas???
>> Sounds like node1 fenced node2, and node2 hasn't been rebooted since
>> being fenced. Either that, or node2 uses manual fencing and you
>> haven't yet manually acknowledged that it was rebooted.
>> Check your logs in /var/log/messages on node1, I'm pretty sure you'll
>> see a reference there that node2 has been fenced.
>> You'll probably also see somewhere in the logs on node1, that it
>> detected node2 did not leave the cluster after being fenced, and as a
>> result node1 itself has decided to stop itself to prevent data
>> corruption (the message will be something like that anyway).
>> If you are using manual fencing on a node2, after you reboot it you
>> need to run "fence_manual_ack -n <node2>" from node1. Do this only
>> after you've restarted node2 but before cman starts back up on it in
>> the next boot sequence. At this point node1 will stop fencing node2
>> and both nodes should be able to join the cluster succesfully.
>> Manual fencing is evil :-)
>> Try to avoid it if you can - as you'll get this scenario on your
>> cluster every time a node is fenced. This is the reason why Red Hat
>> write in their documentation numerous times that manual fencing is
>> not supported in Production clusters (it's almost as if they're
>> trying to tell us something...). ;-)
>> Also, you mentioned that the solution was not found in the FAQ.
>> While it might not include reference to this specific symptoms, I'm
>> pretty sure the FAQ, the man pages for fence_manual and the RHCS
>> documentation from Red Hat all cover the requirements of having to
>> manually acknowleging nodes that use manual fencing. If you do in
>> fact employ manual fencing in your cluster, you might want to go over
>> this documentation again.
>> If you don't use manual fencing, please accept my apologies for
>> expressing my general distaste for manual fencing instead of actually
>> helping you!! :-)
>> Kind Regards,
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
> Many thanks for your help Stewart, but I don't use manual fence as
> fence device in this cluster. I am using gnbd to do this.
> I post my cluster.conf
> Linux-cluster mailing list
> Linux-cluster at redhat.com
Silly question then, have you actually restarted (i.e. actually
rebooted) the cluster node1?
More information about the Linux-cluster