[Linux-cluster] all nodes halt when one lose connection

Wed May 27 23:48:36 UTC 2009

It sounds like they're fencing themselves.  We got around this issue on a
two-node cluster by including the alternate node's internal ip address in
the /etc/hosts file of both hosts and a cross-over cable for the service
network with the private ip addresses assigned to that network.  If you're
trying to get them to monitor each other via the public network, in theory
this could be done with a backup fencing method, but we weren't able to get
this work since the heartbeat functions only happen on the network that the
node names are defined to use.

On Mon, May 25, 2009 at 5:28 AM, ESGLinux <esggrupos at gmail.com> wrote:

> Hi,
> I think this is not my problem because fencing works fine. The nodes gets
> fenced inmediatly but I think they fence when they don't must
>
> Greetings,
>
> ESG
>
> 2009/5/22 jorge sanchez <xsanch at gmail.com>
>
> Hi,
>>
>> try also disable the acpi if is it running , see following:
>>
>>
>> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/s1-acpi-CA.html
>>
>>
>> Regards,
>>
>> Jorge Sanchez
>>
>>
>> On Thu, May 21, 2009 at 5:34 PM, ESGLinux <esggrupos at gmail.com> wrote:
>>
>>>
>>>
>>> 2009/5/21 Jonathan Brassow <jbrassow at redhat.com>
>>>
>>>>
>>>> On May 21, 2009, at 9:57 AM, ESGLinux wrote:
>>>>
>>>>  Hello,
>>>>>
>>>>> these are the logs I get:
>>>>>
>>>>> In node1:
>>>>>
>>>>> May 21 11:33:44 NODE1 fenced[3840]: NODE2 not a cluster member after 5
>>>>> sec post_fail_delay
>>>>> May 21 11:33:44 NODE1 fenced[3840]: fencing node "NODE2"
>>>>> May 21 11:33:44 NODE1 shutdown[5448]: shutting down for system halt
>>>>>
>>>>> in node2:
>>>>>
>>>>> May 21 11:33:45 NODE2 fenced[3843]: NODE1 not a cluster member after 5
>>>>> sec post_fail_delay
>>>>> May 21 11:33:45 NODE2 fenced[3843]: fencing node "NODE1"
>>>>> May 21 11:33:45 NODE2 shutdown[5923]: shutting down for system halt
>>>>>
>>>>>
>>>>> what I don´t know is way they lose the connection with the cluster,
>>>>> they are still connected (I only unplug a cable from the service network)
>>>>>
>>>>
>>>> That may be something worth chasing down, as it appears that your
>>>> cluster communication is on a network you don't expect?
>>>>
>>>
>>> How can I be sure about the network the nodes are using for
>>> communication? I think they do for the network I have configured to do
>>> that....
>>>
>>>
>>>>
>>>> Also, are the nodes simply "shutting down", or are they being forcibly
>>>> rebooted.  If it is a casual shutdown, then it would appear that both nodes
>>>> are trying to shutdown simultaneously.
>>>>
>>>
>>> they simply shutdown. They no reboot.
>>>
>>> This is what I get every time I unplug the nework cable from eth0 of any
>>> of the two nodes. (they communicate through eth1...)
>>>
>>> Greetings,
>>>
>>> ESG
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>>  brassow
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090527/baf9a0dc/attachment.htm>