[Linux-cluster] all nodes halt when one lose connection

Kaerka Phillips kbphillips80 at gmail.com
Wed May 27 23:52:12 UTC 2009


One thing we did not try, but might've worked, would be to bond two network
interfaces together and then use vlan tagging on top of the bond interface
to create a vlan across it to the other node, and then pointing the cluster
to the vlan interfaces, which should still be up if even if the loss of one
network interface or one switch.

On Wed, May 27, 2009 at 7:48 PM, Kaerka Phillips <kbphillips80 at gmail.com>wrote:

> It sounds like they're fencing themselves.  We got around this issue on a
> two-node cluster by including the alternate node's internal ip address in
> the /etc/hosts file of both hosts and a cross-over cable for the service
> network with the private ip addresses assigned to that network.  If you're
> trying to get them to monitor each other via the public network, in theory
> this could be done with a backup fencing method, but we weren't able to get
> this work since the heartbeat functions only happen on the network that the
> node names are defined to use.
>
>
> On Mon, May 25, 2009 at 5:28 AM, ESGLinux <esggrupos at gmail.com> wrote:
>
>> Hi,
>> I think this is not my problem because fencing works fine. The nodes gets
>> fenced inmediatly but I think they fence when they don't must
>>
>> Greetings,
>>
>> ESG
>>
>> 2009/5/22 jorge sanchez <xsanch at gmail.com>
>>
>> Hi,
>>>
>>> try also disable the acpi if is it running , see following:
>>>
>>>
>>> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/s1-acpi-CA.html
>>>
>>>
>>> Regards,
>>>
>>> Jorge Sanchez
>>>
>>>
>>> On Thu, May 21, 2009 at 5:34 PM, ESGLinux <esggrupos at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> 2009/5/21 Jonathan Brassow <jbrassow at redhat.com>
>>>>
>>>>>
>>>>> On May 21, 2009, at 9:57 AM, ESGLinux wrote:
>>>>>
>>>>>  Hello,
>>>>>>
>>>>>> these are the logs I get:
>>>>>>
>>>>>> In node1:
>>>>>>
>>>>>> May 21 11:33:44 NODE1 fenced[3840]: NODE2 not a cluster member after 5
>>>>>> sec post_fail_delay
>>>>>> May 21 11:33:44 NODE1 fenced[3840]: fencing node "NODE2"
>>>>>> May 21 11:33:44 NODE1 shutdown[5448]: shutting down for system halt
>>>>>>
>>>>>> in node2:
>>>>>>
>>>>>> May 21 11:33:45 NODE2 fenced[3843]: NODE1 not a cluster member after 5
>>>>>> sec post_fail_delay
>>>>>> May 21 11:33:45 NODE2 fenced[3843]: fencing node "NODE1"
>>>>>> May 21 11:33:45 NODE2 shutdown[5923]: shutting down for system halt
>>>>>>
>>>>>>
>>>>>> what I don´t know is way they lose the connection with the cluster,
>>>>>> they are still connected (I only unplug a cable from the service network)
>>>>>>
>>>>>
>>>>> That may be something worth chasing down, as it appears that your
>>>>> cluster communication is on a network you don't expect?
>>>>>
>>>>
>>>> How can I be sure about the network the nodes are using for
>>>> communication? I think they do for the network I have configured to do
>>>> that....
>>>>
>>>>
>>>>>
>>>>> Also, are the nodes simply "shutting down", or are they being forcibly
>>>>> rebooted.  If it is a casual shutdown, then it would appear that both nodes
>>>>> are trying to shutdown simultaneously.
>>>>>
>>>>
>>>> they simply shutdown. They no reboot.
>>>>
>>>> This is what I get every time I unplug the nework cable from eth0 of any
>>>> of the two nodes. (they communicate through eth1...)
>>>>
>>>> Greetings,
>>>>
>>>> ESG
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>  brassow
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090527/ed4cb37e/attachment.htm>


More information about the Linux-cluster mailing list