[Linux-cluster] all nodes halt when one lose connection

ESGLinux esggrupos at gmail.com
Mon Jun 8 07:55:41 UTC 2009


Thanks for your answers,
I have used a  separated network for the manage and service networks with 2
switchs and now it works fine.

Thanks again,

ESG

2009/5/28 Kaerka Phillips <kbphillips80 at gmail.com>

> One thing we did not try, but might've worked, would be to bond two network
> interfaces together and then use vlan tagging on top of the bond interface
> to create a vlan across it to the other node, and then pointing the cluster
> to the vlan interfaces, which should still be up if even if the loss of one
> network interface or one switch.
>
>
> On Wed, May 27, 2009 at 7:48 PM, Kaerka Phillips <kbphillips80 at gmail.com>wrote:
>
>> It sounds like they're fencing themselves.  We got around this issue on a
>> two-node cluster by including the alternate node's internal ip address in
>> the /etc/hosts file of both hosts and a cross-over cable for the service
>> network with the private ip addresses assigned to that network.  If you're
>> trying to get them to monitor each other via the public network, in theory
>> this could be done with a backup fencing method, but we weren't able to get
>> this work since the heartbeat functions only happen on the network that the
>> node names are defined to use.
>>
>>
>> On Mon, May 25, 2009 at 5:28 AM, ESGLinux <esggrupos at gmail.com> wrote:
>>
>>> Hi,
>>> I think this is not my problem because fencing works fine. The nodes gets
>>> fenced inmediatly but I think they fence when they don't must
>>>
>>> Greetings,
>>>
>>> ESG
>>>
>>> 2009/5/22 jorge sanchez <xsanch at gmail.com>
>>>
>>> Hi,
>>>>
>>>> try also disable the acpi if is it running , see following:
>>>>
>>>>
>>>> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/s1-acpi-CA.html
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Jorge Sanchez
>>>>
>>>>
>>>> On Thu, May 21, 2009 at 5:34 PM, ESGLinux <esggrupos at gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> 2009/5/21 Jonathan Brassow <jbrassow at redhat.com>
>>>>>
>>>>>>
>>>>>> On May 21, 2009, at 9:57 AM, ESGLinux wrote:
>>>>>>
>>>>>>  Hello,
>>>>>>>
>>>>>>> these are the logs I get:
>>>>>>>
>>>>>>> In node1:
>>>>>>>
>>>>>>> May 21 11:33:44 NODE1 fenced[3840]: NODE2 not a cluster member after
>>>>>>> 5 sec post_fail_delay
>>>>>>> May 21 11:33:44 NODE1 fenced[3840]: fencing node "NODE2"
>>>>>>> May 21 11:33:44 NODE1 shutdown[5448]: shutting down for system halt
>>>>>>>
>>>>>>> in node2:
>>>>>>>
>>>>>>> May 21 11:33:45 NODE2 fenced[3843]: NODE1 not a cluster member after
>>>>>>> 5 sec post_fail_delay
>>>>>>> May 21 11:33:45 NODE2 fenced[3843]: fencing node "NODE1"
>>>>>>> May 21 11:33:45 NODE2 shutdown[5923]: shutting down for system halt
>>>>>>>
>>>>>>>
>>>>>>> what I don´t know is way they lose the connection with the cluster,
>>>>>>> they are still connected (I only unplug a cable from the service network)
>>>>>>>
>>>>>>
>>>>>> That may be something worth chasing down, as it appears that your
>>>>>> cluster communication is on a network you don't expect?
>>>>>>
>>>>>
>>>>> How can I be sure about the network the nodes are using for
>>>>> communication? I think they do for the network I have configured to do
>>>>> that....
>>>>>
>>>>>
>>>>>>
>>>>>> Also, are the nodes simply "shutting down", or are they being forcibly
>>>>>> rebooted.  If it is a casual shutdown, then it would appear that both nodes
>>>>>> are trying to shutdown simultaneously.
>>>>>>
>>>>>
>>>>> they simply shutdown. They no reboot.
>>>>>
>>>>> This is what I get every time I unplug the nework cable from eth0 of
>>>>> any of the two nodes. (they communicate through eth1...)
>>>>>
>>>>> Greetings,
>>>>>
>>>>> ESG
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  brassow
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090608/1f7031a1/attachment.htm>


More information about the Linux-cluster mailing list