[Linux-cluster] fence start-up issue
Celso K. Webber
celso at webbertek.com.br
Sat Sep 13 13:54:23 UTC 2008
Subhendu,
I remember to previously see a recommendation which is exactly the opposite
in the Cluster FAQ, which was: do not use the same network for integrated
fencing (iLO, DRAC, IPMI) and heartbeat.
Did this change recently or in Cluster Suite v5? I'm sure in v4 I had to
make them in separate networks.
Thank you.
Celso.
Subhendu Ghosh escreveu:
> Eric Ritchie wrote:
>> I sometimes run into an issue when a node in my 2-node cluster is
>> rebooting and hangs on fenced. It seems it can't communicate with the
>> other node and after the post_join_delay, it fences the other node.
>> This happened again today, and when the second node rebooted after the
>> fence, they were in a split-brain configuration.
>> I saw in the cluster faq, in the cman section, question 6 that the
>> cluster communication network should be the same network as the
>> fencing device. I think this may be my problem but I don't understand
>> why. I'm using HP iLo for fencing and I setup cross-connect cables for
>> the cluster communication between the 2 nodes. Why would having
>> cluster communication and fencing on different networks be an issue?
>>
>> Thanks for your time
>>
>
> Having distinct heartbeat and fencing networks creates the possibility
> of race condition, which you seem to be running into.
>
> The cluster communication may not have stabilized in the post_join_delay
> time frame due to any number of issues including network outage. In
> this case fencing would fail from the node starting up as it is the same
> path to fence device as to cluster member.
>
> By separating the two - fence can succeed while cluster communication
> fails.
>
> Recommendation would be for cluster communication and iLO reachability
> to be through the same NIC on the host.
>
> -regards
> Subhendu
>
--
*Celso Kopp Webber*
celso at webbertek.com.br <mailto:celso at webbertek.com.br>
*Webbertek - Opensource Knowledge*
(41) 8813-1919 - celular
(41) 4063-8448, ramal 102 - fixo
--
Esta mensagem foi verificada pelo sistema de antivírus e
acredita-se estar livre de perigo.
More information about the Linux-cluster
mailing list