[Linux-cluster] fence start-up issue

Celso K. Webber celso at webbertek.com.br
Sat Sep 13 13:54:23 UTC 2008


Subhendu,

I remember to previously see a recommendation which is exactly the opposite 
in the Cluster FAQ, which was: do not use the same network for integrated 
fencing (iLO, DRAC, IPMI) and heartbeat.

Did this change recently or in Cluster Suite v5? I'm sure in v4 I had to 
make them in separate networks.

Thank you.

Celso.

Subhendu Ghosh escreveu:
> Eric Ritchie wrote:
>>    I sometimes run into an issue when a node in my 2-node cluster is 
>> rebooting and hangs on fenced. It seems it can't communicate with the 
>> other node and after the post_join_delay, it fences the other node. 
>> This happened again today, and when the second node rebooted after the 
>> fence, they were in a split-brain configuration.
>>    I saw in the cluster faq, in the cman section, question 6 that the 
>> cluster communication network should be the same network as the 
>> fencing device. I think this may be my problem but I don't understand 
>> why. I'm using HP iLo for fencing and I setup cross-connect cables for 
>> the cluster communication between the 2 nodes. Why would having 
>> cluster communication and fencing on different networks be an issue?
>>
>> Thanks for your time
>>
> 
> Having distinct heartbeat and fencing networks creates the possibility 
> of race condition, which you seem to be running into.
> 
> The cluster communication may not have stabilized in the post_join_delay 
> time frame due to any number of issues including network outage.  In 
> this case fencing would fail from the node starting up as it is the same 
> path to fence device as to cluster member.
> 
> By separating the two - fence can succeed while cluster communication 
> fails.
> 
> Recommendation would be for cluster communication and iLO reachability 
> to be through the same NIC on the host.
> 
> -regards
> Subhendu
> 

-- 
*Celso Kopp Webber*

celso at webbertek.com.br <mailto:celso at webbertek.com.br>

*Webbertek - Opensource Knowledge*
(41) 8813-1919 - celular
(41) 4063-8448, ramal 102 - fixo


-- 
Esta mensagem foi verificada pelo sistema de antivírus e
 acredita-se estar livre de perigo.




More information about the Linux-cluster mailing list