[Linux-cluster] Network hiccup + power-fencing = both nodes go down (redhat cluster 4)
Jeff Harr
jharr at opsource.net
Tue Jan 17 11:48:42 UTC 2006
Hi all, it has been a while since I posted anything. Once again, I'd
appreciate anything anyone has to say regarding this latest issue.
Basically, we have a situation where both nodes are suddenly unable to
reach each other due to a "network hiccup", and they begin trying to
fence each other (power fencing). Then suddenly, the network returns
and they turn each other off. My need: make redhat cluster robust
enough not to do this. It could be that my configurations are wrong,
and I'm going to include them (attached).
My idea/solution: I THINK I could increase the post-fail-delay to a
higher number than 0, thus making it wait to see if things "come back
up". Perhaps I make 1 node wait like 2 minutes for the other one to
come up, and another node wait zero seconds. Thus insuring that nobody
does anything at the same time?
Some small proof that the dual-reboot happened:
I know that both boxes fenced the other and "succeeded", and my ILO
event logs show both servers being powered off.
Thanks a lot,
Jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060117/482de11e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster_db2.conf
Type: application/octet-stream
Size: 1392 bytes
Desc: cluster_db2.conf
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060117/482de11e/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster_db1.conf
Type: application/octet-stream
Size: 1392 bytes
Desc: cluster_db1.conf
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060117/482de11e/attachment-0001.obj>
More information about the Linux-cluster
mailing list