[Linux-cluster] Repeated fencing
Carlos Maiolino
cmaiolino at redhat.com
Wed Feb 24 18:43:02 UTC 2010
On Wed, Feb 24, 2010 at 08:54:38AM -0600, Doug Tucker wrote:
> Thanks to you and Carlos. I understand a bit better now what you are
> referring to, however, I don't believe that is the issue. The reason we
> went to the crossover cable was to avoid this issue, as we had a switch
> die once, and both then thought they were master and tried to fence the
> other. In my situation, there is no reason for the missed heartbeat
> that I can find. The interfaces have not gone down. We ran a test
> where I started a ping between the 2 that wrote out to a file until a
> "heartbeat" missed and a reboot occurred. There was not a single missed
> ping between the 2 nodes prior to the event. Also in a split brain,
> both machines should recognize the other one "gone" and try to become
> master. In this case, only 1 of the nodes at a time is seeing a "missed
> heartbeat" and then attempting to fence the other. We have replaced all
> hardware to include cables even to ensure it wasn't that. This appears
> to be some software bug of sorts. Again, we have another 2 node cluster
> that this doesn't occur on, but, they are running a different kernel and
> gfs module.
>
Doug, did you search if there are any bugs in NIC's module that you are using ? Maybe try to look at kernel's changelog to see if there are any changes on these modules...
cya
--
---
Best Regards
Carlos Eduardo Maiolino
Support engineer
Red Hat - Global Support Services
More information about the Linux-cluster
mailing list