[Linux-cluster] Repeated fencing

Carlos Maiolino cmaiolino at redhat.com
Wed Feb 24 18:43:02 UTC 2010


On Wed, Feb 24, 2010 at 08:54:38AM -0600, Doug Tucker wrote:
> Thanks to you and Carlos.  I understand a bit better now what you are
> referring to, however, I don't believe that is the issue.  The reason we
> went to the crossover cable was to avoid this issue, as we had a switch
> die once, and both then thought they were master and tried to fence the
> other.  In my situation, there is no reason for the missed heartbeat
> that I can find.  The interfaces have not gone down.  We ran a test
> where I started a ping between the 2 that wrote out to a file until a
> "heartbeat" missed and a reboot occurred.  There was not a single missed
> ping between the 2 nodes prior to the event.  Also in a split brain,
> both machines should recognize the other one "gone" and try to become
> master.  In this case, only 1 of the nodes at a time is seeing a "missed
> heartbeat" and then attempting to fence the other.  We have replaced all
> hardware to include cables even to ensure it wasn't that.  This appears
> to be some software bug of sorts.  Again, we have another 2 node cluster
> that this doesn't occur on, but, they are running a different kernel and
> gfs module.
> 

Doug, did you search if there are any bugs in NIC's module that you are using ? Maybe try to look at kernel's changelog to see if there are any changes on these modules...


cya

-- 
---

Best Regards

Carlos Eduardo Maiolino
Support engineer
Red Hat - Global Support Services




More information about the Linux-cluster mailing list