[Linux-cluster] Repeated fencing

Carlos Maiolino cmaiolino at redhat.com
Mon Feb 22 22:33:43 UTC 2010


On Mon, Feb 22, 2010 at 01:40:37PM -0800, Celso K. Webber wrote:
> I endorse Doug's opinion.
> 
> Although my opinion is empiric, I can afirm that a crossover (can be a straight cable in case of GigEthernet) is "more stable" than many Ethernet switches out there. Not to mention that sometimes the customer has only 100 Mbps ports, while using crossover cable you'll have GigE connections.
> 
> 
> So I'd like also to ask: is there officialy any known issues about using crossover cables instead of Ethernet switches for the private / heartbeat network?
> 
> Thankks, Celso.
> 
> 
> 
> ----- Original Message ----
> From: Doug Tucker <tuckerd at lyle.smu.edu>
> To: linux clustering <linux-cluster at redhat.com>
> Sent: Mon, February 22, 2010 4:53:29 PM
> Subject: Re: [Linux-cluster] Repeated fencing
> 
> We did.  It's problematic when you need to reboot a switch or it goes
> down.  They can't talk and try to fence each other.  Crossover cable is
> a direct connection, actually far more efficient for what you are trying
> to accomplish.
> 
> 
> On Mon, 2010-02-22 at 11:57 -0600, Paul M. Dyer wrote:
> > Crossover cable??????
> > 
> > With all the $$ spent, try putting a switch between the nodes.
> > 
> > Paul
> > 
> > ----- Original Message -----
> > From: "Doug Tucker" <tuckerd at lyle.smu.edu>
> > To: linux-cluster at redhat.com
> > Sent: Monday, February 22, 2010 10:15:49 AM (GMT-0600) America/Chicago
> > Subject: [Linux-cluster] Repeated fencing
> > 
> > We have a 2 4.x cluster that has developed an issue we are unable to
> > resolve.  Starting back in December, the nodes began fencing each other
> > randomly, and as frequently as once a day.  There is nothing at the
> > console prior to it happening, and nothing in the logs.  We have not
> > been able to develop any pattern to this point, the 2 nodes appear to be
> > functioning fine, and suddenly in the logs a message will appear about
> > "node x missed too many heartbeats" and the next thing you see is it
> > fencing the node.  Thinking we possibly had a hardware issue, we
> > replaced both nodes from scratch with new machines, the problem
> > persists.  The cluster communication is done via a crossover cable on
> > eth1 on both devices with private ip's.  We have a 2nd cluster that is
> > not having this issue, and both nodes have been up for over 160 days.
> > The configuration is basically identical to the problematic cluster.
> > The only difference between the 2 now is the newer hardware on the
> > problematic node (prior, that was identical), and the kernel.  The
> > non-problematic cluster is still running kernel 89.0.9 and the
> > problematic cluster is on 89.0.11.  We are afraid at this point to allow
> > our non problematic cluster upgrade to the latest packages.  Any insight
> > or advice would be greatly appreciated, we have exhausted our ideas
> > here.
> > 
> > Sincerely,
> > 
> > Doug
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
>       
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Hi Doug, maybe you can avoid this kind of problem using a quorumdisk partition. a two node cluster is split-brain prone and with a quorumdisk partition you can avoid split-brain situations, which probably is causing this behavior.

So, about use a cross-over (or straight) cable, I don't know any issue about it, but, try to check if it's using full-duplex mode. half-duplex mode on cross-over linked machines probably will cause heartbeat problems.

cya..
-- 
---

Best Regards

Carlos Eduardo Maiolino
Support engineer
Red Hat - Global Support Services




More information about the Linux-cluster mailing list