[Linux-cluster] Cman (and corosync) starting before network interface is ready

Facundo M. de la Cruz fmdlc.unix at gmail.com
Wed Sep 17 19:03:51 UTC 2014


On Sep 17, 2014, at 15:51, Rick Stevens <ricks at alldigital.com> wrote:

> On 09/17/2014 08:20 AM, Vallevand, Mark K issued this missive:
>> Tried replacing the switch with a crossover cable.  The problem goes
>> away.  It looks like there is some odd delay in the switch.  The NIC is
>> configured, but it takes 4 seconds for the link to go up.  Huh.
>> 
>> We have a dedicated network for all the cluster traffic.  Nothing else
>> uses it.  In the two-node case, we use a cable.  In larger clusters we
>> will use a switch.  First delivery is for two-node clusters.  But, I
>> worry about that slow switch.
> 
> Switches have to negotiate speeds, protocols, check for conflicting MACs and several other things (depending on the switch/router). It is
> possible for that to take a couple of seconds.
> 
> I'll bet that if you unplug the cable from the switch, then plug it
> back in, you'll probably notice a slight delay in the port's link LED
> lighting up as well. Pretty common and not necessarily indicative of a
> problem.
> ----------------------------------------------------------------------
> - Rick Stevens, Systems Engineer, AllDigital    ricks at alldigital.com -
> - AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
> -                                                                    -
> -     Never put off 'til tommorrow what you can forget altogether!   -
> ----------------------------------------------------------------------
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Hi everyone,

Just let me ask one small thing.
Did you enable Spanning Tree Protocol on the interconnect switch?
STP is not compatible with TOTEM RRP, it’s because STP is flapping all the time between BLOCKED / FORWARDING state on the port, then TOTEM will be not able to transmit heartbeat packages and when you get a number of four TOTEM error (an error is a time ~238 ms + overhead) the node can be fenced or can raise issue like this.

Remember configure all the interconnect ports in the same multicast group too.

Bests regards.

-- 
Facundo M. de la Cruz (tty0)
Information Technology Specialist
Movil: +54 911 56528301

http://codigounix.blogspot.com/
http://twitter.com/_tty0

GPG fingerprint: DF2F 514A 5167 00F5 C753 BF3B D797 C8E1 5726 0789

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning.” - Rich Cook

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140917/1083d1e5/attachment.sig>


More information about the Linux-cluster mailing list