[Linux-cluster] Network switch problem

Nicolas Ross rossnick-lists at cybercat.ca
Fri Aug 19 19:48:30 UTC 2011


Hi !

We have a cluster of 8 nodes that are splited among 2 gigabit 24 ports 
network switch. Port one on each server is used for services, and port 2 for 
the "totem-ring" or cluster communications.

The servers are splited 4 on each switch, with each port configured to the 
proper vlan. We have a vlan trunk between the switchs.

I need to reboot one or both switch, without interupting the cluster 
services. In the past (i.e. before there were critical services), I did 
rebooted a switch and the cluster lost quorum and all services stoped and 
restarted as the quorum got back. I can live with a minute or so without 
services as the switch reboot, but not 5 or 10 while the services stops and 
starts.

Now, to reboot the switch, I plan on adding a 3rd temporary switch just for 
the cluster vlan, and connect, one by one, the network interfaces to that 
switch.

So, if I disconnect a the cluster network interface on a node, will that 
node immediatly be fenced or I have some time, let's say 10 seconds, to 
complete the reconnect ?

I also see that each node has a tcp connection to the other nodes. So, will 
the disconnect / reconnect sever complety that connection or will it be 
retried ?

Thanks for any insights. 




More information about the Linux-cluster mailing list