[Linux-cluster] Cluster of XEN guests unstable when rebooting a node under CS5.1

Mon Dec 17 18:22:14 UTC 2007

A positive update to the situation ...

after working a lot on the config.xml file, on the ethernet setup, I 
found that the instability is present both with ethernet bonding (LACP) 
and with single ethernet also. The instability is noticed when a fenced 
node comes up again and tries to rejoin the cluster, in that moment the 
whole cluster looses the quorum. This happens for both the physical and 
the virtual cluster.

Having seen that the instability is independant from the network 
configuration, I started palying with the network switch configuration, 
which is in my case a 3COM Baseline 2924-SFP, a level 2 manageable 
gigabit ethernet, 24 ports switch updated to the latest available firmware.

I tried to disable the IGMP snooping feature, and retested the cluster 
again by fencing and restarting the nodes, one at a time.

This time the fenced node restarted without problems to the remaining 
operational nodes ! I repeated the test more and more times, removed the 
<totem token= > line from the config.xml file going back to the default 
timeouts, and everything worked as expected.

So, the issue seems related to the switch I use ! I thought that the 
IGMP snooping on the switch was useful to limit the amount of traffic on 
the network ports when multicast is used, but it seems that it not only 
limits but cuts it when enabled.

Hope this note spares some time to other people working on the same 
configuration.

BR, Paolo

Paolo Marini ha scritto:
> After some more estensive testing, the problem is not solved.
>
> I fence one guest node from the luci interface (or with xm destroy from a
> physical node, is the same). What I see on another node log is:
>
> Dec 14 09:59:28 c5g-thor openais[1741]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> Dec 14 09:59:28 c5g-thor openais[1741]: [TOTEM] Receive multicast socket
> recv buffer size (288000 bytes).
> Dec 14 09:59:28 c5g-thor openais[1741]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> Dec 14 09:59:28 c5g-thor openais[1741]: [TOTEM] entering GATHER state from 2.
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] entering GATHER state from 0.
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] entering GATHER state from
> 11.
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] entering GATHER state from
> 11.
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] Creating commit token
> because I am the rep.
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] Saving state aru 71 high
> seq received 71
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] Storing new sequence id
> for ring 10ec
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] entering COMMIT state.
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] entering RECOVERY state.
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] position [0] member
> 192.168.15.152:
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] previous ring seq 4328 rep
> 192.168.15.151
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] aru 71 high delivered 71
> received flag 1
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] Did not need to originate
> any messages in recovery.
> Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] Sending initial ORF token
> Dec 14 09:59:32 c5g-thor clurgmgrd[2386]: <emerg> #1: Quorum Dissolved
> Dec 14 09:59:32 c5g-thor kernel: dlm: closing connection to node 2
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 14 09:59:32 c5g-thor kernel: dlm: closing connection to node 3
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ] New Configuration:
> Dec 14 09:59:32 c5g-thor kernel: dlm: closing connection to node 4
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ]         r(0)
> ip(192.168.15.152)
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ] Members Left:
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ]         r(0)
> ip(192.168.15.151)
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ]         r(0)
> ip(192.168.15.153)
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ]         r(0)
> ip(192.168.15.154)
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ] Members Joined:
> Dec 14 09:59:32 c5g-thor openais[1741]: [CMAN ] quorum lost, blocking
> activity
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ] CLM CONFIGURATION CHANGE
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ] New Configuration:
> Dec 14 09:59:32 c5g-thor openais[1741]: [CLM  ]         r(0)
> ip(192.168.15.152)
> Dec 14 09:59:33 c5g-thor openais[1741]: [CLM  ] Members Left:
> Dec 14 09:59:33 c5g-thor openais[1741]: [CLM  ] Members Joined:
> Dec 14 09:59:33 c5g-thor openais[1741]: [SYNC ] This node is within the
> primary component and will provide service.
> Dec 14 09:59:33 c5g-thor openais[1741]: [TOTEM] entering OPERATIONAL state.
> Dec 14 09:59:33 c5g-thor openais[1741]: [CLM  ] got nodejoin message
> 192.168.15.152
> Dec 14 09:59:33 c5g-thor openais[1741]: [CPG  ] got joinlist message from
> node 1
> Dec 14 09:59:33 c5g-thor openais[1741]: [TOTEM] entering GATHER state from
> 11.
> Dec 14 09:59:33 c5g-thor openais[1741]: [TOTEM] entering GATHER state from
> 11.
> Dec 14 09:59:33 c5g-thor ccsd[1704]: Cluster is not quorate.  Refusing
> connection.
>
>
> The cluster.conf file looks like:
>
> <?xml version="1.0"?>
> <cluster alias="PESV" config_version="25" name="PESV">
>         <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="c5g-thor.prisma" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device domain="c5g-thor"
> name="c5g-thor-f"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="c5g-backup.prisma" nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device domain="c5g-backup"
> name="c5g-backup-f"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="c5g-memo.prisma" nodeid="3" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device domain="c5g-memo"
> name="c5g-memo-f"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="c5g-steiner.prisma" nodeid="4" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device domain="c5g-steiner"
> name="c5g-steiner-f"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <fencedevices>
>                 <fencedevice agent="fence_xvm" name="c5g-backup-f"/>
>                 <fencedevice agent="fence_xvm" name="c5g-thor-f"/>
>                 <fencedevice agent="fence_xvm" name="c5g-memo-f"/>
>                 <fencedevice agent="fence_xvm" name="c5g-steiner-f"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains/>
>                 <resources/>
>         </rm>
>         <totem token="30000"/>
>         <cman/>
> </cluster>
>
>
>   
>> On Wed, 2007-12-12 at 19:23 +0100, Paolo Marini wrote:
>>     
>>> I reiterate the request for help hoping someone has undergone (and
>>> hopefully solved) the same issues.
>>>
>>> I am building up a cluster of XEN Guests with root file system residing
>>> on a file on an GFS filesystem (iscsi actually).
>>>
>>> Each cluster node mounts an GFS file system residing on an iscsi device.
>>>
>>> For performance reasons, both the iscsi device and the physical nodes
>>> (part also of a cluster) use two gigabit ethernet with bonding and LACP.
>>> For the physical machines, I had to insert a sleep 30 on the
>>> /etc/init.d/iscsi script before the iscsi login, in order to wait for
>>> the bond interface to come up, otherwise the iscsi devices are not seen
>>> and no gfs mount is possible.
>>>
>>> Then, going to the cluster of XEN Guests, they work fine, I am able to
>>> migrate each one to a different physical node without problems on the
>>> guest.
>>>
>>> When I reboot or fence one of the guests, the guest cluster breaks, e.g.
>>> the quorum is dissolved and I have to fence ALL the nodes and reboot
>>> them in order for the cluster to restart.
>>>       
>> How many guests - and what are you using for fencing ?
>>
>>     
>>> Does it have to do with the xen bridge going up and down for a time
>>> longer than the heartbeat timeout ?
>>>       
>> Not sure - it shouldn't be that big of a deal.  If you think that's the
>> problem try adding:
>>
>>    <totem token="30000"/>
>>
>> to the vm cluster's cluster.conf
>>
>> -- Lon
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>     
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>