[Linux-cluster] Cluster of XEN guests unstable when rebooting a node

Paolo Marini paolom at prisma-eng.it
Sun Dec 9 08:01:41 UTC 2007


I am building up a cluster of XEN Guests with root file system residing 
on a file on an GFS filesystem (iscsi actually). Each cluster node 
mounts an GFS file system residing on an iscsi device. For performance 
reasons, both the iscsi device and the physical nodes (part also of a 
cluster) use two gigabit ethernet with bonding and LACP.

For the physical machines, I had to insert a sleep 30 on the 
/etc/init.d/iscsi script before the iscsi login, in order to wait for 
the bond interface to come up, otherwise the iscsi devices are not seen 
and no gfs mount is possible.

Then, going to the cluster of XEN Guests, they work fine, I am able to 
migrate each one to a different physical node without problems on the 
guest. When I reboot or fence a guest, the guest cluster breaks, e.g. 
the quorum is dissolved and I have to fence ALL the nodes and reboot 
them in order for the cluster to restart. Does it have to do with the 
xen bridge going up and down for a time longer than the heartbeat timeout ?

Is it still valid (and so the solution to the problems I found) this 
entry in the FAQ ?

When I reboot a xen dom, I get cluster errors and it gets fenced. What's 
going on and how do I fix it?

As I understand it, the problem is due to the fact that xen nodes tear 
down and rebuild the ethernet nic after cluster suite has started. We're 
working on a more permanent solution. In the meantime, here is a 
workaround:

  1. Edit the file: /etc/xen/xend-config.sxp line. Locate the line that
     reads:

     (network-script network-bridge)

     Change that line to read:

     (network-script /bin/true)

  2. Create and/or edit file /etc/sysconfig/network-scripts/ifcfg-eth0
     to look something like:

     DEVICE=eth0
     ONBOOT=yes
     BRIDGE=xenbr0
     HWADDR=XX:XX:XX:XX:XX:XX

     Where XX:XX:XX:XX:XX:XX is the mac address of your network card.

  3. Create and/or edit file
     /etc/sysconfig/network-scripts/ifcfg-xenbr0 to look something like:

     DEVICE=xenbr0
     ONBOOT=yes
     BOOTPROTO=static
     IPADDR=10.0.0.116
     NETMASK=255.255.255.0
     GATEWAY=10.0.0.254
     TYPE=Bridge
     DELAY=0

     Substitute your appropriate IP address, netmask and gateway
     information.


Thanks, Paolo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: paolom.vcf
Type: text/x-vcard
Size: 298 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20071209/be64acd1/attachment.vcf>


More information about the Linux-cluster mailing list