[Linux-cluster] Xen network config -> Fence problem - More info
Madison Kelly
linux at alteeve.com
Sat Oct 31 05:01:23 UTC 2009
After sending this, I went back to debugging the problem. The
machines had stopped fencing and the DRBD link was down.
So first I stopped and then started 'xend' and this got the Xen-type
networking up. I left the machines alone for about ten minutes to see if
they would fence one another, they didn't.
So then I set about fixing DRBD. I got the array re-sync'ing and I
thought I might have gotten things working, but about 15 or 30 seconds
after getting the DRBD back online, one node fenced the other again. It
may have been a coincidence, but the last command I called before one
node fenced the other was 'pvdisplay' to check the LVM PVs. That command
didn't return, and may have been the trigger, I am not sure.
So it looks like they fence each other until DRBD breaks. Once array
is fixed and/or pvdisplay is called, the fence loop starts again.
Madi
Madison Kelly wrote:
> Hi all,
>
> I've got CentOS 5.3 installed on two nodes (simple two node cluster).
> On this, I've got a DRBD partition running cluster aware LVM. I use this
> to host VMs under Xen.
>
> I've got a problem where I am trying to use eth0 as a back channel for
> the VMs on either node via a firewall VM. The network setup on each node
> is:
>
> eth0: back channel, IPMI only connected to an internal network.
> eth1: dedicated DRBD link.
> eth2: Internet-facing interface.
>
> I want to get eth0 and eth2 under Xen's networking but the default
> config was to leave eth0 alone. Specifically, the
> convirt-xen-multibridge is set to:
>
> "$dir/network-bridge" "$@" vifnum=0 netdev=peth0 bridge=xenbr0
>
> When I change this to:
>
> "$dir/network-bridge" "$@" vifnum=0 netdev=eth0 bridge=xenbr0
>
> One of the nodes will soon fence the other, and when it comes back up
> it fences the first. Eventually one node stays up and constantly fences
> the other.
>
> The node that survives prints this to repeatedly to the log just
> before it is fenced:
>
> Oct 31 00:27:21 vsh02 openais[3133]: [TOTEM] FAILED TO RECEIVE
> Oct 31 00:27:21 vsh02 openais[3133]: [TOTEM] entering GATHER state from 6.
>
> And the node that stays up prints this:
>
> Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] The token was lost in the
> OPERATIONAL state.
> Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] Receive multicast socket
> recv buffer size (288000 bytes).
> Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> Oct 31 00:35:47 vsh03 openais[3237]: [TOTEM] entering GATHER state from 2.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering GATHER state from 0.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Creating commit token
> because I am the rep.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Saving state aru 2c high
> seq received 2c
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Storing new sequence id for
> ring 108
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering COMMIT state.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] entering RECOVERY state.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] position [0] member
> 10.255.135.3:
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] previous ring seq 260 rep
> 10.255.135.2
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] aru 2c high delivered 2c
> received flag 1
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Did not need to originate
> any messages in recovery.
> Oct 31 00:35:51 vsh03 openais[3237]: [TOTEM] Sending initial ORF token
> Oct 31 00:35:51 vsh03 openais[3237]: [CLM ] CLM CONFIGURATION CHANGE
> Oct 31 00:35:51 vsh03 openais[3237]: [CLM ] New Configuration:
> Oct 31 00:35:51 vsh03 kernel: dlm: closing connection to node 1
> Oct 31 00:35:51 vsh03 fenced[3256]: vsh02.domain.com not a cluster
> member after 0 sec post_fail_delay
> Oct 31 00:35:51 vsh03 openais[3237]: [CLM ] r(0) ip(10.255.135.3)
> Oct 31 00:35:51 vsh03 fenced[3256]: fencing node "vsh02.domain.com"
>
> If I leave it long enough, the failed node (vsh02 in this case), stops
> getting fenced but the Xen networking doesn't come up. Specifically, no
> vifX.Y, xenbrX or other devices get created.
>
> Any idea what might be going on? I really need to get eth0 virtualized
> so that I can get routing to work.
>
> Thanks!
>
> Madi
More information about the Linux-cluster
mailing list