[Linux-cluster] Fence_xvmd/fence_xvm problem

Agnieszka Kukałowicz qqlka at nask.pl
Mon Feb 11 08:55:34 UTC 2008


Hi,
 
I was trying to configure Xen guests as virtual services under Cluster
Suite. My configuration is simple:
 
Node one "d1" runs xen guest as virtual service "vm_service1", and node
one "d2" runs virtual service "vm_service2". 
 
The /etc/cluster/cluster.conf file is below:
 
<?xml version="1.0"?>
<cluster alias="VM_Data_Cluster" config_version="112"
name="VM_Data_Cluster">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="300"/>
        <clusternodes>
                <clusternode name="d1" nodeid="1" votes="1">
                        <multicast addr="225.0.0.1" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device name="apc_power_switch"
port="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="d2" nodeid="2" votes="1">
                        <multicast addr="225.0.0.1" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device name="apc_power_switch"
port="2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
                <multicast addr="225.0.0.1"/>
        </cman>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="X.X.X.X"
login="apc" name="apc_power_switch"   passwd="apc"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="VM_d1_failover"
ordered="0" restricted="0">
                                <failoverdomainnode name="d1"
priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="VM_d2_failover"
ordered="0" restricted="0">
                                <failoverdomainnode name="d2"
priority="1"/>
                        </failoverdomain>
                <resources/>
                <vm autostart="1" domain="VM_d1_failover" exclusive="0"
name="vm_service1"        
                    path="/virts/service1" recovery="relocate"/>
                <vm autostart="1" domain="VM_d2_failover" exclusive="0"
name="vm_service2"
                    path="/virts/service2" recovery="relocate"/>
        </rm>
        <totem consensus="4800" join="60" token="10000"
token_retransmits_before_loss_const="20"/>
        <fence_xvmd family="ipv4"/>
</cluster>
 
On guests "vm_service1"  and "vm_service2" I have configured the second
cluster. 
 
<cluster alias="SV_Data_Cluster" config_version="29"
name="SV_Data_Cluster">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="d11" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="d11"
name="virtual_fence"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="d12" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="d12"
name="virtual_fence"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_xvm" name="virtual_fence"/>
        </fencedevices>
        <rm>
.
       </rm>
</cluster>
 
The problem is that the fence_xvmd/fence_xvm mechanism doesn't work due
to propably misconfiguration of multicast. 
 
Physical nodes "d1" and "d2" and xen guests "vm_service1" and
"vm_service2"  have two ethernet interfaces: private- 10.0.200.x (eth0)
and public (eth1). 
 
On physical nodes, "fence_xvmd" deamon listens defaults on eth1
interface: 
[root at d2 ~]# netstat -g
IPv6/IPv4 Group Memberships
Interface       RefCnt Group
--------------- ------ ---------------------
lo              1      ALL-SYSTEMS.MCAST.NET
eth0            1      225.0.0.1
eth0            1      ALL-SYSTEMS.MCAST.NET
eth1            1      225.0.0.12
eth1            1      ALL-SYSTEMS.MCAST.NET
virbr0          1      ALL-SYSTEMS.MCAST.NET
lo              1      ff02::1
..
 
Next when I make on xen guest "vm_service1"  a test to fence guest
"vm_service2"  I get:
 
[root at d11 cluster]# /sbin/fence_xvm -H d12 -ddddd
Debugging threshold is now 5
-- args @ 0xbf8aea70 --
  args->addr = 225.0.0.12
  args->domain = d12
  args->key_file = /etc/cluster/fence_xvm.key
  args->op = 2
  args->hash = 2
  args->auth = 2
  args->port = 1229
  args->family = 2
  args->timeout = 30
  args->retr_time = 20
  args->flags = 0
  args->debug = 5
-- end args --
Reading in key file /etc/cluster/fence_xvm.key into 0xbf8ada1c (4096 max
size)
Actual key length = 4096 bytesOpening /dev/urandom
Sending to 225.0.0.12 via 127.0.0.1
Opening /dev/urandom
Sending to 225.0.0.12 via X.X.X.X
Opening /dev/urandom
Sending to 225.0.0.12 via 10.0.200.124
Waiting for connection from XVM host daemon.
..
Waiting for connection from XVM host daemon.
Timed out waiting for response
 
On the node "d2" where "vm_service2" is running I get:
 
[root at d2 ~]# /sbin/fence_xvmd -fddd
Debugging threshold is now 3
-- args @ 0xbfc54e3c --
  args->addr = 225.0.0.12
  args->domain = (null)
  args->key_file = /etc/cluster/fence_xvm.key
  args->op = 2
  args->hash = 2
  args->auth = 2
  args->port = 1229
  args->family = 2
  args->timeout = 30
  args->retr_time = 20
  args->flags = 1
  args->debug = 3
-- end args --
Reading in key file /etc/cluster/fence_xvm.key into 0xbfc53e3c (4096 max
size)
Actual key length = 4096 bytesOpened ckpt vm_states
My Node ID = 1
Domain                   UUID                                 Owner
State
------                   ----                                 -----
-----
Domain-0                 00000000-0000-0000-0000-000000000000 00001
00001
vm_service2      2dd8193f-e4d4-f41c-a4af-f5b30d19fe00 00001 00001
Storing vm_service2
Domain                   UUID                                 Owner
State
------                   ----                                 -----
-----
Domain-0                 00000000-0000-0000-0000-000000000000 00001
00001
vm_service2     2dd8193f-e4d4-f41c-a4af-f5b30d19fe00 00001 00001
Storing vm_service2
Request to fence: d12.
Evaluating Domain: d12   Last Owner/State Unknown
Domain                   UUID                                 Owner
State
------                   ----                                 -----
-----
Domain-0                 00000000-0000-0000-0000-000000000000 00001
00001
vm_service2      2dd8193f-e4d4-f41c-a4af-f5b30d19fe00 00001 00001
Storing vm_service2
Request to fence: d12
Evaluating Domain: d12   Last Owner/State Unknown
 
So it looks like the fence_xvmd and fence_xvm cannot communicate earch
other. 
But "fence_xvm" on "vm_service1" sends multicast packets through all
interfaces and node "d2" can receive them. Tcpdump on node "d2" says
that the node "d2" receives the packages:
 
[root at d2 ~]# tcpdump  -i peth0 -n host 225.0.0.12
listening on peth0, link-type EN10MB (Ethernet), capture size 96 bytes
17:50:47.972477 IP 10.0.200.124.filenet-pch > 225.0.0.12.novell-zfs:
UDP, length 176
17:50:49.960841 IP 10.0.200.124.filenet-pch > 225.0.0.12.novell-zfs:
UDP, length 176
17:50:51.977425 IP 10.0.200.124.filenet-pch > 225.0.0.12.novell-zfs:
UDP, length 176
 
[root at d2 ~]# tcpdump  -i peth1 -n host 225.0.0.12
listening on peth1, link-type EN10MB (Ethernet), capture size 96 bytes
17:51:26.168132 IP X.X.X.X.filenet-pch > 225.0.0.12.novell-zfs: UDP,
length 176
17:51:28.184802 IP X.X.X.X.filenet-pch > 225.0.0.12.novell-zfs: UDP,
length 176
17:51:30.196875 IP X.X.X.X.filenet-pch > 225.0.0.12.novell-zfs: UDP,
length 176
 
But I can't see the "node2" sends anything to xen guest "vm_service1".
So "fence_xvm" gets timeout.
What can I do wrong?
 
Cheers
 
Agnieszka Kukałowicz
NASK, Polska.pl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080211/55f6f4dd/attachment.htm>


More information about the Linux-cluster mailing list