[Linux-cluster] Fence_xvmd/fence_xvm problem
Bernard Chew
bernard.chew at muvee.com
Mon Feb 11 15:01:58 UTC 2008
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Agnieszka
> Kukalowicz
> Sent: Monday, February 11, 2008 4:56 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] Fence_xvmd/fence_xvm problem
>
> Hi,
>
> I was trying to configure Xen guests as virtual services under Cluster Suite. My configuration is simple:
>
> Node one "d1" runs xen guest as virtual service "vm_service1", and node one "d2" runs virtual service
> "vm_service2".
>
> The /etc/cluster/cluster.conf file is below:
>
> <?xml version="1.0"?>
> <cluster alias="VM_Data_Cluster" config_version="112" name="VM_Data_Cluster">
> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="300"/>
> <clusternodes>
> <clusternode name="d1" nodeid="1" votes="1">
> <multicast addr="225.0.0.1" interface="eth0"/>
> <fence>
> <method name="1">
> <device name="apc_power_switch" port="1"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="d2" nodeid="2" votes="1">
> <multicast addr="225.0.0.1" interface="eth0"/>
> <fence>
> <method name="1">
> <device name="apc_power_switch" port="2"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1">
> <multicast addr="225.0.0.1"/>
> </cman>
> <fencedevices>
> <fencedevice agent="fence_apc" ipaddr="X.X.X.X" login="apc" name="apc_power_switch"
> passwd="apc"/>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="VM_d1_failover" ordered="0" restricted="0">
> <failoverdomainnode name="d1" priority="1"/>
> </failoverdomain>
> <failoverdomain name="VM_d2_failover" ordered="0" restricted="0">
> <failoverdomainnode name="d2" priority="1"/>
> </failoverdomain>
> <resources/>
> <vm autostart="1" domain="VM_d1_failover" exclusive="0" name="vm_service1"
> path="/virts/service1" recovery="relocate"/>
> <vm autostart="1" domain="VM_d2_failover" exclusive="0" name="vm_service2"
> path="/virts/service2" recovery="relocate"/>
> </rm>
> <totem consensus="4800" join="60" token="10000" token_retransmits_before_loss_const="20"/>
> <fence_xvmd family="ipv4"/>
> </cluster>
>
> On guests "vm_service1" and "vm_service2" I have configured the second cluster.
>
> <cluster alias="SV_Data_Cluster" config_version="29" name="SV_Data_Cluster">
> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> <clusternodes>
> <clusternode name="d11" nodeid="1" votes="1">
> <fence>
> <method name="1">
> <device domain="d11" name="virtual_fence"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="d12" nodeid="2" votes="1">
> <fence>
> <method name="1">
> <device domain="d12" name="virtual_fence"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_xvm" name="virtual_fence"/>
> </fencedevices>
> <rm>
> ...
> </rm>
> </cluster>
>
> The problem is that the fence_xvmd/fence_xvm mechanism doesn't work due to propably misconfiguration of
> multicast.
>
> Physical nodes "d1" and "d2" and xen guests "vm_service1" and "vm_service2" have two ethernet interfaces:
> private- 10.0.200.x (eth0) and public (eth1).
>
> On physical nodes, "fence_xvmd" deamon listens defaults on eth1 interface:
> [root at d2 ~]# netstat -g
> IPv6/IPv4 Group Memberships
> Interface RefCnt Group
> --------------- ------ ---------------------
> lo 1 ALL-SYSTEMS.MCAST.NET
> eth0 1 225.0.0.1
> eth0 1 ALL-SYSTEMS.MCAST.NET
> eth1 1 225.0.0.12
> eth1 1 ALL-SYSTEMS.MCAST.NET
> virbr0 1 ALL-SYSTEMS.MCAST.NET
> lo 1 ff02::1
> ....
>
> Next when I make on xen guest "vm_service1" a test to fence guest "vm_service2" I get:
>
> [root at d11 cluster]# /sbin/fence_xvm -H d12 -ddddd
> Debugging threshold is now 5
> -- args @ 0xbf8aea70 --
> args->addr = 225.0.0.12
> args->domain = d12
> args->key_file = /etc/cluster/fence_xvm.key
> args->op = 2
> args->hash = 2
> args->auth = 2
> args->port = 1229
> args->family = 2
> args->timeout = 30
> args->retr_time = 20
> args->flags = 0
> args->debug = 5
> -- end args --
> Reading in key file /etc/cluster/fence_xvm.key into 0xbf8ada1c (4096 max size)
> Actual key length = 4096 bytesOpening /dev/urandom
> Sending to 225.0.0.12 via 127.0.0.1
> Opening /dev/urandom
> Sending to 225.0.0.12 via X.X.X.X
> Opening /dev/urandom
> Sending to 225.0.0.12 via 10.0.200.124
> Waiting for connection from XVM host daemon.
> ....
> Waiting for connection from XVM host daemon.
> Timed out waiting for response
>
> On the node "d2" where "vm_service2" is running I get:
>
> [root at d2 ~]# /sbin/fence_xvmd -fddd
> Debugging threshold is now 3
> -- args @ 0xbfc54e3c --
> args->addr = 225.0.0.12
> args->domain = (null)
> args->key_file = /etc/cluster/fence_xvm.key
> args->op = 2
> args->hash = 2
> args->auth = 2
> args->port = 1229
> args->family = 2
> args->timeout = 30
> args->retr_time = 20
> args->flags = 1
> args->debug = 3
> -- end args --
> Reading in key file /etc/cluster/fence_xvm.key into 0xbfc53e3c (4096 max size)
> Actual key length = 4096 bytesOpened ckpt vm_states
> My Node ID = 1
> Domain UUID Owner State
> ------ ---- ----- -----
> Domain-0 00000000-0000-0000-0000-000000000000 00001 00001
> vm_service2 2dd8193f-e4d4-f41c-a4af-f5b30d19fe00 00001 00001
> Storing vm_service2
> Domain UUID Owner State
> ------ ---- ----- -----
> Domain-0 00000000-0000-0000-0000-000000000000 00001 00001
> vm_service2 2dd8193f-e4d4-f41c-a4af-f5b30d19fe00 00001 00001
> Storing vm_service2
> Request to fence: d12.
> Evaluating Domain: d12 Last Owner/State Unknown
> Domain UUID Owner State
> ------ ---- ----- -----
> Domain-0 00000000-0000-0000-0000-000000000000 00001 00001
> vm_service2 2dd8193f-e4d4-f41c-a4af-f5b30d19fe00 00001 00001
> Storing vm_service2
> Request to fence: d12
> Evaluating Domain: d12 Last Owner/State Unknown
>
> So it looks like the fence_xvmd and fence_xvm cannot communicate earch other.
> But "fence_xvm" on "vm_service1" sends multicast packets through all interfaces and node "d2" can receive them.
> Tcpdump on node "d2" says that the node "d2" receives the packages:
>
> [root at d2 ~]# tcpdump -i peth0 -n host 225.0.0.12
> listening on peth0, link-type EN10MB (Ethernet), capture size 96 bytes
> 17:50:47.972477 IP 10.0.200.124.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176
> 17:50:49.960841 IP 10.0.200.124.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176
> 17:50:51.977425 IP 10.0.200.124.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176
>
> [root at d2 ~]# tcpdump -i peth1 -n host 225.0.0.12
> listening on peth1, link-type EN10MB (Ethernet), capture size 96 bytes
> 17:51:26.168132 IP X.X.X.X.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176
> 17:51:28.184802 IP X.X.X.X.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176
> 17:51:30.196875 IP X.X.X.X.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176
>
> But I can't see the "node2" sends anything to xen guest "vm_service1". So "fence_xvm" gets timeout.
> What can I do wrong?
>
> Cheers
>
> Agnieszka Kukałowicz
> NASK, Polska.pl
Hi,
Can you show the results of "netstat -nr" as well?
Regards,
Bernard Chew
More information about the Linux-cluster
mailing list