[Linux-cluster] fence in xen
Joel Heenan
joelh at planetjoel.com
Fri Oct 1 05:09:16 UTC 2010
Are you saying that if you manually destroy the guest, then start it up it
works?
I don't think your problem is with fencing I think its that the two guests
are not joining correctly. It seems like the fencing part is working.
Do the logs in /var/log/messages show that one node succesfully fenced the
other? What is the output of group_tool on both nodes after they have come
up, this should help you debug it.
I don't think its relevant but this item from the FAQ may help:
http://sources.redhat.com/cluster/wiki/FAQ/Fencing#fence_stuck
Joel
On Wed, Sep 22, 2010 at 7:08 PM, Rakovec Jost <Jost.Rakovec at snt.si> wrote:
> Hi
>
> anybody any idea? Please help!!
>
>
> now i can fence node but after booting it can't connect in to cluster.
>
> on dom0
>
> fence_xvmd -LX -I xenbr0 -U xen:/// -fdddddddddddddd
>
>
> ipv4_connect: Connecting to client
> ipv4_connect: Success; fd = 12
> Rebooting domain oelcl21...
> [REBOOT] Calling virDomainDestroy(0x99cede0)
> libvir: Xen error : Domain not found: xenUnifiedDomainLookupByName
> [[ XML Domain Info ]]
> <domain type='xen' id='41'>
> <name>oelcl21</name>
> <uuid>07e31b27-1ff1-4754-4f58-221e8d2057d6</uuid>
> <memory>1048576</memory>
> <currentMemory>1048576</currentMemory>
> <vcpu>2</vcpu>
> <bootloader>/usr/bin/pygrub</bootloader>
> <os>
> <type>linux</type>
> </os>
> <clock offset='utc'/>
> <on_poweroff>destroy</on_poweroff>
> <on_reboot>restart</on_reboot>
> <on_crash>restart</on_crash>
> <devices>
> <disk type='block' device='disk'>
> <driver name='phy'/>
> <source dev='/dev/vg_datastore/oelcl21'/>
> <target dev='xvda' bus='xen'/>
> </disk>
> <disk type='block' device='disk'>
> <driver name='phy'/>
> <source dev='/dev/vg_datastore/skupni1'/>
> <target dev='xvdb' bus='xen'/>
> <shareable/>
> </disk>
> <interface type='bridge'>
> <mac address='00:16:3e:7c:60:aa'/>
> <source bridge='xenbr0'/>
> <script path='/etc/xen/scripts/vif-bridge'/>
> <target dev='vif41.0'/>
> </interface>
> <console type='pty' tty='/dev/pts/2'>
> <source path='/dev/pts/2'/>
> <target port='0'/>
> </console>
> </devices>
> </domain>
>
> [[ XML END ]]
> Calling virDomainCreateLinux()..
>
>
> on domU -node1
>
> fence_xvm -H oelcl21 -ddd
>
> clustat on node1:
>
> [root at oelcl11 ~]# clustat
> Cluster Status for cluster2 @ Wed Sep 22 11:04:49 2010
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> oelcl11 1 Online, Local,
> rgmanager
> oelcl21 2 Online, rgmanager
>
> Service Name Owner (Last)
> State
> ------- ---- ----- ------
> -----
> service:web oelcl11
> started
> [root at oelcl11 ~]#
>
>
> but node2 it waits for 300s an can 't connect
>
> Starting daemons... done
> Starting fencing... Sep 22 10:41:06 oelcl21 kernel: eth0: no IPv6 routers
> present
> done
> [ OK ]
>
> [root at oelcl21 ~]# clustat
> Cluster Status for cluster2 @ Wed Sep 22 11:04:19 2010
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> oelcl11 1 Online
> oelcl21 2 Online, Local
>
> [root at oelcl21 ~]#
>
>
>
> br
> jost
>
>
>
>
> ________________________________________
> From: linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com]
> On Behalf Of Rakovec Jost [Jost.Rakovec at snt.si]
> Sent: Monday, September 13, 2010 9:31 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] fence in xen
>
> Hi
>
>
> Q: do fence_xvmd must run also in domU?
> Because I notice that if I run on host when fence_xvmd is running:
>
> [root at oelcl1 ~]# fence_xvm -H oelcl2 -ddd -o null
> Debugging threshold is now 3
> -- args @ 0x7fffe3f71fb0 --
> args->addr = 225.0.0.12
> args->domain = oelcl2
> args->key_file = /etc/cluster/fence_xvm.key
> args->op = 0
> args->hash = 2
> args->auth = 2
> args->port = 1229
> args->ifindex = 0
> args->family = 2
> args->timeout = 30
> args->retr_time = 20
> args->flags = 0
> args->debug = 3
> -- end args --
> Reading in key file /etc/cluster/fence_xvm.key into 0x7fffe3f70f60 (4096
> max size)
> Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1
> Sending to 225.0.0.12 via 10.9.131.80
> Sending to 225.0.0.12 via 10.9.131.83
> Sending to 225.0.0.12 via 192.168.122.1
> Waiting for connection from XVM host daemon.
> Issuing TCP challenge
> Responding to TCP challenge
> TCP Exchange + Authentication done...
> Waiting for return value from XVM host
> Remote: Operation was successful
>
>
> but if I try to fence ---> reboot then I get:
>
> [root at oelcl1 ~]# fence_xvm -H oelc2
> Remote: Operation was successful
> [root at oelcl1 ~]#
>
> but host2 is not reboot.
>
>
> if fence_xvmd is not run on hosts then I get time out.
>
>
>
> [root at oelcl1 sysconfig]# fence_xvm -H oelcl2 -ddd -o null
> Debugging threshold is now 3
> -- args @ 0x7fff1a6b5580 --
> args->addr = 225.0.0.12
> args->domain = oelcl2
> args->key_file = /etc/cluster/fence_xvm.key
> args->op = 0
> args->hash = 2
> args->auth = 2
> args->port = 1229
> args->ifindex = 0
> args->family = 2
> args->timeout = 30
> args->retr_time = 20
> args->flags = 0
> args->debug = 3
> -- end args --
> Reading in key file /etc/cluster/fence_xvm.key into 0x7fff1a6b4530 (4096
> max size)
> Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1
> Sending to 225.0.0.12 via 10.9.131.80
> Waiting for connection from XVM host daemon.
> Sending to 225.0.0.12 via 127.0.0.1
> Sending to 225.0.0.12 via 10.9.131.80
> Waiting for connection from XVM host daemon.
>
>
>
> Q: how can I try if multicast is ok?
>
> Q: on which network interface must fence_xvmd run on dom0? I notice that on
> hosts-domU is:
>
> virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
> inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
> inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:40 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:0 (0.0 b) TX bytes:7212 (7.0 KiB)
>
>
> also virbr0
>
> and on dom0 guest:
>
> [root at vm5 ~]# fence_xvmd -fdd -I xenbr0
> -- args @ 0xbfd26234 --
> args->addr = 225.0.0.12
> args->domain = (null)
> args->key_file = /etc/cluster/fence_xvm.key
> args->op = 2
> args->hash = 2
> args->auth = 2
> args->port = 1229
> args->ifindex = 7
> args->family = 2
> args->timeout = 30
> args->retr_time = 20
> args->flags = 1
> args->debug = 2
> -- end args --
> Opened ckpt vm_states
> My Node ID = 1
> Domain UUID Owner State
> ------ ---- ----- -----
> Domain-0 00000000-0000-0000-0000-000000000000 00001 00001
> oelcl1 2a53022c-5836-68f0-4514-02a5a0b07e81 00001 00002
> oelcl2 dd268dd4-f012-e0f7-7c77-aa8a58e1e6ab 00001 00002
> oelcman 09c783bd-9107-0916-ebbf-bd27bcc8babe 00001 00002
> Storing oelcl1
> Storing oelcl2
>
>
>
> [root at vm5 ~]# fence_xvmd -fdd -I virbr0
> -- args @ 0xbfd26234 --
> args->addr = 225.0.0.12
> args->domain = (null)
> args->key_file = /etc/cluster/fence_xvm.key
> args->op = 2
> args->hash = 2
> args->auth = 2
> args->port = 1229
> args->ifindex = 7
> args->family = 2
> args->timeout = 30
> args->retr_time = 20
> args->flags = 1
> args->debug = 2
> -- end args --
> Opened ckpt vm_states
> My Node ID = 1
> Domain UUID Owner State
> ------ ---- ----- -----
> Domain-0 00000000-0000-0000-0000-000000000000 00001 00001
> oelcl1 2a53022c-5836-68f0-4514-02a5a0b07e81 00001 00002
> oelcl2 dd268dd4-f012-e0f7-7c77-aa8a58e1e6ab 00001 00002
> oelcman 09c783bd-9107-0916-ebbf-bd27bcc8babe 00001 00002
> Storing oelcl1
> Storing oelcl2
>
>
> no meter whic interface I take fence is not done.
>
>
> thx
>
> br jost
>
>
>
>
>
>
>
>
>
> _____________________________________
> From: linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com]
> On Behalf Of Rakovec Jost [Jost.Rakovec at snt.si]
> Sent: Saturday, September 11, 2010 6:36 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] fence in xen
>
> Hi list!
>
>
> I have a question about fence_xvm.
>
> Situation is:
>
> one physical server with xen --> dom0 with 2 domU. Cluster work fine
> between domU --reboot, relocate,
>
> I'm using redhat 5.5
>
> Problem is with fence from dom0 with "fence_xvm -H oelcl2" , domU is
> destroyed but when it is booted back domU can't join to the cluster. domU
> boot very long time --> FENCED_START_TIMEOUT=300
>
>
> on console I get after the node2 is up:
>
> node2:
>
> INFO: task clurgmgrd:2127 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> clurgmgrd D 0000000000000010 0 2127 2126
> (NOTLB)
> ffff88006f08dda8 0000000000000286 ffff88007cc0b810 0000000000000000
> 0000000000000003 ffff880072009860 ffff880072f6b0c0 00000000000455ec
> ffff880072009a48 ffffffff802649d7
> Call Trace:
> [<ffffffff802649d7>] _read_lock_irq+0x9/0x19
> [<ffffffff8021420e>] filemap_nopage+0x193/0x360
> [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b
> [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14
> [<ffffffff88424b64>] :dlm:dlm_new_lockspace+0x2c/0x860
> [<ffffffff80222b08>] __up_read+0x19/0x7f
> [<ffffffff802d0abb>] __kmalloc+0x8f/0x9f
> [<ffffffff8842b6fa>] :dlm:device_write+0x438/0x5e5
> [<ffffffff80217377>] vfs_write+0xce/0x174
> [<ffffffff80217bc4>] sys_write+0x45/0x6e
> [<ffffffff802602f9>] tracesys+0xab/0xb6
>
>
> between booting on node2:
>
> Starting clvmd: dlm: Using TCP for communications
> clvmd startup timed out
> [FAILED]
>
>
>
> node2:
>
> [root at oelcl2 init.d]# clustat
> Cluster Status for cluster1 @ Sat Sep 11 18:11:21 2010
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> oelcl1 1 Online
> oelcl2 2 Online, Local
>
> [root at oelcl2 init.d]#
>
>
> on first node:
>
> [root at oelcl1 ~]# clustat
> Cluster Status for cluster1 @ Sat Sep 11 18:12:07 2010
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> oelcl1 1 Online, Local,
> rgmanager
> oelcl2 2 Online,
> rgmanager
>
> Service Name Owner (Last)
> State
> ------- ---- ----- ------
> -----
> service:webby oelcl1
> started
> [root at oelcl1 ~]#
>
>
> and then I have to destroy both domU on guest and create it back to get
> node2 work again.
>
> I have use how to on https://access.redhat.com/kb/docs/DOC-5937 and
> http://sources.redhat.com/cluster/wiki/VMClusterCookbook
>
>
> cluster config on dom0
>
>
> <?xml version="1.0"?>
> <cluster alias="vmcluster" config_version="1" name="vmcluster">
> <clusternodes>
> <clusternode name="vm5" nodeid="1" votes="1"/>
> </clusternodes>
> <cman/>
> <fencedevices/>
> <rm/>
> <fence_xvmd/>
> </cluster>
>
>
>
> cluster config on domU
>
>
> <?xml version="1.0"?>
> <cluster alias="cluster1" config_version="49" name="cluster1">
> <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="4"/>
> <clusternodes>
> <clusternode name="oelcl1.name.comi" nodeid="1" votes="1">
> <fence>
> <method name="1">
> <device domain="oelcl1"
> name="xenfence1"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="oelcl2.name.com" nodeid="2" votes="1">
> <fence>
> <method name="1">
> <device domain="oelcl2"
> name="xenfence1"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_xvm" name="xenfence1"/>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="prefer_node1" nofailback="0"
> ordered="1" restricted="1">
> <failoverdomainnode name="oelcl1.name.com"
> priority="1"/>
> <failoverdomainnode name="oelcl2.name.com"
> priority="2"/>
> </failoverdomain>
> </failoverdomains>
> <resources>
> <ip address="xx.xx.xx.xx" monitor_link="1"/>
> <fs device="/dev/xvdb1" force_fsck="0"
> force_unmount="0" fsid="8669" fstype="ext3" mountpoint="/var/www/html"
> name="docroot" self_fence="0"/>
> <script file="/etc/init.d/httpd" name="apache_s"/>
> </resources>
> <service autostart="1" domain="prefer_node1" exclusive="0"
> name="webby" recovery="relocate">
> <ip ref="xx.xx.xx.xx"/>
> <fs ref="docroot"/>
> <script ref="apache_s"/>
> </service>
> </rm>
> </cluster>
>
>
>
>
> fence proces on dom0
>
> [root at vm5 cluster]# ps -ef |grep fenc
> root 18690 1 0 17:40 ? 00:00:00 /sbin/fenced
> root 18720 1 0 17:40 ? 00:00:00 /sbin/fence_xvmd -I xenbr0
> root 22633 14524 0 18:21 pts/3 00:00:00 grep fenc
> [root at vm5 cluster]#
>
>
> and on domU
>
> [root at oelcl1 ~]# ps -ef|grep fen
> root 1523 1 0 17:41 ? 00:00:00 /sbin/fenced
> root 13695 2902 0 18:22 pts/0 00:00:00 grep fen
> [root at oelcl1 ~]#
>
>
>
> Do somebody have any idea why fence don't work?
>
> thx
>
> br
>
> jost
>
>
>
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101001/5f0a0087/attachment.htm>
More information about the Linux-cluster
mailing list