[Linux-cluster] fence in xen

Joel Heenan joelh at planetjoel.com
Fri Oct 1 05:09:16 UTC 2010


Are you saying that if you manually destroy the guest, then start it up it
works?

I don't think your problem is with fencing I think its that the two guests
are not joining correctly. It seems like the fencing part is working.

Do the logs in /var/log/messages show that one node succesfully fenced the
other? What is the output of group_tool on both nodes after they have come
up, this should help you debug it.

I don't think its relevant but this item from the FAQ may help:

http://sources.redhat.com/cluster/wiki/FAQ/Fencing#fence_stuck

Joel

On Wed, Sep 22, 2010 at 7:08 PM, Rakovec Jost <Jost.Rakovec at snt.si> wrote:

> Hi
>
> anybody any idea? Please help!!
>
>
> now i can fence node but after booting it can't connect in to cluster.
>
> on dom0
>
>  fence_xvmd -LX -I xenbr0 -U xen:/// -fdddddddddddddd
>
>
> ipv4_connect: Connecting to client
> ipv4_connect: Success; fd = 12
> Rebooting domain oelcl21...
> [REBOOT] Calling virDomainDestroy(0x99cede0)
> libvir: Xen error : Domain not found: xenUnifiedDomainLookupByName
> [[ XML Domain Info ]]
> <domain type='xen' id='41'>
>  <name>oelcl21</name>
>  <uuid>07e31b27-1ff1-4754-4f58-221e8d2057d6</uuid>
>  <memory>1048576</memory>
>  <currentMemory>1048576</currentMemory>
>  <vcpu>2</vcpu>
>  <bootloader>/usr/bin/pygrub</bootloader>
>  <os>
>    <type>linux</type>
>  </os>
>  <clock offset='utc'/>
>  <on_poweroff>destroy</on_poweroff>
>  <on_reboot>restart</on_reboot>
>  <on_crash>restart</on_crash>
>  <devices>
>    <disk type='block' device='disk'>
>      <driver name='phy'/>
>      <source dev='/dev/vg_datastore/oelcl21'/>
>      <target dev='xvda' bus='xen'/>
>    </disk>
>    <disk type='block' device='disk'>
>      <driver name='phy'/>
>      <source dev='/dev/vg_datastore/skupni1'/>
>      <target dev='xvdb' bus='xen'/>
>      <shareable/>
>    </disk>
>    <interface type='bridge'>
>      <mac address='00:16:3e:7c:60:aa'/>
>      <source bridge='xenbr0'/>
>      <script path='/etc/xen/scripts/vif-bridge'/>
>      <target dev='vif41.0'/>
>    </interface>
>    <console type='pty' tty='/dev/pts/2'>
>      <source path='/dev/pts/2'/>
>      <target port='0'/>
>    </console>
>  </devices>
> </domain>
>
> [[ XML END ]]
> Calling virDomainCreateLinux()..
>
>
> on domU -node1
>
> fence_xvm -H oelcl21 -ddd
>
> clustat on node1:
>
> [root at oelcl11 ~]# clustat
> Cluster Status for cluster2 @ Wed Sep 22 11:04:49 2010
> Member Status: Quorate
>
>  Member Name                                        ID   Status
>  ------ ----                                        ---- ------
>  oelcl11                                                1 Online, Local,
> rgmanager
>  oelcl21                                                2 Online, rgmanager
>
>  Service Name                              Owner (Last)
>          State
>  ------- ----                              ----- ------
>          -----
>  service:web                               oelcl11
>           started
> [root at oelcl11 ~]#
>
>
> but node2 it waits for 300s an can 't connect
>
>   Starting daemons... done
>   Starting fencing... Sep 22 10:41:06 oelcl21 kernel: eth0: no IPv6 routers
> present
> done
> [  OK  ]
>
> [root at oelcl21 ~]# clustat
> Cluster Status for cluster2 @ Wed Sep 22 11:04:19 2010
> Member Status: Quorate
>
>  Member Name                             ID   Status
>  ------ ----                             ---- ------
>  oelcl11                                     1 Online
>  oelcl21                                     2 Online, Local
>
> [root at oelcl21 ~]#
>
>
>
> br
> jost
>
>
>
>
> ________________________________________
> From: linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com]
> On Behalf Of Rakovec Jost [Jost.Rakovec at snt.si]
> Sent: Monday, September 13, 2010 9:31 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] fence in xen
>
> Hi
>
>
> Q: do fence_xvmd must run also  in domU?
> Because I notice that if I run on host when fence_xvmd is running:
>
> [root at oelcl1 ~]# fence_xvm -H oelcl2 -ddd -o null
> Debugging threshold is now 3
> -- args @ 0x7fffe3f71fb0 --
>  args->addr = 225.0.0.12
>  args->domain = oelcl2
>  args->key_file = /etc/cluster/fence_xvm.key
>  args->op = 0
>  args->hash = 2
>  args->auth = 2
>  args->port = 1229
>  args->ifindex = 0
>  args->family = 2
>  args->timeout = 30
>  args->retr_time = 20
>  args->flags = 0
>  args->debug = 3
> -- end args --
> Reading in key file /etc/cluster/fence_xvm.key into 0x7fffe3f70f60 (4096
> max size)
> Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1
> Sending to 225.0.0.12 via 10.9.131.80
> Sending to 225.0.0.12 via 10.9.131.83
> Sending to 225.0.0.12 via 192.168.122.1
> Waiting for connection from XVM host daemon.
> Issuing TCP challenge
> Responding to TCP challenge
> TCP Exchange + Authentication done...
> Waiting for return value from XVM host
> Remote: Operation was successful
>
>
> but if I try to fence ---> reboot then I get:
>
> [root at oelcl1 ~]# fence_xvm -H oelc2
> Remote: Operation was successful
> [root at oelcl1 ~]#
>
> but host2 is not reboot.
>
>
> if fence_xvmd is not run on hosts then I get time out.
>
>
>
> [root at oelcl1 sysconfig]# fence_xvm -H oelcl2 -ddd -o null
> Debugging threshold is now 3
> -- args @ 0x7fff1a6b5580 --
>  args->addr = 225.0.0.12
>  args->domain = oelcl2
>  args->key_file = /etc/cluster/fence_xvm.key
>  args->op = 0
>  args->hash = 2
>  args->auth = 2
>  args->port = 1229
>  args->ifindex = 0
>  args->family = 2
>  args->timeout = 30
>  args->retr_time = 20
>  args->flags = 0
>  args->debug = 3
> -- end args --
> Reading in key file /etc/cluster/fence_xvm.key into 0x7fff1a6b4530 (4096
> max size)
> Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1
> Sending to 225.0.0.12 via 10.9.131.80
> Waiting for connection from XVM host daemon.
> Sending to 225.0.0.12 via 127.0.0.1
> Sending to 225.0.0.12 via 10.9.131.80
> Waiting for connection from XVM host daemon.
>
>
>
> Q: how can I try if multicast is ok?
>
> Q: on which network interface must fence_xvmd run on dom0? I notice that on
> hosts-domU is:
>
> virbr0    Link encap:Ethernet  HWaddr 00:00:00:00:00:00
>          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
>          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:40 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:0
>          RX bytes:0 (0.0 b)  TX bytes:7212 (7.0 KiB)
>
>
> also virbr0
>
> and on dom0 guest:
>
> [root at vm5 ~]# fence_xvmd -fdd -I xenbr0
> -- args @ 0xbfd26234 --
>  args->addr = 225.0.0.12
>  args->domain = (null)
>  args->key_file = /etc/cluster/fence_xvm.key
>  args->op = 2
>  args->hash = 2
>  args->auth = 2
>  args->port = 1229
>  args->ifindex = 7
>  args->family = 2
>  args->timeout = 30
>  args->retr_time = 20
>  args->flags = 1
>  args->debug = 2
> -- end args --
> Opened ckpt vm_states
> My Node ID = 1
> Domain                   UUID                                 Owner State
> ------                   ----                                 ----- -----
> Domain-0                 00000000-0000-0000-0000-000000000000 00001 00001
> oelcl1                   2a53022c-5836-68f0-4514-02a5a0b07e81 00001 00002
> oelcl2                   dd268dd4-f012-e0f7-7c77-aa8a58e1e6ab 00001 00002
> oelcman                  09c783bd-9107-0916-ebbf-bd27bcc8babe 00001 00002
> Storing oelcl1
> Storing oelcl2
>
>
>
> [root at vm5 ~]# fence_xvmd -fdd -I virbr0
> -- args @ 0xbfd26234 --
>  args->addr = 225.0.0.12
>  args->domain = (null)
>  args->key_file = /etc/cluster/fence_xvm.key
>  args->op = 2
>  args->hash = 2
>  args->auth = 2
>  args->port = 1229
>  args->ifindex = 7
>  args->family = 2
>  args->timeout = 30
>  args->retr_time = 20
>  args->flags = 1
>  args->debug = 2
> -- end args --
> Opened ckpt vm_states
> My Node ID = 1
> Domain                   UUID                                 Owner State
> ------                   ----                                 ----- -----
> Domain-0                 00000000-0000-0000-0000-000000000000 00001 00001
> oelcl1                   2a53022c-5836-68f0-4514-02a5a0b07e81 00001 00002
> oelcl2                   dd268dd4-f012-e0f7-7c77-aa8a58e1e6ab 00001 00002
> oelcman                  09c783bd-9107-0916-ebbf-bd27bcc8babe 00001 00002
> Storing oelcl1
> Storing oelcl2
>
>
> no meter whic interface I take fence is not done.
>
>
> thx
>
> br jost
>
>
>
>
>
>
>
>
>
> _____________________________________
> From: linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com]
> On Behalf Of Rakovec Jost [Jost.Rakovec at snt.si]
> Sent: Saturday, September 11, 2010 6:36 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] fence in xen
>
> Hi list!
>
>
> I have a question about fence_xvm.
>
> Situation is:
>
> one physical server with xen --> dom0  with 2 domU. Cluster work fine
> between domU --reboot, relocate,
>
> I'm using redhat 5.5
>
> Problem is with fence from dom0  with "fence_xvm -H oelcl2" ,  domU is
> destroyed but when it is booted back domU can't join to the cluster. domU
> boot very long time --> FENCED_START_TIMEOUT=300
>
>
> on console I get after the node2 is up:
>
> node2:
>
> INFO: task clurgmgrd:2127 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> clurgmgrd     D 0000000000000010     0  2127   2126
> (NOTLB)
>  ffff88006f08dda8  0000000000000286  ffff88007cc0b810  0000000000000000
>  0000000000000003  ffff880072009860  ffff880072f6b0c0  00000000000455ec
>  ffff880072009a48  ffffffff802649d7
> Call Trace:
>  [<ffffffff802649d7>] _read_lock_irq+0x9/0x19
>  [<ffffffff8021420e>] filemap_nopage+0x193/0x360
>  [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b
>  [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14
>  [<ffffffff88424b64>] :dlm:dlm_new_lockspace+0x2c/0x860
>  [<ffffffff80222b08>] __up_read+0x19/0x7f
>  [<ffffffff802d0abb>] __kmalloc+0x8f/0x9f
>  [<ffffffff8842b6fa>] :dlm:device_write+0x438/0x5e5
>  [<ffffffff80217377>] vfs_write+0xce/0x174
>  [<ffffffff80217bc4>] sys_write+0x45/0x6e
>  [<ffffffff802602f9>] tracesys+0xab/0xb6
>
>
> between booting on node2:
>
> Starting clvmd: dlm: Using TCP for communications
> clvmd startup timed out
> [FAILED]
>
>
>
> node2:
>
> [root at oelcl2 init.d]# clustat
> Cluster Status for cluster1 @ Sat Sep 11 18:11:21 2010
> Member Status: Quorate
>
>  Member Name                                                ID   Status
>  ------ ----                                                ---- ------
>  oelcl1                                                  1 Online
>  oelcl2                                                 2 Online, Local
>
> [root at oelcl2 init.d]#
>
>
> on first node:
>
> [root at oelcl1 ~]# clustat
> Cluster Status for cluster1 @ Sat Sep 11 18:12:07 2010
> Member Status: Quorate
>
>  Member Name                                                ID   Status
>  ------ ----                                                ---- ------
>  oelcl1                                                  1 Online, Local,
> rgmanager
>  oelcl2                                                  2 Online,
> rgmanager
>
>  Service Name                                      Owner (Last)
>                          State
>  ------- ----                                      ----- ------
>                          -----
>  service:webby                                     oelcl1
>                   started
> [root at oelcl1 ~]#
>
>
> and then I have to destroy both domU on guest and create it back to get
> node2 work again.
>
> I have use how to on https://access.redhat.com/kb/docs/DOC-5937 and
> http://sources.redhat.com/cluster/wiki/VMClusterCookbook
>
>
> cluster config on dom0
>
>
> <?xml version="1.0"?>
> <cluster alias="vmcluster" config_version="1" name="vmcluster">
>        <clusternodes>
>                <clusternode name="vm5" nodeid="1" votes="1"/>
>        </clusternodes>
>        <cman/>
>        <fencedevices/>
>        <rm/>
>        <fence_xvmd/>
> </cluster>
>
>
>
> cluster config on domU
>
>
> <?xml version="1.0"?>
> <cluster alias="cluster1" config_version="49" name="cluster1">
>        <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="4"/>
>        <clusternodes>
>                <clusternode name="oelcl1.name.comi" nodeid="1" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device domain="oelcl1"
> name="xenfence1"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="oelcl2.name.com" nodeid="2" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device domain="oelcl2"
> name="xenfence1"/>
>                                </method>
>                        </fence>
>                </clusternode>
>        </clusternodes>
>        <cman expected_votes="1" two_node="1"/>
>        <fencedevices>
>                <fencedevice agent="fence_xvm" name="xenfence1"/>
>        </fencedevices>
>        <rm>
>                <failoverdomains>
>                        <failoverdomain name="prefer_node1" nofailback="0"
> ordered="1" restricted="1">
>                                <failoverdomainnode name="oelcl1.name.com"
> priority="1"/>
>                                <failoverdomainnode name="oelcl2.name.com"
> priority="2"/>
>                        </failoverdomain>
>                </failoverdomains>
>                <resources>
>                        <ip address="xx.xx.xx.xx" monitor_link="1"/>
>                        <fs device="/dev/xvdb1" force_fsck="0"
> force_unmount="0" fsid="8669" fstype="ext3" mountpoint="/var/www/html"
> name="docroot" self_fence="0"/>
>                        <script file="/etc/init.d/httpd" name="apache_s"/>
>                </resources>
>                <service autostart="1" domain="prefer_node1" exclusive="0"
> name="webby" recovery="relocate">
>                        <ip ref="xx.xx.xx.xx"/>
>                        <fs ref="docroot"/>
>                        <script ref="apache_s"/>
>                </service>
>        </rm>
> </cluster>
>
>
>
>
> fence proces on dom0
>
> [root at vm5 cluster]# ps -ef |grep fenc
> root     18690     1  0 17:40 ?        00:00:00 /sbin/fenced
> root     18720     1  0 17:40 ?        00:00:00 /sbin/fence_xvmd -I xenbr0
> root     22633 14524  0 18:21 pts/3    00:00:00 grep fenc
> [root at vm5 cluster]#
>
>
> and on domU
>
> [root at oelcl1 ~]# ps -ef|grep fen
> root      1523     1  0 17:41 ?        00:00:00 /sbin/fenced
> root     13695  2902  0 18:22 pts/0    00:00:00 grep fen
> [root at oelcl1 ~]#
>
>
>
> Do somebody have any idea why fence don't work?
>
> thx
>
> br
>
> jost
>
>
>
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101001/5f0a0087/attachment.htm>


More information about the Linux-cluster mailing list