From mgrac at redhat.com  Mon Jul  1 13:15:02 2013
From: mgrac at redhat.com (Marek Grac)
Date: Mon, 01 Jul 2013 15:15:02 +0200
Subject: [Linux-cluster] fence-agents-4.0.1 stable release
Message-ID: <51D180D6.6060200@redhat.com>

Welcome to the fence-agents 4.0.1 release.

This release includes a few minor bug fixes:
* fence agent node assassin was temporary removed

* fix problem for actions for fence agents without plugs/ports
* fix validation for password,password_script or identity file
* fence_scsi now supports delay as other fence agents (be aware to use -H which is agent specific)
* support for new fencing method in fence_dummy - type=fail - all operation should fail
* improve work with invalid power states
* fix fence_apc after introducing support for firmware 5.x - problem occurs on devices with more than 25 devices
* command-prompt can be properly entered from user's input
* fence_dummy can benefit from random delay at its start

* manual page for fence_scsi was extended to provide info about 'unfence'
* notice was added that command prompt is expected to be python regular expression


The new source tarball can be downloaded here:

https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-4.0.1.tar.xz

To report bugs or issues:

    https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

    Join us on IRC (irc.freenode.net #linux-cluster) and share your
    experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

m,



From gianluca.cecchi at gmail.com  Thu Jul  4 15:03:24 2013
From: gianluca.cecchi at gmail.com (Gianluca Cecchi)
Date: Thu, 4 Jul 2013 17:03:24 +0200
Subject: [Linux-cluster] Info on clvmd with halvm on rhel 6.3 based clusters
Message-ID: <CAG2kNCym+WjT9jVY9qTGNczLQMvTmSLCTmMLonb4W-SAGJtTvg@mail.gmail.com>

Hello,
I already read these technotes so that it seems my configuration is
coherent with them:

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/ap-ha-halvm-CA.html
https://access.redhat.com/site/solutions/409813

basically I would like to use clvmd with ha-lvm (as recommended) and
set up the cluster service with resources like this:

                <resources>
                        <lvm lv_name="lv_prova" name="lv_prova"
vg_name="VG_PROVA"/>
                        <fs device="/dev/VG_PROVA/lv_prova"
force_fsck="0" force_unmount="1" fsid="50013" fstype="ext3
" mountpoint="/PROVA" name="PROVA" options="" self_fence="1"/>
                </resources>

                <service autostart="1" domain="MYDOM" name="MYSERVICE">
                        <lvm ref="lv_prova"/>
                        <fs ref="PROVA"/>
                </service>

The problem is that if I starts both nodes, when clvmd starts it
activates all the VGs, because of

action "Activating VG(s):" ${lvm_vgchange} -ayl $LVM_VGS || return $?

in init script for clvmd and $LVM_VGS empty

So when the service starts, it fails in lv activation (because already
active) and then the service goes in failed state.

My system is registered with rhsm and bound to 6.3 release.
Current packages
lvm2-cluster-2.02.95-10.el6_3.3.x86_64
cman-3.0.12.1-32.el6_3.2.x86_64
lvm2-2.02.95-10.el6_3.3.x86_64

I can solve my problem if I set the clvmd init scripts as in rhel 5.9
where there is a conditional statement.
Diff between original 6.3 clvmd init script and mine is now:

$ diff clvmd clvmd.orig
32,34d31
< # Activate & deactivate clustered LVs
< CLVMD_ACTIVATE_VOLUMES=1
<
91,92c88
< if [ -n "$CLVMD_ACTIVATE_VOLUMES" ] ; then
< ${lvm_vgscan} > /dev/null 2>&1
---
> ${lvm_vgscan} > /dev/null 2>&1
94,95c90
< action "Activating VG(s):" ${lvm_vgchange} -ayl $LVM_VGS || return $?
< fi
---
> action "Activating VG(s):" ${lvm_vgchange} -ayl $LVM_VGS || return $?

Then I set this in  /etc/sysconfig/clvmd
CLVMD_ACTIVATE_VOLUMES=""

Now all seems ok in start, stop and relocate.

Between technotes of 6.4 I only see this

BZ #729812
Prior to this update, occasional service failures occurred when
starting the clvmd variant of the
HA-LVM service on multiple nodes in a cluster at the same time. The
start of an HA-LVM
resource coincided with another node initializing that same HA-LVM
resource. With this update,
a patch has been introduced to synchronize the initialization of both
resources. As a result,
services no longer fail due to the simultaneous initialization.

but I'm not sure if it is related with my problem as it is private.

Can anyone give his/her opinion?
I'm going to open a case with redhat, but I would like to understand
if it's me missing something trivial.... as I think I would not be the
only one with this kind of configuration....

Thanks in advance,

Gianluca



From rmitchel at redhat.com  Fri Jul  5 00:42:47 2013
From: rmitchel at redhat.com (Ryan Mitchell)
Date: Fri, 05 Jul 2013 10:42:47 +1000
Subject: [Linux-cluster] Info on clvmd with halvm on rhel 6.3 based
	clusters
In-Reply-To: <CAG2kNCym+WjT9jVY9qTGNczLQMvTmSLCTmMLonb4W-SAGJtTvg@mail.gmail.com>
References: <CAG2kNCym+WjT9jVY9qTGNczLQMvTmSLCTmMLonb4W-SAGJtTvg@mail.gmail.com>
Message-ID: <51D61687.2080204@redhat.com>

Hi,

On 07/05/2013 01:03 AM, Gianluca Cecchi wrote:
> Hello,
> I already read these technotes so that it seems my configuration is
> coherent with them:
>
> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/ap-ha-halvm-CA.html
> https://access.redhat.com/site/solutions/409813
>
> basically I would like to use clvmd with ha-lvm (as recommended) and
> set up the cluster service with resources like this:
>
>                  <resources>
>                          <lvm lv_name="lv_prova" name="lv_prova"
> vg_name="VG_PROVA"/>
>                          <fs device="/dev/VG_PROVA/lv_prova"
> force_fsck="0" force_unmount="1" fsid="50013" fstype="ext3
> " mountpoint="/PROVA" name="PROVA" options="" self_fence="1"/>
>                  </resources>
>
>                  <service autostart="1" domain="MYDOM" name="MYSERVICE">
>                          <lvm ref="lv_prova"/>
>                          <fs ref="PROVA"/>
>                  </service>
>
> The problem is that if I starts both nodes, when clvmd starts it
> activates all the VGs, because of
>
> action "Activating VG(s):" ${lvm_vgchange} -ayl $LVM_VGS || return $?
>
> in init script for clvmd and $LVM_VGS empty
>
> So when the service starts, it fails in lv activation (because already
> active) and then the service goes in failed state.

You aren't starting rgmanager with the -N option are you?  It is not the default.
# man clurgmgrd
        -N     Do  not  perform  stop-before-start.  Combined with the -Z flag to clusvcadm, this can be used to allow rgmanager to be upgraded
               without stopping a given user service or set of services.

What is supposed to happen is:
- clvmd is started at boot time, and all clustered logical volumes are activated (including CLVM HA-LVM volumes)
- rgmanager starts after clvmd, and it initializes all resources to ensure they are in a known state.  For example:
Jul  4 20:06:26 r6ha1 rgmanager[2478]: I am node #1
Jul  4 20:06:27 r6ha1 rgmanager[2478]: Resource Group Manager Starting
Jul  4 20:06:27 r6ha1 rgmanager[2478]: Loading Service Data
Jul  4 20:06:33 r6ha1 rgmanager[2478]: Initializing Services                  <----
Jul  4 20:06:33 r6ha1 rgmanager[3316]: [fs] stop: Could not match /dev/vgdata/lvmirror with a real device
Jul  4 20:06:33 r6ha1 rgmanager[2478]: stop on fs "fsdata" returned 2 (invalid argument(s))
Jul  4 20:06:35 r6ha1 rgmanager[2478]: Services Initialized
Jul  4 20:06:35 r6ha1 rgmanager[2478]: State change: Local UP
Jul  4 20:06:35 r6ha1 rgmanager[2478]: State change: r6ha2.cluster.net UP
- So when rgmanager starts, it stops the CLVM HA-LVM logical volumes again prior to starting the service, unless you disabled the "stop-before-start" option.

I did a quick test and I got the same results as you.  Can you show your resource/service definitions and the logs of when rgmanager starts up?

> My system is registered with rhsm and bound to 6.3 release.
> Current packages
> lvm2-cluster-2.02.95-10.el6_3.3.x86_64
> cman-3.0.12.1-32.el6_3.2.x86_64
> lvm2-2.02.95-10.el6_3.3.x86_64
>
> I can solve my problem if I set the clvmd init scripts as in rhel 5.9
> where there is a conditional statement.
> Diff between original 6.3 clvmd init script and mine is now:
>
> $ diff clvmd clvmd.orig
> 32,34d31
> < # Activate & deactivate clustered LVs
> < CLVMD_ACTIVATE_VOLUMES=1
> <
> 91,92c88
> < if [ -n "$CLVMD_ACTIVATE_VOLUMES" ] ; then
> < ${lvm_vgscan} > /dev/null 2>&1
> ---
>> ${lvm_vgscan} > /dev/null 2>&1
> 94,95c90
> < action "Activating VG(s):" ${lvm_vgchange} -ayl $LVM_VGS || return $?
> < fi
> ---
>> action "Activating VG(s):" ${lvm_vgchange} -ayl $LVM_VGS || return $?
>
> Then I set this in  /etc/sysconfig/clvmd
> CLVMD_ACTIVATE_VOLUMES=""
>
> Now all seems ok in start, stop and relocate.

This is another option, but it shouldn't be required if rgmanager is allowed to stop the resources prior to starting the service.  We could raise an RFE to add 
this functionality to RHEL6 if a case is opened.

>
> Between technotes of 6.4 I only see this
>
> BZ #729812
> Prior to this update, occasional service failures occurred when
> starting the clvmd variant of the
> HA-LVM service on multiple nodes in a cluster at the same time. The
> start of an HA-LVM
> resource coincided with another node initializing that same HA-LVM
> resource. With this update,
> a patch has been introduced to synchronize the initialization of both
> resources. As a result,
> services no longer fail due to the simultaneous initialization.
>
> but I'm not sure if it is related with my problem as it is private.

This is only related to starting the HA-LVM resources simultaneously on multiple nodes, and it synchronizes them correctly so it can only start on node node.

> Can anyone give his/her opinion?
> I'm going to open a case with redhat, but I would like to understand
> if it's me missing something trivial.... as I think I would not be the
> only one with this kind of configuration....

If you open a case with Red Hat, it may find its way to me and we can troubleshoot further.

>
> Thanks in advance,
>
> Gianluca
>


Regards,

Ryan Mitchell
Red Hat Global Support Services



From gianluca.cecchi at gmail.com  Fri Jul  5 15:35:18 2013
From: gianluca.cecchi at gmail.com (Gianluca Cecchi)
Date: Fri, 5 Jul 2013 17:35:18 +0200
Subject: [Linux-cluster] Info on clvmd with halvm on rhel 6.3 based
	clusters
In-Reply-To: <51D61687.2080204@redhat.com>
References: <CAG2kNCym+WjT9jVY9qTGNczLQMvTmSLCTmMLonb4W-SAGJtTvg@mail.gmail.com>
	<51D61687.2080204@redhat.com>
Message-ID: <CAG2kNCwzZQNFEGhapH0xUqxjPcpdDRwBaE7SS8YwjwThGuYVXw@mail.gmail.com>

On Fri, Jul 5, 2013 at 2:42 AM, Ryan Mitchell  wrote:

> You aren't starting rgmanager with the -N option are you?  It is not the
> default.
> # man clurgmgrd
>        -N     Do  not  perform  stop-before-start.  Combined with the -Z
> flag to clusvcadm, this can be used to allow rgmanager to be upgraded
>               without stopping a given user service or set of services.
>
> What is supposed to happen is:
> - clvmd is started at boot time, and all clustered logical volumes are
> activated (including CLVM HA-LVM volumes)
> - rgmanager starts after clvmd, and it initializes all resources to ensure
> they are in a known state.  For example:
> Jul  4 20:06:26 r6ha1 rgmanager[2478]: I am node #1
> Jul  4 20:06:27 r6ha1 rgmanager[2478]: Resource Group Manager Starting
> Jul  4 20:06:27 r6ha1 rgmanager[2478]: Loading Service Data
> Jul  4 20:06:33 r6ha1 rgmanager[2478]: Initializing Services
> <----
> Jul  4 20:06:33 r6ha1 rgmanager[3316]: [fs] stop: Could not match
> /dev/vgdata/lvmirror with a real device
> Jul  4 20:06:33 r6ha1 rgmanager[2478]: stop on fs "fsdata" returned 2
> (invalid argument(s))
> Jul  4 20:06:35 r6ha1 rgmanager[2478]: Services Initialized
> Jul  4 20:06:35 r6ha1 rgmanager[2478]: State change: Local UP
> Jul  4 20:06:35 r6ha1 rgmanager[2478]: State change: r6ha2.cluster.net UP
> - So when rgmanager starts, it stops the CLVM HA-LVM logical volumes again
> prior to starting the service, unless you disabled the "stop-before-start"
> option.
>
> I did a quick test and I got the same results as you.  Can you show your
> resource/service definitions and the logs of when rgmanager starts up?
>

>
> If you open a case with Red Hat, it may find its way to me and we can
> troubleshoot further.


Thanks for the answer Ryan.
I opened the case 00900301 as suggested.
I think the problem is with the clvmd already activating lvs.

My service is composed by ip resource and some <lv..> and <fs...> resources
When the nodes start up, on the node chosen by priority definition of
failover domain I get this:

Jul  4 14:27:46 oraugov4 rgmanager[6469]: Services Initialized
Jul  4 14:27:46 oraugov4 rgmanager[6469]: State change: Local UP
Jul  4 14:27:46 oraugov4 rgmanager[6469]: Starting stopped service
service:MYSERVICE
Jul  4 14:27:48 oraugov4 rgmanager[9436]: [lvm] Failed to activate
logical volume, VG_UGDMPRO_TEMP/LV_UGDMPRO_TEMP
Jul  4 14:27:48 oraugov4 rgmanager[9458]: [lvm] Attempting cleanup of
VG_UGDMPRO_TEMP/LV_UGDMPRO_TEMP
Jul  4 14:27:49 oraugov4 rgmanager[9484]: [lvm] Failed second attempt
to activate VG_UGDMPRO_TEMP/LV_UGDMPRO_TEMP
Jul  4 14:27:49 oraugov4 rgmanager[6469]: start on lvm
"LV_UGDMPRO_TEMP" returned 1 (generic error)
Jul  4 14:27:49 oraugov4 rgmanager[6469]: #68: Failed to start
service:MYSERVICE; return value: 1
Jul  4 14:27:49 oraugov4 rgmanager[6469]: Stopping service service:MYSERVICE
Jul  4 14:27:49 oraugov4 rgmanager[9557]: [fs] stop: Could not match
/dev/VG_PROVA/lv_prova with a real device
Jul  4 14:27:49 oraugov4 rgmanager[6469]: stop on fs "PROVA" returned
2 (invalid argument(s))
Jul  4 14:27:49 oraugov4 rgmanager[9594]: [fs] stop: Could not match
/dev/VG_UGDMPRE_RDOF/LV_UGDMPRE_RDOF with a real device
Jul  4 14:27:49 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_RDOF"
returned 2 (invalid argument(s))
Jul  4 14:27:49 oraugov4 rgmanager[9631]: [fs] stop: Could not match
/dev/VG_UGDMPRE_REDO/LV_UGDMPRE_REDO with a real device
Jul  4 14:27:49 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_REDO"
returned 2 (invalid argument(s))
Jul  4 14:27:49 oraugov4 rgmanager[9669]: [fs] stop: Could not match
/dev/VG_UGDMPRE_DATA/LV_UGDMPRE_DATA with a real device
Jul  4 14:27:49 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_DATA"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9706]: [fs] stop: Could not match
/dev/VG_UGDMPRE_SAVE/LV_UGDMPRE_SAVE with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_SAVE"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9743]: [fs] stop: Could not match
/dev/VG_UGDMPRE_CTRL/LV_UGDMPRE_CTRL with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_CTRL"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9780]: [fs] stop: Could not match
/dev/VG_UGDMPRE_TEMP/LV_UGDMPRE_TEMP with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRE_TEMP"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9817]: [fs] stop: Could not match
/dev/VG_UGDMPRO_RDOF/LV_UGDMPRO_RDOF with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_RDOF"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9854]: [fs] stop: Could not match
/dev/VG_UGDMPRO_REDO/LV_UGDMPRO_REDO with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_REDO"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9891]: [fs] stop: Could not match
/dev/VG_UGDMPRO_DATA/LV_UGDMPRO_DATA with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_DATA"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9928]: [fs] stop: Could not match
/dev/VG_UGDMPRO_SAVE/LV_UGDMPRO_SAVE with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_SAVE"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[9965]: [fs] stop: Could not match
/dev/VG_UGDMPRO_CTRL/LV_UGDMPRO_CTRL with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_CTRL"
returned 2 (invalid argument(s))
Jul  4 14:27:50 oraugov4 rgmanager[10002]: [fs] stop: Could not match
/dev/VG_UGDMPRO_TEMP/LV_UGDMPRO_TEMP with a real device
Jul  4 14:27:50 oraugov4 rgmanager[6469]: stop on fs "UGDMPRO_TEMP"
returned 2 (invalid argument(s))
Jul  4 14:27:53 oraugov4 rgmanager[6469]: State change: icloraugov3 UP
Jul  4 14:28:11 oraugov4 rgmanager[6469]: #12: RG service:MYSERVICE
failed to stop; intervention required


So I think I have double problem:

1) lv fails to activate because already active
2) then to solve the problem it tries to stop resources but fs.sh
fails because it seems there is no related lv under it
I think during the stop it should reverse order, so it should stop fs
first (and it should get a result of already stopped) and only after
it should deactivate the related lv... or not?

Gianluca



From mgrac at redhat.com  Tue Jul  9 21:48:33 2013
From: mgrac at redhat.com (Marek Grac)
Date: Tue, 09 Jul 2013 23:48:33 +0200
Subject: [Linux-cluster] fence_ovh - Fence agent for OVH (Proxmox 3)
In-Reply-To: <498927.2707.1372269310026.JavaMail.adrian@adrianworktop>
References: <498927.2707.1372269310026.JavaMail.adrian@adrianworktop>
Message-ID: <51DC8531.6080404@redhat.com>

Hi Adrian,

On 06/26/2013 07:55 PM, Adrian Gibanel wrote:
>    I've improved my former fence_ovh script so that it works in Proxmox 3 and so that it uses suds library as I was suggested in the linux-cluster mailing list.
>
> 1) What is fence_ovh
>
> fence_ovh is a fence agent based on python for the big French datacentre provider OVH. You can get information about OVH on: http://www.ovh.co.uk/ . I also wanted to make clear that I'm not part of official OVH staff.
>
Thanks, you have done a great job in that rewrite. I have modified it a 
little to better fit into our existing infrastructure (--verbose, 
--plug). The only real change that I have added is that SOAP is not 
disconnected after every operation. Please take a look at it and (very 
likely) fix minor errors which I have introduced as I was not able to 
test it.

m,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-fence_ovh-New-fence-agent-for-OVH.patch
Type: text/x-patch
Size: 6371 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130709/b045fd88/attachment.bin>

From adrian.gibanel at btactic.com  Thu Jul 11 17:48:55 2013
From: adrian.gibanel at btactic.com (Adrian Gibanel)
Date: Thu, 11 Jul 2013 19:48:55 +0200 (CEST)
Subject: [Linux-cluster] fence_ovh - Fence agent for OVH (Proxmox 3)
In-Reply-To: <mailman.49.1373472004.32397.linux-cluster@redhat.com>
References: <mailman.49.1373472004.32397.linux-cluster@redhat.com>
Message-ID: <16433523.1103.1373564934098.JavaMail.adrian@adrianworktop>

----- Mensaje original ----- 

> Hi Adrian,

> On 06/26/2013 07:55 PM, Adrian Gibanel wrote:
> > I've improved my former fence_ovh script so that it works in
> > Proxmox 3 and so that it uses suds library as I was suggested in
> > the linux-cluster mailing list.
> >
> > 1) What is fence_ovh
> >
> > fence_ovh is a fence agent based on python for the big French
> > datacentre provider OVH. You can get information about OVH on:
> > http://www.ovh.co.uk/ . I also wanted to make clear that I'm not
> > part of official OVH staff.
> >
> Thanks, you have done a great job in that rewrite. I have modified it
> a
> little to better fit into our existing infrastructure (--verbose,
> --plug). The only real change that I have added is that SOAP is not
> disconnected after every operation. Please take a look at it and
> (very
> likely) fix minor errors which I have introduced as I was not able to
> test it.

Thank you Marek!

  About the SOAP disconnection it's ok you not logging out at each time but I think the soap login should be tried just before calling the reboot_time function. The reason is that I'm afraid that 150 or 240 seconds are long enough for session to timeout. Maybe they are not but I prefer to be in the safe side.

  I have not tested it yet too but I've seen some changes that can made to it:

	elif options["--action"] in  ['on', 'off' ]:
should be:
	elif options["--action"] in  ['on', 'reboot' ]:

And at:

session = soap.service.login(options["--username"], options["--password"], 'es', 0)

You should use 'en' instead of 'es' so that default errors are printed in English by default.

Again, thank you!

-- 

-- 
Adri?n Gibanel 
I.T. Manager 

+34 675 683 301 
www.btactic.com 



Ens podeu seguir a/Nos podeis seguir en: 

i 


Abans d?imprimir aquest missatge, pensa en el medi ambient. El medi ambient ?s cosa de tothom. / Antes de imprimir el mensaje piensa en el medio ambiente. El medio ambiente es cosa de todos. 

AVIS: 
El contingut d'aquest missatge i els seus annexos ?s confidencial. Si no en sou el destinatari, us fem saber que est? prohibit utilitzar-lo, divulgar-lo i/o copiar-lo sense tenir l'autoritzaci? corresponent. Si heu rebut aquest missatge per error, us agrairem que ho feu saber immediatament al remitent i que procediu a destruir el missatge . 

AVISO: 
El contenido de este mensaje y de sus anexos es confidencial. Si no es el destinatario, les hacemos saber que est? prohibido utilizarlo, divulgarlo y/o copiarlo sin tener la autorizaci?n correspondiente. Si han recibido este mensaje por error, les agradecer?amos que lo hagan saber inmediatamente al remitente y que procedan a destruir el mensaje . 



From linuxtovishesh at gmail.com  Sun Jul 14 12:28:37 2013
From: linuxtovishesh at gmail.com (Vishesh kumar)
Date: Sun, 14 Jul 2013 17:58:37 +0530
Subject: [Linux-cluster] Getting error in luci
Message-ID: <CAKpaJ1y=juzCrSzLGm257vQ-TMBsrfF0A8FGipOiOR5YVSKMVg@mail.gmail.com>

Hi Members,

I am getting below error in luci web interface. Can you please  let me know
the reasons for same
++++++++++++++++++++++
An error occurred during the creation of cluster "vk" while updating the
luci database: An operation previously failed, with traceback: File
"/usr/lib/python2.6/threading.py", line 504, in __bootstrap
self.__bootstrap_inner() File "/usr/lib/python2.6/threading.py", line 532,
in __bootstrap_inner self.run() File "/usr/lib/python2.6/threading.py",
line 484, in run self.__target(*self.__args, **self.__kwargs) File
"/usr/lib/python2.6/site-packages/paste/httpserver.py", line 878, in
worker_thread_callback runnable() File
"/usr/lib/python2.6/site-packages/paste/httpserver.py", line 1052, in
+++++++++++++++++++++++++++++++++++++
-- 
Thanks
Vishesh Kumar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130714/665c8973/attachment.htm>

From hsiddiqi at gmail.com  Mon Jul 15 08:54:40 2013
From: hsiddiqi at gmail.com (Hammad Siddiqi)
Date: Mon, 15 Jul 2013 13:54:40 +0500
Subject: [Linux-cluster] :BUG: soft lockup - CPU#0 stuck for 67s!
	[vm.sh:29764]
Message-ID: <CABE8nt8nTBHuOTpEy+Fo28SZFdeUQo=FQB7yPUa8L_KpWbR=7A@mail.gmail.com>

Geniuses,

I have a Redhat cluster setup for VMs running on KVM. during the live
migration I have come across a kernel bug related to soft lockup of CPU #
0. Please see the back trace from abrt tool below. The host specs are:

Supermicro Server with AMD Opteron processor (48 cores)
RAM ECC 512 GB
6.4 x86_64
Disk images stored on Netapp volumes shared via NFS on 10Gbps network


The issue may not be related to Clustering Suite (looks like kernel
related) but any help in pointing to the right direction will highly be
appreciated. Please let me know if you require additional
information/logs/output

Thank you
Hammad Siddiqi



abrt_version:   2.0.8
cmdline:        ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS
LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap
SYSFONT=latarcyrheb-sun16 crashkernel=161M at 0M rd_LVM_LV=VolGroup/lv_root
 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
comment:        During live migration of KVM VMs (13 VMs at a time)
kernel:         2.6.32-358.6.2.el6.x86_64
logfile:
time:           Mon 15 Jul 2013 12:55:20 AM PDT

sosreport.tar.xz: Binary file, 3153956 bytes

backtrace:
:BUG: soft lockup - CPU#0 stuck for 67s! [vm.sh:29764]
:Modules linked in: act_police cls_u32 sch_ingress cls_fw sch_htb
ip6table_filter ip6_tables ebtable_nat ebtables bridge nfs lockd fscache
auth_rpcgss nfs_acl dlm configfs sunrpc iptable_filter ip_tables
openvswitch xsvhba(U) scsi_transport_fc scsi_tgt xve(U) xsvnic(U) bonding
ipv6 8021q garp stp llc xscore(U) ib_cm mlx4_ib ib_sa ib_mad ib_core
vhost_net macvtap macvlan tun kvm_amd kvm igb dca ptp pps_core mlx4_core sg
serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core
shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom mpt2sas
scsi_transport_sas raid_class ata_generic pata_acpi pata_atiixp ahci
usb_storage dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
scsi_wait_scan]
:CPU 0
:Modules linked in: act_police cls_u32 sch_ingress cls_fw sch_htb
ip6table_filter ip6_tables ebtable_nat ebtables bridge nfs lockd fscache
auth_rpcgss nfs_acl dlm configfs sunrpc iptable_filter ip_tables
openvswitch xsvhba(U) scsi_transport_fc scsi_tgt xve(U) xsvnic(U) bonding
ipv6 8021q garp stp llc xscore(U) ib_cm mlx4_ib ib_sa ib_mad ib_core
vhost_net macvtap macvlan tun kvm_amd kvm igb dca ptp pps_core mlx4_core sg
serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core
shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom mpt2sas
scsi_transport_sas raid_class ata_generic pata_acpi pata_atiixp ahci
usb_storage dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
scsi_wait_scan]
:Pid: 29764, comm: vm.sh Not tainted 2.6.32-358.6.2.el6.x86_64 #1
Supermicro H8QG6/H8QG6
:RIP: 0010:[<ffffffff8105007c>]  [<ffffffff8105007c>]
wait_for_rqlock+0x2c/0x40
:RSP: 0018:ffff887a9febbeb8  EFLAGS: 00000202
:RAX: 0000000003d503b2 RBX: ffff887a9febbeb8 RCX: ffff880028216700
:RDX: 00000000000003d5 RSI: 0000000000000056 RDI: 0000000000000000
:RBP: ffffffff8100bb8e R08: ffff887bd174b500 R09: 0000000000000000
:R10: 0000000000000001 R11: 00000000000004fd R12: ffffffff00000000
:R13: 0000000000007444 R14: ffff887b00040001 R15: 0000000000000011
:FS:  00007f3bf82ec700(0000) GS:ffff880028200000(0000)
knlGS:0000000000000000
:CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
:CR2: 00007f3bf79250a0 CR3: 0000000001a85000 CR4: 00000000000007f0
:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
:Process vm.sh (pid: 29764, threadinfo ffff887a9feba000, task
ffff887bd174b500)
:Stack:
:ffff887a9febbf38 ffffffff8107382b ffff888007203668 ffff887a9febbef8
: 00007fff8bf63cdc ffff887bd174b9c8 ffff887bd174b9c8 0000000000000000
: ffff887a9febbef8 ffff887a9febbef8 0000000001395020 0000000000000000
:Call Trace:
:[<ffffffff8107382b>] ? do_exit+0x5ab/0x870
:[<ffffffff81073b48>] ? do_group_exit+0x58/0xd0
:[<ffffffff81073bd7>] ? sys_exit_group+0x17/0x20
:[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
:Code: 48 89 e5 0f 1f 44 00 00 48 c7 c0 00 67 01 00 65 48 8b 0c 25 b0 e0 00
00 0f ae f0 48 01 c1 eb 09 0f 1f 80 00 00 00 00 f3 90 8b 01 <89> c2 c1 fa
10 66 39 c2 75 f2 c9 c3 0f 1f 84 00 00 00 00 00 55

END:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130715/49ecd312/attachment.htm>

From jpokorny at redhat.com  Mon Jul 15 17:08:17 2013
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Mon, 15 Jul 2013 19:08:17 +0200
Subject: [Linux-cluster] Getting error in luci
In-Reply-To: <CAKpaJ1y=juzCrSzLGm257vQ-TMBsrfF0A8FGipOiOR5YVSKMVg@mail.gmail.com>
References: <CAKpaJ1y=juzCrSzLGm257vQ-TMBsrfF0A8FGipOiOR5YVSKMVg@mail.gmail.com>
Message-ID: <20130715170817.GA14067@redhat.com>

Hello Vishesh,

On 14/07/13 17:58 +0530, Vishesh kumar wrote:
> I am getting below error in luci web interface. Can you please let
> me know the reasons for same

[following slightly modified in-place]

> +++++++++++++++++++++++++++++++++++++
> An error occurred during the creation of cluster "vk" while updating the
> luci database: An operation previously failed, with traceback:
> 
> File "/usr/lib/python2.6/threading.py", line 504,
>   in __bootstrap
>   self.__bootstrap_inner()
> File "/usr/lib/python2.6/threading.py", line 532,
>   in __bootstrap_inner
>   self.run()
> File "/usr/lib/python2.6/threading.py", line 484,
>   in run
>   self.__target(*self.__args, **self.__kwargs)
> File "/usr/lib/python2.6/site-packages/paste/httpserver.py", line 878,
>   in worker_thread_callback
>   runnable()
> File "/usr/lib/python2.6/site-packages/paste/httpserver.py", line 1052,
>   in [reconstructed: process_request]
>   [reconstructed:
>    self.process_request_in_thread(request, client_address)]
> [from now on, cannot be reconstructed reliably but expecting up to
> tens of subsequent frames]
> +++++++++++++++++++++++++++++++++++++

If I understand it correctly, this is what you get directly in the
luci interface.  As you can see, the traceback is not complete, but
it may get trimmed somewhere on the way from source to the web browser
(side question: is the snippet you provided really complete, i.e.,
not followed by the rest of expected traceback?).

Anyway, authoritative source to check for details about the problems
in luci is its log file located at /var/log/luci/luci.log by default.
Could you please watch this file while reproducing the issue (best
by issuing "tail -f /var/log/luci/luci.log" in a separete terminal)
and provide the respective part of the log?  This might help a lot.


If you are only managing a single cluster or so in luci, perhaps
I would recommend you to drop existing luci-internal database and
start all over:

    service luci stop
    rm -i /var/lib/luci/data/luci.db
    service luci start


Hope this helps.

-- 
Jan



From mgrac at redhat.com  Wed Jul 17 15:47:12 2013
From: mgrac at redhat.com (Marek Grac)
Date: Wed, 17 Jul 2013 17:47:12 +0200
Subject: [Linux-cluster] fence_ovh - Fence agent for OVH (Proxmox 3)
In-Reply-To: <16433523.1103.1373564934098.JavaMail.adrian@adrianworktop>
References: <mailman.49.1373472004.32397.linux-cluster@redhat.com>
	<16433523.1103.1373564934098.JavaMail.adrian@adrianworktop>
Message-ID: <51E6BC80.9030805@redhat.com>

Hi,

On 07/11/2013 07:48 PM, Adrian Gibanel wrote:
>    About the SOAP disconnection it's ok you not logging out at each time but I think the soap login should be tried just before calling the reboot_time function. The reason is that I'm afraid that 150 or 240 seconds are long enough for session to timeout. Maybe they are not but I prefer to be in the safe side.
Ok, we will start with original version when login/logout is done 
several times it should not impact fencing a lot. Later we can test if 
it works correctly with single login or not.
>    I have not tested it yet too but I've seen some changes that can made to it:
>
> 	elif options["--action"] in  ['on', 'off' ]:
> should be:
> 	elif options["--action"] in  ['on', 'reboot' ]:
>
> And at:
>
> session = soap.service.login(options["--username"], options["--password"], 'es', 0)
>
> You should use 'en' instead of 'es' so that default errors are printed in English by default.
>
fixed

Fence agent is now upstream git - it will be part of next release 4.0.2 
at the end of the month.

m,



From linuxtovishesh at gmail.com  Thu Jul 18 11:48:38 2013
From: linuxtovishesh at gmail.com (Vishesh kumar)
Date: Thu, 18 Jul 2013 17:18:38 +0530
Subject: [Linux-cluster] Getting error in luci
In-Reply-To: <20130715170817.GA14067@redhat.com>
References: <CAKpaJ1y=juzCrSzLGm257vQ-TMBsrfF0A8FGipOiOR5YVSKMVg@mail.gmail.com>
	<20130715170817.GA14067@redhat.com>
Message-ID: <CAKpaJ1zEZzWA4+KCsMp4hWc2ccE3TGRwdCpfCq+ZxzFZe5CK0Q@mail.gmail.com>

On Mon, Jul 15, 2013 at 10:38 PM, Jan Pokorn? <jpokorny at redhat.com> wrote:

> rm -i /var/lib/luci/data/luci.db



Thanks Jan . It worked after removing /var/lib/luci/data/luci.db

Thanks
-- 
http://linuxmantra.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130718/fa8c2d54/attachment.htm>

From linuxtovishesh at gmail.com  Thu Jul 18 11:54:34 2013
From: linuxtovishesh at gmail.com (Vishesh kumar)
Date: Thu, 18 Jul 2013 17:24:34 +0530
Subject: [Linux-cluster] fence_xvm nopt working
Message-ID: <CAKpaJ1x6LiTx9tVRjpRR6i4gEd=ZJGimoddaj15u+T4Mb1q8NA@mail.gmail.com>

Hi All,

I am trying to implement fence_xvm using backend libvirt. Everything is
setup fine and fence_virt.conf have following configuration
++++++++++++++++++++++++++++++++++

backends {
    libvirt {
        uri = "qemu:///system";
    }
}
listeners {
    multicast {
        port = "1229";
        family = "ipv4";
        address = "225.0.0.12";
        key_file = "/etc/cluster/fence_xvm.key";
    }
}
fence_virtd {
    module_path = "/usr/lib64/fence-virt";
    backend = "libvirt";
    listener = "multicast";}
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

But this does not work as daeom fence_virtd immediately after starting. I
am unable to find any log as well.

Changing backend to checkpoint resolve the issue of fence_virtd stoppage,
but i have no idea to implement checkpoint backend.

-- 
http://linuxmantra.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130718/1f4daaaa/attachment.htm>

From adel.benzarrouk at gmail.com  Thu Jul 18 12:05:48 2013
From: adel.benzarrouk at gmail.com (Adel Ben Zarrouk)
Date: Thu, 18 Jul 2013 13:05:48 +0100
Subject: [Linux-cluster] Unable to connect to hp blade system with
	fence_hpblade (RHEL 6.4)
Message-ID: <CAH535BtqBuqzFF=q2C=_vWsSmu4mqT8EzTnb3FJr=6YeoHz2iA@mail.gmail.com>

Hello,

I am trying to connect to Onboard administration of HP blade system using
fence_hpblade tool , but I am getting the message:

unable/connect to fence device.

I was able to connect using ssh.

Please any advice or recommendation.

Regards

 --Adel


On Thu, Jul 18, 2013 at 12:48 PM, Vishesh kumar <linuxtovishesh at gmail.com>wrote:

>
> On Mon, Jul 15, 2013 at 10:38 PM, Jan Pokorn? <jpokorny at redhat.com> wrote:
>
>> rm -i /var/lib/luci/data/luci.db
>
>
>
> Thanks Jan . It worked after removing /var/lib/luci/data/luci.db
>
> Thanks
> --
> http://linuxmantra.com
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130718/82a69e64/attachment.htm>

From lists at alteeve.ca  Thu Jul 18 13:56:12 2013
From: lists at alteeve.ca (Digimer)
Date: Thu, 18 Jul 2013 09:56:12 -0400
Subject: [Linux-cluster] Unable to connect to hp blade system with
 fence_hpblade (RHEL 6.4)
In-Reply-To: <CAH535BtqBuqzFF=q2C=_vWsSmu4mqT8EzTnb3FJr=6YeoHz2iA@mail.gmail.com>
References: <CAH535BtqBuqzFF=q2C=_vWsSmu4mqT8EzTnb3FJr=6YeoHz2iA@mail.gmail.com>
Message-ID: <51E7F3FC.7060606@alteeve.ca>

Can you share the exact 'fence_hpblade <args>' line you're sending as 
well as any output you get back? I am not familiar with that agent, but 
most have a verbose or debug mode that will return more output. Some 
agents need to have the command prompt string defined or extra args like 
'lanplus' for iLO defined. The man page should provide some insight.

digimer

On 18/07/13 08:05, Adel Ben Zarrouk wrote:
> Hello,
>
> I am trying to connect to Onboard administration of HP blade system
> using fence_hpblade tool , but I am getting the message:
>
> unable/connect to fence device.
>
> I was able to connect using ssh.
>
> Please any advice or recommendation.
>
> Regards
>
>   --Adel
>
>
> On Thu, Jul 18, 2013 at 12:48 PM, Vishesh kumar
> <linuxtovishesh at gmail.com <mailto:linuxtovishesh at gmail.com>> wrote:
>
>
>     On Mon, Jul 15, 2013 at 10:38 PM, Jan Pokorn? <jpokorny at redhat.com
>     <mailto:jpokorny at redhat.com>> wrote:
>
>         rm -i /var/lib/luci/data/luci.db
>
>
>
>     Thanks Jan . It worked after removing /var/lib/luci/data/luci.db
>
>     Thanks
>     --
>     http://linuxmantra.com
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From lists at alteeve.ca  Thu Jul 18 13:57:47 2013
From: lists at alteeve.ca (Digimer)
Date: Thu, 18 Jul 2013 09:57:47 -0400
Subject: [Linux-cluster] fence_xvm nopt working
In-Reply-To: <CAKpaJ1x6LiTx9tVRjpRR6i4gEd=ZJGimoddaj15u+T4Mb1q8NA@mail.gmail.com>
References: <CAKpaJ1x6LiTx9tVRjpRR6i4gEd=ZJGimoddaj15u+T4Mb1q8NA@mail.gmail.com>
Message-ID: <51E7F45B.7040602@alteeve.ca>

On 18/07/13 07:54, Vishesh kumar wrote:
> Hi All,
>
> I am trying to implement fence_xvm using backend libvirt. Everything is
> setup fine and fence_virt.conf have following configuration
> ++++++++++++++++++++++++++++++++++
>
> backends  {
>      libvirt  {
>          uri  =  "qemu:///system";
>      }
>
> }
>
> listeners  {
>      multicast  {
>          port  =  "1229";
>          family  =  "ipv4";
>          address  =  "225.0.0.12";
>          key_file  =  "/etc/cluster/fence_xvm.key";
>      }
>
> }
>
> fence_virtd  {
>      module_path  =  "/usr/lib64/fence-virt";
>      backend  =  "libvirt";
>      listener  =  "multicast";
> }
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> But this does not work as daeom fence_virtd immediately after starting.
> I am unable to find any log as well.
>
> Changing backend to checkpoint resolve the issue of fence_virtd
> stoppage, but i have no idea to implement checkpoint backend.

This is a not-quite-finished tutorial I have been working on to cover 
fencing with fence_xvm / fence_virtd. Perhaps it would help?

https://alteeve.ca/w/Fencing_KVM_Virtual_Servers

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From linuxtovishesh at gmail.com  Thu Jul 18 14:17:55 2013
From: linuxtovishesh at gmail.com (Vishesh kumar)
Date: Thu, 18 Jul 2013 19:47:55 +0530
Subject: [Linux-cluster] fence_xvm nopt working
In-Reply-To: <51E7F45B.7040602@alteeve.ca>
References: <CAKpaJ1x6LiTx9tVRjpRR6i4gEd=ZJGimoddaj15u+T4Mb1q8NA@mail.gmail.com>
	<51E7F45B.7040602@alteeve.ca>
Message-ID: <CAKpaJ1z+HEJRZPT+bcH8P1O_1tBpCuNvU5DNxq0rd-5XLrM5rw@mail.gmail.com>

Thanks for reply,

I have to check value of /sys/class/net/virbr0/bridge/multicast_querier for
centos6.4. Do this value only belong to bridged interface?

Thanks


On Thu, Jul 18, 2013 at 7:27 PM, Digimer <lists at alteeve.ca> wrote:

> On 18/07/13 07:54, Vishesh kumar wrote:
>
>> Hi All,
>>
>> I am trying to implement fence_xvm using backend libvirt. Everything is
>> setup fine and fence_virt.conf have following configuration
>> ++++++++++++++++++++++++++++++**++++
>>
>> backends  {
>>      libvirt  {
>>          uri  =  "qemu:///system";
>>      }
>>
>> }
>>
>> listeners  {
>>      multicast  {
>>          port  =  "1229";
>>          family  =  "ipv4";
>>          address  =  "225.0.0.12";
>>          key_file  =  "/etc/cluster/fence_xvm.key";
>>      }
>>
>> }
>>
>> fence_virtd  {
>>      module_path  =  "/usr/lib64/fence-virt";
>>      backend  =  "libvirt";
>>      listener  =  "multicast";
>> }
>> ++++++++++++++++++++++++++++++**++++++++++++++++++++++++++++++**++++
>>
>> But this does not work as daeom fence_virtd immediately after starting.
>> I am unable to find any log as well.
>>
>> Changing backend to checkpoint resolve the issue of fence_virtd
>> stoppage, but i have no idea to implement checkpoint backend.
>>
>
> This is a not-quite-finished tutorial I have been working on to cover
> fencing with fence_xvm / fence_virtd. Perhaps it would help?
>
> https://alteeve.ca/w/Fencing_**KVM_Virtual_Servers<https://alteeve.ca/w/Fencing_KVM_Virtual_Servers>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>



-- 
http://linuxmantra.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130718/e2572a8e/attachment.htm>

From lists at alteeve.ca  Thu Jul 18 15:21:10 2013
From: lists at alteeve.ca (Digimer)
Date: Thu, 18 Jul 2013 11:21:10 -0400
Subject: [Linux-cluster] fence_xvm nopt working
In-Reply-To: <CAKpaJ1z+HEJRZPT+bcH8P1O_1tBpCuNvU5DNxq0rd-5XLrM5rw@mail.gmail.com>
References: <CAKpaJ1x6LiTx9tVRjpRR6i4gEd=ZJGimoddaj15u+T4Mb1q8NA@mail.gmail.com>
	<51E7F45B.7040602@alteeve.ca>
	<CAKpaJ1z+HEJRZPT+bcH8P1O_1tBpCuNvU5DNxq0rd-5XLrM5rw@mail.gmail.com>
Message-ID: <51E807E6.5020706@alteeve.ca>

If your bridge is 'virbr0', then yes. If you use traditional bridging, 
probably not. Do you see the VMs from the how when you run 'fence_xvm -o 
list'?

On 18/07/13 10:17, Vishesh kumar wrote:
> Thanks for reply,
>
> I have to check value of /sys/class/net/virbr0/bridge/multicast_querier
> for centos6.4. Do this value only belong to bridged interface?
>
> Thanks
>
>
> On Thu, Jul 18, 2013 at 7:27 PM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
>
>     On 18/07/13 07:54, Vishesh kumar wrote:
>
>         Hi All,
>
>         I am trying to implement fence_xvm using backend libvirt.
>         Everything is
>         setup fine and fence_virt.conf have following configuration
>         ++++++++++++++++++++++++++++++__++++
>
>         backends  {
>               libvirt  {
>                   uri  =  "qemu:///system";
>               }
>
>         }
>
>         listeners  {
>               multicast  {
>                   port  =  "1229";
>                   family  =  "ipv4";
>                   address  =  "225.0.0.12";
>                   key_file  =  "/etc/cluster/fence_xvm.key";
>               }
>
>         }
>
>         fence_virtd  {
>               module_path  =  "/usr/lib64/fence-virt";
>               backend  =  "libvirt";
>               listener  =  "multicast";
>         }
>         ++++++++++++++++++++++++++++++__++++++++++++++++++++++++++++++__++++
>
>         But this does not work as daeom fence_virtd immediately after
>         starting.
>         I am unable to find any log as well.
>
>         Changing backend to checkpoint resolve the issue of fence_virtd
>         stoppage, but i have no idea to implement checkpoint backend.
>
>
>     This is a not-quite-finished tutorial I have been working on to
>     cover fencing with fence_xvm / fence_virtd. Perhaps it would help?
>
>     https://alteeve.ca/w/Fencing___KVM_Virtual_Servers
>     <https://alteeve.ca/w/Fencing_KVM_Virtual_Servers>
>
>     --
>     Digimer
>     Papers and Projects: https://alteeve.ca/w/
>     What if the cure for cancer is trapped in the mind of a person
>     without access to education?
>
>
>
>
> --
> http://linuxmantra.com


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From linuxtovishesh at gmail.com  Thu Jul 18 16:40:43 2013
From: linuxtovishesh at gmail.com (Vishesh kumar)
Date: Thu, 18 Jul 2013 22:10:43 +0530
Subject: [Linux-cluster] fence_xvm nopt working
In-Reply-To: <51E807E6.5020706@alteeve.ca>
References: <CAKpaJ1x6LiTx9tVRjpRR6i4gEd=ZJGimoddaj15u+T4Mb1q8NA@mail.gmail.com>
	<51E7F45B.7040602@alteeve.ca>
	<CAKpaJ1z+HEJRZPT+bcH8P1O_1tBpCuNvU5DNxq0rd-5XLrM5rw@mail.gmail.com>
	<51E807E6.5020706@alteeve.ca>
Message-ID: <CAKpaJ1yB463fkAFRE8JadKQfxW2cEtW_kNX4ysEH_=PiZxAKvA@mail.gmail.com>

On Thu, Jul 18, 2013 at 8:51 PM, Digimer <lists at alteeve.ca> wrote:

> . Do you see the VMs from the how when you run 'fence_xvm -o list'?


Thanks for reply.

'fence_xvm -o list' command resulting in timeout.

Thanks


-- 
http://linuxmantra.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130718/380401cf/attachment.htm>

From lists at alteeve.ca  Thu Jul 18 16:49:39 2013
From: lists at alteeve.ca (Digimer)
Date: Thu, 18 Jul 2013 12:49:39 -0400
Subject: [Linux-cluster] fence_xvm nopt working
In-Reply-To: <CAKpaJ1yB463fkAFRE8JadKQfxW2cEtW_kNX4ysEH_=PiZxAKvA@mail.gmail.com>
References: <CAKpaJ1x6LiTx9tVRjpRR6i4gEd=ZJGimoddaj15u+T4Mb1q8NA@mail.gmail.com>
	<51E7F45B.7040602@alteeve.ca>
	<CAKpaJ1z+HEJRZPT+bcH8P1O_1tBpCuNvU5DNxq0rd-5XLrM5rw@mail.gmail.com>
	<51E807E6.5020706@alteeve.ca>
	<CAKpaJ1yB463fkAFRE8JadKQfxW2cEtW_kNX4ysEH_=PiZxAKvA@mail.gmail.com>
Message-ID: <51E81CA3.6040806@alteeve.ca>

On 18/07/13 12:40, Vishesh kumar wrote:
>
> On Thu, Jul 18, 2013 at 8:51 PM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
>
>     . Do you see the VMs from the how when you run 'fence_xvm -o list'?
>
>
> Thanks for reply.
>
> 'fence_xvm -o list' command resulting in timeout.
>
> Thanks

So the deamon is not running, it would seem. Try running 'fence_virtd 
-d99 -F' (show debug and do not fork into the background).

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From linuxtovishesh at gmail.com  Fri Jul 19 13:50:31 2013
From: linuxtovishesh at gmail.com (Vishesh kumar)
Date: Fri, 19 Jul 2013 06:50:31 -0700
Subject: [Linux-cluster] fence_xvm nopt working
In-Reply-To: <51E81CA3.6040806@alteeve.ca>
References: <CAKpaJ1x6LiTx9tVRjpRR6i4gEd=ZJGimoddaj15u+T4Mb1q8NA@mail.gmail.com>
	<51E7F45B.7040602@alteeve.ca>
	<CAKpaJ1z+HEJRZPT+bcH8P1O_1tBpCuNvU5DNxq0rd-5XLrM5rw@mail.gmail.com>
	<51E807E6.5020706@alteeve.ca>
	<CAKpaJ1yB463fkAFRE8JadKQfxW2cEtW_kNX4ysEH_=PiZxAKvA@mail.gmail.com>
	<51E81CA3.6040806@alteeve.ca>
Message-ID: <CAKpaJ1wkbtAoejLD70s+9DLCcBUPJ7cWRvuMCv0HmPPOzsKOXA@mail.gmail.com>

Thanks.

It worked now. I debugged by option -d99 -F and found issue with multicast.


Thanks

On Thu, Jul 18, 2013 at 9:49 AM, Digimer <lists at alteeve.ca> wrote:

> On 18/07/13 12:40, Vishesh kumar wrote:
>
>>
>> On Thu, Jul 18, 2013 at 8:51 PM, Digimer <lists at alteeve.ca
>> <mailto:lists at alteeve.ca>> wrote:
>>
>>     . Do you see the VMs from the how when you run 'fence_xvm -o list'?
>>
>>
>> Thanks for reply.
>>
>> 'fence_xvm -o list' command resulting in timeout.
>>
>> Thanks
>>
>
> So the deamon is not running, it would seem. Try running 'fence_virtd -d99
> -F' (show debug and do not fork into the background).
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>



-- 
http://linuxmantra.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130719/2b195d69/attachment.htm>

From emi2fast at gmail.com  Fri Jul 19 14:23:35 2013
From: emi2fast at gmail.com (emmanuel segura)
Date: Fri, 19 Jul 2013 16:23:35 +0200
Subject: [Linux-cluster] fence_xvm nopt working
In-Reply-To: <CAKpaJ1wkbtAoejLD70s+9DLCcBUPJ7cWRvuMCv0HmPPOzsKOXA@mail.gmail.com>
References: <CAKpaJ1x6LiTx9tVRjpRR6i4gEd=ZJGimoddaj15u+T4Mb1q8NA@mail.gmail.com>
	<51E7F45B.7040602@alteeve.ca>
	<CAKpaJ1z+HEJRZPT+bcH8P1O_1tBpCuNvU5DNxq0rd-5XLrM5rw@mail.gmail.com>
	<51E807E6.5020706@alteeve.ca>
	<CAKpaJ1yB463fkAFRE8JadKQfxW2cEtW_kNX4ysEH_=PiZxAKvA@mail.gmail.com>
	<51E81CA3.6040806@alteeve.ca>
	<CAKpaJ1wkbtAoejLD70s+9DLCcBUPJ7cWRvuMCv0HmPPOzsKOXA@mail.gmail.com>
Message-ID: <CAE7pJ3CNuWZcS60xRfxkVroPi_m9f+dYdWfcsVWmrEAKUGjRnA@mail.gmail.com>

Hello

can you tell us how you resolved the problem, maybe it can be util for
others people

Thanks


2013/7/19 Vishesh kumar <linuxtovishesh at gmail.com>

> Thanks.
>
> It worked now. I debugged by option -d99 -F and found issue with multicast.
>
>
> Thanks
>
> On Thu, Jul 18, 2013 at 9:49 AM, Digimer <lists at alteeve.ca> wrote:
>
>> On 18/07/13 12:40, Vishesh kumar wrote:
>>
>>>
>>> On Thu, Jul 18, 2013 at 8:51 PM, Digimer <lists at alteeve.ca
>>> <mailto:lists at alteeve.ca>> wrote:
>>>
>>>     . Do you see the VMs from the how when you run 'fence_xvm -o list'?
>>>
>>>
>>> Thanks for reply.
>>>
>>> 'fence_xvm -o list' command resulting in timeout.
>>>
>>> Thanks
>>>
>>
>> So the deamon is not running, it would seem. Try running 'fence_virtd
>> -d99 -F' (show debug and do not fork into the background).
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>
>
>
> --
> http://linuxmantra.com
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130719/51d4dde6/attachment.htm>

From linuxtovishesh at gmail.com  Fri Jul 19 14:58:31 2013
From: linuxtovishesh at gmail.com (Vishesh kumar)
Date: Fri, 19 Jul 2013 07:58:31 -0700
Subject: [Linux-cluster] fence_xvm nopt working
In-Reply-To: <CAE7pJ3CNuWZcS60xRfxkVroPi_m9f+dYdWfcsVWmrEAKUGjRnA@mail.gmail.com>
References: <CAKpaJ1x6LiTx9tVRjpRR6i4gEd=ZJGimoddaj15u+T4Mb1q8NA@mail.gmail.com>
	<51E7F45B.7040602@alteeve.ca>
	<CAKpaJ1z+HEJRZPT+bcH8P1O_1tBpCuNvU5DNxq0rd-5XLrM5rw@mail.gmail.com>
	<51E807E6.5020706@alteeve.ca>
	<CAKpaJ1yB463fkAFRE8JadKQfxW2cEtW_kNX4ysEH_=PiZxAKvA@mail.gmail.com>
	<51E81CA3.6040806@alteeve.ca>
	<CAKpaJ1wkbtAoejLD70s+9DLCcBUPJ7cWRvuMCv0HmPPOzsKOXA@mail.gmail.com>
	<CAE7pJ3CNuWZcS60xRfxkVroPi_m9f+dYdWfcsVWmrEAKUGjRnA@mail.gmail.com>
Message-ID: <CAKpaJ1y2PNAeT0NmiOMtmnLuuGfJoPoUA7V5_bVjDam96Hg8zQ@mail.gmail.com>

Sure. I edited fence_virt.conf file and set interface=virbr0. Conf file
that worked for me is as below

backends {
    libvirt {
        uri = "qemu:///system";
    }
}
listeners {
    multicast {

        interface=virbr0;
        port = "1229";
        family = "ipv4";
        address = "225.0.0.12";
        key_file = "/etc/cluster/fence_xvm.key";
    }
}
fence_virtd {
    module_path = "/usr/lib64/fence-virt";
    backend = "libvirt";
    listener = "multicast";}


Thanks


On Fri, Jul 19, 2013 at 7:23 AM, emmanuel segura <emi2fast at gmail.com> wrote:

> Hello
>
> can you tell us how you resolved the problem, maybe it can be util for
> others people
>
> Thanks
>
>
> 2013/7/19 Vishesh kumar <linuxtovishesh at gmail.com>
>
>> Thanks.
>>
>> It worked now. I debugged by option -d99 -F and found issue with
>> multicast.
>>
>>
>> Thanks
>>
>> On Thu, Jul 18, 2013 at 9:49 AM, Digimer <lists at alteeve.ca> wrote:
>>
>>> On 18/07/13 12:40, Vishesh kumar wrote:
>>>
>>>>
>>>> On Thu, Jul 18, 2013 at 8:51 PM, Digimer <lists at alteeve.ca
>>>> <mailto:lists at alteeve.ca>> wrote:
>>>>
>>>>     . Do you see the VMs from the how when you run 'fence_xvm -o list'?
>>>>
>>>>
>>>> Thanks for reply.
>>>>
>>>> 'fence_xvm -o list' command resulting in timeout.
>>>>
>>>> Thanks
>>>>
>>>
>>> So the deamon is not running, it would seem. Try running 'fence_virtd
>>> -d99 -F' (show debug and do not fork into the background).
>>>
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/
>>> What if the cure for cancer is trapped in the mind of a person without
>>> access to education?
>>>
>>
>>
>>
>> --
>> http://linuxmantra.com
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
http://linuxmantra.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130719/325f1641/attachment.htm>

From magawake at gmail.com  Sun Jul 21 06:28:42 2013
From: magawake at gmail.com (Mag Gam)
Date: Sun, 21 Jul 2013 02:28:42 -0400
Subject: [Linux-cluster] Mag Gam
Message-ID: <CAPG7ZsjtYVVFwer_U0-VYzTsQJcDh-=ChNW7ZzThqmR+4Vr6Lg@mail.gmail.com>

http://pureau.be/ixulknth/geg.lwald





Mag Gam


7/21/2013 7:28:37 AM



From anprice at redhat.com  Tue Jul 23 16:47:29 2013
From: anprice at redhat.com (Andrew Price)
Date: Tue, 23 Jul 2013 17:47:29 +0100
Subject: [Linux-cluster] gfs2-utils 3.1.6 Released
Message-ID: <51EEB3A1.9060507@redhat.com>

Hi,

gfs2-utils 3.1.6 has been released. Notable changes include:

- A large number of improvements and bug fixes in fsck.gfs2, bringing 
the ability to fix a wider range of issues.

- mkfs.gfs2 now aligns resource groups to RAID stripes, automatically if 
it can, or by using new options (see the man page). It also now uses far 
fewer resources to create larger file systems.

- There is a new test suite, which can be run with 'make check'. The 
suite is quite small at the moment but we will be adding more tests in 
due course.

- gfs_controld has been retired, as it hasn't been required since Linux 3.3.

- Documentation has been improved and a doc/README.contributing file has 
been added to aid anybody interested in contributing to gfs2-utils.

See below for a full list of changes. The source tarball is available from:

   https://fedorahosted.org/released/gfs2-utils/gfs2-utils-3.1.6.tar.gz

Please test, and do make sure to report bugs, whether they're crashers 
or typos. Please file them against the gfs2-utils component of Fedora 
(rawhide):

 
https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=gfs2-utils&version=rawhide

Regards,

Andy Price
Red Hat File Systems


Changes since 3.1.5:

Andrew Price (55):
       libgfs2: Fix build with bison 2.6
       gfs2-utils: Update translation files
       mkfs.gfs2: Improve strings for translation
       gfs2-utils: Add the beginnings of a test suite
       gfs2-utils tests: Add a script to exercise the utils
       gfs2-utils: Rename lockcapture directory to scripts
       gfs2-utils: Add a doc on contributing
       mkfs.gfs2: Add translator doc comments
       tunegfs2: Update man page
       tunegfs2: i18n improvements
       mkfs.gfs2: i18n improvements
       gfs2-utils: Update translations and .gitignore
       libgfs2: Rework blk_alloc_i
       libgfs2: Make gfs2_rgrp_out accept char buffers
       mkfs.gfs2: Reduce memory usage
       gfs2-utils: Make the tool tests script more useful
       mkfs.gfs2: Separate user options from file system params
       libgfs2: Move lgfs2_field_print into gfs2l and make it static
       fsck.gfs2: Trivial typo fix
       gfs2-utils build: Enable silent rules by default
       libgfs2: Remove gfs2_next_rg_meta
       gfs2-utils: Build system fixes
       libgfs2: Don't release rgrp buffers which are still in use
       gfs2_edit: Fix divide by zero bug
       mkfs.gfs2: Add options for stripe size and width
       libgfs2: Remove 'writes' field from gfs2_sbd
       mkfs.gfs2: Link to libblkid
       mkfs.gfs2: Use libblkid for checking contents
       mkfs.gfs2: Add a struct to store device info
       libgfs2: Clarify gfs2_compute_bitstructs's parameters
       gfs2-utils build: Fix reporting lack of check
       gfs2l: Improve usage message and opt handling
       gfs2l: Enable setting the type of a block
       gfs2l: Add hash comments
       gfs2l: Add options to print block types and fields
       gfs2l: Read from stdin by default
       gfs2l: Improve grammar layout and path parsing
       gfs2-utils: Remove some unused build files
       gfs2-utils: Retire gfs_controld
       build: Put back AC_CONFIG_SRCDIR
       gfs2-utils: Fix some uninitialized variable warnings
       libgfs2: Remove dinode_alloc
       mkfs.gfs2: Set sunit and swidth from probed io limits
       mkfs.gfs2: Align resource groups to RAID stripes
       mkfs.gfs2: Create new resource groups on-demand
       mkfs.gfs2: Add align option and update docs
       mkfs.gfs2: Move the new rgrp creation code into libgfs2
       gfs2-utils: Update translations
       init.d/gfs2: Work around nested mount points umount bug
       fsck.gfs2: Don't call gettext a second time in fsck_query()
       fsck.gfs2: Don't rely on cluster.conf when rebuilding sb
       gfs2-utils: Add some missing gettext calls
       gfs2-utils: Update translation template
       gfs2-utils: Update docs
       gfs2-utils: Update .gitignore and doc/Makefile.am

Bob Peterson (66):
       gfs2_convert: mark rgrp bitmaps dirty when converting
       gfs2_convert: mark buffer dirty when switching dirs from meta to data
       gfs2_convert: remember number of blocks when converting quotas
       gfs2_convert: Use proper header size when reordering meta pointers
       gfs2_convert: calculate height 1 for small files that were once big
       gfs2_convert: clear out old di_mode before setting it
       gfs2_convert: mask out proper bits when identifying symlinks
       fsck.gfs2: Detect and fix mismatch in GFS1 formal inode number
       gfs2_grow: report bad return codes on error
       libgfs2: externalize dir_split_leaf
       libgfs2: allow dir_split_leaf to receive a leaf buffer
       libgfs2: let dir_split_leaf receive a "broken" lindex
       fsck.gfs2: Move function find_free_blk to util.c
       fsck.gfs2: Split out function to make sure lost+found exists
       fsck.gfs2: Check for formal inode mismatch when adding to lost+found
       fsck.gfs2: shorten some debug messages in lost+found
       fsck.gfs2: Move basic directory entry checks to separate function
       fsck.gfs2: Add formal inode check to basic dirent checks
       fsck.gfs2: Add new function to check dir hash tables
       fsck.gfs2: Special case '..' when processing bad formal inode number
       fsck.gfs2: Move function to read directory hash table to util.c
       fsck.gfs2: Misc cleanups
       fsck.gfs2: Verify dirent hash values correspond to proper leaf block
       fsck.gfs2: re-read hash table if directory height or depth changes
       fsck.gfs2: fix leaf blocks, don't try to patch the hash table
       fsck.gfs2: check leaf depth when validating leaf blocks
       fsck.gfs2: small cleanups
       fsck.gfs2: reprocess inodes when blocks are added
       fsck.gfs2: Remove redundant leaf depth check
       fsck.gfs2: link dinodes that only have extended attribute problems
       fsck.gfs2: Add clarifying message to duplicate processing
       fsck.gfs2: separate function to calculate metadata block header size
       fsck.gfs2: Rework the "undo" functions
       fsck.gfs2: Check for interrupt when resolving duplicates
       fsck.gfs2: Consistent naming of struct duptree variables
       fsck.gfs2: Keep proper counts when duplicates are found
       fsck.gfs2: print metadata block reference on data errors
       fsck.gfs2: print block count values when fixing them
       fsck.gfs2: Do not invalidate metablocks of dinodes with invalid mode
       fsck.gfs2: Log when unrecoverable data block errors are encountered
       fsck.gfs2: don't remove buffers from the list when errors are found
       fsck.gfs2: Don't flag GFS1 non-dinode blocks as duplicates
       fsck.gfs2: externalize check_leaf
       fsck.gfs2: pass2: check leaf blocks when fixing hash table
       fsck.gfs2: standardize check_metatree return codes
       fsck.gfs2: don't invalidate files with duplicate data block refs
       fsck.gfs2: check for duplicate first references
       fsck.gfs2: When flagging a duplicate reference, show valid or invalid
       fsck.gfs2: major duplicate reference reform
       fsck.gfs2: Remove all bad eattr blocks
       fsck.gfs2: Remove unused variable
       fsck.gfs2: double-check transitions from dinode to data
       fsck.gfs2: Stop "undo" process when error data block is reached
       fsck.gfs2: Don't allocate leaf blocks in pass1
       fsck.gfs2: take hash table start boundaries into account
       fsck.gfs2: delete all duplicates from unrecoverable damaged dinodes
       gfs2_edit: print formal inode numbers and hash value on dir display
       fsck.gfs2: fix some log messages
       fsck.gfs2: Fix directory link on relocated directory dirents
       fsck.gfs2: Fix infinite loop in pass1b caused by duplicates in 
hash table
       fsck.gfs2: don't check newly created lost+found in pass2
       fsck.gfs2: avoid negative number in leaf depth
       fsck.gfs2: Detect and fix duplicate references in hash tables
       gfs2_edit: Add new option to print all bitmaps for an rgrp
       gfs2_edit: display pointer offsets for directory dinodes
       gfs2_edit: fix a segfault with file names > 255 bytes

Callum Massey (1):
       gfs2-utils: Fix build warnings in Fedora 18

David Teigland (1):
       gfs2: add native setup to man page

Paul Evans (1):
       libgfs2: Fix resource leak, variable "result" going out of scope

Shane Bradley (5):
       gfs2-lockcapture: Modified some of the data gathered
       gfs2_trace: Added a script called gfs2_trace for kernel tracing 
debugging.
       gfs2_lockcapture: The script now returns a correct exit code when 
the script exits.
       gfs2_lockcapture: Capture the status of the cluster nodes and 
find the clusternode name and id.
       gfs2_lockcapture: Various script and man page updates

Sitsofe Wheeler (1):
       Fix clang --analyze warning.

Steven Whitehouse (3):
       libgfs2: Add readahead for rgrp headers
       fsck: Speed up reading of dir leaf blocks
       fsck: Clean up pass1 inode iteration code



From dc12078 at gmail.com  Wed Jul 24 13:51:13 2013
From: dc12078 at gmail.com (D C)
Date: Wed, 24 Jul 2013 09:51:13 -0400
Subject: [Linux-cluster] rgmanager hangs when shutting down service.
Message-ID: <CACErCdVWKZak=-=_3wDFNTCi98_fH9_7LOYjcVOgZS0oQXekvg@mail.gmail.com>

I setup a basic cluster for testing, with a virtual ip (on a bonded
interface), and apache.  I've verified that services work on both nodes,
but I have an issue one of them during shutdown.

CentOS 6.3
rpm -q rgmanager ricci modcluster resource-agents
rgmanager-3.0.12.1-12.el6.x86_64
ricci-0.16.2-55.el6.x86_64
modcluster-0.16.2-18.el6.x86_64
resource-agents-3.9.2-12.el6_3.2.x86_64





[root at lust-02 cluster]# clusvcadm -d apache-service
Local machine disabling service:apache-service...
<it just hangs here>

Nothing shows up in the logs, and I was able to verify that apache is still
running, and the ip  address is still active.


I ran the command again with strace, but it seems to also just hang. Below
is the entire output of the strace.



[root at clust-02 cluster]# strace clusvcadm -d apache-service
execve("/usr/sbin/clusvcadm", ["clusvcadm", "-d", "apache-service"], [/* 22
vars */]) = 0
brk(0)                                  = 0x1f12000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fa81ce8c000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=32069, ...}) = 0
mmap(NULL, 32069, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fa81ce84000
close(3)                                = 0
open("/usr/lib64/libcman.so.3", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\23`N4\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=21272, ...}) = 0
mmap(0x344e600000, 2114200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0x344e600000
mprotect(0x344e604000, 2097152, PROT_NONE) = 0
mmap(0x344e804000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x344e804000
close(3)                                = 0
open("/lib64/libpthread.so.0", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\\\240\3668\0\0\0"..., 832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=145720, ...}) = 0
mmap(0x38f6a00000, 2212768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0x38f6a00000
mprotect(0x38f6a17000, 2097152, PROT_NONE) = 0
mmap(0x38f6c17000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x38f6c17000
mmap(0x38f6c19000, 13216, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x38f6c19000
close(3)                                = 0
open("/usr/lib64/liblogthread.so.3", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16\340N4\0\0\0"..., 832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=11592, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fa81ce83000
mmap(0x344ee00000, 2112968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0x344ee00000
mprotect(0x344ee02000, 2093056, PROT_NONE) = 0
mmap(0x344f001000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x344f001000
mmap(0x344f002000, 7624, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x344f002000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3,
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\355a\3668\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1918016, ...}) = 0
mmap(0x38f6600000, 3741864, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0x38f6600000
mprotect(0x38f6789000, 2093056, PROT_NONE) = 0
mmap(0x38f6988000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x188000) = 0x38f6988000
mmap(0x38f698d000, 18600, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x38f698d000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fa81ce82000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fa81ce81000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fa81ce80000
arch_prctl(ARCH_SET_FS, 0x7fa81ce81700) = 0
mprotect(0x38f6c17000, 4096, PROT_READ) = 0
mprotect(0x38f6988000, 16384, PROT_READ) = 0
mprotect(0x38f601f000, 4096, PROT_READ) = 0
munmap(0x7fa81ce84000, 32069)           = 0
set_tid_address(0x7fa81ce819d0)         = 1095
set_robust_list(0x7fa81ce819e0, 0x18)   = 0
futex(0x7fff3157e0ac, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7fff3157e0ac, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1,
NULL, 7fa81ce81700) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigaction(SIGRTMIN, {0x38f6a05ae0, [], SA_RESTORER|SA_SIGINFO,
0x38f6a0f500}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x38f6a05b70, [], SA_RESTORER|SA_RESTART|SA_SIGINFO,
0x38f6a0f500}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART,
0x38f6632920}, {SIG_DFL, [], 0}, 8) = 0
brk(0)                                  = 0x1f12000
brk(0x1f33000)                          = 0x1f33000
socket(PF_FILE, SOCK_STREAM, 0)         = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
connect(3, {sa_family=AF_FILE, path="/var/run/cman_client"}, 110) = 0
open("/dev/zero", O_RDONLY)             = 4
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
writev(3, [{"NAMC\3\0\0\20\24\0\0\0\7\0\0\0\0\0\0\0", 20}], 1) = 20
recvfrom(3, "NAMCk&\233?\210\3\0\0\7\0\0@\0\0\0\0", 20, 0, NULL, NULL) = 20
read(3,
"\2\0\0\0\270\1\0\0\1\0\0\0\0\0\0\0\0\0\0\0\234\0\0\0\2\0\0\0e-cl"..., 884)
= 884
writev(3, [{"NAMC\3\0\0\20\24\0\0\0\7\0\0\0\0\0\0\0", 20}], 1) = 20
recvfrom(3, "NAMCk&\233?\210\3\0\0\7\0\0@\0\0\0\0", 20, 0, NULL, NULL) = 20
read(3,
"\2\0\0\0\270\1\0\0\1\0\0\0\0\0\0\0\0\0\0\0\234\0\0\0\2\0\0\0e-cl"..., 884)
= 884
writev(3, [{"NAMC\3\0\0\20\314\1\0\0\220\0\0\0\0\0\0\0", 20}, {"
\313\350\34\0\0\0\0\0\0\340\263\257b\376\377\0\0\366\302\301\353q\0\0\0\0\0\0\0\0\0"...,
440}], 2) = 460
recvfrom(3, "NAMCk&\233?\320\1\0\0\220\0\0@\0\0\0\0", 20, 0, NULL, NULL) =
20
read(3,
"\0\0\0\0\270\1\0\0\2\0\0\0\1\0\0\0\0\0\0\0\4\0\0\0\2\0\0\0e-cl"..., 444) =
444
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fa81ce8b000
write(1, "Local machine disabling service:"..., 49Local machine disabling
service:apache-service...) = 49
socket(PF_FILE, SOCK_STREAM, 0)         = 5
connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, 110)
= 0
select(6, NULL, [5], [5], NULL)         = 1 (out [5])
write(5,
"h\0\0\0\4\261\227\36\22:\274\0\0\0\0h\0\23\205\202\0\0\0\0\0\0\0\0\0\0\0\0"...,
112) = 112
select(6, [5], NULL, [5], NULL




Thanks,
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130724/269b767a/attachment.htm>

From churnd at gmail.com  Fri Jul 26 02:29:58 2013
From: churnd at gmail.com (ch urnd)
Date: Thu, 25 Jul 2013 22:29:58 -0400
Subject: [Linux-cluster] fence_drac5 timeouts
Message-ID: <CANAAsd4aoT4EkpJLLGosXB6k6L=WgtTUsuyrNL3bJ3C3L773cQ@mail.gmail.com>

I'm trying to get fence_drac5 working on a cluster I'm setting up of two
Dell R410's.  The primary issue I'm seeing are timeouts.  The fence does
seem to work as the other node will get shut down, but the script always
exits 1.

Here's the output:

# fence_drac5 -a 192.168.1.100 --power-timeout 30 -x -l root -p calvin -c
'admin1->' -o reboot
Connection timed out

# fence_drac5 -a 192.168.1.100 --power-timeout 30 -v -x -l root -p calvin
-c 'admin1->' -o reboot
root at 192.168.1.100's password:
/admin1-> racadm serveraction powerstatus
Server power status: ON
/admin1->
/admin1-> racadm serveraction powerdown
Server power operation successful
/admin1->Traceback (most recent call last):
  File "/usr/sbin/fence_drac5", line 154, in <module>
    main()
  File "/usr/sbin/fence_drac5", line 137, in main
    result = fence_action(conn, options, set_power_status,
get_power_status, get_list_devices)
  File "/usr/share/fence/fencing.py", line 838, in fence_action
    if wait_power_status(tn, options, get_power_fn) == 0:
  File "/usr/share/fence/fencing.py", line 744, in wait_power_status
    if get_power_fn(tn, options) != options["-o"]:
  File "/usr/sbin/fence_drac5", line 38, in get_power_status
    status = re.compile("(^|: )(ON|OFF|Powering ON|Powering OFF)\s*$",
re.IGNORECASE | re.MULTILINE).search(conn.before).group(2)
AttributeError: 'NoneType' object has no attribute 'group'



Even though I pass "-o reboot", it still powers off.  It does the same even
if I don't pass that option.

I added --power-timeout 30 in the latest test to see if that'd help but no
dice.  Doesn't work without it either.

I have tried fence_ipmilan & it works great, but the iDRAC interfaces are
somewhat exposed & need to use SSH for security reasons, which limits me to
fence_drac5.

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130725/8c396a70/attachment.htm>

From lists at alteeve.ca  Sat Jul 27 00:40:26 2013
From: lists at alteeve.ca (Digimer)
Date: Fri, 26 Jul 2013 20:40:26 -0400
Subject: [Linux-cluster] Problem deleting running VM from rgmanager
Message-ID: <51F316FA.8010108@alteeve.ca>

Hi all,

   I've got a problem where I deleted a running VM from the cluster using;

ccs -h localhost --activate --sync --password "secret" --rmvm vm01-win7

   This kind of worked, in that the VM was removed from cluster.conf, 
but 'clustat' still shows it. The logs from the call are:

=====
Jul 26 20:19:01 an-c05n01 ricci[18020]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:01 an-c05n01 ricci[18066]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:01 an-c05n01 ricci[18069]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:01 an-c05n01 ricci[18071]: Executing 
'/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1428781577'
Jul 26 20:19:01 an-c05n01 ricci[18075]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:01 an-c05n01 ricci[18077]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:01 an-c05n01 ricci[18080]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:01 an-c05n01 ricci[18082]: Executing 
'/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/881799278'
Jul 26 20:19:03 an-c05n01 ricci[18088]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:03 an-c05n01 ricci[18090]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:03 an-c05n01 ricci[18093]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:03 an-c05n01 ricci[18095]: Executing 
'/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/439919971'
Jul 26 20:19:03 an-c05n01 modcluster: Updating cluster.conf
Jul 26 20:19:03 an-c05n01 corosync[3479]:   [QUORUM] Members[2]: 1 2
Jul 26 20:19:03 an-c05n01 ricci[18140]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:03 an-c05n01 ricci[18170]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:03 an-c05n01 rgmanager[3710]: Reconfiguring
Jul 26 20:19:03 an-c05n01 ricci[18194]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:03 an-c05n01 ricci[18234]: Executing 
'/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/446527166'
Jul 26 20:19:04 an-c05n01 ricci[18457]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:04 an-c05n01 ricci[18496]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:04 an-c05n01 ricci[18528]: Executing '/usr/bin/virsh nodeinfo'
Jul 26 20:19:04 an-c05n01 ricci[18560]: Executing 
'/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1207456461'
Jul 26 20:19:04 an-c05n01 modcluster: Updating cluster.conf
Jul 26 20:19:04 an-c05n01 corosync[3479]:   [QUORUM] Members[2]: 1 2
Jul 26 20:19:05 an-c05n01 kernel: vbr2: port 4(vnet2) entering disabled 
state
Jul 26 20:19:05 an-c05n01 kernel: device vnet2 left promiscuous mode
Jul 26 20:19:05 an-c05n01 kernel: vbr2: port 4(vnet2) entering disabled 
state
Jul 26 20:19:06 an-c05n01 rgmanager[3710]: vm:vm01-win7 removed from the 
config, but I am not stopping it.
Jul 26 20:19:06 an-c05n01 rgmanager[3710]: Reconfiguring
Jul 26 20:19:07 an-c05n01 ntpd[2794]: Deleting interface #16 vnet2, 
fe80::fc54:ff:fea5:37ea#123, interface stats: received=0, sent=0, 
dropped=0, active_time=135 secs
=====

However, clustat still shows;

=====
Cluster Status for an-cluster-05 @ Fri Jul 26 20:37:17 2013
Member Status: Quorate

  Member Name                             ID   Status
  ------ ----                             ---- ------
  an-c05n01.alteeve.ca                        1 Online, rgmanager
  an-c05n02.alteeve.ca                        2 Online, Local, rgmanager

  Service Name                   Owner (Last)                   State
  ------- ----                   ----- ------                   -----
  service:storage_n01            an-c05n01.alteeve.ca           started
  service:storage_n02            an-c05n02.alteeve.ca           started
  vm:vm01-win7                   an-c05n02.alteeve.ca           started
  vm:vm02-rhel6                  an-c05n02.alteeve.ca           started
  vm:vm03-debian7                an-c05n01.alteeve.ca           started
  vm:vm04-solaris11              an-c05n02.alteeve.ca           started
  vm:vm05-win2008r2              an-c05n02.alteeve.ca           started
  vm:vm06-win8                   an-c05n01.alteeve.ca           started
  vm:vm07-win2012                an-c05n02.alteeve.ca           started
  vm:vm08-freebsd9               an-c05n01.alteeve.ca           started
  vm:vm09-suse11                 an-c05n01.alteeve.ca           started
=====

Trying to stop it produces;

=====
an-c05n02:~# clusvcadm -d vm:vm01-win7
Local machine disabling vm:vm01-win7...Failure
=====

CentOS 6.4, fully up to date; rgmanager-3.0.12.1-17.el6.x86_64

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From lists at alteeve.ca  Sat Jul 27 00:58:52 2013
From: lists at alteeve.ca (Digimer)
Date: Fri, 26 Jul 2013 20:58:52 -0400
Subject: [Linux-cluster] Problem deleting running VM from rgmanager
In-Reply-To: <51F316FA.8010108@alteeve.ca>
References: <51F316FA.8010108@alteeve.ca>
Message-ID: <51F31B4C.8030903@alteeve.ca>

I rebuilt the VM and deleted it a second time and it worked properly... 
I hate bugs like that.

digimer

On 26/07/13 20:40, Digimer wrote:
> Hi all,
>
>    I've got a problem where I deleted a running VM from the cluster using;
>
> ccs -h localhost --activate --sync --password "secret" --rmvm vm01-win7
>
>    This kind of worked, in that the VM was removed from cluster.conf,
> but 'clustat' still shows it. The logs from the call are:
>
> =====
> Jul 26 20:19:01 an-c05n01 ricci[18020]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:01 an-c05n01 ricci[18066]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:01 an-c05n01 ricci[18069]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:01 an-c05n01 ricci[18071]: Executing
> '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1428781577'
> Jul 26 20:19:01 an-c05n01 ricci[18075]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:01 an-c05n01 ricci[18077]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:01 an-c05n01 ricci[18080]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:01 an-c05n01 ricci[18082]: Executing
> '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/881799278'
> Jul 26 20:19:03 an-c05n01 ricci[18088]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:03 an-c05n01 ricci[18090]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:03 an-c05n01 ricci[18093]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:03 an-c05n01 ricci[18095]: Executing
> '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/439919971'
> Jul 26 20:19:03 an-c05n01 modcluster: Updating cluster.conf
> Jul 26 20:19:03 an-c05n01 corosync[3479]:   [QUORUM] Members[2]: 1 2
> Jul 26 20:19:03 an-c05n01 ricci[18140]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:03 an-c05n01 ricci[18170]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:03 an-c05n01 rgmanager[3710]: Reconfiguring
> Jul 26 20:19:03 an-c05n01 ricci[18194]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:03 an-c05n01 ricci[18234]: Executing
> '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/446527166'
> Jul 26 20:19:04 an-c05n01 ricci[18457]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:04 an-c05n01 ricci[18496]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:04 an-c05n01 ricci[18528]: Executing '/usr/bin/virsh nodeinfo'
> Jul 26 20:19:04 an-c05n01 ricci[18560]: Executing
> '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1207456461'
> Jul 26 20:19:04 an-c05n01 modcluster: Updating cluster.conf
> Jul 26 20:19:04 an-c05n01 corosync[3479]:   [QUORUM] Members[2]: 1 2
> Jul 26 20:19:05 an-c05n01 kernel: vbr2: port 4(vnet2) entering disabled
> state
> Jul 26 20:19:05 an-c05n01 kernel: device vnet2 left promiscuous mode
> Jul 26 20:19:05 an-c05n01 kernel: vbr2: port 4(vnet2) entering disabled
> state
> Jul 26 20:19:06 an-c05n01 rgmanager[3710]: vm:vm01-win7 removed from the
> config, but I am not stopping it.
> Jul 26 20:19:06 an-c05n01 rgmanager[3710]: Reconfiguring
> Jul 26 20:19:07 an-c05n01 ntpd[2794]: Deleting interface #16 vnet2,
> fe80::fc54:ff:fea5:37ea#123, interface stats: received=0, sent=0,
> dropped=0, active_time=135 secs
> =====
>
> However, clustat still shows;
>
> =====
> Cluster Status for an-cluster-05 @ Fri Jul 26 20:37:17 2013
> Member Status: Quorate
>
>   Member Name                             ID   Status
>   ------ ----                             ---- ------
>   an-c05n01.alteeve.ca                        1 Online, rgmanager
>   an-c05n02.alteeve.ca                        2 Online, Local, rgmanager
>
>   Service Name                   Owner (Last)                   State
>   ------- ----                   ----- ------                   -----
>   service:storage_n01            an-c05n01.alteeve.ca           started
>   service:storage_n02            an-c05n02.alteeve.ca           started
>   vm:vm01-win7                   an-c05n02.alteeve.ca           started
>   vm:vm02-rhel6                  an-c05n02.alteeve.ca           started
>   vm:vm03-debian7                an-c05n01.alteeve.ca           started
>   vm:vm04-solaris11              an-c05n02.alteeve.ca           started
>   vm:vm05-win2008r2              an-c05n02.alteeve.ca           started
>   vm:vm06-win8                   an-c05n01.alteeve.ca           started
>   vm:vm07-win2012                an-c05n02.alteeve.ca           started
>   vm:vm08-freebsd9               an-c05n01.alteeve.ca           started
>   vm:vm09-suse11                 an-c05n01.alteeve.ca           started
> =====
>
> Trying to stop it produces;
>
> =====
> an-c05n02:~# clusvcadm -d vm:vm01-win7
> Local machine disabling vm:vm01-win7...Failure
> =====
>
> CentOS 6.4, fully up to date; rgmanager-3.0.12.1-17.el6.x86_64
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From mgrac at redhat.com  Tue Jul 30 11:00:49 2013
From: mgrac at redhat.com (Marek Grac)
Date: Tue, 30 Jul 2013 13:00:49 +0200
Subject: [Linux-cluster] fence-agents-4.0.2 stable release
Message-ID: <51F79CE1.3040707@redhat.com>

Welcome to the fence-agents 4.0.2 release.

This release includes a minor bug fix, invalid names in fence_eps, 
fence_rhevm and fence_xenapi.

In this release you can also find a new fence agent for OVH 
(http://www.ovh.com) and symbolic link fence_ilo4 which runs 
fence_ipmilan with required arguments.

For the 4.0.x series, I plan to release a new version on at least 
monthly basis.

The new source tarball can be downloaded here:

https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-4.0.1.tar.xz 


To report bugs or issues:

https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

    Join us on IRC (irc.freenode.net #linux-cluster) and share your
    experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

m,



From lists at alteeve.ca  Tue Jul 30 13:55:42 2013
From: lists at alteeve.ca (Digimer)
Date: Tue, 30 Jul 2013 09:55:42 -0400
Subject: [Linux-cluster] [Cluster-devel] fence-agents-4.0.2 stable
	release
In-Reply-To: <51F79CE1.3040707@redhat.com>
References: <51F79CE1.3040707@redhat.com>
Message-ID: <51F7C5DE.7040509@alteeve.ca>

On 30/07/13 07:00, Marek Grac wrote:
> Welcome to the fence-agents 4.0.2 release.
>
> This release includes a minor bug fix, invalid names in fence_eps,
> fence_rhevm and fence_xenapi.
>
> In this release you can also find a new fence agent for OVH
> (http://www.ovh.com) and symbolic link fence_ilo4 which runs
> fence_ipmilan with required arguments.
>
> For the 4.0.x series, I plan to release a new version on at least
> monthly basis.
>
> The new source tarball can be downloaded here:
>
> https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-4.0.1.tar.xz
>
>
> To report bugs or issues:
>
> https://bugzilla.redhat.com/
>
> Would you like to meet the cluster team or members of its community?
>
>     Join us on IRC (irc.freenode.net #linux-cluster) and share your
>     experience  with other sysadministrators or power users.
>
> Thanks/congratulations to all people that contributed to achieve this
> great milestone.
>
> m,

Yet another release goes by and I didn't get around to asking for a new 
agent to be added. :) I've got a fence agent for TrippLite switched PDUs;

https://github.com/digimer/fence_tripplite_snmp

They're hardly ideal as fence devices because they are slow, but they do 
work reliably and their cost makes them very common in DCs.

Also, \o/ new release!

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From russell at jonesmail.me  Wed Jul 31 02:14:54 2013
From: russell at jonesmail.me (Russell Jones)
Date: Tue, 30 Jul 2013 21:14:54 -0500
Subject: [Linux-cluster] corosync and token, token_retransmit,
 token_retransmit_before_loss_const confusion
Message-ID: <51F8731E.10008@jonesmail.me>

Hi all,

I am trying to understand how the corosync token, token_retansmit, and 
token_retransmit_before_loss_const variables all tie in together.

I have a standard RHCS v3 cluster set up and running. The token timeout 
is set to 10000. When testing it seems to detect failed members pretty 
consistently within 10 seconds. What I am not understanding is *when* a 
node is declared dead, and a fence call is actually made.  The man pages 
show that the cluster is reconfigured when the "token" time is reached, 
and also when token_retransmits_before_loss_const is reached. This is 
confusing :-)


Which one is it that will reform the cluster? Both? When does one taken 
precedence over the other?


Thanks!



From Maeulen at awp-shop.de  Wed Jul 31 13:57:33 2013
From: Maeulen at awp-shop.de (=?iso-8859-1?Q?Johannes_M=E4ulen?=)
Date: Wed, 31 Jul 2013 13:57:33 +0000
Subject: [Linux-cluster] fence_ipmilan
Message-ID: <9A757AF2CA7F204A8F2444FFC5C27C30485F536C@Exchange2010.Skynet.local>

Hi there,

I?m trying to setup a cluster and had issues with ?fence_ipmilan? from the
package fence-agents.

I?m running debian 7.1 with a 3.2.0-4-amd64 kernel.

?fence_ipmilan ?V? gives ?fence_ipmilan 3.1.5?

My Cluster nodes are running on Supermicro Motherboards with IPMI on-board.
(To be exact:
http://www.supermicro.nl/products/motherboard/Xeon/C202_C204/X9SCA-F.cfm )

I?ve experienced the following behavior:

 

fence_ipmilan -a xxx.xxx.xxx.xxx -l USER-p PASS -v -o off; echo $?

Powering off machine @ IPMI:xxx.xxx.xxx.xxx...Spawning: '/usr/bin/ipmitool
-I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power off'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power status'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power off'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power status'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power off'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power status'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power off'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power status'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power off'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power status'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power off'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power status'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power off'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power status'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power off'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power status'...

ipmilan: Power still on

Failed

1

 

More or less in the same moment when I got this message the machine went
down. So all the commands were working, but not in the expected time.

( Using Supermicro Mainboards with IPMI onboard,
http://www.supermicro.nl/products/motherboard/Xeon/C202_C204/X9SCA-F.cfm )

I?ve played around with available parameters and wasn?t able to fix this
behavior.

So I went into the source
code(fence-agents-3.1.5/fence/agents/ipmilan/ipmilan.c) and had a look at
the ipmi_off function. 

There was a fixed value of 2 seconds to sleep.

I modified this to use the same parameter like ipmi_on: ipmi-> i_power_wait
instead of 2, so that I can modify this value and test if it has effect on
my problem.

Now when I use the modified version of fence_ipmilan the output looks like:

 

./fence_ipmilan -a xxx.xxx.xxx.xxx -l USER -p PASS -T 10 -v -o off ; echo $?

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power off'...

Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS'
-v chassis power status'...

Done

0

 

So I think this fixed my problem, and I think it might help other users
experiencing the same issues.

Kind regards

 

Johannes

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130731/98732a65/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6310 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130731/98732a65/attachment.p7s>