From carlopmart at gmail.com  Fri Mar  2 07:55:10 2012
From: carlopmart at gmail.com (C. L. Martinez)
Date: Fri, 2 Mar 2012 08:55:10 +0100
Subject: [Linux-cluster] Questionas about fence_vmware_soap
Message-ID: <CAEjQA5+bY_=PtSNZXATf6T33phcy8oW1C38q5LbdH6Zi7mnnTw@mail.gmail.com>

Hi all,

 I have some doubts about using fence_vmware_soap under vSphere
Infrastructure (4.1).

a) Can I use this fence device under an ESXi 4.1 standalone server
without using vCenter server??

b) To use fence_vmware_soap with vCenter server, what privileges needs
vCenter user to fence cluster nodes?? start, stop and restart are
right or do I neeed to configure more??

Thanks.


From christian.masopust at siemens.com  Fri Mar  2 09:09:02 2012
From: christian.masopust at siemens.com (Masopust, Christian)
Date: Fri, 2 Mar 2012 10:09:02 +0100
Subject: [Linux-cluster] Questionas about fence_vmware_soap
In-Reply-To: <CAEjQA5+bY_=PtSNZXATf6T33phcy8oW1C38q5LbdH6Zi7mnnTw@mail.gmail.com>
References: <CAEjQA5+bY_=PtSNZXATf6T33phcy8oW1C38q5LbdH6Zi7mnnTw@mail.gmail.com>
Message-ID: <C3B6F57F6F0CE34093FF52B3FFBEFA7C011445FB2CB7@ATVIES9917VMSX.ww300.siemens.net>


Hi,

I don't know fence_vmware_soap but for ESXi 4.1 I've written
a "fence_esxi" (based on fence_apc) which connects by ssh and
simply powers on/off the VM by means of "vim-cmd".

Please send me a private mail if you like to have it :)

br,
christian 

> -----Urspr?ngliche Nachricht-----
> Von: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Im Auftrag von C. 
> L. Martinez
> Gesendet: Freitag, 02. M?rz 2012 08:55
> An: linux-cluster at redhat.com
> Betreff: [Linux-cluster] Questionas about fence_vmware_soap
> 
> Hi all,
> 
>  I have some doubts about using fence_vmware_soap under vSphere
> Infrastructure (4.1).
> 
> a) Can I use this fence device under an ESXi 4.1 standalone server
> without using vCenter server??
> 
> b) To use fence_vmware_soap with vCenter server, what privileges needs
> vCenter user to fence cluster nodes?? start, stop and restart are
> right or do I neeed to configure more??
> 
> Thanks.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From scooter at cgl.ucsf.edu  Fri Mar  2 14:05:35 2012
From: scooter at cgl.ucsf.edu (Scooter Morris)
Date: Fri, 02 Mar 2012 06:05:35 -0800
Subject: [Linux-cluster] GFS2 bug in appending to a file?
Message-ID: <4F50D3AF.4090706@cgl.ucsf.edu>

Hi all,
     We're seeing a problem with file append using cat: "cat >> file" on 
a 4 node cluster with gfs2 where the file's mtime doesn't get updated.  
This looks exactly the same as in Bug 496716 
<https://bugzilla.redhat.com/show_bug.cgi?id=496716>, except that bug 
was supposed to have been fixed in RHEL 5.5 and we're running RHEL 6.2.  
Did the same problem creep back in or is this a new manifestation of the 
problem?  We are mounting noatime, noquota, nodiratime, noacl if that helps.

Thanks in advance!

-- scooter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120302/26a50ebe/attachment.htm>

From swhiteho at redhat.com  Fri Mar  2 14:16:50 2012
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Fri, 02 Mar 2012 14:16:50 +0000
Subject: [Linux-cluster] GFS2 bug in appending to a file?
In-Reply-To: <4F50D3AF.4090706@cgl.ucsf.edu>
References: <4F50D3AF.4090706@cgl.ucsf.edu>
Message-ID: <1330697810.2745.3.camel@menhir>

Hi,

On Fri, 2012-03-02 at 06:05 -0800, Scooter Morris wrote:
> Hi all,
>     We're seeing a problem with file append using cat: "cat >> file"
> on a 4 node cluster with gfs2 where the file's mtime doesn't get
> updated.  This looks exactly the same as in Bug 496716, except that
> bug was supposed to have been fixed in RHEL 5.5 and we're running RHEL
> 6.2.  Did the same problem creep back in or is this a new
> manifestation of the problem?  We are mounting noatime, noquota,
> nodiratime, noacl if that helps.
> 
> Thanks in advance!
> 
> -- scooter
> 
Well the patch in question is definitely in RHEL6, so this is probably
something different. Can you open a bug for it?

Steve.


From thevision at pobox.com  Fri Mar  2 16:27:37 2012
From: thevision at pobox.com (Greg Mortensen)
Date: Fri, 2 Mar 2012 11:27:37 -0500
Subject: [Linux-cluster] Throughput drops with VMware GFS2 cluster when
 using fence_scsi
In-Reply-To: <CAF3s=r4zYc-ppDTViijehXrRSxr1ovU7-U2yuh3WT0MEGfmjxQ@mail.gmail.com>
References: <CAF3s=r6RGdwURX5mivV76yeHw+4tGqi8ZfmHAVvxCkt26No3Ug@mail.gmail.com>
	<1330017228.2710.51.camel@menhir>
	<CAF3s=r4ontiFxTjagFWf0CJ4Myv8V7pJxCR1jH2doMjmx85RoA@mail.gmail.com>
	<CAF3s=r4zYc-ppDTViijehXrRSxr1ovU7-U2yuh3WT0MEGfmjxQ@mail.gmail.com>
Message-ID: <CAF3s=r5=YnGb-RS5UKCbL=b9QxUF0vNanbLaNwKo_Brye5PQsQ@mail.gmail.com>

<Responding to myself for benefit of the list>

This looks like it was caused by the device mapping defaulting to a
round-robin path selection policy.? While I couldn't find any mention
of it in the RedHat cluster documentation, I did see some MSCS
postings[1] that said:

Round Robin can interfere with applications that use SCSI reservations
for sharing LUNs among VMs and thus is not supported with the use of
LUNs with MSCS.

so I changed the policy to "Most Recently Used" and was able to get
sustained writes of 40MB/s and sustained reads of 100MB/s down one
NIC.

  Regards,
    Greg

[1] http://en.community.dell.com/techcenter/storage/w/wiki/2671.aspx


From orquidea.peramor at gmail.com  Fri Mar  2 19:22:32 2012
From: orquidea.peramor at gmail.com (Orquidea Salt mas)
Date: Fri, 2 Mar 2012 20:22:32 +0100
Subject: [Linux-cluster] =?iso-8859-1?q?Reb=E9late_by_self-management=2C_f?=
	=?iso-8859-1?q?irst_project_of_free_software_by_which_we_bet_all_/?=
	=?iso-8859-1?q?_Reb=E9late_por_la_autogesti=F3n=2C_primer_proyecto?=
	=?iso-8859-1?q?_de_software_libre_por_el_que_apostamos_todas?=
Message-ID: <CAHvO04bb7yweBmjn8_Jev_a=iRpaeVBc-YBLfw0cRr09x7fc0g@mail.gmail.com>

Ingl?s :

Many already we have contributed to the first project of free software
dedicated to self-management in this campaign of collective financing,
it collaborates and it spreads!/


Beginning campaign collective financing

http://www.goteo.org/project/rebelaos-publicacion-por-la-autogestion?lang=en


Login to enter with user of social networks and for would register in Goteo :

http://www.goteo.org/user/login?lang=en


Rebelaos! Publication by self-management A massive publication that
floods the public transport, the work centers, the parks, the
consumption centers, by means of distribution of 500,000 gratuitous
units, acting simultaneously in all sides and nowhere.

We announce the main tool of a vestibule Web for the management of
self-sustaining resources by means of Drupal, in addition in the
publication there will be an article dedicated to free software,
hardware, It is being prepared in ingl?s,  the machinery You can see
more details in the index of the
publication    https://n-1.cc/pg/file/read/1151902/indexresumen-de-los-contenidos-pdf

 . A computer system that allows us to share resources in all the
scopes of our life so that we do not have to generate means different
for each subject nor for each territory.

A point of contact digitalis to generate projects of life outside
Capitalism and to margin of the State.


A tool to spread and to impel the social transformation through the
resources that will set out in their contents around self-management,
the autoorganizaci?n, the disobedience and the collective action.

In which the capitalist system goes to the collapse, in a while
immersed in a deep systemic crisis (ecological, political and
economic, but mainly of values), where individual and collective of
people they are being lacking of his fundamental rights, is necessary
to develop a horizontal collective process where all the human beings
we pruned to interact in equality of conditions and freedom.


To interact means to relate to us (as much human as economically), to
communicate to us, to cover our basic needs, to generate and to
protect communal properties, to know and to provide collective
solutions us problematic that our lives interfere. We want abrir a
breach within normality in the monotonous life state-capitalist, a day
anyone, that finally will not be any day.


By means of this publication we try:

- To drive a horizontal collective process where all and all we pruned
to interact in equality of conditions and freedom.

 - To create communications network between the people it jeopardize
with the change and arranged to act.


 - To find collective solutions to problematic that our lives interfere


- To facilitate the access to resources that make possible self-management.

- To participate in the construction of networks of mutual support,
generated horizontals, asamblearias and from the base.


 - To publish all this information in an attractive format stops to
facilitate the access to all the society.


There are 15 days remaining for the upcoming March 15, the day that
will come Rebelaos!, Magazine for the selfmanagement

Today, we issue the cover of Rebelaos! (Castilian version) that can be
displayed on the following link:
https://n-1.cc/pg/file/read/1200503/portada-15-de-marzo-rebelaos
The contents of the store owners to us by 15 March. Do you? Do you
keep on 15 March?

In addition, we have over 200 distribution nodes, distributed
throughout the Spanish state. Check the map:
https://afinidadrebelde.crowdmap.com/

On the other hand, the funding campaign continues to move and still
have 12 days to collect the remaining 6,000 euros. We can all make a
bit for all the grains of sand become a great beach on March 15. You
can access the co-financing campaign:
http://www.goteo.org/project/rebelaos-publicacion-por-la-autogestion

Rebel Affinity group
www.rebelaos.net


-------------------------------------------------------------------------------
Castellano:

Muchos ya hemos aportado al primer proyecto de software libre dedicado
a la la financiaci?n colectiva, colabora y diffunde !!!!!

Inicio campa?a financiaci?n colectiva goteo.org

www.goteo.org/project/rebelaos-publicacion-por-la-autogestion


Link para registrarse en Goteo y acceder a redes sociales para
colaborar en la difus?n

http://www.goteo.org/user/login

?Rebelaos! Publicaci?n por la autogesti?n

Una publicaci?n masiva que inunde el transporte p?blico, los centros
de trabajo, los parques, los centros de consumo, mediante la
distribuci?n de 500.000 ejemplares gratuitos, actuando simult?neamente
en todos lados y en ninguna parte.


Anunciamos la herramienta principal de un  portal web para la gesti?n
de recursos autogestionados mediante Drupal, adem?s en  la publicaci?n
habr? un art?culo dedicado al software libre, el hardware, la
maquinaria... Puedes ver m?s detalles en el ?ndice de la publicaci?n
https://n-1.cc/pg/file/read/1151902/indexresumen-de-los-contenidos-pdf

Un sistema inf?rmatico que nos permita compartir recursos en todos los
?mbitos de nuestra vida de forma que no tengamos que generar un medio
distinto para cada tema ni para cada territorio. Un punto de encuentro
digital para generar proyectos de vida fuera del capitalismo y al
margen del Estado.


Una herramienta para difundir e impulsar la transformaci?n social a
trav?s de los recursos que se propondr?n en sus contenidos en torno a
la autogesti?n, la autoorganizaci?n, la desobediencia y la acci?n
colectiva.

En un momento en que el sistema capitalista se dirige al colapso,
inmerso en una profunda crisis sist?mica (ecol?gica, pol?tica y
econ?mica, pero principalmente de valores), donde individuos y
colectivos de personas est?n siendo desprovistos de sus derechos
fundamentales, es necesario desarrollar un proceso colectivo
horizontal donde todos los seres humanos podamos interactuar en
igualdad de condiciones y en libertad.

Interactuar significa relacionarnos (tanto humana como
econ?micamente), comunicarnos, cubrir nuestras necesidades b?sicas,
generar y proteger bienes comunes, conocernos y dar soluciones
colectivas a problem?ticas que interfieren nuestras vidas. Queremos
abrir una brecha dentro de la normalidad en la mon?tona vida
estatal-capitalista, un d?a cualquiera, que finalmente no ser?
cualquier d?a.

Mediante esta publicaci?n pretendemos:

- Impulsar un proceso colectivo horizontal donde todos y todas podamos
interactuar en igualdad de condiciones y en libertad.

- Crear red de comunicaciones entre las personas comprometidas con el
cambio y dispuestas a actuar.

- Encontrar soluciones colectivas a problem?ticas que interfieren
nuestras vidas.

- Facilitar el acceso a recursos que posibiliten la autogesti?n.

- Participar en la construcci?n de redes de apoyo mutuo, horizontales,
asamblearias y generadas desde la base.

- Publicar toda esta informaci?n en un formato atractivo para
facilitar el acceso a toda la sociedad.

Son 15 los d?as que restan para el pr?ximo 15 de marzo, d?a en el que
ver? la luz ?Rebelaos!, publicaci?n por la autogesti?n.

Hoy, hacemos p?blica la portada de ?Rebelaos! (versi?n en castellano)
que pod?is visualizar en el siguiente enlace:
https://n-1.cc/pg/file/read/1200503/portada-15-de-marzo-rebelaos
El contenido de los titulares nos los guardamos para el 15 de marzo.
?Y t?? ?Te guardas el 15 de marzo?

Adem?s, ya hemos superado los 200 nodos de distribuci?n, repartidos
por todo el estado espa?ol. Ver el mapa:
https://afinidadrebelde.crowdmap.com/

Por otro lado, la campa?a de financiaci?n contin?a avanzando y todav?a
quedan 12 d?as para reunir los 6.000 euros que restan. Todas podemos
aportar un poco para que todos los granitos de arena se conviertan en
una gran playa el 15 de marzo. Pod?is acceder a la campa?a de
cofinanciaci?n en:
http://www.goteo.org/project/rebelaos-publicacion-por-la-autogestion

Colectivo Afinidad Rebelde
www.rebelaos.net


From carlopmart at gmail.com  Fri Mar  2 22:21:53 2012
From: carlopmart at gmail.com (carlopmart)
Date: Fri, 02 Mar 2012 23:21:53 +0100
Subject: [Linux-cluster] Questionas about fence_vmware_soap
In-Reply-To: <C3B6F57F6F0CE34093FF52B3FFBEFA7C011445FB2CB7@ATVIES9917VMSX.ww300.siemens.net>
References: <CAEjQA5+bY_=PtSNZXATf6T33phcy8oW1C38q5LbdH6Zi7mnnTw@mail.gmail.com>
	<C3B6F57F6F0CE34093FF52B3FFBEFA7C011445FB2CB7@ATVIES9917VMSX.ww300.siemens.net>
Message-ID: <4F514801.4090002@gmail.com>

On 03/02/2012 10:09 AM, Masopust, Christian wrote:
>
> Hi,
>
> I don't know fence_vmware_soap but for ESXi 4.1 I've written
> a "fence_esxi" (based on fence_apc) which connects by ssh and
> simply powers on/off the VM by means of "vim-cmd".
>
> Please send me a private mail if you like to have it :)
>
> br,
> christian
>

Ok, I have tried to configure it (fence_vmware_soap):

<cluster config_version="1" name="VMwareCluster">
	<cman expected_votes="1" two_node="1"/>
	<logging to_logfile="yes" to_syslog="yes"/>
	<clusternodes>
		<clusternode name="firstnode.domain.com" nodeid="1">
			<fence>
				<method name="1">
					<device name="esxi-resort" port="rhelnode01" 
uuid="564dee4b-7107-f6c5-608e-dd781fac9847"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="secondnode.domain.com" nodeid="2">
			<fence>
			</fence>
		</clusternode>
	</clusternodes>
	<fencedevices>
		<fencedevice agent="fence_vmware_soap" ipaddr="172.25.50.8" 
login="root" passwd="mypass" ssl="1" name="esxi-resort"/>
		<fencedevice agent="fence_ack_manual" name="last-resort"/>
	</fencedevices>
......

  ... and it seems it works, but ....

[root at firstnode cluster]# clustat
Cluster Status for VMwareCluster @ Fri Mar  2 22:16:04 2012
Member Status: Quorate

  Member Name                                                     ID 
Status
  ------ ----                                                     ---- 
------
  firstnode.domain.com                                         1 Online, 
Local
  secondnode.domain.com                                         2 Offline


... and when I try to start clvmd:

[root at firstnode log]# service clvmd start
Starting clvmd: clvmd startup timed out

and group_tool output:

[root at firstnode cluster]# group_tool
fence domain
member count  1
victim count  1
victim now    2
master nodeid 1
wait state    fencing
members

dlm lockspaces
name          clvmd
id            0x4104eefa
flags         0x00000015 need_plock,kern_stop,join
change        member 0 joined 0 remove 0 failed 0 seq 0,0
members
new change    member 1 joined 1 remove 0 failed 0 seq 1,1
new status    wait_messages 0 wait_condition 1 fencing
new members   1


  .. and I can't configure shared storage with lvm tools because they 
doesn't works ... Where is the problem?' Do I need to setup a quorum 
device??

Still I have not configured second node ...

-- 
CL Martinez
carlopmart {at} gmail {d0t} com


From grimme at atix.de  Thu Mar  8 15:04:18 2012
From: grimme at atix.de (Marc Grimme)
Date: Thu, 8 Mar 2012 16:04:18 +0100 (CET)
Subject: [Linux-cluster] GFS2 not able to remove a file
In-Reply-To: <a873ec13-026c-487f-b2f0-5017fc05e2b7@mobilix-20>
Message-ID: <43bc931e-e364-4e55-892b-4ef743a1b2f7@mobilix-20>

Hello,
I'm having a strange behavior of a GFS2 file system.

I have a file I can write to and read from. But I cannot delete the file or move it.
I've already done an fsck but with no effect.

see below
---------------X8-------------
[root at server run.old]# cat messagebus.pid
4000
[root at server run.old]# echo 4001 > messagebus.pid
[root at server run.old]# cat messagebus.pid
4001
[root at server run.old]# rm messagebus.pid
rm: remove regular file `messagebus.pid'? y
rm: cannot remove `messagebus.pid': No such file or directory
---------------X8-------------

Information about system:
---------------X8------------------------
# cat /etc/redhat-release
CentOS release 6.2 (Final)
# modinfo gfs2
filename:       /lib/modules/2.6.32-220.4.2.el6.x86_64/kernel/fs/gfs2/gfs2.ko
license:        GPL
author:         Red Hat, Inc.
description:    Global File System
srcversion:     C664A0EEE2337E08DEF7648
depends:        dlm
vermagic:       2.6.32-220.4.2.el6.x86_64 SMP mod_unload modversions
---------------X8------------------------

If you need any more information let me know.

Has anybody an idea where this is from or how I can solve it?

Thanks Marc.

______________________________________________________________________________

Marc Grimme

ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 |
85716 Unterschleissheim | www.atix.de

Enterprise Linux einfach online kaufen: www.linux-subscriptions.com

Registergericht: Amtsgericht M?nchen, Registernummer: HRB 168930, USt.-Id.:
DE209485962 | Vorstand: Thomas Merz (Vors.), Marc Grimme, Mark Hlawatschek, Jan R. Bergrath |
Vorsitzender des Aufsichtsrats: Dr. Martin Buss


From raju.rajsand at gmail.com  Thu Mar  8 17:40:57 2012
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Thu, 8 Mar 2012 23:10:57 +0530
Subject: [Linux-cluster] GFS2 not able to remove a file
In-Reply-To: <43bc931e-e364-4e55-892b-4ef743a1b2f7@mobilix-20>
References: <a873ec13-026c-487f-b2f0-5017fc05e2b7@mobilix-20>
	<43bc931e-e364-4e55-892b-4ef743a1b2f7@mobilix-20>
Message-ID: <CA+Ydgaq-uZVEeSHP2rnDnKyQuHfR8gj7PWDUF7Cwmb=0W_jbiA@mail.gmail.com>

Greetings,

On Thu, Mar 8, 2012 at 8:34 PM, Marc Grimme <grimme at atix.de> wrote:
> Hello,
> I'm having a strange behavior of a GFS2 file system.
>
> I have a file I can write to and read from. But I cannot delete the file or move it.
> I've already done an fsck but with no effect.
>
> see below
> ---------------X8-------------
> [root at server run.old]# cat messagebus.pid
> 4000
> [root at server run.old]# echo 4001 > messagebus.pid
> [root at server run.old]# cat messagebus.pid
> 4001
> [root at server run.old]# rm messagebus.pid
> rm: remove regular file `messagebus.pid'? y
> rm: cannot remove `messagebus.pid': No such file or directory
> ---------------X8-------------
>

What does lsof <filename> says?

-- 
Regards,

Rajagopal


From criley at erad.com  Thu Mar  8 18:59:05 2012
From: criley at erad.com (Charles Riley)
Date: Thu, 8 Mar 2012 13:59:05 -0500
Subject: [Linux-cluster] Clustered LVM for storage
Message-ID: <CAPSBQNMi53rtmw6_N5oPBTjU2ADU86vEwvvJ27MneMC94O0N-A@mail.gmail.com>

Greetings,

I have an aoe device with a lot of storage in it that I would like to share
among four rhel 4 servers.  Each of the servers will mount it's own
storage, no data is shared between them.  e.g the servers won't be mounting
the same volumes.

I could create four different raid groups on the aoe device and present a
different one to each server, but that would waste space.
What I'd rather do is create one big raid group and use clustered lvm to
divvy the space between servers.

Is it possible?  Would it be enough to run just the clustered lvm daemon,
or would I need to install all of the cluster suite?
Are there other/better options?

Thanks!

Charles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120308/c3da40f4/attachment.htm>

From grimme at atix.de  Thu Mar  8 20:33:11 2012
From: grimme at atix.de (Marc Grimme)
Date: Thu, 8 Mar 2012 21:33:11 +0100 (CET)
Subject: [Linux-cluster] GFS2 not able to remove a file
In-Reply-To: <CA+Ydgaq-uZVEeSHP2rnDnKyQuHfR8gj7PWDUF7Cwmb=0W_jbiA@mail.gmail.com>
Message-ID: <018a9cfe-24d4-4d8a-9765-b1be99ee416b@mobilix-20>

Hello,
Nothing.

# lsof -b +M -n -l 2>/dev/null | grep messagebus
#

The server/cluster was rebooted a few times. I'm pretty sure that no application is using this file.


Regards Marc.
----- Original Message -----
From: "Rajagopal Swaminathan" <raju.rajsand at gmail.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Donnerstag, 8. M?rz 2012 18:40:57
Subject: Re: [Linux-cluster] GFS2 not able to remove a file

Greetings,

On Thu, Mar 8, 2012 at 8:34 PM, Marc Grimme <grimme at atix.de> wrote:
> Hello,
> I'm having a strange behavior of a GFS2 file system.
>
> I have a file I can write to and read from. But I cannot delete the file or move it.
> I've already done an fsck but with no effect.
>
> see below
> ---------------X8-------------
> [root at server run.old]# cat messagebus.pid
> 4000
> [root at server run.old]# echo 4001 > messagebus.pid
> [root at server run.old]# cat messagebus.pid
> 4001
> [root at server run.old]# rm messagebus.pid
> rm: remove regular file `messagebus.pid'? y
> rm: cannot remove `messagebus.pid': No such file or directory
> ---------------X8-------------
>

What does lsof <filename> says?

--
Regards,

Rajagopal

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From linux at alteeve.com  Thu Mar  8 20:45:43 2012
From: linux at alteeve.com (Digimer)
Date: Thu, 08 Mar 2012 15:45:43 -0500
Subject: [Linux-cluster] Clustered LVM for storage
In-Reply-To: <CAPSBQNMi53rtmw6_N5oPBTjU2ADU86vEwvvJ27MneMC94O0N-A@mail.gmail.com>
References: <CAPSBQNMi53rtmw6_N5oPBTjU2ADU86vEwvvJ27MneMC94O0N-A@mail.gmail.com>
Message-ID: <4F591A77.7000303@alteeve.com>

On 03/08/2012 01:59 PM, Charles Riley wrote:
> Greetings,
> 
> I have an aoe device with a lot of storage in it that I would like to
> share among four rhel 4 servers.  Each of the servers will mount it's
> own storage, no data is shared between them.  e.g the servers won't be
> mounting the same volumes.
> 
> I could create four different raid groups on the aoe device and present
> a different one to each server, but that would waste space.
> What I'd rather do is create one big raid group and use clustered lvm to
> divvy the space between servers.
> 
> Is it possible?  Would it be enough to run just the clustered lvm
> daemon, or would I need to install all of the cluster suite?
> Are there other/better options?
> 
> Thanks!
> 
> Charles

In theory, it should be fine, but it will be up to you to ensure a given
LV is in fact only mounted in one place at a time.

Clustered LVM does required cman on RHEL/CentOS 6. As such, you need the
full stack, *including* fencing.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Papers and Projects: https://alteeve.com


From jeff.sturm at eprize.com  Thu Mar  8 22:59:00 2012
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Thu, 8 Mar 2012 22:59:00 +0000
Subject: [Linux-cluster] Clustered LVM for storage
In-Reply-To: <CAPSBQNMi53rtmw6_N5oPBTjU2ADU86vEwvvJ27MneMC94O0N-A@mail.gmail.com>
References: <CAPSBQNMi53rtmw6_N5oPBTjU2ADU86vEwvvJ27MneMC94O0N-A@mail.gmail.com>
Message-ID: <B1B9801C5CBC954680D0374CC4EEABA511BB66E3@MailNode2.eprize.local>

With aoe you can use old-fashioned disk partitioning.  Just run "parted" (or whatever partitioning tool you choose) and allocate storage for partitions as you see fit.

The benefits of doing this are:  Easier/simpler to setup than cluster suite, and you can still use all the spindles from your aoe target (for example by creating a large RAID-10 array across all disks).

The downside of partitions is they aren't easy to change.  You can add them safely while the storage array is in use, but each host needs to reload the partition table when you're done with changes before the new storage can be used, and that may not happen until you rmmod/modprobe the aoe driver, which you can't do while any partitions are in use, e.g. on mounted file systems.  And resizing partitions is tricky because they are allocated on consecutive sectors.

So if you want the flexibility of adding/removing/modifying volumes at any time, it may be worth the trouble to get Cluster Suite running so you can use CLVM.  If you just want to carve it up once and forget about it, partitioning the array will be the fastest.

-Jeff

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Charles Riley
Sent: Thursday, March 08, 2012 1:59 PM
To: linux cluster
Subject: [Linux-cluster] Clustered LVM for storage

Greetings,

I have an aoe device with a lot of storage in it that I would like to share among four rhel 4 servers.  Each of the servers will mount it's own storage, no data is shared between them.  e.g the servers won't be mounting the same volumes.

I could create four different raid groups on the aoe device and present a different one to each server, but that would waste space.
What I'd rather do is create one big raid group and use clustered lvm to divvy the space between servers.

Is it possible?  Would it be enough to run just the clustered lvm daemon, or would I need to install all of the cluster suite?
Are there other/better options?

Thanks!

Charles


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120308/166991d1/attachment.htm>

From criley at erad.com  Fri Mar  9 13:55:45 2012
From: criley at erad.com (Charles Riley)
Date: Fri, 9 Mar 2012 08:55:45 -0500
Subject: [Linux-cluster] Clustered LVM for storage
In-Reply-To: <B1B9801C5CBC954680D0374CC4EEABA511BB66E3@MailNode2.eprize.local>
References: <CAPSBQNMi53rtmw6_N5oPBTjU2ADU86vEwvvJ27MneMC94O0N-A@mail.gmail.com>
	<B1B9801C5CBC954680D0374CC4EEABA511BB66E3@MailNode2.eprize.local>
Message-ID: <CAPSBQNO5w11z5D7P72LTNfEnmWZ_u4X+yy+yY-AkHPQrEE9-Wg@mail.gmail.com>

We're talking 36TB of storage here..  I've never had much success
partitioning an array of that size (and actually being able to use all of
the space) without lvm.  But I might give that a try.

Continuing along this line of thought:
If I stop access to the array from all of the servers before I make any
changes, I could probably even make use of lvm without clustering.

Charles


On Thu, Mar 8, 2012 at 5:59 PM, Jeff Sturm <jeff.sturm at eprize.com> wrote:

>  With aoe you can use old-fashioned disk partitioning.  Just run "parted"
> (or whatever partitioning tool you choose) and allocate storage for
> partitions as you see fit.****
>
> ** **
>
> The benefits of doing this are:  Easier/simpler to setup than cluster
> suite, and you can still use all the spindles from your aoe target (for
> example by creating a large RAID-10 array across all disks).****
>
> ** **
>
> The downside of partitions is they aren't easy to change.  You can add
> them safely while the storage array is in use, but each host needs to
> reload the partition table when you're done with changes before the new
> storage can be used, and that may not happen until you rmmod/modprobe the
> aoe driver, which you can't do while any partitions are in use, e.g. on
> mounted file systems.  And resizing partitions is tricky because they are
> allocated on consecutive sectors.****
>
> ** **
>
> So if you want the flexibility of adding/removing/modifying volumes at any
> time, it may be worth the trouble to get Cluster Suite running so you can
> use CLVM.  If you just want to carve it up once and forget about it,
> partitioning the array will be the fastest.****
>
> ** **
>
> -Jeff****
>
> ** **
>
> *From:* linux-cluster-bounces at redhat.com [mailto:
> linux-cluster-bounces at redhat.com] *On Behalf Of *Charles Riley
> *Sent:* Thursday, March 08, 2012 1:59 PM
> *To:* linux cluster
> *Subject:* [Linux-cluster] Clustered LVM for storage****
>
> ** **
>
> Greetings,****
>
> ** **
>
> I have an aoe device with a lot of storage in it that I would like to
> share among four rhel 4 servers.  Each of the servers will mount it's own
> storage, no data is shared between them.  e.g the servers won't be mounting
> the same volumes.****
>
> ** **
>
> I could create four different raid groups on the aoe device and present a
> different one to each server, but that would waste space.****
>
> What I'd rather do is create one big raid group and use clustered lvm to
> divvy the space between servers.****
>
> ** **
>
> Is it possible?  Would it be enough to run just the clustered lvm daemon,
> or would I need to install all of the cluster suite?****
>
> Are there other/better options?****
>
> ** **
>
> Thanks!****
>
> ** **
>
> Charles****
>
> ** **
>
>
> ****
>
> ** **
>
> ** **
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Charles Riley | eRAD <http://www.erad.com> | Director of Technical Solutions |
O: 864.640.8648 C: 864.881.1331
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120309/71c248df/attachment.htm>

From gianluca.cecchi at gmail.com  Fri Mar  9 14:14:17 2012
From: gianluca.cecchi at gmail.com (Gianluca Cecchi)
Date: Fri, 9 Mar 2012 15:14:17 +0100
Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd
Message-ID: <CAG2kNCySEn6tr79C1Ez+PAKA2n8-vfrnxSQPp4UUC1PE3oorpA@mail.gmail.com>

Hello,
I have a cluster in RH EL 5.7 with quorum disk and an heuristic.
Current versions of main cluster packages are:
rgmanager-2.0.52-21.el5_7.1
cman-2.0.115-85.el5_7.3

This is the loaded heuristic

Heuristic: 'ping -c1 -w1 10.4.5.250' score=1 interval=2 tko=200

Line in cluster.conf:
<heuristic interval="2" program="ping -c1 -w1 10.4.5.250" score="1" tko="200"/>

where 10.4.5.250 is the gateway of the production lan,
>From ping man page:
 -c count
 Stop after sending count ECHO_REQUEST packets. With deadline (-w)
option,  ping  waits  for count ECHO_REPLY packets, until the timeout
expires.
-w deadline
 Specify a timeout, in seconds, before ping exits regardless of how many
packets have  been  sent or  received.  In  this case ping does not stop
after count packet are sent, it waits either for deadline expire or
until count probes are answered or for some error notification from
network.

So I would expect that the single ping command, executed as a sanity
check, at most after 1 second
should exit with a code, regardless an echo reply has been received or not
And in fact I had no particular problem for many months

As a test, putting an ip on an unreachable lan (say 10.4.6.5):
date
n=0
while [ $n -lt 20 ]
do
  ping -c1 -w1 10.4.6.5
  sleep 2
  n=$(expr $n + 1)
done
date

Output is
Fri Mar  9 11:59:02 CET 2012
PING 10.4.6.5 (10.4.6.5) 56(84) bytes of data.

--- 10.4.6.5 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms

...

--- 10.4.6.5 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

Fri Mar  9 12:00:02 CET 2012

so 60 seconds....

In case of gateway reachability problems (also tested with an iptables
rule that drops icmp output request) I would then have:

qdiskd[2780]: <debug> Heuristic: 'ping -c1 -w1 10.4.5.250' missed
(1/200)

Strange thing I got yesterday night was this only line:

qdiskd[22145]: <info> Heuristic: 'ping -c1 -w1 10.4.5.250' DOWN -
Exceeded timeout of 75 seconds

and the node self-fencing causing relocation of some services
So for some reason the ping command was not able to exit at all, I presume...
despite the -c and -w options....

I suppose a condition that causes an internal timeout defined for the
monitor operation itself (default to 75 seconds?)
something like a pacemaker directive
op monitor interval="20" timeout="40"

And the cluster at this point considering as heuristic failed at all
and self-fencing....
Is this right?

My default quorumd directive is this one, btw:

<quorumd device="/dev/mapper/mpquorum" interval="5" label="oraprquorum"
log_facility="local4" log_level="7" tko="16" votes="1">

And in fact when for some reason I have temporary problems with my
SAN, I get something like:

qdiskd[1339]: <warning> qdisk cycle took more than 5 seconds to complete
(34.540000)

and on the other node
qdiskd[6025]: <debug> Node 1 missed an update (2/200)
qdiskd[6025]: <debug> Node 1 missed an update (3/200)
...

Can anyone give any insight for the message I got yesterday that I
never saw before:
qdiskd[22145]: <info> Heuristic: 'ping -c1 -w1 10.4.5.250' DOWN -
Exceeded timeout of 75 seconds

?
Do I have to suppose a bug in the ping command?

Thanks in advance,
Gianluca


From emi2fast at gmail.com  Fri Mar  9 14:39:43 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Fri, 9 Mar 2012 15:39:43 +0100
Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd
In-Reply-To: <CAG2kNCySEn6tr79C1Ez+PAKA2n8-vfrnxSQPp4UUC1PE3oorpA@mail.gmail.com>
References: <CAG2kNCySEn6tr79C1Ez+PAKA2n8-vfrnxSQPp4UUC1PE3oorpA@mail.gmail.com>
Message-ID: <CAE7pJ3Cunogte+7mYHeFNR_XSY_8GX2Yjc5X-tx+gKfXMOSH8A@mail.gmail.com>

Hello Gianluca

Do you have a cluster private network?

if your answer it's yes i recommend don't use heuristic because if your
cluster public network goes down your cluster take a fencing loop

Or you can do something better, use pacemaker+corosync

Il giorno 09 marzo 2012 15:14, Gianluca Cecchi
<gianluca.cecchi at gmail.com>ha scritto:

> Hello,
> I have a cluster in RH EL 5.7 with quorum disk and an heuristic.
> Current versions of main cluster packages are:
> rgmanager-2.0.52-21.el5_7.1
> cman-2.0.115-85.el5_7.3
>
> This is the loaded heuristic
>
> Heuristic: 'ping -c1 -w1 10.4.5.250' score=1 interval=2 tko=200
>
> Line in cluster.conf:
> <heuristic interval="2" program="ping -c1 -w1 10.4.5.250" score="1"
> tko="200"/>
>
> where 10.4.5.250 is the gateway of the production lan,
> >From ping man page:
>  -c count
>  Stop after sending count ECHO_REQUEST packets. With deadline (-w)
> option,  ping  waits  for count ECHO_REPLY packets, until the timeout
> expires.
> -w deadline
>  Specify a timeout, in seconds, before ping exits regardless of how many
> packets have  been  sent or  received.  In  this case ping does not stop
> after count packet are sent, it waits either for deadline expire or
> until count probes are answered or for some error notification from
> network.
>
> So I would expect that the single ping command, executed as a sanity
> check, at most after 1 second
> should exit with a code, regardless an echo reply has been received or not
> And in fact I had no particular problem for many months
>
> As a test, putting an ip on an unreachable lan (say 10.4.6.5):
> date
> n=0
> while [ $n -lt 20 ]
> do
>  ping -c1 -w1 10.4.6.5
>  sleep 2
>  n=$(expr $n + 1)
> done
> date
>
> Output is
> Fri Mar  9 11:59:02 CET 2012
> PING 10.4.6.5 (10.4.6.5) 56(84) bytes of data.
>
> --- 10.4.6.5 ping statistics ---
> 2 packets transmitted, 0 received, 100% packet loss, time 1000ms
>
> ...
>
> --- 10.4.6.5 ping statistics ---
> 2 packets transmitted, 0 received, 100% packet loss, time 999ms
>
> Fri Mar  9 12:00:02 CET 2012
>
> so 60 seconds....
>
> In case of gateway reachability problems (also tested with an iptables
> rule that drops icmp output request) I would then have:
>
> qdiskd[2780]: <debug> Heuristic: 'ping -c1 -w1 10.4.5.250' missed
> (1/200)
>
> Strange thing I got yesterday night was this only line:
>
> qdiskd[22145]: <info> Heuristic: 'ping -c1 -w1 10.4.5.250' DOWN -
> Exceeded timeout of 75 seconds
>
> and the node self-fencing causing relocation of some services
> So for some reason the ping command was not able to exit at all, I
> presume...
> despite the -c and -w options....
>
> I suppose a condition that causes an internal timeout defined for the
> monitor operation itself (default to 75 seconds?)
> something like a pacemaker directive
> op monitor interval="20" timeout="40"
>
> And the cluster at this point considering as heuristic failed at all
> and self-fencing....
> Is this right?
>
> My default quorumd directive is this one, btw:
>
> <quorumd device="/dev/mapper/mpquorum" interval="5" label="oraprquorum"
> log_facility="local4" log_level="7" tko="16" votes="1">
>
> And in fact when for some reason I have temporary problems with my
> SAN, I get something like:
>
> qdiskd[1339]: <warning> qdisk cycle took more than 5 seconds to complete
> (34.540000)
>
> and on the other node
> qdiskd[6025]: <debug> Node 1 missed an update (2/200)
> qdiskd[6025]: <debug> Node 1 missed an update (3/200)
> ...
>
> Can anyone give any insight for the message I got yesterday that I
> never saw before:
> qdiskd[22145]: <info> Heuristic: 'ping -c1 -w1 10.4.5.250' DOWN -
> Exceeded timeout of 75 seconds
>
> ?
> Do I have to suppose a bug in the ping command?
>
> Thanks in advance,
> Gianluca
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120309/a1ef1ce6/attachment.htm>

From gianluca.cecchi at gmail.com  Fri Mar  9 15:44:03 2012
From: gianluca.cecchi at gmail.com (Gianluca Cecchi)
Date: Fri, 9 Mar 2012 16:44:03 +0100
Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd
In-Reply-To: <CAG2kNCySEn6tr79C1Ez+PAKA2n8-vfrnxSQPp4UUC1PE3oorpA@mail.gmail.com>
References: <CAG2kNCySEn6tr79C1Ez+PAKA2n8-vfrnxSQPp4UUC1PE3oorpA@mail.gmail.com>
Message-ID: <CAG2kNCw_a98=McB_PWUhPdKY-c-0NEAbua_yV7Qkn4Dzv6A1Pw@mail.gmail.com>

On Fri, 9 Mar 2012 15:39:43 +0100, emmanuel segura wrote:

> Hello Gianluca
> Do you have a cluster private network?
> if your answer it's yes i recommend don't use heuristic because if your cluster public network goes down
> your cluster take a fencing loop
>
> Or you can do something better, use pacemaker+corosync

My cluster is RH EL 5.7 based. Pacemaker is not an option here...
And also, if I remember correctly, pacemaker in 6.2 is not officialy
supported yet. Probably in 6.3?
I do have a private network that is in place.
Here we are talking about heuristic to manage fencing decisions, based
on both quorum disk reachability and network serviceability (ping to
gateway for the production network must be ok)

see also (even if not so recent):
http://magazine.redhat.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/

But in the mean time I also found this thread:
http://osdir.com/ml/linux-cluster/2010-05/msg00081.html

and in fact during last weeks we got a nightly job consisting of a big
ftp to an external site and it could be related to my problem...

I have to evaluate if
ping -c3 -t3 -W1
could be a better option in our new situation during night


From jeff.sturm at eprize.com  Fri Mar  9 15:55:36 2012
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Fri, 9 Mar 2012 15:55:36 +0000
Subject: [Linux-cluster] Clustered LVM for storage
In-Reply-To: <CAPSBQNO5w11z5D7P72LTNfEnmWZ_u4X+yy+yY-AkHPQrEE9-Wg@mail.gmail.com>
References: <CAPSBQNMi53rtmw6_N5oPBTjU2ADU86vEwvvJ27MneMC94O0N-A@mail.gmail.com>
	<B1B9801C5CBC954680D0374CC4EEABA511BB66E3@MailNode2.eprize.local>
	<CAPSBQNO5w11z5D7P72LTNfEnmWZ_u4X+yy+yY-AkHPQrEE9-Wg@mail.gmail.com>
Message-ID: <B1B9801C5CBC954680D0374CC4EEABA511BB83A7@MailNode2.eprize.local>

Sure.   As long as you don't try to make use of any LVM features that require metadata consistency (mirroring, snapshots, online resizing, etc.) you can get by using LVM without clustering.

-Jeff

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Charles Riley
Sent: Friday, March 09, 2012 8:56 AM
To: linux clustering
Subject: Re: [Linux-cluster] Clustered LVM for storage


We're talking 36TB of storage here..  I've never had much success partitioning an array of that size (and actually being able to use all of the space) without lvm.  But I might give that a try.

Continuing along this line of thought:
If I stop access to the array from all of the servers before I make any changes, I could probably even make use of lvm without clustering.

Charles


On Thu, Mar 8, 2012 at 5:59 PM, Jeff Sturm <jeff.sturm at eprize.com<mailto:jeff.sturm at eprize.com>> wrote:
With aoe you can use old-fashioned disk partitioning.  Just run "parted" (or whatever partitioning tool you choose) and allocate storage for partitions as you see fit.

The benefits of doing this are:  Easier/simpler to setup than cluster suite, and you can still use all the spindles from your aoe target (for example by creating a large RAID-10 array across all disks).

The downside of partitions is they aren't easy to change.  You can add them safely while the storage array is in use, but each host needs to reload the partition table when you're done with changes before the new storage can be used, and that may not happen until you rmmod/modprobe the aoe driver, which you can't do while any partitions are in use, e.g. on mounted file systems.  And resizing partitions is tricky because they are allocated on consecutive sectors.

So if you want the flexibility of adding/removing/modifying volumes at any time, it may be worth the trouble to get Cluster Suite running so you can use CLVM.  If you just want to carve it up once and forget about it, partitioning the array will be the fastest.

-Jeff

From: linux-cluster-bounces at redhat.com<mailto:linux-cluster-bounces at redhat.com> [mailto:linux-cluster-bounces at redhat.com<mailto:linux-cluster-bounces at redhat.com>] On Behalf Of Charles Riley
Sent: Thursday, March 08, 2012 1:59 PM
To: linux cluster
Subject: [Linux-cluster] Clustered LVM for storage

Greetings,

I have an aoe device with a lot of storage in it that I would like to share among four rhel 4 servers.  Each of the servers will mount it's own storage, no data is shared between them.  e.g the servers won't be mounting the same volumes.

I could create four different raid groups on the aoe device and present a different one to each server, but that would waste space.
What I'd rather do is create one big raid group and use clustered lvm to divvy the space between servers.

Is it possible?  Would it be enough to run just the clustered lvm daemon, or would I need to install all of the cluster suite?
Are there other/better options?

Thanks!

Charles


--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Charles Riley | eRAD<http://www.erad.com> | Director of Technical Solutions | O: 864.640.8648 C: 864.881.1331
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120309/550f2c93/attachment.htm>

From emi2fast at gmail.com  Fri Mar  9 16:29:06 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Fri, 9 Mar 2012 17:29:06 +0100
Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd
In-Reply-To: <CAG2kNCw_a98=McB_PWUhPdKY-c-0NEAbua_yV7Qkn4Dzv6A1Pw@mail.gmail.com>
References: <CAG2kNCySEn6tr79C1Ez+PAKA2n8-vfrnxSQPp4UUC1PE3oorpA@mail.gmail.com>
	<CAG2kNCw_a98=McB_PWUhPdKY-c-0NEAbua_yV7Qkn4Dzv6A1Pw@mail.gmail.com>
Message-ID: <CAE7pJ3BzoLp7JqaaH5dR3B7FvfzV1Vd733MBExiKCZP36-xHdw@mail.gmail.com>

i'll try to be more clear

i work on redhat cluster from 2 years and i seen this topic so much times

this it's the problem with redhat cluster ping+quorum

If i have a two node cluster with public_net+private_net

I think it's normal my services switch if have the public network  down on
the node where the resource group was running,But But But with ping as
heuristic you get a node fence

And remember when you use a quorum disk on redhat cluster the fence
decision it's based on the ping or heuristic

Sorry i tell you something in Italian

Usare il ping sul qdisk aggiungi solo dei problemi al cluster

I nodi in un cluster devono essere fensati solo se perdono l'accesso al
qdisk o la rete privata non funziona, in caso la mia rete privata ? ok e i
nodo riesco a accedere ai diski ma rete publica non funziona le risorse
devono lo switchare

Tell me if you need more info about the ping

Il giorno 09 marzo 2012 16:44, Gianluca Cecchi
<gianluca.cecchi at gmail.com>ha scritto:

> On Fri, 9 Mar 2012 15:39:43 +0100, emmanuel segura wrote:
>
> > Hello Gianluca
> > Do you have a cluster private network?
> > if your answer it's yes i recommend don't use heuristic because if your
> cluster public network goes down
> > your cluster take a fencing loop
> >
> > Or you can do something better, use pacemaker+corosync
>
> My cluster is RH EL 5.7 based. Pacemaker is not an option here...
> And also, if I remember correctly, pacemaker in 6.2 is not officialy
> supported yet. Probably in 6.3?
> I do have a private network that is in place.
> Here we are talking about heuristic to manage fencing decisions, based
> on both quorum disk reachability and network serviceability (ping to
> gateway for the production network must be ok)
>
> see also (even if not so recent):
> http://magazine.redhat.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/
>
> But in the mean time I also found this thread:
> http://osdir.com/ml/linux-cluster/2010-05/msg00081.html
>
> and in fact during last weeks we got a nightly job consisting of a big
> ftp to an external site and it could be related to my problem...
>
> I have to evaluate if
> ping -c3 -t3 -W1
> could be a better option in our new situation during night
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120309/ad9d65f6/attachment.htm>

From ajb2 at mssl.ucl.ac.uk  Fri Mar  9 14:45:34 2012
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Fri, 09 Mar 2012 14:45:34 +0000
Subject: [Linux-cluster] Clustered LVM for storage
In-Reply-To: <B1B9801C5CBC954680D0374CC4EEABA511BB66E3@MailNode2.eprize.local>
References: <CAPSBQNMi53rtmw6_N5oPBTjU2ADU86vEwvvJ27MneMC94O0N-A@mail.gmail.com>
	<B1B9801C5CBC954680D0374CC4EEABA511BB66E3@MailNode2.eprize.local>
Message-ID: <4F5A178E.6070509@mssl.ucl.ac.uk>

On 08/03/12 22:59, Jeff Sturm wrote:

> The downside of partitions is they aren't easy to change.  You can add them safely while the storage array is in use, but each host needs to reload the partition table when you're done with changes before the new storage can be used, and that may not happen until you rmmod/modprobe the aoe driver

Partprobe fails?


From rohit2525 at gmail.com  Sat Mar 10 22:36:32 2012
From: rohit2525 at gmail.com (Rohit tripathi)
Date: Sun, 11 Mar 2012 04:06:32 +0530
Subject: [Linux-cluster] (no subject)
Message-ID: <CAG2tR2rGX0rbM1fTWa_u5g0c_55ns_ujTf+TK97fgxLFPwxwYA@mail.gmail.com>

Hi Team,

I need to know step by step procedure to create a GFS2 file system for my
cluster.

regards

Rohit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120311/5dc87551/attachment.htm>

From gianluca.cecchi at gmail.com  Mon Mar 12 11:26:20 2012
From: gianluca.cecchi at gmail.com (Gianluca Cecchi)
Date: Mon, 12 Mar 2012 12:26:20 +0100
Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd
In-Reply-To: <CAG2kNCw_a98=McB_PWUhPdKY-c-0NEAbua_yV7Qkn4Dzv6A1Pw@mail.gmail.com>
References: <CAG2kNCySEn6tr79C1Ez+PAKA2n8-vfrnxSQPp4UUC1PE3oorpA@mail.gmail.com>
	<CAG2kNCw_a98=McB_PWUhPdKY-c-0NEAbua_yV7Qkn4Dzv6A1Pw@mail.gmail.com>
Message-ID: <CAG2kNCwyMX_WSQqZJascyJkww9QpXrYxdfk_1SRYu6z4H_xdfg@mail.gmail.com>

On Fri, 9 Mar 2012 17:29:06 +0100 emmanuel segura wrote:
> i'll try to be more clear
> i work on redhat cluster from 2 years and i seen this topic so much times

Sorry, I didn't want to offend anyone.
I have been working on rhcs (and other companions from other OSes) for
many years too...

> I think it's normal my services switch if have the public network  down on the node where
> the resource group was running,But But But with ping as heuristic you get a node fence

AFAIK rhcs is not able to switch service if the server looses its connectivity.
Better: the /usr/share/cluster/ip.sh resource definition contains the
parameter monitor_link, but it is only for dead link on the nic..
And I have to manage rhcs...

So in my opinion if you want to test gateway reachability (that means
production lan where you deliver a cluster service) you are at the
moment forced to use heuristic or write your own resource to add to
the ones composing the service and so causing a service switch in case
of problems with this custom resource...
but I could be wrong in my assumption...

Cheers,

Gianluca


From emi2fast at gmail.com  Mon Mar 12 11:56:56 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Mon, 12 Mar 2012 12:56:56 +0100
Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd
In-Reply-To: <CAG2kNCwyMX_WSQqZJascyJkww9QpXrYxdfk_1SRYu6z4H_xdfg@mail.gmail.com>
References: <CAG2kNCySEn6tr79C1Ez+PAKA2n8-vfrnxSQPp4UUC1PE3oorpA@mail.gmail.com>
	<CAG2kNCw_a98=McB_PWUhPdKY-c-0NEAbua_yV7Qkn4Dzv6A1Pw@mail.gmail.com>
	<CAG2kNCwyMX_WSQqZJascyJkww9QpXrYxdfk_1SRYu6z4H_xdfg@mail.gmail.com>
Message-ID: <CAE7pJ3AzDJW4c-evMnudbvjgWac5d8yXYuDnguQFNGh34Dw07g@mail.gmail.com>

I know the cluster agent /usr/share/cluster/ip.sh cannot check the gateway

I resolved this problem with one script in my service group, so when the
script fail the resource switch

========================================================
 <script file="/usr/share/cluster/neighbour_RMAN.bash"
name="neigh_gtw_rman"/>
========================================================
script
========================================================
#!/bin/bash


NEIGHBOUR=xx.xxx.xxx.x

LC_ALL=C
LANG=C
PATH=/bin:/sbin:/usr/bin:/usr/sbin
export LC_ALL LANG PATH

. $(dirname $0)/ocf-shellfuncs


case $1 in
start)
        exit $OCF_SUCCESS
        ;;
stop)
        exit $OCF_SUCCESS
        ;;
status|monitor)
        /bin/ping -c 5 -w 6 -t 4 $NEIGHBOUR
        if [ $? -ne 0 ];
        then
                ocf_log warn "Damn, failed to ping $NEIGHBOUR"
                exit $OCF_ERR_GENERIC
        fi

        ocf_log debug "Ping to $NEIGHBOUR succeeded :]"
        exit $OCF_SUCCESS
        ;;
restart)
        exit $OCF_SUCCESS
        ;;
*)
        echo "usage: $0 [start|stop|status|restart|meta-data]"
        exit $OCF_ERR_ARGS
        ;;
esac
=====================================================

Il giorno 12 marzo 2012 12:26, Gianluca Cecchi
<gianluca.cecchi at gmail.com>ha scritto:

> On Fri, 9 Mar 2012 17:29:06 +0100 emmanuel segura wrote:
> > i'll try to be more clear
> > i work on redhat cluster from 2 years and i seen this topic so much times
>
> Sorry, I didn't want to offend anyone.
> I have been working on rhcs (and other companions from other OSes) for
> many years too...
>
> > I think it's normal my services switch if have the public network  down
> on the node where
> > the resource group was running,But But But with ping as heuristic you
> get a node fence
>
> AFAIK rhcs is not able to switch service if the server looses its
> connectivity.
> Better: the /usr/share/cluster/ip.sh resource definition contains the
> parameter monitor_link, but it is only for dead link on the nic..
> And I have to manage rhcs...
>
> So in my opinion if you want to test gateway reachability (that means
> production lan where you deliver a cluster service) you are at the
> moment forced to use heuristic or write your own resource to add to
> the ones composing the service and so causing a service switch in case
> of problems with this custom resource...
> but I could be wrong in my assumption...
>
> Cheers,
>
> Gianluca
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120312/b43ac420/attachment.htm>

From xavier.montagutelli at unilim.fr  Mon Mar 12 17:40:36 2012
From: xavier.montagutelli at unilim.fr (Xavier Montagutelli)
Date: Mon, 12 Mar 2012 18:40:36 +0100
Subject: [Linux-cluster] Clustered LVM for storage
In-Reply-To: <B1B9801C5CBC954680D0374CC4EEABA511BB83A7@MailNode2.eprize.local>
References: <CAPSBQNMi53rtmw6_N5oPBTjU2ADU86vEwvvJ27MneMC94O0N-A@mail.gmail.com>
	<CAPSBQNO5w11z5D7P72LTNfEnmWZ_u4X+yy+yY-AkHPQrEE9-Wg@mail.gmail.com>
	<B1B9801C5CBC954680D0374CC4EEABA511BB83A7@MailNode2.eprize.local>
Message-ID: <201203121840.36717.xavier.montagutelli@unilim.fr>

On Friday 09 March 2012 16:55:36 Jeff Sturm wrote:
> Sure.   As long as you don't try to make use of any LVM features that
> require metadata consistency (mirroring, snapshots, online resizing, etc.)
> you can get by using LVM without clustering.

Yes, particulary using the (relatively new) options of lvchange / vgchange :

--refresh
   If any logical volume in the volume group is active, reload its metadata.  
   This is not necessary in normal operation, but may be use-
   ful if something has gone wrong or if you're doing clustering manually 
   without a clustered lock manager.


> 
> -Jeff
> 
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Charles Riley Sent:
> Friday, March 09, 2012 8:56 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] Clustered LVM for storage
> 
> 
> We're talking 36TB of storage here..  I've never had much success
> partitioning an array of that size (and actually being able to use all of
> the space) without lvm.  But I might give that a try.
> 
> Continuing along this line of thought:
> If I stop access to the array from all of the servers before I make any
> changes, I could probably even make use of lvm without clustering.
> 
> Charles
> 
> 
> On Thu, Mar 8, 2012 at 5:59 PM, Jeff Sturm
> <jeff.sturm at eprize.com<mailto:jeff.sturm at eprize.com>> wrote: With aoe you
> can use old-fashioned disk partitioning.  Just run "parted" (or whatever
> partitioning tool you choose) and allocate storage for partitions as you
> see fit.
> 
> The benefits of doing this are:  Easier/simpler to setup than cluster
> suite, and you can still use all the spindles from your aoe target (for
> example by creating a large RAID-10 array across all disks).
> 
> The downside of partitions is they aren't easy to change.  You can add them
> safely while the storage array is in use, but each host needs to reload
> the partition table when you're done with changes before the new storage
> can be used, and that may not happen until you rmmod/modprobe the aoe
> driver, which you can't do while any partitions are in use, e.g. on
> mounted file systems.  And resizing partitions is tricky because they are
> allocated on consecutive sectors.
> 
> So if you want the flexibility of adding/removing/modifying volumes at any
> time, it may be worth the trouble to get Cluster Suite running so you can
> use CLVM.  If you just want to carve it up once and forget about it,
> partitioning the array will be the fastest.
> 
> -Jeff
> 
> From:
> linux-cluster-bounces at redhat.com<mailto:linux-cluster-bounces at redhat.com>
> [mailto:linux-cluster-bounces at redhat.com<mailto:linux-cluster-bounces at redh
> at.com>] On Behalf Of Charles Riley Sent: Thursday, March 08, 2012 1:59 PM
> To: linux cluster
> Subject: [Linux-cluster] Clustered LVM for storage
> 
> Greetings,
> 
> I have an aoe device with a lot of storage in it that I would like to share
> among four rhel 4 servers.  Each of the servers will mount it's own
> storage, no data is shared between them.  e.g the servers won't be
> mounting the same volumes.
> 
> I could create four different raid groups on the aoe device and present a
> different one to each server, but that would waste space. What I'd rather
> do is create one big raid group and use clustered lvm to divvy the space
> between servers.
> 
> Is it possible?  Would it be enough to run just the clustered lvm daemon,
> or would I need to install all of the cluster suite? Are there
> other/better options?
> 
> Thanks!
> 
> Charles
> 
> 
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> --
> Charles Riley | eRAD<http://www.erad.com> | Director of Technical Solutions
> | O: 864.640.8648 C: 864.881.1331

-- 
Xavier Montagutelli
http://twitter.com/#!/XMontagutelli
Service Commun Informatique - Universite de Limoges
123, avenue Albert Thomas - 87060 Limoges cedex
Tel : +33 (0)5 55 45 77 20 /   Fax : +33 (0)5 55 45 75 95


From arnold at arnoldarts.de  Mon Mar 12 19:52:58 2012
From: arnold at arnoldarts.de (Arnold Krille)
Date: Mon, 12 Mar 2012 20:52:58 +0100
Subject: [Linux-cluster] Clustered LVM for storage
In-Reply-To: <201203121840.36717.xavier.montagutelli@unilim.fr>
References: <CAPSBQNMi53rtmw6_N5oPBTjU2ADU86vEwvvJ27MneMC94O0N-A@mail.gmail.com>
	<B1B9801C5CBC954680D0374CC4EEABA511BB83A7@MailNode2.eprize.local>
	<201203121840.36717.xavier.montagutelli@unilim.fr>
Message-ID: <2574470.vKcIHoqpkN@yukon>

On Monday 12 March 2012 18:40:36 Xavier Montagutelli wrote:
> On Friday 09 March 2012 16:55:36 Jeff Sturm wrote:
> > Sure.   As long as you don't try to make use of any LVM features that
> > require metadata consistency (mirroring, snapshots, online resizing, etc.)
> > you can get by using LVM without clustering.
> 
> Yes, particulary using the (relatively new) options of lvchange / vgchange :
> 
> --refresh
>    If any logical volume in the volume group is active, reload its metadata.
> This is not necessary in normal operation, but may be use-
>    ful if something has gone wrong or if you're doing clustering manually
>    without a clustered lock manager.

I think when you disable the cache in all the /etc/lvm/lvm.conf, the result 
should be the same...

Have fun,

Arnold
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120312/bececfb8/attachment.sig>

From nicolas at ecarnot.net  Wed Mar 21 10:43:26 2012
From: nicolas at ecarnot.net (Nicolas Ecarnot)
Date: Wed, 21 Mar 2012 11:43:26 +0100
Subject: [Linux-cluster] Fencing: Prevent rebooting halted node
Message-ID: <4F69B0CE.7020809@ecarnot.net>

Hi,

We are setting up a new cluster and we still have tests and questions.
At present, our cluster is two nodes only, with a very simple setup.
fencing is done with fence_ipmilan, and the only action we do is rebooting.
Today, I tried to completely switch both nodes off, then boot up node 1.
It perfectly boots up and serves as it should.
But detecting the missing one, fencing is ran on node 2 and boots it up.

I would like to avoid that, and keep the stopped nodes stopped.

I don't know if there's a way I could improve my cluster.conf to do that?
Either improve my fencedevice command, but I did not find many more 
option in the fence_ipmilan man page...
Either there's a way to first do a test (?) before doing any further action?

I'd be glad to read your advice.

My setup :

[...]

<clusternodes>
  <clusternode name="node1" nodeid="1">
   <fence>
    <method name="1">
     <device name="node1_ipmi" action="reboot"/>
    </method>
   </fence>
  </clusternode>
  <clusternode name="node2" nodeid="2">
   <fence>
    <method name="1">
     <device name="node2_ipmi" action="reboot"/>
    </method>
   </fence>
  </clusternode>
</clusternodes>

<fencedevices>
  <fencedevice name="node1_ipmi" agent="fence_ipmilan" ipaddr="c-node1" 
login="alogin" passwd="apwd"/>
  <fencedevice name="node2_ipmi" agent="fence_ipmilan" ipaddr="c-node2" 
login="alogin" passwd="apwd"/>
</fencedevices>

[...]

Regards,

-- 
Nicolas Ecarnot


From emi2fast at gmail.com  Wed Mar 21 11:23:47 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Wed, 21 Mar 2012 12:23:47 +0100
Subject: [Linux-cluster] Fencing: Prevent rebooting halted node
In-Reply-To: <4F69B0CE.7020809@ecarnot.net>
References: <4F69B0CE.7020809@ecarnot.net>
Message-ID: <CAE7pJ3Cr-AKMNSKuE5yM8SXw2RFP4fCkd-FjM=pwhZ1g2G-RwQ@mail.gmail.com>

Hello Nicolas

The first i can recommend it's use a quorum disk and for your problem i
know that it's called fencing-loop

you got two choices

1:don't put the daemons cluster at boot time
2:If you use a qdisk you can use clean_start="1" in the fence_daemon tag,
this can be used if you are not using gfs or gfs2 or
<cmantwo_node="1"expected_votes="1"/>


Il giorno 21 marzo 2012 11:43, Nicolas Ecarnot <nicolas at ecarnot.net> ha
scritto:

> Hi,
>
> We are setting up a new cluster and we still have tests and questions.
> At present, our cluster is two nodes only, with a very simple setup.
> fencing is done with fence_ipmilan, and the only action we do is rebooting.
> Today, I tried to completely switch both nodes off, then boot up node 1.
> It perfectly boots up and serves as it should.
> But detecting the missing one, fencing is ran on node 2 and boots it up.
>
> I would like to avoid that, and keep the stopped nodes stopped.
>
> I don't know if there's a way I could improve my cluster.conf to do that?
> Either improve my fencedevice command, but I did not find many more option
> in the fence_ipmilan man page...
> Either there's a way to first do a test (?) before doing any further
> action?
>
> I'd be glad to read your advice.
>
> My setup :
>
> [...]
>
> <clusternodes>
>  <clusternode name="node1" nodeid="1">
>  <fence>
>   <method name="1">
>    <device name="node1_ipmi" action="reboot"/>
>   </method>
>  </fence>
>  </clusternode>
>  <clusternode name="node2" nodeid="2">
>  <fence>
>   <method name="1">
>    <device name="node2_ipmi" action="reboot"/>
>   </method>
>  </fence>
>  </clusternode>
> </clusternodes>
>
> <fencedevices>
>  <fencedevice name="node1_ipmi" agent="fence_ipmilan" ipaddr="c-node1"
> login="alogin" passwd="apwd"/>
>  <fencedevice name="node2_ipmi" agent="fence_ipmilan" ipaddr="c-node2"
> login="alogin" passwd="apwd"/>
> </fencedevices>
>
> [...]
>
> Regards,
>
> --
> Nicolas Ecarnot
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/**mailman/listinfo/linux-cluster<https://www.redhat.com/mailman/listinfo/linux-cluster>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120321/de3aaf3b/attachment.htm>

From nicolas at ecarnot.net  Wed Mar 21 12:17:08 2012
From: nicolas at ecarnot.net (Nicolas Ecarnot)
Date: Wed, 21 Mar 2012 13:17:08 +0100
Subject: [Linux-cluster] Fencing: Prevent rebooting halted node
In-Reply-To: <CAE7pJ3Cr-AKMNSKuE5yM8SXw2RFP4fCkd-FjM=pwhZ1g2G-RwQ@mail.gmail.com>
References: <4F69B0CE.7020809@ecarnot.net>
	<CAE7pJ3Cr-AKMNSKuE5yM8SXw2RFP4fCkd-FjM=pwhZ1g2G-RwQ@mail.gmail.com>
Message-ID: <4F69C6C4.2020904@ecarnot.net>

Le 21/03/2012 12:23, emmanuel segura a ?crit :
> Hello Nicolas

Hello Emmanuel,

> The first i can recommend it's use a quorum disk

I do use a quorum disk. I did not mention it as I thought this had 
nothing to do with my issue.

> and for your problem i
> know that it's called fencing-loop

Ok, I'll search on that subject.

> you got two choices
>
> 1:don't put the daemons cluster at boot time

In fact, I do want the cluster daemons to be ran at boot time. This is a 
good behaviour.
I just want dead nodes to remain dead, until manual intervention.

> 2:If you use a qdisk you can use clean_start="1" in the fence_daemon
> tag, this can be used if you are not using gfs or gfs2 or
> <cmantwo_node="1"expected_votes="1"/>

I am using gfs2...


To cope with my problem, I was looking whether it was possible to call 
my own script in cluster.conf? I don't know if this is permitted and 
what return code is expected?

-- 
Nicolas Ecarnot


From lists at alteeve.ca  Wed Mar 21 13:18:12 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 21 Mar 2012 09:18:12 -0400
Subject: [Linux-cluster] Fencing: Prevent rebooting halted node
In-Reply-To: <4F69B0CE.7020809@ecarnot.net>
References: <4F69B0CE.7020809@ecarnot.net>
Message-ID: <4F69D514.7010307@alteeve.ca>

On 03/21/2012 06:43 AM, Nicolas Ecarnot wrote:
> Hi,
> 
> We are setting up a new cluster and we still have tests and questions.
> At present, our cluster is two nodes only, with a very simple setup.
> fencing is done with fence_ipmilan, and the only action we do is rebooting.
> Today, I tried to completely switch both nodes off, then boot up node 1.
> It perfectly boots up and serves as it should.
> But detecting the missing one, fencing is ran on node 2 and boots it up.
> 
> I would like to avoid that, and keep the stopped nodes stopped.
> 
> I don't know if there's a way I could improve my cluster.conf to do that?
> Either improve my fencedevice command, but I did not find many more
> option in the fence_ipmilan man page...
> Either there's a way to first do a test (?) before doing any further
> action?
> 
> I'd be glad to read your advice.

On first startup (whether started manually or via init.d), the node does
not know the state of it's peer. As such, it can't safely start services
until it fences the peer. This is by design.

If it just assumed it's peer was down and started services, it could
well cause a split-brain. The only way to avoid this is by putting the
peer into a known state, which fencing does (ensures that it's not
running and thus safe to proceed).

Once option is to change the fence action from "reboot" to "off". This
would avoid booting the peer while still allowing the running machine to
proceed safely. Note though that, in the event of a real crash of a
node, the cluster will only power off the node. You will have to
manually restart the fenced node.

-- 
Alteeve's Niche!
Madison Kelly        647-501-5200
Papers and Projects: https://alteeve.com


From lists at alteeve.ca  Wed Mar 21 13:20:51 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 21 Mar 2012 09:20:51 -0400
Subject: [Linux-cluster] Fencing: Prevent rebooting halted node
In-Reply-To: <CAE7pJ3Cr-AKMNSKuE5yM8SXw2RFP4fCkd-FjM=pwhZ1g2G-RwQ@mail.gmail.com>
References: <4F69B0CE.7020809@ecarnot.net>
	<CAE7pJ3Cr-AKMNSKuE5yM8SXw2RFP4fCkd-FjM=pwhZ1g2G-RwQ@mail.gmail.com>
Message-ID: <4F69D5B3.9020907@alteeve.ca>

On 03/21/2012 07:23 AM, emmanuel segura wrote:
> Hello Nicolas
> 
> The first i can recommend it's use a quorum disk and for your problem i
> know that it's called fencing-loop

Quorum disk works only when someone has a SAN proper, and even then, I
am pretty sure that a node in an unknown state will still need to be fenced.

> you got two choices
> 
> 1:don't put the daemons cluster at boot time

This delays the issue, not avoid it. When Nicolas starts cman, it will
still fence the peer.

> 2:If you use a qdisk you can use clean_start="1" in the fence_daemon
> tag, this can be used if you are not using gfs or gfs2 or
> <cmantwo_node="1"expected_votes="1"/>

I feel *very* uncomfortable with the 'clean_start="1"' option. Making
any kind of assumption about the state of a node in a cluster is asking
for trouble.

-- 
Alteeve's Niche!
Madison Kelly        647-501-5200
Papers and Projects: https://alteeve.com


From kkovachev at varna.net  Wed Mar 21 13:47:53 2012
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Wed, 21 Mar 2012 15:47:53 +0200
Subject: [Linux-cluster] Fencing: Prevent rebooting halted node
In-Reply-To: <4F69B0CE.7020809@ecarnot.net>
References: <4F69B0CE.7020809@ecarnot.net>
Message-ID: <31dad739c8b89a241241aebfdacb1287@mx.varna.net>

Hi,
On Wed, 21 Mar 2012 11:43:26 +0100, Nicolas Ecarnot <nicolas at ecarnot.net>
wrote:
> Hi,
> 
> We are setting up a new cluster and we still have tests and questions.
> At present, our cluster is two nodes only, with a very simple setup.
> fencing is done with fence_ipmilan, and the only action we do is
rebooting.
> Today, I tried to completely switch both nodes off, then boot up node 1.
> It perfectly boots up and serves as it should.
> But detecting the missing one, fencing is ran on node 2 and boots it up.
> 
> I would like to avoid that, and keep the stopped nodes stopped.
> 

if method=cycle does not help, you may create a wrapper script that will
first check the status and if it is OFF simply return success or if it is
ON to call fence_ipmilan with reboot

> I don't know if there's a way I could improve my cluster.conf to do
that?
> Either improve my fencedevice command, but I did not find many more 
> option in the fence_ipmilan man page...
> Either there's a way to first do a test (?) before doing any further
> action?
> 
> I'd be glad to read your advice.
> 
> My setup :
> 
> [...]
> 
> <clusternodes>
>   <clusternode name="node1" nodeid="1">
>    <fence>
>     <method name="1">
>      <device name="node1_ipmi" action="reboot"/>
>     </method>
>    </fence>
>   </clusternode>
>   <clusternode name="node2" nodeid="2">
>    <fence>
>     <method name="1">
>      <device name="node2_ipmi" action="reboot"/>
>     </method>
>    </fence>
>   </clusternode>
> </clusternodes>
> 
> <fencedevices>
>   <fencedevice name="node1_ipmi" agent="fence_ipmilan" ipaddr="c-node1" 
> login="alogin" passwd="apwd"/>
>   <fencedevice name="node2_ipmi" agent="fence_ipmilan" ipaddr="c-node2" 
> login="alogin" passwd="apwd"/>
> </fencedevices>
> 
> [...]
> 
> Regards,


From nicolas at ecarnot.net  Wed Mar 21 14:08:29 2012
From: nicolas at ecarnot.net (Nicolas Ecarnot)
Date: Wed, 21 Mar 2012 15:08:29 +0100
Subject: [Linux-cluster] Fencing: Prevent rebooting halted node
In-Reply-To: <31dad739c8b89a241241aebfdacb1287@mx.varna.net>
References: <4F69B0CE.7020809@ecarnot.net>
	<31dad739c8b89a241241aebfdacb1287@mx.varna.net>
Message-ID: <4F69E0DD.5040004@ecarnot.net>

Le 21/03/2012 14:47, Kaloyan Kovachev a ?crit :
> if method=cycle does not help, you may create a wrapper script that will
> first check the status and if it is OFF simply return success or if it is
> ON to call fence_ipmilan with reboot

I AM presently finishing to write it.
I asked it anyway, assuming I was missing something, my cluster 
experience is quite recent.

Thank to all for your answers.

-- 
Nicolas Ecarnot


From nicolas at ecarnot.net  Wed Mar 21 17:51:07 2012
From: nicolas at ecarnot.net (Nicolas Ecarnot)
Date: Wed, 21 Mar 2012 18:51:07 +0100
Subject: [Linux-cluster] Fencing: Prevent rebooting halted node
In-Reply-To: <4F69E0DD.5040004@ecarnot.net>
References: <4F69B0CE.7020809@ecarnot.net>
	<31dad739c8b89a241241aebfdacb1287@mx.varna.net>
	<4F69E0DD.5040004@ecarnot.net>
Message-ID: <4F6A150B.2030909@ecarnot.net>

Le 21/03/2012 15:08, Nicolas Ecarnot a ?crit :
> Le 21/03/2012 14:47, Kaloyan Kovachev a ?crit :
>> if method=cycle does not help, you may create a wrapper script that will
>> first check the status and if it is OFF simply return success or if it is
>> ON to call fence_ipmilan with reboot
>
> I AM presently finishing to write it.
> I asked it anyway, assuming I was missing something, my cluster
> experience is quite recent.
>
> Thank to all for your answers.
>

Ok, just to leave a footprint for future askers, my wrapper is written 
and succesfully tested, and available to anyone who may need it.

Regards,

-- 
Nicolas Ecarnot


From parvez.h.shaikh at gmail.com  Mon Mar 26 12:09:37 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Mon, 26 Mar 2012 17:39:37 +0530
Subject: [Linux-cluster] [TOTEM] The consensus timeout expired.
Message-ID: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>

Hi all,

I have a cluster with two blades in IBM  BladeCenter. Following error is
appearing when I start cman service and it keep repeating the message
/var/log/messages -

 openais[10770]: [TOTEM] The consensus timeout expired.
 openais[10770]: [TOTEM] entering GATHER state from 3.


Heart beating IP is available on the blade and link to blade2 is also fine.
Cluster on blade2 is not running.

Services like iptables and portmap are also down.

Has anyone encountered such error and resolved it?

I am using RHEL 5.5

cman_tool version
> 6.2.0 config 1
>


Thanks,
Parvez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120326/240731ff/attachment.htm>

From ming-ming.chen at hp.com  Tue Mar 27 23:09:57 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Tue, 27 Mar 2012 23:09:57 +0000
Subject: [Linux-cluster] Where to find information on HA-LVM
In-Reply-To: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
Message-ID: <1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>

Hi,

IN RHEL 6 release, it says that :

* If you are using a clustered system for failover where only a single node that accesses the storage

is active at any one time, you should use High Availability Logical Volume Management agents (HALVM).


That is exactly what I want to do, but could not find any information about HALVM, and if it is included in the RHEL 6.

Any information will be appreciated.

Regards

Ming


From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Parvez Shaikh
Sent: Monday, March 26, 2012 5:10 AM
To: linux clustering
Subject: [Linux-cluster] [TOTEM] The consensus timeout expired.


Hi all,

I have a cluster with two blades in IBM  BladeCenter. Following error is appearing when I start cman service and it keep repeating the message /var/log/messages -

 openais[10770]: [TOTEM] The consensus timeout expired.
 openais[10770]: [TOTEM] entering GATHER state from 3.


Heart beating IP is available on the blade and link to blade2 is also fine. Cluster on blade2 is not running.

Services like iptables and portmap are also down.

Has anyone encountered such error and resolved it?

I am using RHEL 5.5

cman_tool version
6.2.0 config 1


Thanks,
Parvez

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120327/cd18f9b1/attachment.htm>

From corey.kovacs at gmail.com  Wed Mar 28 03:20:58 2012
From: corey.kovacs at gmail.com (Corey Kovacs)
Date: Tue, 27 Mar 2012 21:20:58 -0600
Subject: [Linux-cluster] Where to find information on HA-LVM
In-Reply-To: <1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
Message-ID: <CAMH2m-ohQAeq0vB4sA4eLzUtRTo4yhPUWQxa-nN7LhrSemWqew@mail.gmail.com>

If you can post you config, that would go a long way towards helping you
out. otherwise you are asking people to guess at your problem as there
could be many, or just something small.

-C

On Tue, Mar 27, 2012 at 5:09 PM, Chen, Ming Ming <ming-ming.chen at hp.com>wrote:

>  Hi,****
>
> IN RHEL 6 release, it says that :****
>
> ? If you are using a clustered system for failover where only a single
> node that accesses the storage****
>
> is active at any one time, you should use High Availability Logical Volume
> Management agents (HALVM).****
>
> ** **
>
> That is exactly what I want to do, but could not find any information
> about HALVM, and if it is included in the RHEL 6.****
>
> Any information will be appreciated.****
>
> Regards****
>
> Ming****
>
> ** **
>
> ** **
>
> *From:* linux-cluster-bounces at redhat.com [mailto:
> linux-cluster-bounces at redhat.com] *On Behalf Of *Parvez Shaikh
> *Sent:* Monday, March 26, 2012 5:10 AM
> *To:* linux clustering
> *Subject:* [Linux-cluster] [TOTEM] The consensus timeout expired.****
>
> ** **
>
> Hi all,
>
> I have a cluster with two blades in IBM  BladeCenter. Following error is
> appearing when I start cman service and it keep repeating the message
> /var/log/messages -****
>
>  openais[10770]: [TOTEM] The consensus timeout expired.
>  openais[10770]: [TOTEM] entering GATHER state from 3.****
>
>
> Heart beating IP is available on the blade and link to blade2 is also
> fine. Cluster on blade2 is not running.
>
> Services like iptables and portmap are also down.
>
> Has anyone encountered such error and resolved it?
>
> I am using RHEL 5.5****
>
> cman_tool version
> 6.2.0 config 1****
>
>
>
>
> Thanks,
> Parvez****
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120327/0c5b4bcd/attachment.htm>

From Chris.Jankowski at hp.com  Wed Mar 28 05:51:07 2012
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Wed, 28 Mar 2012 06:51:07 +0100
Subject: [Linux-cluster] Where to find information on HA-LVM
In-Reply-To: <1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
Message-ID: <036B68E61A28CA49AC2767596576CD597578CF33D0@GVW1113EXC.americas.hpqcorp.net>

Ming,

I have never seen HA-LVM properly described either.

Here is a little bit:

http://www.nxnt.org/2010/09/redhat-cluster-howto/

The notion of tags is crucial to understanding how HA-LVM works.  It worked pretty well the last time I used it about 2 years ago, but I did not do a lot of testing.  There are bugs against it listed.  I am not sure if they have been fixed.

I suggest that you use Google and search for HA-LVM tags.  This should narrow the results of the search.

Could I ask you to publish the list of most relevant information of HA-LVM that you'd find on this list, please?  We'll all benefit.

Regards,

Chris Jankowski

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chen, Ming Ming
Sent: Wednesday, 28 March 2012 10:10
To: linux clustering
Subject: [Linux-cluster] Where to find information on HA-LVM

Hi,
IN RHEL 6 release, it says that :
* If you are using a clustered system for failover where only a single node that accesses the storage
is active at any one time, you should use High Availability Logical Volume Management agents (HALVM).

That is exactly what I want to do, but could not find any information about HALVM, and if it is included in the RHEL 6.
Any information will be appreciated.
Regards
Ming


From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Parvez Shaikh
Sent: Monday, March 26, 2012 5:10 AM
To: linux clustering
Subject: [Linux-cluster] [TOTEM] The consensus timeout expired.

Hi all,

I have a cluster with two blades in IBM  BladeCenter. Following error is appearing when I start cman service and it keep repeating the message /var/log/messages -
 openais[10770]: [TOTEM] The consensus timeout expired.
 openais[10770]: [TOTEM] entering GATHER state from 3.

Heart beating IP is available on the blade and link to blade2 is also fine. Cluster on blade2 is not running.

Services like iptables and portmap are also down.

Has anyone encountered such error and resolved it?

I am using RHEL 5.5
cman_tool version
6.2.0 config 1


Thanks,
Parvez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120328/e658ac64/attachment.htm>

From parvez.h.shaikh at gmail.com  Wed Mar 28 06:14:45 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 28 Mar 2012 11:44:45 +0530
Subject: [Linux-cluster] Two clusters in the network with same "name" in
	cluster.conf
Message-ID: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>

Hi experts,

I am running in to a problem in a situation where two clusters in the
network have same name.

Node A, Node B : cluster name CLUSTER
Node C, Node D : cluster name CLUSTER

Node C and Node D's cluster is running fine however when I start node A it
copies /etc/cluster.conf (I believe ccsd) from either C and D and replaces
its own and start attempting a cluster, of course it fails because C or D
is not a host name of node A.

Is it possible to have more than one clusters in same network, these nodes
are in IBM blades hosted in Bladecenter and the heart beat IPs are on
private network.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120328/8bc9e97d/attachment.htm>

From lists at alteeve.ca  Wed Mar 28 06:30:13 2012
From: lists at alteeve.ca (Digimer)
Date: Tue, 27 Mar 2012 23:30:13 -0700
Subject: [Linux-cluster] Two clusters in the network with same "name" in
 cluster.conf
In-Reply-To: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>
References: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>
Message-ID: <4F72AFF5.7020701@alteeve.ca>

On 03/27/2012 11:14 PM, Parvez Shaikh wrote:
> Hi experts,
> 
> I am running in to a problem in a situation where two clusters in the
> network have same name.
> 
> Node A, Node B : cluster name CLUSTER
> Node C, Node D : cluster name CLUSTER
> 
> Node C and Node D's cluster is running fine however when I start node A
> it copies /etc/cluster.conf (I believe ccsd) from either C and D and
> replaces its own and start attempting a cluster, of course it fails
> because C or D is not a host name of node A.
> 
> Is it possible to have more than one clusters in same network, these
> nodes are in IBM blades hosted in Bladecenter and the heart beat IPs are
> on private network.
> 
> Thanks

If the clusters are different and on the same subnet, then the cluster
name must be unique.

-- 
Digimer
Papers and Projects: https://alteeve.com


From fdinitto at redhat.com  Wed Mar 28 06:43:23 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 28 Mar 2012 08:43:23 +0200
Subject: [Linux-cluster] Two clusters in the network with same "name" in
 cluster.conf
In-Reply-To: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>
References: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>
Message-ID: <4F72B30B.2000409@redhat.com>

On 3/28/2012 8:14 AM, Parvez Shaikh wrote:
> Hi experts,
> 
> I am running in to a problem in a situation where two clusters in the
> network have same name.
> 
> Node A, Node B : cluster name CLUSTER
> Node C, Node D : cluster name CLUSTER
> 
> Node C and Node D's cluster is running fine however when I start node A
> it copies /etc/cluster.conf (I believe ccsd) from either C and D and
> replaces its own and start attempting a cluster, of course it fails
> because C or D is not a host name of node A.
> 
> Is it possible to have more than one clusters in same network, these
> nodes are in IBM blades hosted in Bladecenter and the heart beat IPs are
> on private network.

Uh.. you can have multiple clusters on the same network as long as they
have different names. Or you need to isolate them on different networks
if you need to keep the same name.

Fabio


From parvez.h.shaikh at gmail.com  Wed Mar 28 07:00:16 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 28 Mar 2012 12:30:16 +0530
Subject: [Linux-cluster] Two clusters in the network with same "name" in
	cluster.conf
In-Reply-To: <4F72B30B.2000409@redhat.com>
References: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>
	<4F72B30B.2000409@redhat.com>
Message-ID: <CAKrd533ovqW=Brd2TxbfJyCMX6xAoGcpTWDi1kGAT9gb2jeucw@mail.gmail.com>

Thanks Fabio & Digimer,

Just to add more information that each node has 4 NIC cards out of which -

eth0 is public / management IP their addresses are falling on the same
subnet

eth1 is private address and they are privately connected (node A - node B)
and (node C - node D)

Cluster.conf mentions hostname(from /etc/hosts) which corresponds to IP
addresses of eth1 (not globally accessible name of eth0's IP)

Even in situation such as above, cluster names should be unique? I would
assume that ccsd would use only the node names given in cluster.conf for
communication.

Does ccsd use cluster name to locate other members of cluster and get copy
of cluster configuration file from those?

Following messages are seen in /var/log/messages -

Mar 28 11:57:07 blade4-2 ccsd[6821]: Starting ccsd 2.0.115:
Mar 28 11:57:07 blade4-2 ccsd[6821]:  Built: Mar 16 2010 10:28:57
Mar 28 11:57:07 blade4-2 ccsd[6821]:  Copyright (C) Red Hat, Inc.  2004
All rights reserved.
Mar 28 11:57:07 blade4-2 ccsd[6821]: cluster.conf (cluster name = Cluster,
version = 1) found.
Mar 28 11:57:07 blade4-2 ccsd[6821]: Remote copy of cluster.conf is from
quorate node.
Mar 28 11:57:07 blade4-2 ccsd[6821]:  Local version # : 1
Mar 28 11:57:07 blade4-2 ccsd[6821]:  Remote version #: 1
Mar 28 11:57:07 blade4-2 ccsd[6821]: Remote copy of cluster.conf is from
quorate node.
Mar 28 11:57:07 blade4-2 ccsd[6821]:  Local version # : 1
Mar 28 11:57:07 blade4-2 ccsd[6821]:  Remote version #: 1
Mar 28 11:57:07 blade4-2 ccsd[6821]: Remote copy of cluster.conf is from
quorate node.
Mar 28 11:57:07 blade4-2 ccsd[6821]:  Local version # : 1
Mar 28 11:57:07 blade4-2 ccsd[6821]:  Remote version #: 1


On Wed, Mar 28, 2012 at 12:13 PM, Fabio M. Di Nitto <fdinitto at redhat.com>wrote:

> On 3/28/2012 8:14 AM, Parvez Shaikh wrote:
> > Hi experts,
> >
> > I am running in to a problem in a situation where two clusters in the
> > network have same name.
> >
> > Node A, Node B : cluster name CLUSTER
> > Node C, Node D : cluster name CLUSTER
> >
> > Node C and Node D's cluster is running fine however when I start node A
> > it copies /etc/cluster.conf (I believe ccsd) from either C and D and
> > replaces its own and start attempting a cluster, of course it fails
> > because C or D is not a host name of node A.
> >
> > Is it possible to have more than one clusters in same network, these
> > nodes are in IBM blades hosted in Bladecenter and the heart beat IPs are
> > on private network.
>
> Uh.. you can have multiple clusters on the same network as long as they
> have different names. Or you need to isolate them on different networks
> if you need to keep the same name.
>
> Fabio
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120328/ee8169b7/attachment.htm>

From lists at alteeve.ca  Wed Mar 28 07:24:24 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 28 Mar 2012 00:24:24 -0700
Subject: [Linux-cluster] Two clusters in the network with same "name" in
 cluster.conf
In-Reply-To: <CAKrd533ovqW=Brd2TxbfJyCMX6xAoGcpTWDi1kGAT9gb2jeucw@mail.gmail.com>
References: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>
	<4F72B30B.2000409@redhat.com>
	<CAKrd533ovqW=Brd2TxbfJyCMX6xAoGcpTWDi1kGAT9gb2jeucw@mail.gmail.com>
Message-ID: <4F72BCA8.8080408@alteeve.ca>

I'll admit to having trouble following you... So let's get some basics;

* What is the cluster.conf of either cluster?
* What exact IPs does the hostnames, as set in the cluster.conf, resolve to?
* What is the exact IP of eth0 and eth1 on each node, inc. netmask?

As a point of curiosity; why do you need the cluster names to be identical?

Digi

On 03/28/2012 12:00 AM, Parvez Shaikh wrote:
> Thanks Fabio & Digimer,
> 
> Just to add more information that each node has 4 NIC cards out of which -
> 
> eth0 is public / management IP their addresses are falling on the same
> subnet
> 
> eth1 is private address and they are privately connected (node A - node
> B) and (node C - node D)
> 
> Cluster.conf mentions hostname(from /etc/hosts) which corresponds to IP
> addresses of eth1 (not globally accessible name of eth0's IP)
> 
> Even in situation such as above, cluster names should be unique? I would
> assume that ccsd would use only the node names given in cluster.conf for
> communication.
> 
> Does ccsd use cluster name to locate other members of cluster and get
> copy of cluster configuration file from those?
> 
> Following messages are seen in /var/log/messages -
> 
> Mar 28 11:57:07 blade4-2 ccsd[6821]: Starting ccsd 2.0.115:
> Mar 28 11:57:07 blade4-2 ccsd[6821]:  Built: Mar 16 2010 10:28:57
> Mar 28 11:57:07 blade4-2 ccsd[6821]:  Copyright (C) Red Hat, Inc.  2004 
> All rights reserved.
> Mar 28 11:57:07 blade4-2 ccsd[6821]: cluster.conf (cluster name =
> Cluster, version = 1) found.
> Mar 28 11:57:07 blade4-2 ccsd[6821]: Remote copy of cluster.conf is from
> quorate node.
> Mar 28 11:57:07 blade4-2 ccsd[6821]:  Local version # : 1
> Mar 28 11:57:07 blade4-2 ccsd[6821]:  Remote version #: 1
> Mar 28 11:57:07 blade4-2 ccsd[6821]: Remote copy of cluster.conf is from
> quorate node.
> Mar 28 11:57:07 blade4-2 ccsd[6821]:  Local version # : 1
> Mar 28 11:57:07 blade4-2 ccsd[6821]:  Remote version #: 1
> Mar 28 11:57:07 blade4-2 ccsd[6821]: Remote copy of cluster.conf is from
> quorate node.
> Mar 28 11:57:07 blade4-2 ccsd[6821]:  Local version # : 1
> Mar 28 11:57:07 blade4-2 ccsd[6821]:  Remote version #: 1
> 
> 
> On Wed, Mar 28, 2012 at 12:13 PM, Fabio M. Di Nitto <fdinitto at redhat.com
> <mailto:fdinitto at redhat.com>> wrote:
> 
>     On 3/28/2012 8:14 AM, Parvez Shaikh wrote:
>     > Hi experts,
>     >
>     > I am running in to a problem in a situation where two clusters in the
>     > network have same name.
>     >
>     > Node A, Node B : cluster name CLUSTER
>     > Node C, Node D : cluster name CLUSTER
>     >
>     > Node C and Node D's cluster is running fine however when I start
>     node A
>     > it copies /etc/cluster.conf (I believe ccsd) from either C and D and
>     > replaces its own and start attempting a cluster, of course it fails
>     > because C or D is not a host name of node A.
>     >
>     > Is it possible to have more than one clusters in same network, these
>     > nodes are in IBM blades hosted in Bladecenter and the heart beat
>     IPs are
>     > on private network.
> 
>     Uh.. you can have multiple clusters on the same network as long as they
>     have different names. Or you need to isolate them on different networks
>     if you need to keep the same name.
> 
>     Fabio
> 
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Digimer
Papers and Projects: https://alteeve.com


From laszlo.budai at gmail.com  Wed Mar 28 07:46:39 2012
From: laszlo.budai at gmail.com (Budai Laszlo)
Date: Wed, 28 Mar 2012 10:46:39 +0300
Subject: [Linux-cluster] Two clusters in the network with same "name" in
 cluster.conf
In-Reply-To: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>
References: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>
Message-ID: <4F72C1DF.40505@gmail.com>

Hi,

if you RAELLY need two clusters with the same name on the same subnet,
then try to adjust the following parameters and use unique values for
each cluster:

/Multicast network configuration/
    cman uses multicast UDP packets to communicate with other nodes in
    the cluster. By default it will generate a multicast address using
    239.192.x.x where x.x is the 16bit cluster ID number split into
    bytes. This, in turn is generated from a hash of the cluster name
    though it can be specified explicitly. The purpose of this is to
    allow multiple clusters to share the same subnet - they will each
    use a different multicast address. You might also/instead want to
    isolate clusters using the port number as shown above.

    It is possible to override the multicast address by specifying it in
    cluster.conf as shown:

    <cman> <multicast addr="229.192.0.1"/> </cman>

/Cluster ID/
    The cluster ID number is used to isolate clusters in the same
    subnet. Usually it is generated from a hash of the cluster name, but
    it can be overridden here if you feel the need. Sometimes cluster
    names can hash to the same ID.

    <cman cluster_id="669"> </cman>

Kind regards,
Laszlo

On 03/28/2012 09:14 AM, Parvez Shaikh wrote:
> Hi experts,
>
> I am running in to a problem in a situation where two clusters in the
> network have same name.
>
> Node A, Node B : cluster name CLUSTER
> Node C, Node D : cluster name CLUSTER
>
> Node C and Node D's cluster is running fine however when I start node
> A it copies /etc/cluster.conf (I believe ccsd) from either C and D and
> replaces its own and start attempting a cluster, of course it fails
> because C or D is not a host name of node A.
>
> Is it possible to have more than one clusters in same network, these
> nodes are in IBM blades hosted in Bladecenter and the heart beat IPs
> are on private network.
>
> Thanks
>
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From parvez.h.shaikh at gmail.com  Wed Mar 28 09:02:54 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 28 Mar 2012 14:32:54 +0530
Subject: [Linux-cluster] Two clusters in the network with same "name" in
	cluster.conf
In-Reply-To: <4F72BCA8.8080408@alteeve.ca>
References: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>
	<4F72B30B.2000409@redhat.com>
	<CAKrd533ovqW=Brd2TxbfJyCMX6xAoGcpTWDi1kGAT9gb2jeucw@mail.gmail.com>
	<4F72BCA8.8080408@alteeve.ca>
Message-ID: <CAKrd532fFysOZa3Oz26ojkRVPWXpYRdBD7_FgomhJ8vGJ8fU3g@mail.gmail.com>

Hi Digi,

1. sniplet of cluster.conf file from one cluster  :-

<?xml version="1.0"?>
<cluster alias="Cluster" config_version="1" name="Cluster">
  <clusternodes>
    <clusternode name="blade4-2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device blade="2" missing_as_off="1" name="BladeCenterFencing"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="blade4-1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device blade="1" missing_as_off="1" name="BladeCenterFencing"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman expected_votes="1" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_bladecenter" ipaddr="XXXXX" login="USERID"
name="BladeCenterFencing" passwd="******"/>
  </fencedevices>

...

Of another cluster is same except node names differ(they blade3-1 and
blade3-2).

2.

In /etc/hosts

192.168.154.1/24 blade4-1.privdomain blade4-1 (this goes on eth1 of
blade4-1)
192.168.154.2/24 blade4-2.privdomain blade4-2 (this goes on eth1 of
blade4-2)

Same is with blade3-1 and blade3-2 (Cluster 2)

192.168.153.1/24 blade3-1.privdomain blade3-1 (this goes on eth1 of
blade3-1)
192.168.153.2/24 blade3-2.privdomain blade3-2 (this goes on eth1 of
blade3-2)

3. Public IP of blade4-1 & blade4-2 (Cluster 1) and blade3-1 & blade3-2
(Cluster 2) are on same subnet 10.5.81.xxx and subnet mask is 255.255.255.

Thanks

On Wed, Mar 28, 2012 at 12:54 PM, Digimer <lists at alteeve.ca> wrote:

> I'll admit to having trouble following you... So let's get some basics;
>
> * What is the cluster.conf of either cluster?
> * What exact IPs does the hostnames, as set in the cluster.conf, resolve
> to?
> * What is the exact IP of eth0 and eth1 on each node, inc. netmask?
>
> As a point of curiosity; why do you need the cluster names to be identical?
>
> Digi
>
> On 03/28/2012 12:00 AM, Parvez Shaikh wrote:
> > Thanks Fabio & Digimer,
> >
> > Just to add more information that each node has 4 NIC cards out of which
> -
> >
> > eth0 is public / management IP their addresses are falling on the same
> > subnet
> >
> > eth1 is private address and they are privately connected (node A - node
> > B) and (node C - node D)
> >
> > Cluster.conf mentions hostname(from /etc/hosts) which corresponds to IP
> > addresses of eth1 (not globally accessible name of eth0's IP)
> >
> > Even in situation such as above, cluster names should be unique? I would
> > assume that ccsd would use only the node names given in cluster.conf for
> > communication.
> >
> > Does ccsd use cluster name to locate other members of cluster and get
> > copy of cluster configuration file from those?
> >
> > Following messages are seen in /var/log/messages -
> >
> > Mar 28 11:57:07 blade4-2 ccsd[6821]: Starting ccsd 2.0.115:
> > Mar 28 11:57:07 blade4-2 ccsd[6821]:  Built: Mar 16 2010 10:28:57
> > Mar 28 11:57:07 blade4-2 ccsd[6821]:  Copyright (C) Red Hat, Inc.  2004
> > All rights reserved.
> > Mar 28 11:57:07 blade4-2 ccsd[6821]: cluster.conf (cluster name =
> > Cluster, version = 1) found.
> > Mar 28 11:57:07 blade4-2 ccsd[6821]: Remote copy of cluster.conf is from
> > quorate node.
> > Mar 28 11:57:07 blade4-2 ccsd[6821]:  Local version # : 1
> > Mar 28 11:57:07 blade4-2 ccsd[6821]:  Remote version #: 1
> > Mar 28 11:57:07 blade4-2 ccsd[6821]: Remote copy of cluster.conf is from
> > quorate node.
> > Mar 28 11:57:07 blade4-2 ccsd[6821]:  Local version # : 1
> > Mar 28 11:57:07 blade4-2 ccsd[6821]:  Remote version #: 1
> > Mar 28 11:57:07 blade4-2 ccsd[6821]: Remote copy of cluster.conf is from
> > quorate node.
> > Mar 28 11:57:07 blade4-2 ccsd[6821]:  Local version # : 1
> > Mar 28 11:57:07 blade4-2 ccsd[6821]:  Remote version #: 1
> >
> >
> > On Wed, Mar 28, 2012 at 12:13 PM, Fabio M. Di Nitto <fdinitto at redhat.com
> > <mailto:fdinitto at redhat.com>> wrote:
> >
> >     On 3/28/2012 8:14 AM, Parvez Shaikh wrote:
> >     > Hi experts,
> >     >
> >     > I am running in to a problem in a situation where two clusters in
> the
> >     > network have same name.
> >     >
> >     > Node A, Node B : cluster name CLUSTER
> >     > Node C, Node D : cluster name CLUSTER
> >     >
> >     > Node C and Node D's cluster is running fine however when I start
> >     node A
> >     > it copies /etc/cluster.conf (I believe ccsd) from either C and D
> and
> >     > replaces its own and start attempting a cluster, of course it fails
> >     > because C or D is not a host name of node A.
> >     >
> >     > Is it possible to have more than one clusters in same network,
> these
> >     > nodes are in IBM blades hosted in Bladecenter and the heart beat
> >     IPs are
> >     > on private network.
> >
> >     Uh.. you can have multiple clusters on the same network as long as
> they
> >     have different names. Or you need to isolate them on different
> networks
> >     if you need to keep the same name.
> >
> >     Fabio
> >
> >     --
> >     Linux-cluster mailing list
> >     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> >     https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120328/84a46f80/attachment.htm>

From parvez.h.shaikh at gmail.com  Wed Mar 28 09:39:51 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 28 Mar 2012 15:09:51 +0530
Subject: [Linux-cluster] Two clusters in the network with same "name" in
	cluster.conf
In-Reply-To: <4F72C1DF.40505@gmail.com>
References: <CAKrd531F8i8M2GR1rNZ-8zMKRoMtGx0_97opwvUaB=_XY5JX9Q@mail.gmail.com>
	<4F72C1DF.40505@gmail.com>
Message-ID: <CAKrd53395pRr6vYBJu_=7XgOWpNPhO2nyZZUUGVMv86KznFB5Q@mail.gmail.com>

Hi Laszlo,

Thanks for detailed information however I tried below but without success.

Thanks,
Parvez

On Wed, Mar 28, 2012 at 1:16 PM, Budai Laszlo <laszlo.budai at gmail.com>wrote:

> Hi,
>
> if you RAELLY need two clusters with the same name on the same subnet,
> then try to adjust the following parameters and use unique values for
> each cluster:
>
> /Multicast network configuration/
>    cman uses multicast UDP packets to communicate with other nodes in
>    the cluster. By default it will generate a multicast address using
>    239.192.x.x where x.x is the 16bit cluster ID number split into
>    bytes. This, in turn is generated from a hash of the cluster name
>    though it can be specified explicitly. The purpose of this is to
>    allow multiple clusters to share the same subnet - they will each
>    use a different multicast address. You might also/instead want to
>    isolate clusters using the port number as shown above.
>
>    It is possible to override the multicast address by specifying it in
>    cluster.conf as shown:
>
>    <cman> <multicast addr="229.192.0.1"/> </cman>
>
> /Cluster ID/
>    The cluster ID number is used to isolate clusters in the same
>    subnet. Usually it is generated from a hash of the cluster name, but
>    it can be overridden here if you feel the need. Sometimes cluster
>    names can hash to the same ID.
>
>    <cman cluster_id="669"> </cman>
>
> Kind regards,
> Laszlo
>
> On 03/28/2012 09:14 AM, Parvez Shaikh wrote:
> > Hi experts,
> >
> > I am running in to a problem in a situation where two clusters in the
> > network have same name.
> >
> > Node A, Node B : cluster name CLUSTER
> > Node C, Node D : cluster name CLUSTER
> >
> > Node C and Node D's cluster is running fine however when I start node
> > A it copies /etc/cluster.conf (I believe ccsd) from either C and D and
> > replaces its own and start attempting a cluster, of course it fails
> > because C or D is not a host name of node A.
> >
> > Is it possible to have more than one clusters in same network, these
> > nodes are in IBM blades hosted in Bladecenter and the heart beat IPs
> > are on private network.
> >
> > Thanks
> >
> >
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120328/74757cac/attachment.htm>

From ming-ming.chen at hp.com  Wed Mar 28 16:59:24 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Wed, 28 Mar 2012 16:59:24 +0000
Subject: [Linux-cluster] Where to find information on HA-LVM
In-Reply-To: <036B68E61A28CA49AC2767596576CD597578CF33D0@GVW1113EXC.americas.hpqcorp.net>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
	<036B68E61A28CA49AC2767596576CD597578CF33D0@GVW1113EXC.americas.hpqcorp.net>
Message-ID: <1D241511770E2F4BA89AFD224EDD527117B82301@G9W0737.americas.hpqcorp.net>

Thanks, and I will post whatever additional information that I can find.

Ming


From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jankowski, Chris
Sent: Tuesday, March 27, 2012 10:51 PM
To: linux clustering
Subject: Re: [Linux-cluster] Where to find information on HA-LVM


Ming,


I have never seen HA-LVM properly described either.


Here is a little bit:


http://www.nxnt.org/2010/09/redhat-cluster-howto/


The notion of tags is crucial to understanding how HA-LVM works.  It worked pretty well the last time I used it about 2 years ago, but I did not do a lot of testing.  There are bugs against it listed.  I am not sure if they have been fixed.


I suggest that you use Google and search for HA-LVM tags.  This should narrow the results of the search.


Could I ask you to publish the list of most relevant information of HA-LVM that you'd find on this list, please?  We'll all benefit.


Regards,


Chris Jankowski


From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chen, Ming Ming
Sent: Wednesday, 28 March 2012 10:10
To: linux clustering
Subject: [Linux-cluster] Where to find information on HA-LVM


Hi,

IN RHEL 6 release, it says that :

* If you are using a clustered system for failover where only a single node that accesses the storage

is active at any one time, you should use High Availability Logical Volume Management agents (HALVM).


That is exactly what I want to do, but could not find any information about HALVM, and if it is included in the RHEL 6.

Any information will be appreciated.

Regards

Ming


From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Parvez Shaikh
Sent: Monday, March 26, 2012 5:10 AM
To: linux clustering
Subject: [Linux-cluster] [TOTEM] The consensus timeout expired.


Hi all,

I have a cluster with two blades in IBM  BladeCenter. Following error is appearing when I start cman service and it keep repeating the message /var/log/messages -

 openais[10770]: [TOTEM] The consensus timeout expired.
 openais[10770]: [TOTEM] entering GATHER state from 3.


Heart beating IP is available on the blade and link to blade2 is also fine. Cluster on blade2 is not running.

Services like iptables and portmap are also down.

Has anyone encountered such error and resolved it?

I am using RHEL 5.5

cman_tool version
6.2.0 config 1


Thanks,
Parvez

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120328/44d97be3/attachment.htm>

From mgrac at redhat.com  Thu Mar 29 11:38:54 2012
From: mgrac at redhat.com (Marek Grac)
Date: Thu, 29 Mar 2012 13:38:54 +0200
Subject: [Linux-cluster] fence-agents-3.1.8 stable release
Message-ID: <4F7449CE.1080701@redhat.com>

Welcome to the fence-agents 3.1.8 release.

This release includes these updates:
* support for more than 100 virtual machines in VMWare
* fix for potential buffer-overflow in fence_ipmilan
* fix for borderline states of virtual machines in RHEV
* missing password is reported properly even if fence device has optional password
* fix for delay option, so there should be no problem with automatically closed connections
* fix return code of fence_ipmilan when used with -M cycle
* remove unlink of fence_scsi.dev file


The new source tarball can be downloaded here:

https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-3.1.8.tar.xz

To report bugs or issues:

    https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

    Join us on IRC (irc.freenode.net #linux-cluster) and share your
    experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

m,


From jonathan.barber at gmail.com  Fri Mar 30 17:00:41 2012
From: jonathan.barber at gmail.com (Jonathan Barber)
Date: Fri, 30 Mar 2012 18:00:41 +0100
Subject: [Linux-cluster] Documentation about the fence agent interface to
	RHCS?
Message-ID: <CAPEiEj5Uo9iAGygbnP_b7LN9Q=mxkT_-RFKB7gwSdXTgM8+ZFg@mail.gmail.com>

I'm writing a fencing agent and would like to know is there a document
describing the interface that fencing agents should support. i.e. how
are arguments passed to the fence agent, what exit codes represent, is
anything done with the agents standard out/error.

I've looked at the agents that ship with RHCS and have some idea
what's going on, but it'd be nice to have to documentation to confirm
my suspicions.

Cheers
-- 
Jonathan Barber <jonathan.barber at gmail.com>


From lists at alteeve.ca  Fri Mar 30 18:12:57 2012
From: lists at alteeve.ca (Digimer)
Date: Fri, 30 Mar 2012 11:12:57 -0700
Subject: [Linux-cluster] Documentation about the fence agent interface
 to RHCS?
In-Reply-To: <CAPEiEj5Uo9iAGygbnP_b7LN9Q=mxkT_-RFKB7gwSdXTgM8+ZFg@mail.gmail.com>
References: <CAPEiEj5Uo9iAGygbnP_b7LN9Q=mxkT_-RFKB7gwSdXTgM8+ZFg@mail.gmail.com>
Message-ID: <4F75F7A9.9030608@alteeve.ca>

On 03/30/2012 10:00 AM, Jonathan Barber wrote:
> I'm writing a fencing agent and would like to know is there a document
> describing the interface that fencing agents should support. i.e. how
> are arguments passed to the fence agent, what exit codes represent, is
> anything done with the agents standard out/error.
> 
> I've looked at the agents that ship with RHCS and have some idea
> what's going on, but it'd be nice to have to documentation to confirm
> my suspicions.
> 
> Cheers

Check out this: https://fedorahosted.org/cluster/wiki/FenceAgentAPI

If you need help/clarity, let me know.

-- 
Digimer
Papers and Projects: https://alteeve.com