From Ralf.Aumueller at informatik.uni-stuttgart.de  Thu May  3 10:36:24 2012
From: Ralf.Aumueller at informatik.uni-stuttgart.de (Ralf Aumueller)
Date: Thu, 03 May 2012 12:36:24 +0200
Subject: [Linux-cluster] RHEL6 Cluster: Update corosync RPMs
Message-ID: <4FA25FA8.6060603@informatik.uni-stuttgart.de>

Hello,

recently there was an update of corosync and corosynclib rpms. Is it save to
just install these updates on a running two node cluster or do I have to use a
special procedure (e.g. Stop cluster services on node2; apply updates and reboot
node2; move services to node2 and update node1).

Regards,
Ralf



From fdinitto at redhat.com  Thu May  3 11:06:25 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 03 May 2012 13:06:25 +0200
Subject: [Linux-cluster] RHEL6 Cluster: Update corosync RPMs
In-Reply-To: <4FA25FA8.6060603@informatik.uni-stuttgart.de>
References: <4FA25FA8.6060603@informatik.uni-stuttgart.de>
Message-ID: <4FA266B1.4050403@redhat.com>

On 5/3/2012 12:36 PM, Ralf Aumueller wrote:
> Hello,
> 
> recently there was an update of corosync and corosynclib rpms. Is it save to
> just install these updates on a running two node cluster or do I have to use a
> special procedure (e.g. Stop cluster services on node2; apply updates and reboot
> node2; move services to node2 and update node1).

We don?t support updating packages on a running cluster.

stop cluster on nodeX.. update.. (reboot if necessary).. start cluster..

repeat for all cluster nodes.

Fabio



From yamato at redhat.com  Mon May  7 09:08:35 2012
From: yamato at redhat.com (Masatake YAMATO)
Date: Mon, 07 May 2012 18:08:35 +0900 (JST)
Subject: [Linux-cluster] [PATCH] typo in fence_kdump_send.8
Message-ID: <20120507.180835.6721616621535321.yamato@redhat.com>

Signed-off-by: Masatake YAMATO<yamato at redhat.com>
diff --git a/fence/agents/kdump/fence_kdump_send.8 b/fence/agents/kdump/fence_kdump_send.8
index 4cec124..ab95836 100644
--- a/fence/agents/kdump/fence_kdump_send.8
+++ b/fence/agents/kdump/fence_kdump_send.8
@@ -16,7 +16,7 @@ kdump kernel after a cluster node has encountered a kernel panic. Once
 the cluster node has entered the kdump crash recovery service,
 \fIfence_kdump_send\fP will periodically send messages to all cluster
 nodes. When the \fIfence_kdump\fP agent receives a valid message from
-the failed not, fencing is complete.
+the failed node, fencing is complete.
 .SH OPTIONS
 .TP
 .B -p, --ipport=\fIPORT\fP



From rohara at redhat.com  Mon May  7 14:33:33 2012
From: rohara at redhat.com (Ryan O'Hara)
Date: Mon, 07 May 2012 09:33:33 -0500
Subject: [Linux-cluster] [PATCH] typo in fence_kdump_send.8
In-Reply-To: <20120507.180835.6721616621535321.yamato@redhat.com>
References: <20120507.180835.6721616621535321.yamato@redhat.com>
Message-ID: <4FA7DD3D.3090203@redhat.com>

Thanks. I applied this patch to the upstream git repo this morning.

Ryan

On 05/07/2012 04:08 AM, Masatake YAMATO wrote:
> Signed-off-by: Masatake YAMATO<yamato at redhat.com>
>
> diff --git a/fence/agents/kdump/fence_kdump_send.8 b/fence/agents/kdump/fence_kdump_send.8
> index 4cec124..ab95836 100644
> --- a/fence/agents/kdump/fence_kdump_send.8
> +++ b/fence/agents/kdump/fence_kdump_send.8
> @@ -16,7 +16,7 @@ kdump kernel after a cluster node has encountered a kernel panic. Once
>   the cluster node has entered the kdump crash recovery service,
>   \fIfence_kdump_send\fP will periodically send messages to all cluster
>   nodes. When the \fIfence_kdump\fP agent receives a valid message from
> -the failed not, fencing is complete.
> +the failed node, fencing is complete.
>   .SH OPTIONS
>   .TP
>   .B -p, --ipport=\fIPORT\fP
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From karlhat at gmail.com  Wed May  9 20:18:14 2012
From: karlhat at gmail.com (Carlos Alberto Ramirez Rendon)
Date: Wed, 9 May 2012 15:18:14 -0500
Subject: [Linux-cluster] RHCS RHEL 6 GEO CLUSTER
Message-ID: <CADTd8a0C0J3GT_FrKJrevUNBjc=5Jj8HjTWxyKDhoWYh-8Sguw@mail.gmail.com>

Hi ,

I  want  know,  is RHCS  RHEL 6 GEO CLUSTER  supported ?,  What kind of
cases ?

Thanks for    your   reponse.
*
*

-- 
*Carlos Alberto Ram?rez Rend?n - RHCE -RHCDS- RHCVA - RHCI
Arquitecto de soluciones RedHat
Cel.: +57-1+310-879898
 gpg: 1024R/F9220C2E  6742 3474 CF17 A82C 1888 5D4C F460 15DC F922 0C2E*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120509/6adbce3f/attachment.htm>

From emi2fast at gmail.com  Wed May  9 20:40:51 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Wed, 9 May 2012 22:40:51 +0200
Subject: [Linux-cluster] RHCS RHEL 6 GEO CLUSTER
In-Reply-To: <CADTd8a0C0J3GT_FrKJrevUNBjc=5Jj8HjTWxyKDhoWYh-8Sguw@mail.gmail.com>
References: <CADTd8a0C0J3GT_FrKJrevUNBjc=5Jj8HjTWxyKDhoWYh-8Sguw@mail.gmail.com>
Message-ID: <CAE7pJ3CMDtfJ7HZaC6LLuy4gjsFcmO5yJ5XJ-2j-TTk68d0n5g@mail.gmail.com>

No geo cluster for redhat :-)



2012/5/9 Carlos Alberto Ramirez Rendon <karlhat at gmail.com>

> Hi ,
>
> I  want  know,  is RHCS  RHEL 6 GEO CLUSTER  supported ?,  What kind of
> cases ?
>
> Thanks for    your   reponse.
> *
> *
>
> --
> *Carlos Alberto Ram?rez Rend?n - RHCE -RHCDS- RHCVA - RHCI
> Arquitecto de soluciones RedHat
> Cel.: +57-1+310-879898
>  gpg: 1024R/F9220C2E  6742 3474 CF17 A82C 1888 5D4C F460 15DC F922 0C2E*
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120509/8dbe81b9/attachment.htm>

From zagar at arlut.utexas.edu  Mon May 14 18:42:33 2012
From: zagar at arlut.utexas.edu (Randy Zagar)
Date: Mon, 14 May 2012 13:42:33 -0500
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <mailman.31.1336665605.30503.linux-cluster@redhat.com>
References: <mailman.31.1336665605.30503.linux-cluster@redhat.com>
Message-ID: <4FB15219.8010804@arlut.utexas.edu>

I have an existing CentOS-5 cluster I've configured for 
High-Availability NFS (v3).  Everything is working fine.  I've included 
a simplified cluster.conf file below.

I originally started with 3 file servers that were not clustered.  I 
converted to a clustered configuration where my NFS Clients never get 
"stale nfs" error messages.  When a node failed, all NFS exports (and 
their associated IP address) would  move to another system faster than 
my clients could time out.

I understand that changes to the portmapper in EL6 and NFSv4 make it 
much more difficult to configure HA-NFS and, so far, I have not seen any 
good documentation on how to configure a HA-NFS configuration in EL6.

Does anyone have any suggestions, or links to documentation that you can 
send me?

-RZ

p.s.  Simplified cluster.conf file for EL5...

    <?xml version="1.0"?>
    <cluster alias="ha-nfs-el5" config_version="357" name="ha-nfs-el5">
    	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
    	<clusternodes>
    		<clusternode name="node01.arlut.utexas.edu" nodeid="1" votes="1">
    			<fence>
    				<method name="1">
    					<device name="node01-ilo"/>
    				</method>
    				<method name="2">
    					<device name="sanbox01" port="0"/>
    				</method>
    			</fence>
    		</clusternode>
    		<clusternode name="node02.arlut.utexas.edu" nodeid="2" votes="1">
    			<fence>
    				<method name="1">
    					<device name="node02-ilo"/>
    				</method>
    				<method name="2">
    					<device name="sanbox02" port="0"/>
    				</method>
    			</fence>
    		</clusternode>
    		<clusternode name="node03.arlut.utexas.edu" nodeid="3" votes="1">
    			<fence>
    				<method name="1">
    					<device name="node03-ilo"/>
    				</method>
    				<method name="2">
    					<device name="sanbox03" port="0"/>
    				</method>
    			</fence>
    		</clusternode>
    	</clusternodes>
    	<cman/>
    	<fencedevices>
    		<fencedevice agent="fence_sanbox2" ipaddr="sanbox01.arlut.utexas.edu" login="admin" name="sanbox01" passwd="password"/>
    		<fencedevice agent="fence_sanbox2" ipaddr="sanbox02.arlut.utexas.edu" login="admin" name="sanbox02" passwd="password"/>
    		<fencedevice agent="fence_sanbox2" ipaddr="sanbox03.arlut.utexas.edu" login="admin" name="sanbox03" passwd="password"/>
    		<fencedevice agent="fence_ilo" hostname="node01-ilo" login="Administrator" name="node01-ilo" passwd="DUMMY"/>
    		<fencedevice agent="fence_ilo" hostname="node02-ilo" login="Administrator" name="node02-ilo" passwd="DUMMY"/>
    		<fencedevice agent="fence_ilo" hostname="node03-ilo" login="Administrator" name="node03-ilo" passwd="DUMMY"/>
    	</fencedevices>
    	<rm>
    		<failoverdomains>
    			<failoverdomain name="nfs1-domain" nofailback="1" ordered="1" restricted="1">
    				<failoverdomainnode name="node01.arlut.utexas.edu" priority="1"/>
    				<failoverdomainnode name="node02.arlut.utexas.edu" priority="2"/>
    				<failoverdomainnode name="node03.arlut.utexas.edu" priority="3"/>
    			</failoverdomain>
    			<failoverdomain name="nfs2-domain" nofailback="1" ordered="1" restricted="1">
    				<failoverdomainnode name="node01.arlut.utexas.edu" priority="3"/>
    				<failoverdomainnode name="node02.arlut.utexas.edu" priority="1"/>
    				<failoverdomainnode name="node03.arlut.utexas.edu" priority="2"/>
    			</failoverdomain>
    			<failoverdomain name="nfs3-domain" nofailback="1" ordered="1" restricted="1">
    				<failoverdomainnode name="node01.arlut.utexas.edu" priority="2"/>
    				<failoverdomainnode name="node02.arlut.utexas.edu" priority="3"/>
    				<failoverdomainnode name="node03.arlut.utexas.edu" priority="1"/>
    			</failoverdomain>
    		</failoverdomains>
    		<resources>
    			<ip address="192.168.1.1" monitor_link="1"/>
    			<ip address="192.168.1.2" monitor_link="1"/>
    			<ip address="192.168.1.3" monitor_link="1"/>
    			<fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
    			<fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
    			<fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
    			<nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/>
    		</resources>
    		<service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate">
    			<ip ref="192.168.1.1">
    				<fs __independent_subtree="1" ref="volume01">
    					<nfsexport name="nfs-cvg00-brazos02">
    						<nfsclient name=" " ref="local-subnet"/>
    					</nfsexport>
    				</fs>
    			</ip>
    		</service>
    		<service autostart="1" domain="nfs2-domain" exclusive="0" name="nfs2" nfslock="1" recovery="relocate">
    			<ip ref="192.168.1.2">
    				<fs __independent_subtree="1" ref="volume02">
    					<nfsexport name="nfs-sdd01-data02">
    						<nfsclient name=" " ref="local-subnet"/>
    					</nfsexport>
    				</fs>
    			</ip>
    		</service>
    		<service autostart="1" domain="nfs3-domain" exclusive="0" name="nfs3" nfslock="1" recovery="relocate">
    			<ip ref="192.168.1.3">
    				<fs __independent_subtree="1" ref="volume03">
    						<nfsclient name=" " ref="local-subnet"/>
    					</nfsexport>
    				</fs>
    			</ip>
    		</service>
    	</rm>
    </cluster>

-- 
Randy Zagar                               Sr. Unix Systems Administrator
E-mail: zagar at arlut.utexas.edu            Applied Research Laboratories
Phone: 512 835-3131                       Univ. of Texas at Austin

  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120514/e652213d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 9116 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120514/e652213d/attachment.p7s>

From emi2fast at gmail.com  Tue May 15 08:21:29 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Tue, 15 May 2012 10:21:29 +0200
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <4FB15219.8010804@arlut.utexas.edu>
References: <mailman.31.1336665605.30503.linux-cluster@redhat.com>
	<4FB15219.8010804@arlut.utexas.edu>
Message-ID: <CAE7pJ3BSQ2Y0s-xDkF_FLzCmqcgFkFy7rrH2aXzo77TKZSkr7A@mail.gmail.com>

If look well, you are missing nfsexport in the second services

2012/5/14 Randy Zagar <zagar at arlut.utexas.edu>

>  I have an existing CentOS-5 cluster I've configured for High-Availability
> NFS (v3).  Everything is working fine.  I've included a simplified
> cluster.conf file below.
>
> I originally started with 3 file servers that were not clustered.  I
> converted to a clustered configuration where my NFS Clients never get
> "stale nfs" error messages.  When a node failed, all NFS exports (and their
> associated IP address) would  move to another system faster than my clients
> could time out.
>
> I understand that changes to the portmapper in EL6 and NFSv4 make it much
> more difficult to configure HA-NFS and, so far, I have not seen any good
> documentation on how to configure a HA-NFS configuration in EL6.
>
> Does anyone have any suggestions, or links to documentation that you can
> send me?
>
> -RZ
>
> p.s.  Simplified cluster.conf file for EL5...
>
> <?xml version="1.0"?>
> <cluster alias="ha-nfs-el5" config_version="357" name="ha-nfs-el5">
> 	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> 	<clusternodes>
> 		<clusternode name="node01.arlut.utexas.edu" nodeid="1" votes="1">
> 			<fence>
> 				<method name="1">
> 					<device name="node01-ilo"/>
> 				</method>
> 				<method name="2">
> 					<device name="sanbox01" port="0"/>
> 				</method>
> 			</fence>
> 		</clusternode>
> 		<clusternode name="node02.arlut.utexas.edu" nodeid="2" votes="1">
> 			<fence>
> 				<method name="1">
> 					<device name="node02-ilo"/>
> 				</method>
> 				<method name="2">
> 					<device name="sanbox02" port="0"/>
> 				</method>
> 			</fence>
> 		</clusternode>
> 		<clusternode name="node03.arlut.utexas.edu" nodeid="3" votes="1">
> 			<fence>
> 				<method name="1">
> 					<device name="node03-ilo"/>
> 				</method>
> 				<method name="2">
> 					<device name="sanbox03" port="0"/>
> 				</method>
> 			</fence>
> 		</clusternode>
> 	</clusternodes>
> 	<cman/>
> 	<fencedevices>
> 		<fencedevice agent="fence_sanbox2" ipaddr="sanbox01.arlut.utexas.edu" login="admin" name="sanbox01" passwd="password"/>
> 		<fencedevice agent="fence_sanbox2" ipaddr="sanbox02.arlut.utexas.edu" login="admin" name="sanbox02" passwd="password"/>
> 		<fencedevice agent="fence_sanbox2" ipaddr="sanbox03.arlut.utexas.edu" login="admin" name="sanbox03" passwd="password"/>
> 		<fencedevice agent="fence_ilo" hostname="node01-ilo" login="Administrator" name="node01-ilo" passwd="DUMMY"/>
> 		<fencedevice agent="fence_ilo" hostname="node02-ilo" login="Administrator" name="node02-ilo" passwd="DUMMY"/>
> 		<fencedevice agent="fence_ilo" hostname="node03-ilo" login="Administrator" name="node03-ilo" passwd="DUMMY"/>
> 	</fencedevices>
> 	<rm>
> 		<failoverdomains>
> 			<failoverdomain name="nfs1-domain" nofailback="1" ordered="1" restricted="1">
> 				<failoverdomainnode name="node01.arlut.utexas.edu" priority="1"/>
> 				<failoverdomainnode name="node02.arlut.utexas.edu" priority="2"/>
> 				<failoverdomainnode name="node03.arlut.utexas.edu" priority="3"/>
> 			</failoverdomain>
> 			<failoverdomain name="nfs2-domain" nofailback="1" ordered="1" restricted="1">
> 				<failoverdomainnode name="node01.arlut.utexas.edu" priority="3"/>
> 				<failoverdomainnode name="node02.arlut.utexas.edu" priority="1"/>
> 				<failoverdomainnode name="node03.arlut.utexas.edu" priority="2"/>
> 			</failoverdomain>
> 			<failoverdomain name="nfs3-domain" nofailback="1" ordered="1" restricted="1">
> 				<failoverdomainnode name="node01.arlut.utexas.edu" priority="2"/>
> 				<failoverdomainnode name="node02.arlut.utexas.edu" priority="3"/>
> 				<failoverdomainnode name="node03.arlut.utexas.edu" priority="1"/>
> 			</failoverdomain>
> 		</failoverdomains>
> 		<resources>
> 			<ip address="192.168.1.1" monitor_link="1"/>
> 			<ip address="192.168.1.2" monitor_link="1"/>
> 			<ip address="192.168.1.3" monitor_link="1"/>
> 			<fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
> 			<fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
> 			<fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
> 			<nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/>
> 		</resources>
> 		<service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate">
> 			<ip ref="192.168.1.1">
> 				<fs __independent_subtree="1" ref="volume01">
> 					<nfsexport name="nfs-cvg00-brazos02">
> 						<nfsclient name=" " ref="local-subnet"/>
> 					</nfsexport>
> 				</fs>
> 			</ip>
> 		</service>
> 		<service autostart="1" domain="nfs2-domain" exclusive="0" name="nfs2" nfslock="1" recovery="relocate">
> 			<ip ref="192.168.1.2">
> 				<fs __independent_subtree="1" ref="volume02">
> 					<nfsexport name="nfs-sdd01-data02">
> 						<nfsclient name=" " ref="local-subnet"/>
> 					</nfsexport>
> 				</fs>
> 			</ip>
> 		</service>
> 		<service autostart="1" domain="nfs3-domain" exclusive="0" name="nfs3" nfslock="1" recovery="relocate">
> 			<ip ref="192.168.1.3">
> 				<fs __independent_subtree="1" ref="volume03">
> 						<nfsclient name=" " ref="local-subnet"/>
> 					</nfsexport>
> 				</fs>
> 			</ip>
> 		</service>
> 	</rm>
> </cluster>
>
>
>  --
> Randy Zagar                               Sr. Unix Systems Administrator
> E-mail: zagar at arlut.utexas.edu            Applied Research Laboratories
> Phone: 512 835-3131                       Univ. of Texas at Austin
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120515/9c06ce33/attachment.htm>

From zagar at arlut.utexas.edu  Tue May 15 17:33:40 2012
From: zagar at arlut.utexas.edu (Randy Zagar)
Date: Tue, 15 May 2012 12:33:40 -0500
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
Message-ID: <4FB29374.5000600@arlut.utexas.edu>

To All,

Looks like I got nicked by Occam's Razor when I "simplified" my cluster 
config file... :-)   A "less simplified" version is below.

My question still stands, however.  What does "cluster.conf" look like 
if you're trying to deploy a "highly available" NFS configuration.  And, 
again, by "highly available" I mean that NFS Clients never get the 
dreaded "stale nfs file handle" message unless the entire cluster has 
failed.

-RZ

p.s.  A better, but still simplified, cluster.conf for EL5.

    <?xml version="1.0"?>
    <cluster alias="ha-nfs-el5" config_version="357" name="ha-nfs-el5">
         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
         <clusternodes>
         	<clusternode name="node01.arlut.utexas.edu" nodeid="1" votes="1">
         		<fence>
         			<method name="1">
         				<device name="node01-ilo"/>
         			</method>
         			<method name="2">
         				<device name="sanbox01" port="0"/>
         			</method>
         		</fence>
         	</clusternode>
         	<clusternode name="node02.arlut.utexas.edu" nodeid="2" votes="1">
         		<fence>
         			<method name="1">
         				<device name="node02-ilo"/>
         			</method>
         			<method name="2">
         				<device name="sanbox02" port="0"/>
         			</method>
         		</fence>
         	</clusternode>
         	<clusternode name="node03.arlut.utexas.edu" nodeid="3" votes="1">
         		<fence>
         			<method name="1">
         				<device name="node03-ilo"/>
         			</method>
         			<method name="2">
         				<device name="sanbox03" port="0"/>
         			</method>
         		</fence>
         	</clusternode>
         </clusternodes>
         <cman/>
         <fencedevices>
         	<fencedevice agent="fence_sanbox2" ipaddr="sanbox01.arlut.utexas.edu" login="admin" name="sanbox01" passwd="password"/>
         	<fencedevice agent="fence_sanbox2" ipaddr="sanbox02.arlut.utexas.edu" login="admin" name="sanbox02" passwd="password"/>
         	<fencedevice agent="fence_sanbox2" ipaddr="sanbox03.arlut.utexas.edu" login="admin" name="sanbox03" passwd="password"/>
         	<fencedevice agent="fence_ilo" hostname="node01-ilo" login="Administrator" name="node01-ilo" passwd="DUMMY"/>
         	<fencedevice agent="fence_ilo" hostname="node02-ilo" login="Administrator" name="node02-ilo" passwd="DUMMY"/>
         	<fencedevice agent="fence_ilo" hostname="node03-ilo" login="Administrator" name="node03-ilo" passwd="DUMMY"/>
         </fencedevices>
         <rm>
         	<failoverdomains>
         		<failoverdomain name="nfs1-domain" nofailback="1" ordered="1" restricted="1">
         			<failoverdomainnode name="node01.arlut.utexas.edu" priority="1"/>
         			<failoverdomainnode name="node02.arlut.utexas.edu" priority="2"/>
         			<failoverdomainnode name="node03.arlut.utexas.edu" priority="3"/>
         		</failoverdomain>
         		<failoverdomain name="nfs2-domain" nofailback="1" ordered="1" restricted="1">
         			<failoverdomainnode name="node01.arlut.utexas.edu" priority="3"/>
         			<failoverdomainnode name="node02.arlut.utexas.edu" priority="1"/>
         			<failoverdomainnode name="node03.arlut.utexas.edu" priority="2"/>
         		</failoverdomain>
         		<failoverdomain name="nfs3-domain" nofailback="1" ordered="1" restricted="1">
         			<failoverdomainnode name="node01.arlut.utexas.edu" priority="2"/>
         			<failoverdomainnode name="node02.arlut.utexas.edu" priority="3"/>
         			<failoverdomainnode name="node03.arlut.utexas.edu" priority="1"/>
         		</failoverdomain>
         	</failoverdomains>
         	<resources>
         		<ip address="192.168.1.1" monitor_link="1"/>
         		<ip address="192.168.1.2" monitor_link="1"/>
         		<ip address="192.168.1.3" monitor_link="1"/>
         		<fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
         		<fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
         		<fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
         		<nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/>
         	</resources>
         	<service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate">
         		<ip ref="192.168.1.1">
         			<fs __independent_subtree="1" ref="volume01">
         				<nfsexport name="nfs-volume01">
         					<nfsclient name=" " ref="local-subnet"/>
         				</nfsexport>
         			</fs>
         		</ip>
         	</service>
         	<service autostart="1" domain="nfs2-domain" exclusive="0" name="nfs2" nfslock="1" recovery="relocate">
         		<ip ref="192.168.1.2">
         			<fs __independent_subtree="1" ref="volume02">
         				<nfsexport name="nfs-volume02">
         					<nfsclient name=" " ref="local-subnet"/>
         				</nfsexport>
         			</fs>
         		</ip>
         	</service>
         	<service autostart="1" domain="nfs3-domain" exclusive="0" name="nfs3" nfslock="1" recovery="relocate">
         		<ip ref="192.168.1.3">
         			<fs __independent_subtree="1" ref="volume03">
         				<nfsexport name="nfs-volume03">
         					<nfsclient name=" " ref="local-subnet"/>
         				</nfsexport>
         			</fs>
         		</ip>
         	</service>
         </rm>
    </cluster>

-- 
Randy Zagar                               Sr. Unix Systems Administrator
E-mail: zagar at arlut.utexas.edu            Applied Research Laboratories
Phone: 512 835-3131                       Univ. of Texas at Austin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120515/bb414354/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 9116 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120515/bb414354/attachment.p7s>

From fdinitto at redhat.com  Tue May 15 18:21:28 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 15 May 2012 20:21:28 +0200
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <4FB29374.5000600@arlut.utexas.edu>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu>
Message-ID: <4FB29EA8.5020208@redhat.com>

On 05/15/2012 07:33 PM, Randy Zagar wrote:

>         	    <resources>
>         		    <ip address="192.168.1.1" monitor_link="1"/>
>         		    <ip address="192.168.1.2" monitor_link="1"/>
>         		    <ip address="192.168.1.3" monitor_link="1"/>
>         		    <fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>         		    <fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>         		    <fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>         		    <nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/>
>         	    </resources>

For the <fs resources you want nfslock="1" option too.

>         	    <service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate">
>         		    <ip ref="192.168.1.1">
>         			    <fs __independent_subtree="1" ref="volume01">
>         				    <nfsexport name="nfs-volume01">
>         					    <nfsclient name=" " ref="local-subnet"/>
>         				    </nfsexport>
>         			    </fs>
>         		    </ip>

For all services you need to change the order.

<fs..
 <nfsexport..
  <nfsclient..
   <ip..
  </nfsclient..
 </nfsexport..
</fs

This solves different issues at startup, relocation and recovery

Also note that there is known limitation in nfsd (both rhel5/6) that
could cause some problems in some conditions in your current
configuration. A permanent fix is being worked on atm.

Without extreme details, you might have 2 of those services running on
the same node and attempting to relocate one of them can fail because
the fs cannot be unmounted. This is due to nfsd holding a lock (at
kernel level) to the FS. Changing config to the suggested one, mask the
problem pretty well, but more testing for a real fix is in progress.

Fabio



From tuckerd at lyle.smu.edu  Wed May 16 14:36:47 2012
From: tuckerd at lyle.smu.edu (Doug Tucker)
Date: Wed, 16 May 2012 09:36:47 -0500
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <4FB29EA8.5020208@redhat.com>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu> <4FB29EA8.5020208@redhat.com>
Message-ID: <4FB3BB7F.5000400@lyle.smu.edu>


> For the<fs resources you want nfslock="1" option too.
>
>

I have an open ticket with support because our cluster fails everytime 
on failover but caught this thread.  I set up my fs resources according 
to the docs as such:

<clusterfs device="/dev/vg1/zcb" fsid="7275" fstype="gfs2" 
mountpoint="/mnt/research/zcb" name="gfs22_zcb" 
options="defaults,noatime,localflocks"/>

Can you please verify my options look ok?  I can't find anything in the 
official documentation on setting up NFS on the cluster services with 
the nfslock="1" option.  Thanks!



From zagar at arlut.utexas.edu  Wed May 16 17:02:19 2012
From: zagar at arlut.utexas.edu (Randy Zagar)
Date: Wed, 16 May 2012 12:02:19 -0500
Subject: [Linux-cluster] Linux-cluster Digest, Vol 97, Issue 5
In-Reply-To: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
References: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
Message-ID: <4FB3DD9B.10609@arlut.utexas.edu>

Are you sure that nfslock="1" is a valid option for "<fs ...>"?

There doesn't appear to be a way to add that through LUCI, which means 
I'll have to make and propagate those changes manually.  I used to do 
this in EL5

    /sbin/ccs_tool update /etc/cluster/cluster.conf

but it looks like it's handled differently now.

How?

-RZ

On 05/16/2012 11:00 AM, fdinitto at redhat.com wrote:
> On 05/15/2012 07:33 PM, Randy Zagar wrote:
>> >           	<resources>
>> >           		<ip address="192.168.1.1" monitor_link="1"/>
>> >           		<ip address="192.168.1.2" monitor_link="1"/>
>> >           		<ip address="192.168.1.3" monitor_link="1"/>
>> >           		<fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>> >           		<fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>> >           		<fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>> >           		<nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/>
>> >           	</resources>
> For the<fs resources you want nfslock="1" option too.
>
>> >           	<service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate">
>> >           		<ip ref="192.168.1.1">
>> >           			<fs __independent_subtree="1" ref="volume01">
>> >           				<nfsexport name="nfs-volume01">
>> >           					<nfsclient name=" " ref="local-subnet"/>
>> >           				</nfsexport>
>> >           			</fs>
>> >           		</ip>
> For all services you need to change the order.
>
> <fs..
>   <nfsexport..
>    <nfsclient..
>     <ip..
>    </nfsclient..
>   </nfsexport..
> </fs
>
> This solves different issues at startup, relocation and recovery
>
> Also note that there is known limitation in nfsd (both rhel5/6) that
> could cause some problems in some conditions in your current
> configuration. A permanent fix is being worked on atm.
>
> Without extreme details, you might have 2 of those services running on
> the same node and attempting to relocate one of them can fail because
> the fs cannot be unmounted. This is due to nfsd holding a lock (at
> kernel level) to the FS. Changing config to the suggested one, mask the
> problem pretty well, but more testing for a real fix is in progress.
>
> Fabio

-- 
Randy Zagar                               Sr. Unix Systems Administrator
E-mail: zagar at arlut.utexas.edu            Applied Research Laboratories
Phone: 512 835-3131                       Univ. of Texas at Austin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120516/b3377991/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 9116 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120516/b3377991/attachment.p7s>

From zagar at arlut.utexas.edu  Wed May 16 17:18:11 2012
From: zagar at arlut.utexas.edu (Randy Zagar)
Date: Wed, 16 May 2012 12:18:11 -0500
Subject: [Linux-cluster] Linux-cluster Digest, Vol 97, Issue 5
In-Reply-To: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
References: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
Message-ID: <4FB3E153.3030400@arlut.utexas.edu>

Also, it looks like the resource manager tries to disable the IP address 
when it's a child of the nfsclient resource.  Is that going to be a 
problem when I have 16 NFS exports hosted on a single IP?

-RZ

On 05/16/2012 11:00 AM, fdinitto at redhat.com wrote:
> On 05/15/2012 07:33 PM, Randy Zagar wrote:
>> >           	<resources>
>> >           		<ip address="192.168.1.1" monitor_link="1"/>
>> >           		<ip address="192.168.1.2" monitor_link="1"/>
>> >           		<ip address="192.168.1.3" monitor_link="1"/>
>> >           		<fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>> >           		<fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>> >           		<fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>> >           		<nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/>
>> >           	</resources>
> For the<fs resources you want nfslock="1" option too.
>
>> >           	<service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate">
>> >           		<ip ref="192.168.1.1">
>> >           			<fs __independent_subtree="1" ref="volume01">
>> >           				<nfsexport name="nfs-volume01">
>> >           					<nfsclient name=" " ref="local-subnet"/>
>> >           				</nfsexport>
>> >           			</fs>
>> >           		</ip>
> For all services you need to change the order.
>
> <fs..
>   <nfsexport..
>    <nfsclient..
>     <ip..
>    </nfsclient..
>   </nfsexport..
> </fs
>
> This solves different issues at startup, relocation and recovery
>
> Also note that there is known limitation in nfsd (both rhel5/6) that
> could cause some problems in some conditions in your current
> configuration. A permanent fix is being worked on atm.
>
> Without extreme details, you might have 2 of those services running on
> the same node and attempting to relocate one of them can fail because
> the fs cannot be unmounted. This is due to nfsd holding a lock (at
> kernel level) to the FS. Changing config to the suggested one, mask the
> problem pretty well, but more testing for a real fix is in progress.
>
> Fabio

-- 
Randy Zagar                               Sr. Unix Systems Administrator
E-mail: zagar at arlut.utexas.edu            Applied Research Laboratories
Phone: 512 835-3131                       Univ. of Texas at Austin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120516/b739c9d5/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 9116 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120516/b739c9d5/attachment.p7s>

From fdinitto at redhat.com  Wed May 16 17:25:12 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 16 May 2012 19:25:12 +0200
Subject: [Linux-cluster] Linux-cluster Digest, Vol 97, Issue 5
In-Reply-To: <4FB3DD9B.10609@arlut.utexas.edu>
References: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
	<4FB3DD9B.10609@arlut.utexas.edu>
Message-ID: <4FB3E2F8.6080905@redhat.com>

On 5/16/2012 7:02 PM, Randy Zagar wrote:
> Are you sure that nfslock="1" is a valid option for "<fs ...>"?

Yes.

> 
> There doesn't appear to be a way to add that through LUCI, which means
> I'll have to make and propagate those changes manually.  I used to do
> this in EL5
> 
>     /sbin/ccs_tool update /etc/cluster/cluster.conf
> 
> but it looks like it's handled differently now.

I don?t know how to do that in Luci. All my work is done via manual editing.

Fabio

> 
> How?
> 
> -RZ
> 
> On 05/16/2012 11:00 AM, fdinitto at redhat.com wrote:
>> On 05/15/2012 07:33 PM, Randy Zagar wrote:
>>> >         	    <resources>
>>> >         		    <ip address="192.168.1.1" monitor_link="1"/>
>>> >         		    <ip address="192.168.1.2" monitor_link="1"/>
>>> >         		    <ip address="192.168.1.3" monitor_link="1"/>
>>> >         		    <fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>>> >         		    <fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>>> >         		    <fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>
>>> >         		    <nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/>
>>> >         	    </resources>
>> For the <fs resources you want nfslock="1" option too.
>>
>>> >         	    <service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate">
>>> >         		    <ip ref="192.168.1.1">
>>> >         			    <fs __independent_subtree="1" ref="volume01">
>>> >         				    <nfsexport name="nfs-volume01">
>>> >         					    <nfsclient name=" " ref="local-subnet"/>
>>> >         				    </nfsexport>
>>> >         			    </fs>
>>> >         		    </ip>
>> For all services you need to change the order.
>>
>> <fs..
>>  <nfsexport..
>>   <nfsclient..
>>    <ip..
>>   </nfsclient..
>>  </nfsexport..
>> </fs
>>
>> This solves different issues at startup, relocation and recovery
>>
>> Also note that there is known limitation in nfsd (both rhel5/6) that
>> could cause some problems in some conditions in your current
>> configuration. A permanent fix is being worked on atm.
>>
>> Without extreme details, you might have 2 of those services running on
>> the same node and attempting to relocate one of them can fail because
>> the fs cannot be unmounted. This is due to nfsd holding a lock (at
>> kernel level) to the FS. Changing config to the suggested one, mask the
>> problem pretty well, but more testing for a real fix is in progress.
>>
>> Fabio
> 
> -- 
> Randy Zagar                               Sr. Unix Systems Administrator
> E-mail: zagar at arlut.utexas.edu            Applied Research Laboratories
> Phone: 512 835-3131                       Univ. of Texas at Austin
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From rmccabe at redhat.com  Wed May 16 18:16:13 2012
From: rmccabe at redhat.com (Ryan McCabe)
Date: Wed, 16 May 2012 14:16:13 -0400
Subject: [Linux-cluster] Linux-cluster Digest, Vol 97, Issue 5
In-Reply-To: <4FB3DD9B.10609@arlut.utexas.edu>
References: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
	<4FB3DD9B.10609@arlut.utexas.edu>
Message-ID: <20120516181612.GA122261@redhat.com>

On Wed, May 16, 2012 at 12:02:19PM -0500, Randy Zagar wrote:
> Are you sure that nfslock="1" is a valid option for "<fs ...>"?
> 
> There doesn't appear to be a way to add that through LUCI, which
> means I'll have to make and propagate those changes manually.  I
> used to do this in EL5

Click the "Preferences" right at the top right corner, and tick the
'Enable "expert" mode' checkbox there. If you have that enabled, it'll
show you the nfslock option for fs resources.

> 
>    /sbin/ccs_tool update /etc/cluster/cluster.conf
> 
> but it looks like it's handled differently now.
> 
> How?

Increment the config_version in cluster.conf and run cman_tool version -r


Ryan



From Colin.Simpson at iongeo.com  Wed May 16 18:19:04 2012
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Wed, 16 May 2012 18:19:04 +0000
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <4FB29EA8.5020208@redhat.com>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu> <4FB29EA8.5020208@redhat.com>
Message-ID: <1337192345.12150.57.camel@bhac.iouk.ioroot.tld>

This is interesting.

We very often see the filesystems fail to umount on busy clustered NFS
servers.

What is the nature of the "real fix"?

I like the idea of NFSD fully being in user space, so killing it would
definitely free the fs.

Alan Brown (who's on this list) recently posted to a RH BZ that he was
one of the people who moved it into kernel space for performance reasons
in the past (that are no longer relevant):

https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9

, but I doubt this is the fix you have in mind.

Colin

On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
> This solves different issues at startup, relocation and recovery
>
> Also note that there is known limitation in nfsd (both rhel5/6) that
> could cause some problems in some conditions in your current
> configuration. A permanent fix is being worked on atm.
>
> Without extreme details, you might have 2 of those services running on
> the same node and attempting to relocate one of them can fail because
> the fs cannot be unmounted. This is due to nfsd holding a lock (at
> kernel level) to the FS. Changing config to the suggested one, mask the
> problem pretty well, but more testing for a real fix is in progress.
>
> Fabio
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.




From emi2fast at gmail.com  Wed May 16 18:20:08 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Wed, 16 May 2012 20:20:08 +0200
Subject: [Linux-cluster] Linux-cluster Digest, Vol 97, Issue 5
In-Reply-To: <4FB3E153.3030400@arlut.utexas.edu>
References: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
	<4FB3E153.3030400@arlut.utexas.edu>
Message-ID: <CAE7pJ3BP_-J6DjoCd-hGeTSoG9s1171G4V6TCWXZ8Bvzsifstw@mail.gmail.com>

Yes Randy

I rember in my jobs i found this problem in a cluster

This it's wrong
==============================================
<service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1"
nfslock="1" recovery="relocate">
                            <ip ref="192.168.1.1">
                                    <fs __independent_subtree="1"
ref="volume01">
                                            <nfsexport name="nfs-volume01">
                                                    <nfsclient name=" "
ref="local-subnet"/>
                                            </nfsexport>
                                    </fs>
                            </ip>
================================================
it's must be
================================================
<service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1"
nfslock="1" recovery="relocate">
                            <ip ref="192.168.1.1"/>
                                    <fs __independent_subtree="1"
ref="volume01">
                                            <nfsexport name="nfs-volume01">
                                                    <nfsclient name=" "
ref="local-subnet"/>
                                            </nfsexport>
                                    </fs>
================================================

A give a little explaination, Redhat have a internar order and knows i
which sequense start the resource
For more information read the script /usr/share/cluster/service.sh under
the metadata session



2012/5/16 Randy Zagar <zagar at arlut.utexas.edu>

>  Also, it looks like the resource manager tries to disable the IP address
> when it's a child of the nfsclient resource.  Is that going to be a problem
> when I have 16 NFS exports hosted on a single IP?
>
>
> -RZ
>
> On 05/16/2012 11:00 AM, fdinitto at redhat.com wrote:
>
>  On 05/15/2012 07:33 PM, Randy Zagar wrote:
>
>
>  >         	    <resources>>         		    <ip address="192.168.1.1" monitor_link="1"/>>         		    <ip address="192.168.1.2" monitor_link="1"/>>         		    <ip address="192.168.1.3" monitor_link="1"/>>         		    <fs device="/dev/cvg00/volume01" force_fsck="0" force_unmount="1" fsid="49388" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>>         		    <fs device="/dev/cvg00/volume02" force_fsck="0" force_unmount="1" fsid="58665" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>>         		    <fs device="/dev/cvg00/volume03" force_fsck="0" force_unmount="1" fsid="61028" fstype="ext3" mountpoint="/lvm/volume01" name="volume01" self_fence="0"/>>         		    <nfsclient allow_recover="1" name="local-subnet" options="rw,insecure" target="192.168.1.0/24"/>>         	    </resources>
>
>  For the <fs resources you want nfslock="1" option too.
>
>
>  >         	    <service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1" nfslock="1" recovery="relocate">>         		    <ip ref="192.168.1.1">>         			    <fs __independent_subtree="1" ref="volume01">>         				    <nfsexport name="nfs-volume01">>         					    <nfsclient name=" " ref="local-subnet"/>>         				    </nfsexport>>         			    </fs>>         		    </ip>
>
>  For all services you need to change the order.
>
> <fs..
>  <nfsexport..
>   <nfsclient..
>    <ip..
>   </nfsclient..
>  </nfsexport..
> </fs
>
> This solves different issues at startup, relocation and recovery
>
> Also note that there is known limitation in nfsd (both rhel5/6) that
> could cause some problems in some conditions in your current
> configuration. A permanent fix is being worked on atm.
>
> Without extreme details, you might have 2 of those services running on
> the same node and attempting to relocate one of them can fail because
> the fs cannot be unmounted. This is due to nfsd holding a lock (at
> kernel level) to the FS. Changing config to the suggested one, mask the
> problem pretty well, but more testing for a real fix is in progress.
>
> Fabio
>
>
> --
> Randy Zagar                               Sr. Unix Systems Administrator
> E-mail: zagar at arlut.utexas.edu            Applied Research Laboratories
> Phone: 512 835-3131                       Univ. of Texas at Austin
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120516/b03dd46f/attachment.htm>

From fdinitto at redhat.com  Wed May 16 20:00:03 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 16 May 2012 22:00:03 +0200
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu> <4FB29EA8.5020208@redhat.com>
	<1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
Message-ID: <4FB40743.7000508@redhat.com>

quick reply, I?ll send you the full details monday. I am heading off for
a couple of days of vacation + weekend.. i am not ignoring the question :)

Fabio

On 5/16/2012 8:19 PM, Colin Simpson wrote:
> This is interesting.
> 
> We very often see the filesystems fail to umount on busy clustered NFS
> servers.
> 
> What is the nature of the "real fix"?
> 
> I like the idea of NFSD fully being in user space, so killing it would
> definitely free the fs.
> 
> Alan Brown (who's on this list) recently posted to a RH BZ that he was
> one of the people who moved it into kernel space for performance reasons
> in the past (that are no longer relevant):
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
> 
> , but I doubt this is the fix you have in mind.
> 
> Colin
> 
> On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
>> This solves different issues at startup, relocation and recovery
>>
>> Also note that there is known limitation in nfsd (both rhel5/6) that
>> could cause some problems in some conditions in your current
>> configuration. A permanent fix is being worked on atm.
>>
>> Without extreme details, you might have 2 of those services running on
>> the same node and attempting to relocate one of them can fail because
>> the fs cannot be unmounted. This is due to nfsd holding a lock (at
>> kernel level) to the FS. Changing config to the suggested one, mask the
>> problem pretty well, but more testing for a real fix is in progress.
>>
>> Fabio
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> ________________________________
> 
> 
> This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From emi2fast at gmail.com  Wed May 16 22:45:16 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Thu, 17 May 2012 00:45:16 +0200
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu> <4FB29EA8.5020208@redhat.com>
	<1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
Message-ID: <CAE7pJ3AYFBs3j9=uYJxp7m3WJu5ebV6n4A2vcLPMiLA=9o9Z7A@mail.gmail.com>

Colin

Use force_unmount options

2012/5/16 Colin Simpson <Colin.Simpson at iongeo.com>

> This is interesting.
>
> We very often see the filesystems fail to umount on busy clustered NFS
> servers.
>
> What is the nature of the "real fix"?
>
> I like the idea of NFSD fully being in user space, so killing it would
> definitely free the fs.
>
> Alan Brown (who's on this list) recently posted to a RH BZ that he was
> one of the people who moved it into kernel space for performance reasons
> in the past (that are no longer relevant):
>
> https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
>
> , but I doubt this is the fix you have in mind.
>
> Colin
>
> On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
> > This solves different issues at startup, relocation and recovery
> >
> > Also note that there is known limitation in nfsd (both rhel5/6) that
> > could cause some problems in some conditions in your current
> > configuration. A permanent fix is being worked on atm.
> >
> > Without extreme details, you might have 2 of those services running on
> > the same node and attempting to relocate one of them can fail because
> > the fs cannot be unmounted. This is due to nfsd holding a lock (at
> > kernel level) to the FS. Changing config to the suggested one, mask the
> > problem pretty well, but more testing for a real fix is in progress.
> >
> > Fabio
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> ________________________________
>
>
> This email and any files transmitted with it are confidential and are
> intended solely for the use of the individual or entity to whom they are
> addressed. If you are not the original recipient or the person responsible
> for delivering the email to the intended recipient, be advised that you
> have received this email in error, and that any use, dissemination,
> forwarding, printing, or copying of this email is strictly prohibited. If
> you received this email in error, please immediately notify the sender and
> delete the original.
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120517/cb31b9cb/attachment.htm>

From fdinitto at redhat.com  Thu May 17 08:02:54 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 17 May 2012 10:02:54 +0200
Subject: [Linux-cluster] Linux-cluster Digest, Vol 97, Issue 5
In-Reply-To: <CAE7pJ3BP_-J6DjoCd-hGeTSoG9s1171G4V6TCWXZ8Bvzsifstw@mail.gmail.com>
References: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
	<4FB3E153.3030400@arlut.utexas.edu>
	<CAE7pJ3BP_-J6DjoCd-hGeTSoG9s1171G4V6TCWXZ8Bvzsifstw@mail.gmail.com>
Message-ID: <4FB4B0AE.2050208@redhat.com>

On 05/16/2012 08:20 PM, emmanuel segura wrote:

> ================================================
> it's must be
> ================================================
> <service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1"
> nfslock="1" recovery="relocate">
>                             <ip ref="192.168.1.1"/>
>                                     <fs __independent_subtree="1"
> ref="volume01">
>                                             <nfsexport name="nfs-volume01">
>                                                     <nfsclient name=" "
> ref="local-subnet"/>
>                                             </nfsexport>
>                                     </fs>
> ================================================

That is also wrong.

<ip has to be the last one to start or you will hit race conditions at
service startup.

Fabio



From fdinitto at redhat.com  Thu May 17 08:26:46 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 17 May 2012 10:26:46 +0200
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu> <4FB29EA8.5020208@redhat.com>
	<1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
Message-ID: <4FB4B646.4030306@redhat.com>

On 05/16/2012 08:19 PM, Colin Simpson wrote:
> This is interesting.
> 
> We very often see the filesystems fail to umount on busy clustered NFS
> servers.

Yes, I am aware the issue since I have been investigating it in details
for the past couple of weeks.

> 
> What is the nature of the "real fix"?

First, the bz you mention below is unrelated to the unmount problem we
are discussing. clustered nfsd locks are a slightly different story.

There are two issues here:

1) cluster users expectations
2) nfsd internal design

(and note I am not blaming either cluster or nfsd here)

Generally cluster users expect to be able to do things like (fake meta
config):

<service1..
 <fs1..
  <nfsexport1..
   <nfsclient1..
    <ip1..
....
<service2
 <fs2..
  <nfsexport2..
   <nfsclient2..
    <ip2..

and be able to move services around cluster nodes without problem. Note
that it is irrelevant of the fs used. It can be clustered or not.

This setup does unfortunately clash with nfsd design.

When shutdown of a service happens (due to stop or relocation is
indifferent):

ip is removed
exportfs -u .....
(and that's where we hit the nfsd design limitation)
umount fs..

By design (tho I can't say exactly why it is done this way without
speculating), nfsd will continue to serve open sessions via rpc.
exportfs -u will only stop new incoming requests.

If nfsd is serving a client, it will continue to hold a lock on the
filesystem (in kernel) that would prevent the fs to be unmounted.

The only way to effectively close the sessions are:

- drop the VIP and wait for connections timeout (nfsd would effectively
  also drop the lock on the fs) but it is slow and not always consistent
  on how long it would take

- restart nfsd.


The "real fix" here would be to wait for nfsd containers that do support
exactly this scenario. Allowing unexport of single fs and lock drops
etc. etc. This work is still in very early stages upstream, that doesn't
make it suitable yet for production.

The patch I am working on, is basically a way to handle the clash in the
best way as possible.

A new nfsrestart="" option will be added to both fs and clusterfs, that,
if the filesystem cannot be unmounted, if force_unmount is set, it will
perform an extremely fast restart of nfslock and nfsd.

We can argue that it is not the final solution, i think we can agree
that it is more of a workaround, but:

1) it will allow service migration instead of service failure
2) it will match cluster users expectations (allowing different exports
and live peacefully together).

The only negative impact that we have been able to evaluate so far (the
patch is still under heavy testing phase), beside having to add a config
option to enable it, is that there will be a small window in which all
clients connect to a certain node for all nfs services, will not be
served because nfsd is restarting.

So if you are migrating export1 and there are clients using export2,
export2 will also be affected for those few ms required to restart nfsd.
(assuming export1 and 2 are running on the same node of course).

Placing things in perspective for a cluster, I think that it is a lot
better to be able to unmount a fs and relocate services as necessary vs
a service failing completely and maybe node being fenced.




> 
> I like the idea of NFSD fully being in user space, so killing it would
> definitely free the fs.
> 
> Alan Brown (who's on this list) recently posted to a RH BZ that he was
> one of the people who moved it into kernel space for performance reasons
> in the past (that are no longer relevant):
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
> 
> , but I doubt this is the fix you have in mind.

No that's a totally different issue.

> 
> Colin
> 
> On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
>> This solves different issues at startup, relocation and recovery
>>
>> Also note that there is known limitation in nfsd (both rhel5/6) that
>> could cause some problems in some conditions in your current
>> configuration. A permanent fix is being worked on atm.
>>
>> Without extreme details, you might have 2 of those services running on
>> the same node and attempting to relocate one of them can fail because
>> the fs cannot be unmounted. This is due to nfsd holding a lock (at
>> kernel level) to the FS. Changing config to the suggested one, mask the
>> problem pretty well, but more testing for a real fix is in progress.
>>
>> Fabio
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> ________________________________
> 
> 
> This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From emi2fast at gmail.com  Thu May 17 08:37:24 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Thu, 17 May 2012 10:37:24 +0200
Subject: [Linux-cluster] Linux-cluster Digest, Vol 97, Issue 5
In-Reply-To: <4FB4B0AE.2050208@redhat.com>
References: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
	<4FB3E153.3030400@arlut.utexas.edu>
	<CAE7pJ3BP_-J6DjoCd-hGeTSoG9s1171G4V6TCWXZ8Bvzsifstw@mail.gmail.com>
	<4FB4B0AE.2050208@redhat.com>
Message-ID: <CAE7pJ3B0W0=Bs_R9JNSP5WaVKP4yF3BN2ZOOXTnDy+EX8KbZaw@mail.gmail.com>

Fabio

The Ip it's the last to start, as sayed before look vim
/usr/share/cluster/service.sh

a have a cluster configured like that and i can tell i never found the
problem
===========================================================

Look this, the order start of resources isn't based on the order in the xml
/etc/cluster.conf, i see you work in Redhat and i don't know this Mama Mia
===========================================================
 <special tag="rgmanager">
        <attributes root="1" maxinstances="1"/>
        <child type="lvm" start="1" stop="9"/>
        <child type="fs" start="2" stop="8"/>
        <child type="clusterfs" start="3" stop="7"/>
        <child type="netfs" start="4" stop="6"/>
        <child type="nfsexport" start="5" stop="5"/>

        <child type="nfsclient" start="6" stop="4"/>

        <child type="ip" start="7" stop="2"/>
        <child type="smb" start="8" stop="3"/>
        <child type="script" start="9" stop="1"/>
    </special>
==========================================================

2012/5/17 Fabio M. Di Nitto <fdinitto at redhat.com>

> On 05/16/2012 08:20 PM, emmanuel segura wrote:
>
> > ================================================
> > it's must be
> > ================================================
> > <service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1"
> > nfslock="1" recovery="relocate">
> >                             <ip ref="192.168.1.1"/>
> >                                     <fs __independent_subtree="1"
> > ref="volume01">
> >                                             <nfsexport
> name="nfs-volume01">
> >                                                     <nfsclient name=" "
> > ref="local-subnet"/>
> >                                             </nfsexport>
> >                                     </fs>
> > ================================================
>
> That is also wrong.
>
> <ip has to be the last one to start or you will hit race conditions at
> service startup.
>
> Fabio
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120517/5e1d0a10/attachment.htm>

From emi2fast at gmail.com  Thu May 17 08:38:21 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Thu, 17 May 2012 10:38:21 +0200
Subject: [Linux-cluster] Linux-cluster Digest, Vol 97, Issue 5
In-Reply-To: <4FB4B0AE.2050208@redhat.com>
References: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
	<4FB3E153.3030400@arlut.utexas.edu>
	<CAE7pJ3BP_-J6DjoCd-hGeTSoG9s1171G4V6TCWXZ8Bvzsifstw@mail.gmail.com>
	<4FB4B0AE.2050208@redhat.com>
Message-ID: <CAE7pJ3ApjyCG841_-_mWV4vC6_OCJc_HJ5NDy2s1Jjye5LqAAQ@mail.gmail.com>

Fabio

The Ip it's the last to start, as sayed before look vim
/usr/share/cluster/service.sh

a have a cluster configured like that and i can tell i never found the
problem
==============================
=============================

Look this, the order start of resources isn't based on the order in the xml
/etc/cluster.conf, i see you work in Redhat and you don't know this Mama Mia
===========================================================
 <special tag="rgmanager">
        <attributes root="1" maxinstances="1"/>
        <child type="lvm" start="1" stop="9"/>
        <child type="fs" start="2" stop="8"/>
        <child type="clusterfs" start="3" stop="7"/>
        <child type="netfs" start="4" stop="6"/>
        <child type="nfsexport" start="5" stop="5"/>

        <child type="nfsclient" start="6" stop="4"/>

        <child type="ip" start="7" stop="2"/>
        <child type="smb" start="8" stop="3"/>
        <child type="script" start="9" stop="1"/>
    </special>
==========================================================


2012/5/17 Fabio M. Di Nitto <fdinitto at redhat.com>

> On 05/16/2012 08:20 PM, emmanuel segura wrote:
>
> > ================================================
> > it's must be
> > ================================================
> > <service autostart="1" domain="nfs1-domain" exclusive="0" name="nfs1"
> > nfslock="1" recovery="relocate">
> >                             <ip ref="192.168.1.1"/>
> >                                     <fs __independent_subtree="1"
> > ref="volume01">
> >                                             <nfsexport
> name="nfs-volume01">
> >                                                     <nfsclient name=" "
> > ref="local-subnet"/>
> >                                             </nfsexport>
> >                                     </fs>
> > ================================================
>
> That is also wrong.
>
> <ip has to be the last one to start or you will hit race conditions at
> service startup.
>
> Fabio
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120517/187b507b/attachment.htm>

From fdinitto at redhat.com  Thu May 17 09:38:04 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 17 May 2012 11:38:04 +0200
Subject: [Linux-cluster] Linux-cluster Digest, Vol 97, Issue 5
In-Reply-To: <CAE7pJ3ApjyCG841_-_mWV4vC6_OCJc_HJ5NDy2s1Jjye5LqAAQ@mail.gmail.com>
References: <mailman.33.1337184005.14823.linux-cluster@redhat.com>
	<4FB3E153.3030400@arlut.utexas.edu>
	<CAE7pJ3BP_-J6DjoCd-hGeTSoG9s1171G4V6TCWXZ8Bvzsifstw@mail.gmail.com>
	<4FB4B0AE.2050208@redhat.com>
	<CAE7pJ3ApjyCG841_-_mWV4vC6_OCJc_HJ5NDy2s1Jjye5LqAAQ@mail.gmail.com>
Message-ID: <4FB4C6FC.3080403@redhat.com>

Emmanuel,

On 5/17/2012 10:38 AM, emmanuel segura wrote:
> Fabio
> 
> The Ip it's the last to start, as sayed before look vim
> /usr/share/cluster/service.sh
> 
> a have a cluster configured like that and i can tell i never found the
> problem
> ==============================
> =============================
> 
> Look this, the order start of resources isn't based on the order in the
> xml /etc/cluster.conf,

Wow. that?s good planning of your.. so instead of rely on an explicit
order in cluster.conf, you add an extra layer of obfuscation by assuming
service.sh will always be the same.

Explicit config is more readable and make the service clear for the user
vs expecting a user to dig into some file where priorities are described
and that can change.

> i see you work in Redhat and you don't know this
> Mama Mia

I seriously hope this is sarcasm.

Fabio



From Colin.Simpson at iongeo.com  Thu May 17 09:47:00 2012
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Thu, 17 May 2012 09:47:00 +0000
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <CAE7pJ3AYFBs3j9=uYJxp7m3WJu5ebV6n4A2vcLPMiLA=9o9Z7A@mail.gmail.com>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu> <4FB29EA8.5020208@redhat.com>
	<1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
	<CAE7pJ3AYFBs3j9=uYJxp7m3WJu5ebV6n4A2vcLPMiLA=9o9Z7A@mail.gmail.com>
Message-ID: <1337248020.13755.10.camel@bhac.iouk.ioroot.tld>

Sadly that doesn't work (usually). With nfsd the filesystem will still
refuse to umount (despite the force) as it's locked in kernel space
(where nfsd lives). And the service will just go to failed.

The best you can so is the probably self_fence="1" which is pretty
brutal and halts the node with the stuck fs mount, I believe.

Colin

On Thu, 2012-05-17 at 00:45 +0200, emmanuel segura wrote:
> Colin
>
> Use force_unmount options
>
> 2012/5/16 Colin Simpson <Colin.Simpson at iongeo.com>
>         This is interesting.
>
>         We very often see the filesystems fail to umount on busy
>         clustered NFS
>         servers.
>
>         What is the nature of the "real fix"?
>
>         I like the idea of NFSD fully being in user space, so killing
>         it would
>         definitely free the fs.
>
>         Alan Brown (who's on this list) recently posted to a RH BZ
>         that he was
>         one of the people who moved it into kernel space for
>         performance reasons
>         in the past (that are no longer relevant):
>
>         https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
>
>         , but I doubt this is the fix you have in mind.
>
>         Colin
>
>         On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
>         > This solves different issues at startup, relocation and
>         recovery
>         >
>         > Also note that there is known limitation in nfsd (both
>         rhel5/6) that
>         > could cause some problems in some conditions in your current
>         > configuration. A permanent fix is being worked on atm.
>         >
>         > Without extreme details, you might have 2 of those services
>         running on
>         > the same node and attempting to relocate one of them can
>         fail because
>         > the fs cannot be unmounted. This is due to nfsd holding a
>         lock (at
>         > kernel level) to the FS. Changing config to the suggested
>         one, mask the
>         > problem pretty well, but more testing for a real fix is in
>         progress.
>         >
>         > Fabio
>         >
>         > --
>         > Linux-cluster mailing list
>         > Linux-cluster at redhat.com
>         > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>         ________________________________
>
>
>         This email and any files transmitted with it are confidential
>         and are intended solely for the use of the individual or
>         entity to whom they are addressed. If you are not the original
>         recipient or the person responsible for delivering the email
>         to the intended recipient, be advised that you have received
>         this email in error, and that any use, dissemination,
>         forwarding, printing, or copying of this email is strictly
>         prohibited. If you received this email in error, please
>         immediately notify the sender and delete the original.
>
>
>         --
>         Linux-cluster mailing list
>         Linux-cluster at redhat.com
>         https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera


________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.




From Colin.Simpson at iongeo.com  Thu May 17 09:47:29 2012
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Thu, 17 May 2012 09:47:29 +0000
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <4FB4B646.4030306@redhat.com>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu> <4FB29EA8.5020208@redhat.com>
	<1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
	<4FB4B646.4030306@redhat.com>
Message-ID: <1337248049.13755.11.camel@bhac.iouk.ioroot.tld>

Thanks for all the useful information on this.

I realise the bz is not for this issue, I just included it as it has the
suggestion that nfsd should actually live in user space (which seems
sensible).

Out of interest is there a bz # for this issue?

Colin


On Thu, 2012-05-17 at 10:26 +0200, Fabio M. Di Nitto wrote:
> On 05/16/2012 08:19 PM, Colin Simpson wrote:
> > This is interesting.
> >
> > We very often see the filesystems fail to umount on busy clustered NFS
> > servers.
>
> Yes, I am aware the issue since I have been investigating it in details
> for the past couple of weeks.
>
> >
> > What is the nature of the "real fix"?
>
> First, the bz you mention below is unrelated to the unmount problem we
> are discussing. clustered nfsd locks are a slightly different story.
>
> There are two issues here:
>
> 1) cluster users expectations
> 2) nfsd internal design
>
> (and note I am not blaming either cluster or nfsd here)
>
> Generally cluster users expect to be able to do things like (fake meta
> config):
>
> <service1..
>  <fs1..
>   <nfsexport1..
>    <nfsclient1..
>     <ip1..
> ....
> <service2
>  <fs2..
>   <nfsexport2..
>    <nfsclient2..
>     <ip2..
>
> and be able to move services around cluster nodes without problem. Note
> that it is irrelevant of the fs used. It can be clustered or not.
>
> This setup does unfortunately clash with nfsd design.
>
> When shutdown of a service happens (due to stop or relocation is
> indifferent):
>
> ip is removed
> exportfs -u .....
> (and that's where we hit the nfsd design limitation)
> umount fs..
>
> By design (tho I can't say exactly why it is done this way without
> speculating), nfsd will continue to serve open sessions via rpc.
> exportfs -u will only stop new incoming requests.
>
> If nfsd is serving a client, it will continue to hold a lock on the
> filesystem (in kernel) that would prevent the fs to be unmounted.
>
> The only way to effectively close the sessions are:
>
> - drop the VIP and wait for connections timeout (nfsd would effectively
>   also drop the lock on the fs) but it is slow and not always consistent
>   on how long it would take
>
> - restart nfsd.
>
>
> The "real fix" here would be to wait for nfsd containers that do support
> exactly this scenario. Allowing unexport of single fs and lock drops
> etc. etc. This work is still in very early stages upstream, that doesn't
> make it suitable yet for production.
>
> The patch I am working on, is basically a way to handle the clash in the
> best way as possible.
>
> A new nfsrestart="" option will be added to both fs and clusterfs, that,
> if the filesystem cannot be unmounted, if force_unmount is set, it will
> perform an extremely fast restart of nfslock and nfsd.
>
> We can argue that it is not the final solution, i think we can agree
> that it is more of a workaround, but:
>
> 1) it will allow service migration instead of service failure
> 2) it will match cluster users expectations (allowing different exports
> and live peacefully together).
>
> The only negative impact that we have been able to evaluate so far (the
> patch is still under heavy testing phase), beside having to add a config
> option to enable it, is that there will be a small window in which all
> clients connect to a certain node for all nfs services, will not be
> served because nfsd is restarting.
>
> So if you are migrating export1 and there are clients using export2,
> export2 will also be affected for those few ms required to restart nfsd.
> (assuming export1 and 2 are running on the same node of course).
>
> Placing things in perspective for a cluster, I think that it is a lot
> better to be able to unmount a fs and relocate services as necessary vs
> a service failing completely and maybe node being fenced.
>
>
>
>
> >
> > I like the idea of NFSD fully being in user space, so killing it would
> > definitely free the fs.
> >
> > Alan Brown (who's on this list) recently posted to a RH BZ that he was
> > one of the people who moved it into kernel space for performance reasons
> > in the past (that are no longer relevant):
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
> >
> > , but I doubt this is the fix you have in mind.
>
> No that's a totally different issue.
>
> >
> > Colin
> >
> > On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
> >> This solves different issues at startup, relocation and recovery
> >>
> >> Also note that there is known limitation in nfsd (both rhel5/6) that
> >> could cause some problems in some conditions in your current
> >> configuration. A permanent fix is being worked on atm.
> >>
> >> Without extreme details, you might have 2 of those services running on
> >> the same node and attempting to relocate one of them can fail because
> >> the fs cannot be unmounted. This is due to nfsd holding a lock (at
> >> kernel level) to the FS. Changing config to the suggested one, mask the
> >> problem pretty well, but more testing for a real fix is in progress.
> >>
> >> Fabio
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > ________________________________
> >
> >
> > This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>


________________________________


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.




From fdinitto at redhat.com  Thu May 17 09:57:22 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 17 May 2012 11:57:22 +0200
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <1337248049.13755.11.camel@bhac.iouk.ioroot.tld>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu> <4FB29EA8.5020208@redhat.com>
	<1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
	<4FB4B646.4030306@redhat.com>
	<1337248049.13755.11.camel@bhac.iouk.ioroot.tld>
Message-ID: <4FB4CB82.2010109@redhat.com>

Hi Colin,

On 5/17/2012 11:47 AM, Colin Simpson wrote:
> Thanks for all the useful information on this.
> 
> I realise the bz is not for this issue, I just included it as it has the
> suggestion that nfsd should actually live in user space (which seems
> sensible).

Understood. I can?t really say if userland or kernel would make any
difference in this specific unmount issue, but for "safety reasons" I
need to assume their design is the same and behave the same way. when/if
there will be a switch, we will need to look more deeply into it. With
current kernel implementation we (cluster guys) need to use this approach.

> 
> Out of interest is there a bz # for this issue?

Yes one for rhel5 and one for rhel6, but they are both private at the
moment because they have customer data in it.

I expect that the workaround/fix (whatever you want to label it) will be
available via RHN in 2/3 weeks.

Fabio

> 
> Colin
> 
> 
> On Thu, 2012-05-17 at 10:26 +0200, Fabio M. Di Nitto wrote:
>> On 05/16/2012 08:19 PM, Colin Simpson wrote:
>>> This is interesting.
>>>
>>> We very often see the filesystems fail to umount on busy clustered NFS
>>> servers.
>>
>> Yes, I am aware the issue since I have been investigating it in details
>> for the past couple of weeks.
>>
>>>
>>> What is the nature of the "real fix"?
>>
>> First, the bz you mention below is unrelated to the unmount problem we
>> are discussing. clustered nfsd locks are a slightly different story.
>>
>> There are two issues here:
>>
>> 1) cluster users expectations
>> 2) nfsd internal design
>>
>> (and note I am not blaming either cluster or nfsd here)
>>
>> Generally cluster users expect to be able to do things like (fake meta
>> config):
>>
>> <service1..
>>  <fs1..
>>   <nfsexport1..
>>    <nfsclient1..
>>     <ip1..
>> ....
>> <service2
>>  <fs2..
>>   <nfsexport2..
>>    <nfsclient2..
>>     <ip2..
>>
>> and be able to move services around cluster nodes without problem. Note
>> that it is irrelevant of the fs used. It can be clustered or not.
>>
>> This setup does unfortunately clash with nfsd design.
>>
>> When shutdown of a service happens (due to stop or relocation is
>> indifferent):
>>
>> ip is removed
>> exportfs -u .....
>> (and that's where we hit the nfsd design limitation)
>> umount fs..
>>
>> By design (tho I can't say exactly why it is done this way without
>> speculating), nfsd will continue to serve open sessions via rpc.
>> exportfs -u will only stop new incoming requests.
>>
>> If nfsd is serving a client, it will continue to hold a lock on the
>> filesystem (in kernel) that would prevent the fs to be unmounted.
>>
>> The only way to effectively close the sessions are:
>>
>> - drop the VIP and wait for connections timeout (nfsd would effectively
>>   also drop the lock on the fs) but it is slow and not always consistent
>>   on how long it would take
>>
>> - restart nfsd.
>>
>>
>> The "real fix" here would be to wait for nfsd containers that do support
>> exactly this scenario. Allowing unexport of single fs and lock drops
>> etc. etc. This work is still in very early stages upstream, that doesn't
>> make it suitable yet for production.
>>
>> The patch I am working on, is basically a way to handle the clash in the
>> best way as possible.
>>
>> A new nfsrestart="" option will be added to both fs and clusterfs, that,
>> if the filesystem cannot be unmounted, if force_unmount is set, it will
>> perform an extremely fast restart of nfslock and nfsd.
>>
>> We can argue that it is not the final solution, i think we can agree
>> that it is more of a workaround, but:
>>
>> 1) it will allow service migration instead of service failure
>> 2) it will match cluster users expectations (allowing different exports
>> and live peacefully together).
>>
>> The only negative impact that we have been able to evaluate so far (the
>> patch is still under heavy testing phase), beside having to add a config
>> option to enable it, is that there will be a small window in which all
>> clients connect to a certain node for all nfs services, will not be
>> served because nfsd is restarting.
>>
>> So if you are migrating export1 and there are clients using export2,
>> export2 will also be affected for those few ms required to restart nfsd.
>> (assuming export1 and 2 are running on the same node of course).
>>
>> Placing things in perspective for a cluster, I think that it is a lot
>> better to be able to unmount a fs and relocate services as necessary vs
>> a service failing completely and maybe node being fenced.
>>
>>
>>
>>
>>>
>>> I like the idea of NFSD fully being in user space, so killing it would
>>> definitely free the fs.
>>>
>>> Alan Brown (who's on this list) recently posted to a RH BZ that he was
>>> one of the people who moved it into kernel space for performance reasons
>>> in the past (that are no longer relevant):
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
>>>
>>> , but I doubt this is the fix you have in mind.
>>
>> No that's a totally different issue.
>>
>>>
>>> Colin
>>>
>>> On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
>>>> This solves different issues at startup, relocation and recovery
>>>>
>>>> Also note that there is known limitation in nfsd (both rhel5/6) that
>>>> could cause some problems in some conditions in your current
>>>> configuration. A permanent fix is being worked on atm.
>>>>
>>>> Without extreme details, you might have 2 of those services running on
>>>> the same node and attempting to relocate one of them can fail because
>>>> the fs cannot be unmounted. This is due to nfsd holding a lock (at
>>>> kernel level) to the FS. Changing config to the suggested one, mask the
>>>> problem pretty well, but more testing for a real fix is in progress.
>>>>
>>>> Fabio
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>> ________________________________
>>>
>>>
>>> This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> 
> 
> ________________________________
> 
> 
> This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.
> 



From bcodding at uvm.edu  Tue May 22 13:05:32 2012
From: bcodding at uvm.edu (Benjamin Coddington)
Date: Tue, 22 May 2012 09:05:32 -0400
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu> <4FB29EA8.5020208@redhat.com>
	<1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
Message-ID: <CE766CF2-4ED9-4122-B221-4D2C27B2163C@uvm.edu>

Those running HA NFS should be aware of the following two NFSD open leaks.

The first is the nfs4_open_downgrade leak:
http://marc.info/?l=linux-nfs&m=131077202109185&w=2
https://bugzilla.redhat.com/show_bug.cgi?id=714153

Redhat supposedly fixed this, but I never saw the errata go by.. while we
waited for them to fix it, we went to an upstream kernel and got bit
by this one:

http://marc.info/?l=linux-nfs&m=131077202109185&w=2

NFSD open leaks will cause your filesystems to fail to umount, even after
waiting through your lease time.  You'll see the device's open count
will be non-zero (dmsetup info <device>), even though the filesystem
is unexported, and kernel nfsds are stopped.

We've been running our NFS4 HA cluster for a few months now on
a 3.2.5 kernel, and failover/recovery works well.

Ben

On May 16, 2012, at 2:19 PM, Colin Simpson wrote:

> This is interesting.
> 
> We very often see the filesystems fail to umount on busy clustered NFS
> servers.
> 
> What is the nature of the "real fix"?
> 
> I like the idea of NFSD fully being in user space, so killing it would
> definitely free the fs.
> 
> Alan Brown (who's on this list) recently posted to a RH BZ that he was
> one of the people who moved it into kernel space for performance reasons
> in the past (that are no longer relevant):
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
> 
> , but I doubt this is the fix you have in mind.
> 
> Colin
> 
> On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
>> This solves different issues at startup, relocation and recovery
>> 
>> Also note that there is known limitation in nfsd (both rhel5/6) that
>> could cause some problems in some conditions in your current
>> configuration. A permanent fix is being worked on atm.
>> 
>> Without extreme details, you might have 2 of those services running on
>> the same node and attempting to relocate one of them can fail because
>> the fs cannot be unmounted. This is due to nfsd holding a lock (at
>> kernel level) to the FS. Changing config to the suggested one, mask the
>> problem pretty well, but more testing for a real fix is in progress.
>> 
>> Fabio
>> 
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> ________________________________
> 
> 
> This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From bcodding at uvm.edu  Tue May 22 13:26:21 2012
From: bcodding at uvm.edu (Benjamin Coddington)
Date: Tue, 22 May 2012 09:26:21 -0400
Subject: [Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question
In-Reply-To: <CE766CF2-4ED9-4122-B221-4D2C27B2163C@uvm.edu>
References: <mailman.49.1337097607.8433.linux-cluster@redhat.com>
	<4FB29374.5000600@arlut.utexas.edu> <4FB29EA8.5020208@redhat.com>
	<1337192345.12150.57.camel@bhac.iouk.ioroot.tld>
	<CE766CF2-4ED9-4122-B221-4D2C27B2163C@uvm.edu>
Message-ID: <7361BAA5-C6B9-4604-BEB9-2783FC923989@uvm.edu>

On May 22, 2012, at 9:05 AM, Benjamin Coddington wrote:
> Those running HA NFS should be aware of the following two NFSD open leaks.
> 
> The first is the nfs4_open_downgrade leak:
> http://marc.info/?l=linux-nfs&m=131077202109185&w=2
> https://bugzilla.redhat.com/show_bug.cgi?id=714153
> 
> Redhat supposedly fixed this, but I never saw the errata go by.. while we
> waited for them to fix it, we went to an upstream kernel and got bit
> by this one:
> 
> http://marc.info/?l=linux-nfs&m=131077202109185&w=2

My apologies, this second location is the same as the first, and should be
http://marc.info/?l=linux-nfs&m=131914563520472&w=2

Ben



From xwhuang123 at gmail.com  Thu May 24 04:22:20 2012
From: xwhuang123 at gmail.com (=?Big5?B?tsC+5bC2?=)
Date: Thu, 24 May 2012 12:22:20 +0800
Subject: [Linux-cluster] Is it possible to use quorum for CTDB to prevent
 split-brain and removing lockfile in the cluster file system
Message-ID: <CAG++7iN8N7eiXL80-zJhDG4L7sAjjcc0RdQa3SE2nNqHBMP9yA@mail.gmail.com>

Hello list,

We know that CTDB uses lockfile in the cluster file system to prevent
split-brain.
It is a really good design when all nodes in the cluster can mount the
cluster file system (e.g. GPFS/GFS/GlusterFS) and CTDB can work happily in
this assumption.
However, when split-brain happens, the disconnected private network
violates this assumption usually.
For example, we have four nodes (A, B, C, D) in the cluster and GlusterFS
is the beckend.
GlusterFS and CTDB on all nodes communicate to each other via private
network and CTDB manages the public network.
If node A is disconnected in the private network, there will be group (A)
and group (B,C,D) in our cluster.
The election of recovery master will be triggered after the disconnected
determination of CTDB, i.e. the CTDB elects a new recovery master for each
group after 26 (KeepaliveInterval*KeepaliveLimits+1 by default) seconds.
Then node A will be the recovery master of group (A) and some node (e.g. B)
will be the recovery master of group (B,C,D).
Now, A and B will try to lock the lockfile but GlusterFS also communicates
to each other via private network.
A big problem arises since the lockfile can be locked or not depends on the
lock implementation and disconnected determination of GlusterFS (or other
cluster file system). In my knowledge, GlusterFS will determine some node
is disconnected after 42 seconds and release its lock. In this
configuration, node A and B will ban themselves and the newly elected
recovery master will ban itslef. It's a really bad thing and we can not
treat the cluster file system as a blackbox using the lockfile design.

Hence, I have an idea about the opportunity to build CTDB with split-brain
prevention without lockfile.
Using quorum concepts to ban a node might be an option and I do a little
modification of the CTDB source code.
The modification checks whether there are more than (nodemap->num)/2
connected nodes in main_loop of server/ctdb_recoverd.c.
If not, ban the node itslef and logs an error "Node %u in the group without
quorum".

In server/ctdb_recoverd.c:
static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
TALLOC_CTX *mem_ctx)
...
        /* count how many active nodes there are */
        rec->num_active    = 0;
        rec->num_connected = 0;
        for (i=0; i<nodemap->num; i++) {
                if (!(nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE)) {
                        rec->num_active++;
                }
                if (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED)) {
                        rec->num_connected++;
                }
        }

+       if (rec->num_connected < ((nodemap->num)/2+1)){
+               DEBUG(DEBUG_ERR, ("Node %u in the group without quorum\n",
pnn));
+               ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period);
+       }

This modification seems to provide a split-brain prevention without
lockfile in my tests(more than 3 nodes).
Does this modification cause any side-effect or is that a stupid design?
Please kindly answer me and I appreciate to receive new inputs from smart
people like you guys.

Thanks,
Az
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120524/098904d4/attachment.htm>

From rossnick-lists at cybercat.ca  Fri May 25 16:20:43 2012
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Fri, 25 May 2012 12:20:43 -0400
Subject: [Linux-cluster] rgmanager is jamed
Message-ID: <4FBFB15B.3070707@cybercat.ca>

I am in the process of upgrading one of our cluster from RHEL 6.1 to 
6.2. It's an 8-node cluster.

I started with one node. Stop all cluster resources, cman, rgmanager et 
al. yum update, reboot, move to next. The first one did ok.

On the second one, rgmanager started, but doesn't seem to connect to 
other nodes. I found this in dmesg :

INFO: task rgmanager:2901 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rgmanager     D 0000000000000000     0  2901   2900 0x00000080
  ffff880667299d48 0000000000000082 0000000000000000 ffff8806656aa318
  ffff88066729c378 0000000000000001 ffff880665bb31b0 00007fffc6c6fa20
  ffff88066635a678 ffff880667299fd8 000000000000f4e8 ffff88066635a678
Call Trace:
  [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180
  [<ffffffff814ee59b>] mutex_lock+0x2b/0x50
  [<ffffffffa02c192c>] dlm_new_lockspace+0x3c/0xa30 [dlm]
  [<ffffffff8115f74c>] ? __kmalloc+0x20c/0x220
  [<ffffffffa02ca94d>] device_write+0x30d/0x7d0 [dlm]
  [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
  [<ffffffff8120c646>] ? security_file_permission+0x16/0x20
  [<ffffffff81176918>] vfs_write+0xb8/0x1a0
  [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
  [<ffffffff81177321>] sys_write+0x51/0x90
  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced 
the node, same outcome.

Any hints ?



From ming-ming.chen at hp.com  Fri May 25 16:21:51 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Fri, 25 May 2012 16:21:51 +0000
Subject: [Linux-cluster] Where can I get clvm
In-Reply-To: <4FB15219.8010804@arlut.utexas.edu>
References: <mailman.31.1336665605.30503.linux-cluster@redhat.com>
	<4FB15219.8010804@arlut.utexas.edu>
Message-ID: <1D241511770E2F4BA89AFD224EDD52712A9ED60D@G9W0737.americas.hpqcorp.net>

I'm going to install and configure a CentOS 6.2 cluster. I need  CLVM  Is CLVM  included in  CentOS 6.2? If so, which package should I pick to install? If not, where can I get it?

Thanks

Ming






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120525/9300c6b0/attachment.htm>

From ming-ming.chen at hp.com  Fri May 25 16:27:53 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Fri, 25 May 2012 16:27:53 +0000
Subject: [Linux-cluster]   Where to get CLVM
In-Reply-To: <1D241511770E2F4BA89AFD224EDD527117B904A3@G9W0737.americas.hpqcorp.net>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net>
	<CAE7pJ3C_V4Qkrs44CMtjXMZLoVrwGOHQh9NmHu6NjwZgxs7Gaw@mail.gmail.com>
	<4F7FAF45.8070104@alteeve.ca>
	<1D241511770E2F4BA89AFD224EDD527117B904A3@G9W0737.americas.hpqcorp.net>
Message-ID: <1D241511770E2F4BA89AFD224EDD52712A9ED63F@G9W0737.americas.hpqcorp.net>

I'm going to install and configure a CentOS 6.2 cluster. I need  CLVM  Is CLVM  included in  CentOS 6.2? If so, which package should I pick to install? If not, where can I get it?
Thanks
Ming



From emi2fast at gmail.com  Fri May 25 16:31:48 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Fri, 25 May 2012 18:31:48 +0200
Subject: [Linux-cluster] Where can I get clvm
In-Reply-To: <1D241511770E2F4BA89AFD224EDD52712A9ED60D@G9W0737.americas.hpqcorp.net>
References: <mailman.31.1336665605.30503.linux-cluster@redhat.com>
	<4FB15219.8010804@arlut.utexas.edu>
	<1D241511770E2F4BA89AFD224EDD52712A9ED60D@G9W0737.americas.hpqcorp.net>
Message-ID: <CAE7pJ3DdVUxD5_c3vkV+A9DWTq2exx_2StxyvAMZVoGoNEmN1w@mail.gmail.com>

lvm2-cluster

2012/5/25 Chen, Ming Ming <ming-ming.chen at hp.com>

>  I?m going to install and configure a CentOS 6.2 cluster. I need  CLVM
>  Is CLVM  included in  CentOS 6.2? If so, which package should I pick to
> install? If not, where can I get it?****
>
> Thanks ****
>
> Ming****
>
> ** **
>
> ** **
>
>   ****
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120525/0c013777/attachment.htm>

From ming-ming.chen at hp.com  Fri May 25 17:53:31 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Fri, 25 May 2012 17:53:31 +0000
Subject: [Linux-cluster] Where can I get clvm
In-Reply-To: <CAE7pJ3DdVUxD5_c3vkV+A9DWTq2exx_2StxyvAMZVoGoNEmN1w@mail.gmail.com>
References: <mailman.31.1336665605.30503.linux-cluster@redhat.com>
	<4FB15219.8010804@arlut.utexas.edu>
	<1D241511770E2F4BA89AFD224EDD52712A9ED60D@G9W0737.americas.hpqcorp.net>
	<CAE7pJ3DdVUxD5_c3vkV+A9DWTq2exx_2StxyvAMZVoGoNEmN1w@mail.gmail.com>
Message-ID: <1D241511770E2F4BA89AFD224EDD52712A9ED6B6@G9W0737.americas.hpqcorp.net>

Thanks.

Ming





From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of emmanuel segura
Sent: Friday, May 25, 2012 9:32 AM
To: linux clustering
Subject: Re: [Linux-cluster] Where can I get clvm



lvm2-cluster

2012/5/25 Chen, Ming Ming <ming-ming.chen at hp.com<mailto:ming-ming.chen at hp.com>>

I'm going to install and configure a CentOS 6.2 cluster. I need  CLVM  Is CLVM  included in  CentOS 6.2? If so, which package should I pick to install? If not, where can I get it?

Thanks

Ming








--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
https://www.redhat.com/mailman/listinfo/linux-cluster




--
esta es mi vida e me la vivo hasta que dios quiera

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120525/d2b606f6/attachment.htm>

From rossnick-lists at cybercat.ca  Fri May 25 18:16:11 2012
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Fri, 25 May 2012 14:16:11 -0400
Subject: [Linux-cluster] rgmanager is jamed
In-Reply-To: <4FBFB15B.3070707@cybercat.ca>
References: <4FBFB15B.3070707@cybercat.ca>
Message-ID: <4FBFCC6B.5090102@cybercat.ca>

Nicolas Ross a ?crit :
> I am in the process of upgrading one of our cluster from RHEL 6.1 to 
> 6.2. It's an 8-node cluster.
>
> I started with one node. Stop all cluster resources, cman, rgmanager 
> et al. yum update, reboot, move to next. The first one did ok.
>
> On the second one, rgmanager started, but doesn't seem to connect to 
> other nodes. I found this in dmesg :
>
> INFO: task rgmanager:2901 blocked for more than 120 seconds.
>
(...)
>
> Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced 
> the node, same outcome.

I disabled cluster services with chkconfig, rebooter the whole 8 
servers, updated all, rebooted again, started cman and others, and then 
chkconfig'd back the cluster services.

All is working fine now.



From ming-ming.chen at hp.com  Fri May 25 22:05:44 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Fri, 25 May 2012 22:05:44 +0000
Subject: [Linux-cluster] Where can I get clvm
In-Reply-To: <CAE7pJ3DdVUxD5_c3vkV+A9DWTq2exx_2StxyvAMZVoGoNEmN1w@mail.gmail.com>
References: <mailman.31.1336665605.30503.linux-cluster@redhat.com>
	<4FB15219.8010804@arlut.utexas.edu>
	<1D241511770E2F4BA89AFD224EDD52712A9ED60D@G9W0737.americas.hpqcorp.net>
	<CAE7pJ3DdVUxD5_c3vkV+A9DWTq2exx_2StxyvAMZVoGoNEmN1w@mail.gmail.com>
Message-ID: <1D241511770E2F4BA89AFD224EDD52712A9ED7AF@G9W0737.americas.hpqcorp.net>

When I install the CentOS6.2, I picked the High Availability add-ons, and I cannot find the lvm2-cluster package? Please advice.

Thanks

Ming



From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of emmanuel segura
Sent: Friday, May 25, 2012 9:32 AM
To: linux clustering
Subject: Re: [Linux-cluster] Where can I get clvm



lvm2-cluster

2012/5/25 Chen, Ming Ming <ming-ming.chen at hp.com<mailto:ming-ming.chen at hp.com>>

I'm going to install and configure a CentOS 6.2 cluster. I need  CLVM  Is CLVM  included in  CentOS 6.2? If so, which package should I pick to install? If not, where can I get it?

Thanks

Ming








--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
https://www.redhat.com/mailman/listinfo/linux-cluster




--
esta es mi vida e me la vivo hasta que dios quiera

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120525/9067ab49/attachment.htm>

From raju.rajsand at gmail.com  Sat May 26 03:41:29 2012
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Sat, 26 May 2012 09:11:29 +0530
Subject: [Linux-cluster] Where to get CLVM
In-Reply-To: <1D241511770E2F4BA89AFD224EDD52712A9ED63F@G9W0737.americas.hpqcorp.net>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net>
	<CAE7pJ3C_V4Qkrs44CMtjXMZLoVrwGOHQh9NmHu6NjwZgxs7Gaw@mail.gmail.com>
	<4F7FAF45.8070104@alteeve.ca>
	<1D241511770E2F4BA89AFD224EDD527117B904A3@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD52712A9ED63F@G9W0737.americas.hpqcorp.net>
Message-ID: <CA+Ydgar7L_o3Et3_aT_4vERt8oPQ038kmMW8i_41ydv4XOzvEg@mail.gmail.com>

Greetings,

On Fri, May 25, 2012 at 9:57 PM, Chen, Ming Ming <ming-ming.chen at hp.com> wrote:
> I'm going to install and configure a CentOS 6.2 cluster. I need ?CLVM ?Is CLVM ?included in ?CentOS 6.2? If so, which package should I pick to install? If not, where can I get it?
> Thanks
> Ming

The cluster package should include CLVM.


-- 
Regards,

Rajagopal



From lists at alteeve.ca  Sat May 26 03:53:05 2012
From: lists at alteeve.ca (Digimer)
Date: Fri, 25 May 2012 23:53:05 -0400
Subject: [Linux-cluster] Where to get CLVM
In-Reply-To: <1D241511770E2F4BA89AFD224EDD52712A9ED63F@G9W0737.americas.hpqcorp.net>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net>
	<CAE7pJ3C_V4Qkrs44CMtjXMZLoVrwGOHQh9NmHu6NjwZgxs7Gaw@mail.gmail.com>
	<4F7FAF45.8070104@alteeve.ca>
	<1D241511770E2F4BA89AFD224EDD527117B904A3@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD52712A9ED63F@G9W0737.americas.hpqcorp.net>
Message-ID: <4FC053A1.8070407@alteeve.ca>

On 05/25/2012 12:27 PM, Chen, Ming Ming wrote:
> I'm going to install and configure a CentOS 6.2 cluster. I need  CLVM  Is CLVM  included in  CentOS 6.2? If so, which package should I pick to install? If not, where can I get it?
> Thanks
> Ming

CentOS is a binary compatible, community released version of Red Hat. So
whatever is available in Red Hat is available in CentOS, including clvmd.

This tutorial covers, among other things, how to install and configure
Clustered LVM. Be sure to configure fencing as without it, clvmd will
hang (by design) the first time a node fails and can't be fenced.

https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial

-- 
Digimer
Papers and Projects: https://alteeve.com



From fdinitto at redhat.com  Sat May 26 07:05:37 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Sat, 26 May 2012 09:05:37 +0200
Subject: [Linux-cluster] rgmanager is jamed
In-Reply-To: <4FBFB15B.3070707@cybercat.ca>
References: <4FBFB15B.3070707@cybercat.ca>
Message-ID: <4FC080C1.8000107@redhat.com>

On 05/25/2012 06:20 PM, Nicolas Ross wrote:
> I am in the process of upgrading one of our cluster from RHEL 6.1 to
> 6.2. It's an 8-node cluster.
> 
> I started with one node. Stop all cluster resources, cman, rgmanager et
> al. yum update, reboot, move to next. The first one did ok.
> 
> On the second one, rgmanager started, but doesn't seem to connect to
> other nodes. I found this in dmesg :
> 
> INFO: task rgmanager:2901 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> rgmanager     D 0000000000000000     0  2901   2900 0x00000080
>  ffff880667299d48 0000000000000082 0000000000000000 ffff8806656aa318
>  ffff88066729c378 0000000000000001 ffff880665bb31b0 00007fffc6c6fa20
>  ffff88066635a678 ffff880667299fd8 000000000000f4e8 ffff88066635a678
> Call Trace:
>  [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180
>  [<ffffffff814ee59b>] mutex_lock+0x2b/0x50
>  [<ffffffffa02c192c>] dlm_new_lockspace+0x3c/0xa30 [dlm]
>  [<ffffffff8115f74c>] ? __kmalloc+0x20c/0x220
>  [<ffffffffa02ca94d>] device_write+0x30d/0x7d0 [dlm]
>  [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
>  [<ffffffff8120c646>] ? security_file_permission+0x16/0x20
>  [<ffffffff81176918>] vfs_write+0xb8/0x1a0
>  [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
>  [<ffffffff81177321>] sys_write+0x51/0x90
>  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
> 
> Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced
> the node, same outcome.
> 
> Any hints ?

This looks like a kernel dlm problem. I can see you found a workaround,
but that should not be necessary since upgrades between releases should
work.

can you please file a ticket with GSS and escalate it? Might be a good
idea to grab sosreports before those logs are flushed away in rotate.

Thanks
Fabio



From expertalert at gmail.com  Sat May 26 18:12:34 2012
From: expertalert at gmail.com (fosiul alam)
Date: Sat, 26 May 2012 19:12:34 +0100
Subject: [Linux-cluster] Connection Reset when trying to brorwse luci web
	interface
Message-ID: <CAOf__1PkzsPbuELb8nw0OfUO3aQ3Fs2GhBpwhxLEan3vCx8+qw@mail.gmail.com>

Hi
I am trying cluster in my lab and I have 3 nodes.

in 1st node, i have installed luci as

yum install luci
then
luci_admin init

then service luci restart

Now when i am trying to browse the web interface
https://clstr1:8084/

from mozilla or internet explorer ,
its ask for Certificate but after that , its say: Connection reset  .
So the luci web page is not comming
Can any one tell me why ?

iptables is turned off and selinux is turned off.

Thanks for your help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120526/c45bb610/attachment.htm>

From lists at alteeve.ca  Sun May 27 22:02:46 2012
From: lists at alteeve.ca (Digimer)
Date: Sun, 27 May 2012 18:02:46 -0400
Subject: [Linux-cluster] Ideas on merging #linux-ha and #linux-cluster on
	freenode
Message-ID: <4FC2A486.7020105@alteeve.ca>

I'm not sure if this has come up before, but I thought it might be worth
discussing.

With the cluster stacks merging, it strikes me that having two separate
channels for effectively the same topic splits up folks. I know that
#linux-ha technically still supports Heartbeat, but other than that, I
see little difference between the two channels.

I suppose a similar argument could me made for the myriad of mailing
lists, too. I don't know if any of the lists really have significant
enough load to cause a problem if the lists were merged. Could
Linux-Cluster, Corosync and Pacemaker be merged?

Thoughts?

Digimer, hoping a hornets nest wasn't just opened. :)

-- 
Digimer
Papers and Projects: https://alteeve.com



From andrew at beekhof.net  Mon May 28 00:51:09 2012
From: andrew at beekhof.net (Andrew Beekhof)
Date: Mon, 28 May 2012 10:51:09 +1000
Subject: [Linux-cluster] [corosync] Ideas on merging #linux-ha and
	#linux-cluster on freenode
In-Reply-To: <4FC2A486.7020105@alteeve.ca>
References: <4FC2A486.7020105@alteeve.ca>
Message-ID: <CAEDLWG1g_TfFxTW9woNk6fMJ9c3H9RPW9yVbLBLGXVEERJru=Q@mail.gmail.com>

On Mon, May 28, 2012 at 8:02 AM, Digimer <lists at alteeve.ca> wrote:
> I'm not sure if this has come up before, but I thought it might be worth
> discussing.
>
> With the cluster stacks merging, it strikes me that having two separate
> channels for effectively the same topic splits up folks. I know that
> #linux-ha technically still supports Heartbeat, but other than that, I
> see little difference between the two channels.
>
> I suppose a similar argument could me made for the myriad of mailing
> lists, too. I don't know if any of the lists really have significant
> enough load to cause a problem if the lists were merged. Could
> Linux-Cluster, Corosync and Pacemaker be merged?
>
> Thoughts?
>
> Digimer, hoping a hornets nest wasn't just opened. :)
>

I think the only thing you missed was proposing a meta-project to rule
them all :-)



From lists at alteeve.ca  Mon May 28 01:11:43 2012
From: lists at alteeve.ca (Digimer)
Date: Sun, 27 May 2012 21:11:43 -0400
Subject: [Linux-cluster] [corosync] Ideas on merging #linux-ha and
	#linux-cluster on freenode
In-Reply-To: <CAEDLWG1g_TfFxTW9woNk6fMJ9c3H9RPW9yVbLBLGXVEERJru=Q@mail.gmail.com>
References: <4FC2A486.7020105@alteeve.ca>
	<CAEDLWG1g_TfFxTW9woNk6fMJ9c3H9RPW9yVbLBLGXVEERJru=Q@mail.gmail.com>
Message-ID: <4FC2D0CF.4010504@alteeve.ca>

On 05/27/2012 08:51 PM, Andrew Beekhof wrote:
> On Mon, May 28, 2012 at 8:02 AM, Digimer <lists at alteeve.ca> wrote:
>> I'm not sure if this has come up before, but I thought it might be worth
>> discussing.
>>
>> With the cluster stacks merging, it strikes me that having two separate
>> channels for effectively the same topic splits up folks. I know that
>> #linux-ha technically still supports Heartbeat, but other than that, I
>> see little difference between the two channels.
>>
>> I suppose a similar argument could me made for the myriad of mailing
>> lists, too. I don't know if any of the lists really have significant
>> enough load to cause a problem if the lists were merged. Could
>> Linux-Cluster, Corosync and Pacemaker be merged?
>>
>> Thoughts?
>>
>> Digimer, hoping a hornets nest wasn't just opened. :)
>>
> 
> I think the only thing you missed was proposing a meta-project to rule
> them all :-)

Let me dig around for that ring, I know it's somewhere...

Joking aside though; All the different lists and channels made sense
when there were different stacks and independent components. This is not
really the case anymore though, and will become all the more less so in
the future.

I often worry when I suggest someone go somewhere for help that the
right person who *could* have helped them is not in the given channel or
list. I think it would benefit the community to have one channel and one
list.

-- 
Digimer
Papers and Projects: https://alteeve.com



From andrew at beekhof.net  Mon May 28 01:24:04 2012
From: andrew at beekhof.net (Andrew Beekhof)
Date: Mon, 28 May 2012 11:24:04 +1000
Subject: [Linux-cluster] [corosync] Ideas on merging #linux-ha and
	#linux-cluster on freenode
In-Reply-To: <4FC2D0CF.4010504@alteeve.ca>
References: <4FC2A486.7020105@alteeve.ca>
	<CAEDLWG1g_TfFxTW9woNk6fMJ9c3H9RPW9yVbLBLGXVEERJru=Q@mail.gmail.com>
	<4FC2D0CF.4010504@alteeve.ca>
Message-ID: <CAEDLWG1yTWcKrjAim6K5LDvUG4dMrzX6PterxfYe5AXrf_hFuQ@mail.gmail.com>

On Mon, May 28, 2012 at 11:11 AM, Digimer <lists at alteeve.ca> wrote:
> On 05/27/2012 08:51 PM, Andrew Beekhof wrote:
>> On Mon, May 28, 2012 at 8:02 AM, Digimer <lists at alteeve.ca> wrote:
>>> I'm not sure if this has come up before, but I thought it might be worth
>>> discussing.
>>>
>>> With the cluster stacks merging, it strikes me that having two separate
>>> channels for effectively the same topic splits up folks. I know that
>>> #linux-ha technically still supports Heartbeat, but other than that, I
>>> see little difference between the two channels.
>>>
>>> I suppose a similar argument could me made for the myriad of mailing
>>> lists, too. I don't know if any of the lists really have significant
>>> enough load to cause a problem if the lists were merged. Could
>>> Linux-Cluster, Corosync and Pacemaker be merged?
>>>
>>> Thoughts?
>>>
>>> Digimer, hoping a hornets nest wasn't just opened. :)
>>>
>>
>> I think the only thing you missed was proposing a meta-project to rule
>> them all :-)
>
> Let me dig around for that ring, I know it's somewhere...
>
> Joking aside though; All the different lists and channels made sense
> when there were different stacks and independent components. This is not
> really the case anymore though, and will become all the more less so in
> the future.

For now.  There is no reason to think that no-one will ever write a
better X than Y.

>
> I often worry when I suggest someone go somewhere for help that the
> right person who *could* have helped them is not in the given channel or
> list. I think it would benefit the community to have one channel and one
> list.

I don't care about the distinctions as much as I used to.
But I think we're generally pretty good at suggesting alternate
lists/rooms if the known topic expert hangs out somewhere else.



From fdinitto at redhat.com  Mon May 28 06:42:00 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 28 May 2012 08:42:00 +0200
Subject: [Linux-cluster] Ideas on merging #linux-ha and #linux-cluster
 on freenode
In-Reply-To: <4FC2A486.7020105@alteeve.ca>
References: <4FC2A486.7020105@alteeve.ca>
Message-ID: <4FC31E38.1010005@redhat.com>

On 05/28/2012 12:02 AM, Digimer wrote:
> I'm not sure if this has come up before, but I thought it might be worth
> discussing.
> 
> With the cluster stacks merging, it strikes me that having two separate
> channels for effectively the same topic splits up folks. I know that
> #linux-ha technically still supports Heartbeat, but other than that, I
> see little difference between the two channels.
> 
> I suppose a similar argument could me made for the myriad of mailing
> lists, too. I don't know if any of the lists really have significant
> enough load to cause a problem if the lists were merged. Could
> Linux-Cluster, Corosync and Pacemaker be merged?
> 
> Thoughts?
> 
> Digimer, hoping a hornets nest wasn't just opened. :)
> 

So we already have the ha-wg and ha-wg-techincal mailing lists around on
the linux fundation servers that should serve as coordination between
projects (tho it appears we rarely use them).

We could use an IRC equivalent on freenode.. #ha-wg ? the channel is
free at moment.

I don't see single projects mailing lists or IRC channels disappearing
any time soon and it doesn't make sense to kill them all either.
Some lists will disappear in time as the projects will slowly become
obsoleted. The issue here is that we can't really force it. It has to be
a natural process. Look at cman for example. True we obsoleted it in the
new world, but effectively cman will not die till RHEL6 support ends in
several years from now.

Fabio



From tserong at suse.com  Mon May 28 11:22:43 2012
From: tserong at suse.com (Tim Serong)
Date: Mon, 28 May 2012 21:22:43 +1000
Subject: [Linux-cluster] [corosync] Ideas on merging #linux-ha and
	#linux-cluster on freenode
In-Reply-To: <CAEDLWG1g_TfFxTW9woNk6fMJ9c3H9RPW9yVbLBLGXVEERJru=Q@mail.gmail.com>
References: <4FC2A486.7020105@alteeve.ca>
	<CAEDLWG1g_TfFxTW9woNk6fMJ9c3H9RPW9yVbLBLGXVEERJru=Q@mail.gmail.com>
Message-ID: <4FC36003.30501@suse.com>

On 05/28/2012 10:51 AM, Andrew Beekhof wrote:
> On Mon, May 28, 2012 at 8:02 AM, Digimer <lists at alteeve.ca> wrote:
>> I'm not sure if this has come up before, but I thought it might be worth
>> discussing.
>>
>> With the cluster stacks merging, it strikes me that having two separate
>> channels for effectively the same topic splits up folks. I know that
>> #linux-ha technically still supports Heartbeat, but other than that, I
>> see little difference between the two channels.
>>
>> I suppose a similar argument could me made for the myriad of mailing
>> lists, too. I don't know if any of the lists really have significant
>> enough load to cause a problem if the lists were merged. Could
>> Linux-Cluster, Corosync and Pacemaker be merged?
>>
>> Thoughts?
>>
>> Digimer, hoping a hornets nest wasn't just opened. :)
>>
> 
> I think the only thing you missed was proposing a meta-project to rule
> them all :-)

  ...One Totem Ring to rule them all, one Totem Ring to find them...

If only Sauron had implemented RRP during the Second Age, things might
have turned out differently for Middle Earth.

SCNR,

Tim
-- 
Tim Serong
Senior Clustering Engineer
SUSE
tserong at suse.com



From lists at alteeve.ca  Mon May 28 14:55:41 2012
From: lists at alteeve.ca (Digimer)
Date: Mon, 28 May 2012 10:55:41 -0400
Subject: [Linux-cluster] Ideas on merging #linux-ha and #linux-cluster
 on freenode
In-Reply-To: <4FC31E38.1010005@redhat.com>
References: <4FC2A486.7020105@alteeve.ca> <4FC31E38.1010005@redhat.com>
Message-ID: <4FC391ED.7070306@alteeve.ca>

On 05/28/2012 02:42 AM, Fabio M. Di Nitto wrote:
> On 05/28/2012 12:02 AM, Digimer wrote:
>> I'm not sure if this has come up before, but I thought it might be worth
>> discussing.
>>
>> With the cluster stacks merging, it strikes me that having two separate
>> channels for effectively the same topic splits up folks. I know that
>> #linux-ha technically still supports Heartbeat, but other than that, I
>> see little difference between the two channels.
>>
>> I suppose a similar argument could me made for the myriad of mailing
>> lists, too. I don't know if any of the lists really have significant
>> enough load to cause a problem if the lists were merged. Could
>> Linux-Cluster, Corosync and Pacemaker be merged?
>>
>> Thoughts?
>>
>> Digimer, hoping a hornets nest wasn't just opened. :)
>>
> 
> So we already have the ha-wg and ha-wg-techincal mailing lists around on
> the linux fundation servers that should serve as coordination between
> projects (tho it appears we rarely use them).
> 
> We could use an IRC equivalent on freenode.. #ha-wg ? the channel is
> free at moment.
> 
> I don't see single projects mailing lists or IRC channels disappearing
> any time soon and it doesn't make sense to kill them all either.
> Some lists will disappear in time as the projects will slowly become
> obsoleted. The issue here is that we can't really force it. It has to be
> a natural process. Look at cman for example. True we obsoleted it in the
> new world, but effectively cman will not die till RHEL6 support ends in
> several years from now.
> 
> Fabio

My worry about a new list would be that it'd be just like a standard;

http://xkcd.com/927/

If there was to be a merger, I would think that choosing an existing one
would be best to help avoid this. "Linux-cluster" is pretty generic and
might fit.

I understand that devs working on project like having a dedicated list
for their project of interest. For this reason, I decided not to press
this any more.

My focus was from a user's perspective... A common place to send users
who are looking for help with any part of open-source clustering where
potential helpers can be found. Given the interconnected nature of the
cluster components, it's hard for users to know which component is
troubling them at first.

-- 
Digimer
Papers and Projects: https://alteeve.com



From fdinitto at redhat.com  Mon May 28 19:41:31 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 28 May 2012 21:41:31 +0200
Subject: [Linux-cluster] Ideas on merging #linux-ha and #linux-cluster
 on freenode
In-Reply-To: <4FC391ED.7070306@alteeve.ca>
References: <4FC2A486.7020105@alteeve.ca> <4FC31E38.1010005@redhat.com>
	<4FC391ED.7070306@alteeve.ca>
Message-ID: <4FC3D4EB.2000105@redhat.com>

On 05/28/2012 04:55 PM, Digimer wrote:
> On 05/28/2012 02:42 AM, Fabio M. Di Nitto wrote:
>> On 05/28/2012 12:02 AM, Digimer wrote:
>>> I'm not sure if this has come up before, but I thought it might be worth
>>> discussing.
>>>
>>> With the cluster stacks merging, it strikes me that having two separate
>>> channels for effectively the same topic splits up folks. I know that
>>> #linux-ha technically still supports Heartbeat, but other than that, I
>>> see little difference between the two channels.
>>>
>>> I suppose a similar argument could me made for the myriad of mailing
>>> lists, too. I don't know if any of the lists really have significant
>>> enough load to cause a problem if the lists were merged. Could
>>> Linux-Cluster, Corosync and Pacemaker be merged?
>>>
>>> Thoughts?
>>>
>>> Digimer, hoping a hornets nest wasn't just opened. :)
>>>
>>
>> So we already have the ha-wg and ha-wg-techincal mailing lists around on
>> the linux fundation servers that should serve as coordination between
>> projects (tho it appears we rarely use them).
>>
>> We could use an IRC equivalent on freenode.. #ha-wg ? the channel is
>> free at moment.
>>
>> I don't see single projects mailing lists or IRC channels disappearing
>> any time soon and it doesn't make sense to kill them all either.
>> Some lists will disappear in time as the projects will slowly become
>> obsoleted. The issue here is that we can't really force it. It has to be
>> a natural process. Look at cman for example. True we obsoleted it in the
>> new world, but effectively cman will not die till RHEL6 support ends in
>> several years from now.
>>
>> Fabio
> 
> My worry about a new list would be that it'd be just like a standard

we already have those lists in place. we just don't use them a lot.

;
> 
> http://xkcd.com/927/
> 
> If there was to be a merger, I would think that choosing an existing one
> would be best to help avoid this. "Linux-cluster" is pretty generic and
> might fit.

I generally don't like to go into "politics" but that would be the first
point of friction. linuc-cluster, while i agree it sounds neutral, it is
associated with RHCS and other people are more religious about naming
that others.

> 
> I understand that devs working on project like having a dedicated list
> for their project of interest. For this reason, I decided not to press
> this any more.

The idea is not bad, don't get me wrong, I am not turning it down. Let's
find a neutral namespace (like ha-wg) and start directing all users of
the new stack there.

Per project mailing list needs to exist for legacy and they will slowly
fade away naturally. Some project will keep them alive for patch posting
others will do what they want.

> 
> My focus was from a user's perspective... A common place to send users
> who are looking for help with any part of open-source clustering where
> potential helpers can be found. Given the interconnected nature of the
> cluster components, it's hard for users to know which component is
> troubling them at first.
> 

Yup.. so far, the major players have always been crosslooking at
different mailing lists, so the problem is not that bad as it sounds,
but i still agree (as it was discussed before IIRC) a common "umbrella"
would help the final users.

Fabio



From lists at alteeve.ca  Mon May 28 20:25:21 2012
From: lists at alteeve.ca (Digimer)
Date: Mon, 28 May 2012 16:25:21 -0400
Subject: [Linux-cluster] Ideas on merging #linux-ha and #linux-cluster
 on freenode
In-Reply-To: <4FC3D4EB.2000105@redhat.com>
References: <4FC2A486.7020105@alteeve.ca> <4FC31E38.1010005@redhat.com>
	<4FC391ED.7070306@alteeve.ca> <4FC3D4EB.2000105@redhat.com>
Message-ID: <4FC3DF31.7010503@alteeve.ca>

On 05/28/2012 03:41 PM, Fabio M. Di Nitto wrote:
> On 05/28/2012 04:55 PM, Digimer wrote:
>> On 05/28/2012 02:42 AM, Fabio M. Di Nitto wrote:
>>> On 05/28/2012 12:02 AM, Digimer wrote:
>>>> I'm not sure if this has come up before, but I thought it might be worth
>>>> discussing.
>>>>
>>>> With the cluster stacks merging, it strikes me that having two separate
>>>> channels for effectively the same topic splits up folks. I know that
>>>> #linux-ha technically still supports Heartbeat, but other than that, I
>>>> see little difference between the two channels.
>>>>
>>>> I suppose a similar argument could me made for the myriad of mailing
>>>> lists, too. I don't know if any of the lists really have significant
>>>> enough load to cause a problem if the lists were merged. Could
>>>> Linux-Cluster, Corosync and Pacemaker be merged?
>>>>
>>>> Thoughts?
>>>>
>>>> Digimer, hoping a hornets nest wasn't just opened. :)
>>>>
>>>
>>> So we already have the ha-wg and ha-wg-techincal mailing lists around on
>>> the linux fundation servers that should serve as coordination between
>>> projects (tho it appears we rarely use them).
>>>
>>> We could use an IRC equivalent on freenode.. #ha-wg ? the channel is
>>> free at moment.
>>>
>>> I don't see single projects mailing lists or IRC channels disappearing
>>> any time soon and it doesn't make sense to kill them all either.
>>> Some lists will disappear in time as the projects will slowly become
>>> obsoleted. The issue here is that we can't really force it. It has to be
>>> a natural process. Look at cman for example. True we obsoleted it in the
>>> new world, but effectively cman will not die till RHEL6 support ends in
>>> several years from now.
>>>
>>> Fabio
>>
>> My worry about a new list would be that it'd be just like a standard
> 
> we already have those lists in place. we just don't use them a lot.
> 
> ;
>>
>> http://xkcd.com/927/
>>
>> If there was to be a merger, I would think that choosing an existing one
>> would be best to help avoid this. "Linux-cluster" is pretty generic and
>> might fit.
> 
> I generally don't like to go into "politics" but that would be the first
> point of friction. linuc-cluster, while i agree it sounds neutral, it is
> associated with RHCS and other people are more religious about naming
> that others.
> 
>>
>> I understand that devs working on project like having a dedicated list
>> for their project of interest. For this reason, I decided not to press
>> this any more.
> 
> The idea is not bad, don't get me wrong, I am not turning it down. Let's
> find a neutral namespace (like ha-wg) and start directing all users of
> the new stack there.
> 
> Per project mailing list needs to exist for legacy and they will slowly
> fade away naturally. Some project will keep them alive for patch posting
> others will do what they want.
> 
>>
>> My focus was from a user's perspective... A common place to send users
>> who are looking for help with any part of open-source clustering where
>> potential helpers can be found. Given the interconnected nature of the
>> cluster components, it's hard for users to know which component is
>> troubling them at first.
>>
> 
> Yup.. so far, the major players have always been crosslooking at
> different mailing lists, so the problem is not that bad as it sounds,
> but i still agree (as it was discussed before IIRC) a common "umbrella"
> would help the final users.
> 
> Fabio

Well then, I will un-abandon my position to not proceed.

I understand the name 'ha-wg', but I think it's not enough related to
clustering for people to easily connect it. I like that it focuses on
HA, rather than "clustering" which is an umbrella for both HA and HPC. I
like the word "cluster", as it's one of the primary terms users would
use to search, I would think.

If we must create a new, general purpose name (though I still argue for
"linux-cluster", politics aside), then we should take the opportunity to
choose a name that is user-friendly, easy to connect to open source
cluster.

This should also be encouraged to be a user-focused list, to help keep
the snr low for devs using their per-project lists, I would suggest.

How would something like:

* Open Clustering
* Open HA Cluster
* Other?

-- 
Digimer
Papers and Projects: https://alteeve.com



From lists at alteeve.ca  Mon May 28 20:27:40 2012
From: lists at alteeve.ca (Digimer)
Date: Mon, 28 May 2012 16:27:40 -0400
Subject: [Linux-cluster] Ideas on merging #linux-ha and #linux-cluster
 on freenode
In-Reply-To: <4FC3DF31.7010503@alteeve.ca>
References: <4FC2A486.7020105@alteeve.ca> <4FC31E38.1010005@redhat.com>
	<4FC391ED.7070306@alteeve.ca> <4FC3D4EB.2000105@redhat.com>
	<4FC3DF31.7010503@alteeve.ca>
Message-ID: <4FC3DFBC.1020406@alteeve.ca>

On 05/28/2012 04:25 PM, Digimer wrote:
> I understand the name 'ha-wg', but I think it's not enough related to
> clustering for people to easily connect it. I like that it focuses on
> HA, rather than "clustering" which is an umbrella for both HA and HPC. I
> like the word "cluster", as it's one of the primary terms users would
> use to search, I would think.
> 
> If we must create a new, general purpose name (though I still argue for
> "linux-cluster", politics aside), then we should take the opportunity to
> choose a name that is user-friendly, easy to connect to open source
> cluster.
> 
> This should also be encouraged to be a user-focused list, to help keep
> the snr low for devs using their per-project lists, I would suggest.
> 
> How would something like:
> 
> * Open Clustering
> * Open HA Cluster
> * Other?

As an aside;

#open-cluster fits nicely on IRC and it's free on freenode.

-- 
Digimer
Papers and Projects: https://alteeve.com



From akinoztopuz at yahoo.com  Tue May 29 06:54:26 2012
From: akinoztopuz at yahoo.com (=?iso-8859-1?Q?AKIN_=FFffffffffffd6ZTOPUZ?=)
Date: Mon, 28 May 2012 23:54:26 -0700 (PDT)
Subject: [Linux-cluster] help
Message-ID: <1338274466.1454.YahooMailNeo@web125801.mail.ne1.yahoo.com>

????Hello
?
I need configuration steps?about 2 nodes? rhel(5) cluster?with quorum disk .
?
?
my config?? like that? :
?
each node has 1 vote
quorum disk has 1 vote
cman expected vote is 3
quorum will be protected in 2 votes? so?when one node is down cluster will be up
?
?
whats other things should?I care about it? in cluster config??
?
?
?
thanks in advance 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120528/7604bebc/attachment.htm>

From rossnick-lists at cybercat.ca  Tue May 29 13:08:22 2012
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Tue, 29 May 2012 09:08:22 -0400
Subject: [Linux-cluster] rgmanager is jamed
In-Reply-To: <4FC080C1.8000107@redhat.com>
References: <4FBFB15B.3070707@cybercat.ca> <4FC080C1.8000107@redhat.com>
Message-ID: <4FC4CA46.60108@cybercat.ca>

Fabio M. Di Nitto a ?crit :
> This looks like a kernel dlm problem. I can see you found a 
> workaround, but that should not be necessary since upgrades between 
> releases should work. can you please file a ticket with GSS and 
> escalate it? Might be a good idea to grab sosreports before those logs 
> are flushed away in rotate. Thanks Fabio 
Thanks, will do.



From lists at alteeve.ca  Tue May 29 19:46:41 2012
From: lists at alteeve.ca (Digimer)
Date: Tue, 29 May 2012 15:46:41 -0400
Subject: [Linux-cluster] help
In-Reply-To: <1338274466.1454.YahooMailNeo@web125801.mail.ne1.yahoo.com>
References: <1338274466.1454.YahooMailNeo@web125801.mail.ne1.yahoo.com>
Message-ID: <4FC527A1.1020207@alteeve.ca>

On 05/29/2012 02:54 AM, AKIN ?ffffffffffd6ZTOPUZ wrote:
>     Hello
>  
> I need configuration steps about 2 nodes  rhel(5) cluster with quorum disk .
>  
> my config   like that  :
>  
> each node has 1 vote
> quorum disk has 1 vote
> cman expected vote is 3
> quorum will be protected in 2 votes  so when one node is down cluster
> will be up
>  
> whats other things should I care about it  in cluster config? 
>  
> thanks in advance

First off, I *strongly* recommend using RHEL6, not 5. There have been
many improvements, plus, EL6 will be supported longer.

What are you trying to do with your cluster? I suspect that a quorum
disk is not necessary, though if you have a SAN anyway, I wouldn't argue
against using it. Unless you need the heuristics though, it's probably
not needed. You can safely build a 2-node cluster, you just need to make
sure your fence devices work (which is needed, regardless).

I have a tutorial that shows how to implement a cluster using cman,
clvmd and other components. It also discusses the cluster components
from a high-level, which may help.

https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial

-- 
Digimer
Papers and Projects: https://alteeve.com



From akinoztopuz at yahoo.com  Wed May 30 07:45:25 2012
From: akinoztopuz at yahoo.com (=?utf-8?B?QUtJTiDDv2ZmZmZmZmZmZmZkNlpUT1BVWg==?=)
Date: Wed, 30 May 2012 00:45:25 -0700 (PDT)
Subject: [Linux-cluster] help
In-Reply-To: <4FC527A1.1020207@alteeve.ca>
References: <1338274466.1454.YahooMailNeo@web125801.mail.ne1.yahoo.com>
	<4FC527A1.1020207@alteeve.ca>
Message-ID: <1338363925.53944.YahooMailNeo@web125805.mail.ne1.yahoo.com>

Hello? Digimer
?
I am using qdisk with <quorumd interval="4" label="quorum" min_score="1" tko="15" votes="1" master_wins="1"/>
not heursitic.qdisk?? is suggested for avoidin split brain
?
in this configuration?my problem is?:? node1 is master and continuosly node2 killed by node1
its connection is over iscsi.
? th?nk it is timing problem 
doyou have any idea?
?
?
var/log/messages like that:
?
sapclsn2 clurgmgrd[8761]: <notice> Service service:sap is stopped 
openais[6755]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application?

openais[6755]: [SERV ] AIS Executive exiting (reason: CMAN kill requested, exiting).?
fenced[6788]: cluster is down, exiting

 

________________________________
 From: Digimer <lists at alteeve.ca>
To: AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>; linux clustering <linux-cluster at redhat.com> 
Sent: Tuesday, May 29, 2012 10:46 PM
Subject: Re: [Linux-cluster] help
  
On 05/29/2012 02:54 AM, AKIN ?ffffffffffd6ZTOPUZ wrote:
>? ?  Hello
>? 
> I need configuration steps about 2 nodes? rhel(5) cluster with quorum disk .
>? 
> my config?  like that? :
>? 
> each node has 1 vote
> quorum disk has 1 vote
> cman expected vote is 3
> quorum will be protected in 2 votes? so when one node is down cluster
> will be up
>? 
> whats other things should I care about it? in cluster config? 
>? 
> thanks in advance

First off, I *strongly* recommend using RHEL6, not 5. There have been
many improvements, plus, EL6 will be supported longer.

What are you trying to do with your cluster? I suspect that a quorum
disk is not necessary, though if you have a SAN anyway, I wouldn't argue
against using it. Unless you need the heuristics though, it's probably
not needed. You can safely build a 2-node cluster, you just need to make
sure your fence devices work (which is needed, regardless).

I have a tutorial that shows how to implement a cluster using cman,
clvmd and other components. It also discusses the cluster components
from a high-level, which may help.

https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial

-- 
Digimer
Papers and Projects: https://alteeve.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120530/a5777c1c/attachment.htm>

From emi2fast at gmail.com  Wed May 30 08:00:53 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Wed, 30 May 2012 10:00:53 +0200
Subject: [Linux-cluster] help
In-Reply-To: <1338363925.53944.YahooMailNeo@web125805.mail.ne1.yahoo.com>
References: <1338274466.1454.YahooMailNeo@web125801.mail.ne1.yahoo.com>
	<4FC527A1.1020207@alteeve.ca>
	<1338363925.53944.YahooMailNeo@web125805.mail.ne1.yahoo.com>
Message-ID: <CAE7pJ3BGMBfm6=fWzHq1WLz5LCNfUnqRKym0P9YxzF=TSUO4WA@mail.gmail.com>

Hello AKIN

Are you sure it's not a connection problem?

2012/5/30 AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>

> Hello  Digimer
>
> I am using qdisk with <quorumd interval="4" label="quorum" min_score="1"
> tko="15" votes="1" master_wins="1"/>
> not heursitic.qdisk   is suggested for avoidin split brain
>
> in this configuration my problem is :  node1 is master and continuosly
> node2 killed by node1
> its connection is over iscsi.
> ? th?nk it is timing problem
> doyou have any idea?
>
>
> var/log/messages like that:
>
> sapclsn2 clurgmgrd[8761]: <notice> Service service:sap is stopped
> openais[6755]: [CMAN ] cman killed by node 1 because we were killed by
> cman_tool or other application
>
> openais[6755]: [SERV ] AIS Executive exiting (reason: CMAN kill requested,
> exiting).
> fenced[6788]: cluster is down, exiting
>
>    *From:* Digimer <lists at alteeve.ca>
> *To:* AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>; linux clustering <
> linux-cluster at redhat.com>
> *Sent:* Tuesday, May 29, 2012 10:46 PM
> *Subject:* Re: [Linux-cluster] help
>
> On 05/29/2012 02:54 AM, AKIN ?ffffffffffd6ZTOPUZ wrote:
> >    Hello
> >
> > I need configuration steps about 2 nodes  rhel(5) cluster with quorum
> disk .
> >
> > my config  like that  :
> >
> > each node has 1 vote
> > quorum disk has 1 vote
> > cman expected vote is 3
> > quorum will be protected in 2 votes  so when one node is down cluster
> > will be up
> >
> > whats other things should I care about it  in cluster config?
> >
> > thanks in advance
>
> First off, I *strongly* recommend using RHEL6, not 5. There have been
> many improvements, plus, EL6 will be supported longer.
>
> What are you trying to do with your cluster? I suspect that a quorum
> disk is not necessary, though if you have a SAN anyway, I wouldn't argue
> against using it. Unless you need the heuristics though, it's probably
> not needed. You can safely build a 2-node cluster, you just need to make
> sure your fence devices work (which is needed, regardless).
>
> I have a tutorial that shows how to implement a cluster using cman,
> clvmd and other components. It also discusses the cluster components
> from a high-level, which may help.
>
> https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial
>
> --
> Digimer
> Papers and Projects: https://alteeve.com
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120530/1f5cd729/attachment.htm>

From akinoztopuz at yahoo.com  Wed May 30 08:21:21 2012
From: akinoztopuz at yahoo.com (=?utf-8?B?QUtJTiDDv2ZmZmZmZmZmZmZkNlpUT1BVWg==?=)
Date: Wed, 30 May 2012 01:21:21 -0700 (PDT)
Subject: [Linux-cluster] help
In-Reply-To: <CAE7pJ3BGMBfm6=fWzHq1WLz5LCNfUnqRKym0P9YxzF=TSUO4WA@mail.gmail.com>
References: <1338274466.1454.YahooMailNeo@web125801.mail.ne1.yahoo.com>
	<4FC527A1.1020207@alteeve.ca>
	<1338363925.53944.YahooMailNeo@web125805.mail.ne1.yahoo.com>
	<CAE7pJ3BGMBfm6=fWzHq1WLz5LCNfUnqRKym0P9YxzF=TSUO4WA@mail.gmail.com>
Message-ID: <1338366081.62673.YahooMailNeo@web125803.mail.ne1.yahoo.com>

Hello Emmanuel
?
nodes network is below:
?
-two nodes interconnect via cross cable
-1 interface for iscsi (logical volumes)
-1 interface for??public network
?
? checked again physicall cables?and not seen any problem?
 

________________________________
 From: emmanuel segura <emi2fast at gmail.com>
To: AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>; linux clustering <linux-cluster at redhat.com> 
Sent: Wednesday, May 30, 2012 11:00 AM
Subject: Re: [Linux-cluster] help
  

Hello AKIN

Are you sure it's not a connection problem?


2012/5/30 AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>

Hello? Digimer 
>?
>I am using qdisk with <quorumd interval="4" label="quorum" min_score="1" tko="15" votes="1" master_wins="1"/> 
>not heursitic.qdisk?? is suggested for avoidin split brain
>?
>in this configuration?my problem is?:? node1 is master and continuosly node2 killed by node1 
>its connection is over iscsi.
>? th?nk it is timing problem 
>doyou have any idea?
>?
>?
>var/log/messages like that: 
>?
>sapclsn2 clurgmgrd[8761]: <notice> Service service:sap is stopped 
>openais[6755]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application?
>
>openais[6755]: [SERV ] AIS Executive exiting (reason: CMAN kill requested, exiting).?
>fenced[6788]: cluster is down, exiting
>
>
> 
> From: Digimer <lists at alteeve.ca>
>To: AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>; linux clustering <linux-cluster at redhat.com> 
>Sent: Tuesday, May 29, 2012 10:46 PM
>Subject: Re: [Linux-cluster] help
>  
>On 05/29/2012 02:54 AM, AKIN ?ffffffffffd6ZTOPUZ wrote:
>>? ?  Hello
>>? 
>> I need configuration steps about 2 nodes? rhel(5) cluster with quorum disk .
>>? 
>> my config?  like that? :
>>? 
>> each node has 1 vote
>> quorum disk has 1 vote
>> cman expected vote is 3
>> quorum will be protected in 2 votes? so when one node is down cluster
>> will be up
>>? 
>> whats other things should I care about it? in cluster config? 
>>? 
>> thanks in advance
>
>First off, I *strongly* recommend using RHEL6, not 5. There have been
>many improvements, plus, EL6 will be supported longer.
>
>What are you trying to do with your cluster? I suspect that a quorum
>disk is not necessary, though if you have a SAN anyway, I wouldn't argue
>against using it. Unless you need the heuristics though, it's probably
>not needed. You
 can safely build a 2-node cluster, you just need to make
>sure your fence devices work (which is needed, regardless).
>
>I have a tutorial that shows how to implement a cluster using cman,
>clvmd and other components. It also discusses the cluster components
>from a high-level, which may help.
>
>https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial
>
>-- 
>Digimer
>Papers and Projects: https://alteeve.com
>
>
>  
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120530/3653fbef/attachment.htm>

From emi2fast at gmail.com  Wed May 30 08:28:53 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Wed, 30 May 2012 10:28:53 +0200
Subject: [Linux-cluster] help
In-Reply-To: <1338363925.53944.YahooMailNeo@web125805.mail.ne1.yahoo.com>
References: <1338274466.1454.YahooMailNeo@web125801.mail.ne1.yahoo.com>
	<4FC527A1.1020207@alteeve.ca>
	<1338363925.53944.YahooMailNeo@web125805.mail.ne1.yahoo.com>
Message-ID: <CAE7pJ3DJ2_+XVx2Rc_pqqoPLXT3mAbH2sRef45LXm7ij7i6T2Q@mail.gmail.com>

Hello AKIN

can you show me your full cluster config and your cluster log?

If you think it's a problem with quorum disk, you can monitor the operation
speed with this command

mkqdisk -d -L

2012/5/30 AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>

> Hello  Digimer
>
> I am using qdisk with <quorumd interval="4" label="quorum" min_score="1"
> tko="15" votes="1" master_wins="1"/>
> not heursitic.qdisk   is suggested for avoidin split brain
>
> in this configuration my problem is :  node1 is master and continuosly
> node2 killed by node1
> its connection is over iscsi.
> ? th?nk it is timing problem
> doyou have any idea?
>
>
> var/log/messages like that:
>
> sapclsn2 clurgmgrd[8761]: <notice> Service service:sap is stopped
> openais[6755]: [CMAN ] cman killed by node 1 because we were killed by
> cman_tool or other application
>
> openais[6755]: [SERV ] AIS Executive exiting (reason: CMAN kill requested,
> exiting).
> fenced[6788]: cluster is down, exiting
>
>    *From:* Digimer <lists at alteeve.ca>
> *To:* AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>; linux clustering <
> linux-cluster at redhat.com>
> *Sent:* Tuesday, May 29, 2012 10:46 PM
> *Subject:* Re: [Linux-cluster] help
>
> On 05/29/2012 02:54 AM, AKIN ?ffffffffffd6ZTOPUZ wrote:
> >    Hello
> >
> > I need configuration steps about 2 nodes  rhel(5) cluster with quorum
> disk .
> >
> > my config  like that  :
> >
> > each node has 1 vote
> > quorum disk has 1 vote
> > cman expected vote is 3
> > quorum will be protected in 2 votes  so when one node is down cluster
> > will be up
> >
> > whats other things should I care about it  in cluster config?
> >
> > thanks in advance
>
> First off, I *strongly* recommend using RHEL6, not 5. There have been
> many improvements, plus, EL6 will be supported longer.
>
> What are you trying to do with your cluster? I suspect that a quorum
> disk is not necessary, though if you have a SAN anyway, I wouldn't argue
> against using it. Unless you need the heuristics though, it's probably
> not needed. You can safely build a 2-node cluster, you just need to make
> sure your fence devices work (which is needed, regardless).
>
> I have a tutorial that shows how to implement a cluster using cman,
> clvmd and other components. It also discusses the cluster components
> from a high-level, which may help.
>
> https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial
>
> --
> Digimer
> Papers and Projects: https://alteeve.com
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120530/884c6b4d/attachment.htm>

From jbzornoza at sia.es  Wed May 30 09:01:43 2012
From: jbzornoza at sia.es (Jose Blas Zornoza)
Date: Wed, 30 May 2012 11:01:43 +0200
Subject: [Linux-cluster] help
In-Reply-To: <1338366081.62673.YahooMailNeo@web125803.mail.ne1.yahoo.com>
References: <1338274466.1454.YahooMailNeo@web125801.mail.ne1.yahoo.com><4FC527A1.1020207@alteeve.ca><1338363925.53944.YahooMailNeo@web125805.mail.ne1.yahoo.com><CAE7pJ3BGMBfm6=fWzHq1WLz5LCNfUnqRKym0P9YxzF=TSUO4WA@mail.gmail.com>
	<1338366081.62673.YahooMailNeo@web125803.mail.ne1.yahoo.com>
Message-ID: <2742FF9DED1DFD46AC1D104D545E2745155915AE@CORREO.sia.es>

Hi

 

Is the cman process alive? ( service cman status)

 

try to get more information with admin cluster commands:

-          cman_tool nodes

-          cman_tool status

-          clustat

-          clustat ?s sap

-           

Regards

 

De: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] En nombre de AKIN ?ffffffffffd6ZTOPUZ
Enviado el: mi?rcoles, 30 de mayo de 2012 10:21
Para: emmanuel segura; linux clustering
Asunto: Re: [Linux-cluster] help

 

Hello Emmanuel

 

nodes network is below:

 

-two nodes interconnect via cross cable

-1 interface for iscsi (logical volumes)

-1 interface for  public network

 

? checked again physicall cables and not seen any problem 

 

From: emmanuel segura <emi2fast at gmail.com>
To: AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>; linux clustering <linux-cluster at redhat.com> 
Sent: Wednesday, May 30, 2012 11:00 AM
Subject: Re: [Linux-cluster] help

 

Hello AKIN

Are you sure it's not a connection problem?

2012/5/30 AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>

Hello  Digimer

 

I am using qdisk with <quorumd interval="4" label="quorum" min_score="1" tko="15" votes="1" master_wins="1"/>

not heursitic.qdisk   is suggested for avoidin split brain

 

in this configuration my problem is :  node1 is master and continuosly node2 killed by node1

its connection is over iscsi.

? th?nk it is timing problem 

doyou have any idea?

 

 

var/log/messages like that:

 

sapclsn2 clurgmgrd[8761]: <notice> Service service:sap is stopped 
openais[6755]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application 

openais[6755]: [SERV ] AIS Executive exiting (reason: CMAN kill requested, exiting). 
fenced[6788]: cluster is down, exiting

 

From: Digimer <lists at alteeve.ca>
To: AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>; linux clustering <linux-cluster at redhat.com> 
Sent: Tuesday, May 29, 2012 10:46 PM
Subject: Re: [Linux-cluster] help


On 05/29/2012 02:54 AM, AKIN ?ffffffffffd6ZTOPUZ wrote:
>    Hello
>  
> I need configuration steps about 2 nodes  rhel(5) cluster with quorum disk .
>  
> my config  like that  :
>  
> each node has 1 vote
> quorum disk has 1 vote
> cman expected vote is 3
> quorum will be protected in 2 votes  so when one node is down cluster
> will be up
>  
> whats other things should I care about it  in cluster config? 
>  
> thanks in advance

First off, I *strongly* recommend using RHEL6, not 5. There have been
many improvements, plus, EL6 will be supported longer.

What are you trying to do with your cluster? I suspect that a quorum
disk is not necessary, though if you have a SAN anyway, I wouldn't argue
against using it. Unless you need the heuristics though, it's probably
not needed. You can safely build a 2-node cluster, you just need to make
sure your fence devices work (which is needed, regardless).

I have a tutorial that shows how to implement a cluster using cman,
clvmd and other components. It also discusses the cluster components
from a high-level, which may help.

https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial

-- 
Digimer
Papers and Projects: https://alteeve.com




--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




-- 
esta es mi vida e me la vivo hasta que dios quiera

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120530/11fecfc6/attachment.htm>

From akinoztopuz at yahoo.com  Wed May 30 11:07:51 2012
From: akinoztopuz at yahoo.com (=?utf-8?B?QUtJTiDDv2ZmZmZmZmZmZmZkNlpUT1BVWg==?=)
Date: Wed, 30 May 2012 04:07:51 -0700 (PDT)
Subject: [Linux-cluster] help
In-Reply-To: <CAE7pJ3DJ2_+XVx2Rc_pqqoPLXT3mAbH2sRef45LXm7ij7i6T2Q@mail.gmail.com>
References: <1338274466.1454.YahooMailNeo@web125801.mail.ne1.yahoo.com>
	<4FC527A1.1020207@alteeve.ca>
	<1338363925.53944.YahooMailNeo@web125805.mail.ne1.yahoo.com>
	<CAE7pJ3DJ2_+XVx2Rc_pqqoPLXT3mAbH2sRef45LXm7ij7i6T2Q@mail.gmail.com>
Message-ID: <1338376071.67200.YahooMailNeo@web125806.mail.ne1.yahoo.com>

Hello Emmanuel
?
config file here:
I copied only some parts of ?it
?
<?xml version="1.0"?>
<cluster alias="testsapcluster" config_version="169" name="testsapcluster">
??????? <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
??????? <clusternodes>
??????????????? <clusternode name="sapclsn1.sedas.com" nodeid="1" votes="1">
??????????????????????? <fence>
??????????????????????????????? <method name="1">
??????????????????????????????????????? <device name="fence_node1"/>
??????????????????????????????? </method>
??????????????????????? </fence>
??????????????? </clusternode>
??????????????? <clusternode name="sapclsn2.sedas.com" nodeid="2" votes="1">
??????????????????????? <fence>
??????????????????????????????? <method name="1">
??????????????????????????????????????? <device name="fence_node2"/>
??????????????????????????????? </method>
??????????????????????? </fence>
??????????????? </clusternode>
??????? </clusternodes>
??????? <cman expected_votes="3" quorum_dev_poll="70000" two_node="0"/>
??????? <totem token="70000"/>
????????????????????? <quorumd interval="4" label="quorum" min_score="1" tko="15" votes="1" master_wins="1"/>
</cluster>
 

________________________________
 From: emmanuel segura <emi2fast at gmail.com>
To: AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>; linux clustering <linux-cluster at redhat.com> 
Sent: Wednesday, May 30, 2012 11:28 AM
Subject: Re: [Linux-cluster] help
  

Hello AKIN

can you show me your full cluster config and your cluster log?

If you think it's a problem with quorum disk, you can monitor the operation speed with this command

mkqdisk -d -L


2012/5/30 AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>

Hello? Digimer
>?
>I am using qdisk with <quorumd interval="4" label="quorum" min_score="1" tko="15" votes="1" master_wins="1"/> 
>not heursitic.qdisk?? is suggested for avoidin split brain
>?
>in this configuration?my problem is?:? node1 is master and continuosly node2 killed by node1 
>its connection is over iscsi.
>? th?nk it is timing problem 
>doyou have any idea?
>?
>?
>var/log/messages like that: 
>?
>sapclsn2 clurgmgrd[8761]: <notice> Service service:sap is stopped 
>openais[6755]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application?
>
>openais[6755]: [SERV ] AIS Executive exiting (reason: CMAN kill requested, exiting).?
>fenced[6788]: cluster is down, exiting
>
>
> 
> From: Digimer <lists at alteeve.ca>
>To: AKIN ?ffffffffffd6ZTOPUZ <akinoztopuz at yahoo.com>; linux clustering <linux-cluster at redhat.com> 
>Sent: Tuesday, May 29, 2012 10:46 PM
>Subject: Re: [Linux-cluster] help
>  
>On 05/29/2012 02:54 AM, AKIN ?ffffffffffd6ZTOPUZ wrote:
>>? ?  Hello
>>? 
>> I need configuration steps about 2 nodes? rhel(5) cluster with quorum disk .
>>? 
>> my config?  like that? :
>>? 
>> each node has 1 vote
>> quorum disk has 1 vote
>> cman expected vote is 3
>> quorum will be protected in 2 votes? so when one node is down cluster
>> will be up
>>? 
>> whats other things should I care about it? in cluster config? 
>>? 
>> thanks in advance
>
>First off, I *strongly* recommend using RHEL6, not 5. There have been
>many improvements, plus, EL6 will be supported longer.
>
>What are you trying to do with your cluster? I suspect that a quorum
>disk is not necessary, though if you have a SAN anyway, I wouldn't argue
>against using it. Unless you need the heuristics though, it's probably
>not needed. You
 can safely build a 2-node cluster, you just need to make
>sure your fence devices work (which is needed, regardless).
>
>I have a tutorial that shows how to implement a cluster using cman,
>clvmd and other components. It also discusses the cluster components
>from a high-level, which may help.
>
>https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial
>
>-- 
>Digimer
>Papers and Projects: https://alteeve.com
>
>
>  
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120530/b0317951/attachment.htm>

From ming-ming.chen at hp.com  Thu May 31 16:27:40 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Thu, 31 May 2012 16:27:40 +0000
Subject: [Linux-cluster] Help needed
In-Reply-To: <4FC053A1.8070407@alteeve.ca>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net>
	<CAE7pJ3C_V4Qkrs44CMtjXMZLoVrwGOHQh9NmHu6NjwZgxs7Gaw@mail.gmail.com>
	<4F7FAF45.8070104@alteeve.ca>
	<1D241511770E2F4BA89AFD224EDD527117B904A3@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD52712A9ED63F@G9W0737.americas.hpqcorp.net>
	<4FC053A1.8070407@alteeve.ca>
Message-ID: <1D241511770E2F4BA89AFD224EDD52712A9EE4F0@G9W0737.americas.hpqcorp.net>

 Hi, I have the following simple cluster config just to try out on SertOS 6.2

<?xml version="1.0"?>
<cluster config_version="2" name="vmcluster">
      <logging debug="on"/>
      <cman expected_votes="1" two_node="1"/>
      <clusternodes>
            <clusternode name="shr289.cup.hp.com" nodeid="1">
                  <fence>
                  </fence>
            </clusternode>
            <clusternode name="shr295.cup.hp.com" nodeid="2">
                  <fence>
                  </fence>
            </clusternode>
      </clusternodes>
      <fencedevices>
      </fencedevices>
      <rm>
      </rm>
</cluster>


And I got the following error message when I did "service cman start" I got the same messages on both nodes.
Any help will be appreciated.

May 31 09:08:04 corosync [TOTEM ] RRP multicast threshold (100 problem count)
May 31 09:08:05 shr295 corosync[3542]:   [MAIN  ] Completed service synchronizat
ion, ready to provide service.
May 31 09:08:05 shr295 corosync[3542]:   [TOTEM ] A processor joined or left the
 membership and a new membership was formed.
May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Unable to load new config in c
orosync: New configuration version has to be newer than current running configur
ation
May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Can't get updated config versi
on 4: New configuration version has to be newer than current running configurati
on#012.
May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Activity suspended on this nod
e
May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Error reloading the configurat
ion, will retry every second
May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Node 1 conflict, remote config
 version id=4, local=2
-- VISUAL BLOCK --r295 corosync[3542]:   [CMAN  ] Unable to load new config in c
orosync: New configuration version has to be newer than current running configur
ation
May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Can't get updated config versi
on 4: New configuration version has to be newer than current running configurati
on#012.
May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Activity suspended on this nod
E




From lists at alteeve.ca  Thu May 31 17:13:25 2012
From: lists at alteeve.ca (Digimer)
Date: Thu, 31 May 2012 13:13:25 -0400
Subject: [Linux-cluster] Help needed
In-Reply-To: <1D241511770E2F4BA89AFD224EDD52712A9EE4F0@G9W0737.americas.hpqcorp.net>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net>
	<CAE7pJ3C_V4Qkrs44CMtjXMZLoVrwGOHQh9NmHu6NjwZgxs7Gaw@mail.gmail.com>
	<4F7FAF45.8070104@alteeve.ca>
	<1D241511770E2F4BA89AFD224EDD527117B904A3@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD52712A9ED63F@G9W0737.americas.hpqcorp.net>
	<4FC053A1.8070407@alteeve.ca>
	<1D241511770E2F4BA89AFD224EDD52712A9EE4F0@G9W0737.americas.hpqcorp.net>
Message-ID: <4FC7A6B5.30305@alteeve.ca>

On 05/31/2012 12:27 PM, Chen, Ming Ming wrote:
>  Hi, I have the following simple cluster config just to try out on SertOS 6.2
> 
> <?xml version="1.0"?>
> <cluster config_version="2" name="vmcluster">
>       <logging debug="on"/>
>       <cman expected_votes="1" two_node="1"/>
>       <clusternodes>
>             <clusternode name="shr289.cup.hp.com" nodeid="1">
>                   <fence>
>                   </fence>
>             </clusternode>
>             <clusternode name="shr295.cup.hp.com" nodeid="2">
>                   <fence>
>                   </fence>
>             </clusternode>
>       </clusternodes>
>       <fencedevices>
>       </fencedevices>
>       <rm>
>       </rm>
> </cluster>
> 
> 
> And I got the following error message when I did "service cman start" I got the same messages on both nodes.
> Any help will be appreciated.
> 
> May 31 09:08:04 corosync [TOTEM ] RRP multicast threshold (100 problem count)
> May 31 09:08:05 shr295 corosync[3542]:   [MAIN  ] Completed service synchronizat
> ion, ready to provide service.
> May 31 09:08:05 shr295 corosync[3542]:   [TOTEM ] A processor joined or left the
>  membership and a new membership was formed.
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Unable to load new config in c
> orosync: New configuration version has to be newer than current running configur
> ation
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Can't get updated config versi
> on 4: New configuration version has to be newer than current running configurati
> on#012.
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Activity suspended on this nod
> e
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Error reloading the configurat
> ion, will retry every second
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Node 1 conflict, remote config
>  version id=4, local=2
> -- VISUAL BLOCK --r295 corosync[3542]:   [CMAN  ] Unable to load new config in c
> orosync: New configuration version has to be newer than current running configur
> ation
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Can't get updated config versi
> on 4: New configuration version has to be newer than current running configurati
> on#012.
> May 31 09:08:05 shr295 corosync[3542]:   [CMAN  ] Activity suspended on this nod
> E
> 

Run 'cman_tool version' to get the current version of the configuration,
then increase the config_version="x" to be one higher.

Also, configure fencing! If you don't, your cluster will hang the first
time anything goes wrong.

-- 
Digimer
Papers and Projects: https://alteeve.com



From cfeist at redhat.com  Thu May 31 23:40:19 2012
From: cfeist at redhat.com (Chris Feist)
Date: Thu, 31 May 2012 18:40:19 -0500
Subject: [Linux-cluster] Announce: pcs / pcs-gui (Pacemaker/Corosync
	Configuration System)
Message-ID: <4FC80163.4020500@redhat.com>

I'd like to announce the existence of the "Pacemaker/Corosync configuration 
system", PCS.

The emphasis in PCS differs somewhat from the existing shell:
- Configure the complete cluster (corosync plus pacemaker) from scratch
- Emphasis is on modification not display
- Avoid XML round-tripping
- Syntax won't be restricted to concepts from the underlying XML (which
   should make it easier to configure simple clusters)
- Provide the ability to remotely configure corosync, start/stop cluster and
   get status.

In addition, it will do much of the back-end work for a new GUI being developed, 
also by Red Hat (pcs-gui).

PCS will continue the tradition of having a regression test suite and 
discoverable 'ip'-like hierarchical "menu" structure, however unlike the shell 
we may end up not adding interactivity.

Both projects are far from complete, but so far PCS can:
- Create corosync/pacemaker clusters from scratch
- Add simple resources and add constraints
- Create/Remove resource groups
- Set most pacemaker configuration options
- Start/Stop pacemaker/corosync
- Get basic cluster status

I'm currently working on getting PCS fully functional with Fedora 17 (and it 
should work with other distributions based on corosync 2.0, pacemaker 1.1 and 
systemd).

I'm hoping to have a fairly complete version of PCS for the Fedora 17 release 
(or very shortly thereafter) and a functioning version of pcs-gui (which 
includes the ability to remotely start/stop nodes and set corosync config) by 
the Fedora 18 release.

The code for both projects is currently hosted on github 
(https://github.com/feist/pcs & https://github.com/feist/pcs-gui)

You can view a sample pcs session to get a preliminary view of how pcs will work 
  - https://gist.github.com/2697640

Comments and contributions are welcome.

Thanks!
Chris