From fdinitto at redhat.com  Sun Nov  2 08:49:00 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Sun, 2 Nov 2008 09:49:00 +0100 (CET)
Subject: [Linux-cluster] Fw: Building error in  Cluster 2.03.09
In-Reply-To: <0d9d01c93b49$a5a81fc0$a401a8c0@mainoffice.nodex.ru>
References: <0d9d01c93b49$a5a81fc0$a401a8c0@mainoffice.nodex.ru>
Message-ID: <Pine.LNX.4.64.0811020948420.2932@trider-g7>

On Fri, 31 Oct 2008, Pavel Kuzin wrote:

> Hello!
>
> I`m trying to build cluster 2.03.09 against linux 2.6.27.4.
>
> When building have a error:
>
> upgrade.o: In function `upgrade_device_archive':
> /root/newcluster/cluster-2.03.09/ccs/ccs_tool/upgrade.c:226: undefined 
> reference to `mkostemp'
> collect2: ld returned 1 exit status
> make[2]: *** [ccs_tool] Error 1
> make[2]: Leaving directory `/root/newcluster/cluster-2.03.09/ccs/ccs_tool'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `/root/newcluster/cluster-2.03.09/ccs'
> make: *** [ccs] Error 2
>
> node2:~/newcluster/cluster-2.03.09# uname -a
> Linux node2 2.6.27.4 #2 SMP Fri Oct 31 13:42:09 MSK 2008 i686 GNU/Linux
>
> Distro  - Debian Etch 
> Seems mkostemp is available since glibc 2.7.
> I have 2.6.
> Can "mkostemp" be changed to another similar function?

Probably, I'll have a look on monday.

Fabio

--
I'm going to make him an offer he can't refuse.



From fdinitto at redhat.com  Mon Nov  3 04:55:45 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 3 Nov 2008 05:55:45 +0100 (CET)
Subject: [Linux-cluster] Fw: Building error in  Cluster 2.03.09
In-Reply-To: <0d9d01c93b49$a5a81fc0$a401a8c0@mainoffice.nodex.ru>
References: <0d9d01c93b49$a5a81fc0$a401a8c0@mainoffice.nodex.ru>
Message-ID: <Pine.LNX.4.64.0811030554590.2932@trider-g7>

On Fri, 31 Oct 2008, Pavel Kuzin wrote:

> Hello!
>
> I`m trying to build cluster 2.03.09 against linux 2.6.27.4.
>
> When building have a error:
>
> upgrade.o: In function `upgrade_device_archive':
> /root/newcluster/cluster-2.03.09/ccs/ccs_tool/upgrade.c:226: undefined 
> reference to `mkostemp'
> collect2: ld returned 1 exit status
> make[2]: *** [ccs_tool] Error 1
> make[2]: Leaving directory `/root/newcluster/cluster-2.03.09/ccs/ccs_tool'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `/root/newcluster/cluster-2.03.09/ccs'
> make: *** [ccs] Error 2
>
> node2:~/newcluster/cluster-2.03.09# uname -a
> Linux node2 2.6.27.4 #2 SMP Fri Oct 31 13:42:09 MSK 2008 i686 GNU/Linux
>
> Distro  - Debian Etch 
> Seems mkostemp is available since glibc 2.7.
> I have 2.6.
> Can "mkostemp" be changed to another similar function?

You can apply this patch or wait for the next relese:

commit 6e8c492f8e8233bc5e295ae12322c40936279178
Author: Fabio M. Di Nitto <fdinitto at redhat.com>
Date:   Mon Nov 3 05:52:39 2008 +0100

     ccs: fix build with older glibc

     mkostemp has been introduced only in glibc 2.7. Switch to mkstemp.

     Signed-off-by: Fabio M. Di Nitto <fdinitto at redhat.com>
diff --git a/ccs/ccs_tool/upgrade.c b/ccs/ccs_tool/upgrade.c
index b7cecf0..6a0e150 100644
--- a/ccs/ccs_tool/upgrade.c
+++ b/ccs/ccs_tool/upgrade.c
@@ -223,7 +223,7 @@ static int upgrade_device_archive(char *location){
    memset(tmp_file, 0, 128);
    sprintf(tmp_file, "/tmp/ccs_tool_tmp_XXXXXX");

-  tmp_fd = mkostemp(tmp_file, O_RDWR | O_CREAT |O_TRUNC);
+  tmp_fd = mkstemp(tmp_file);
    if(tmp_fd < 0){
      fprintf(stderr, "Unable to create temporary archive: %s\n", strerror(errno));
      error = -errno;

--
I'm going to make him an offer he can't refuse.



From Harri.Paivaniemi at tietoenator.com  Mon Nov  3 06:26:30 2008
From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com)
Date: Mon, 3 Nov 2008 08:26:30 +0200
Subject: [Linux-cluster] Can clustered RHEL 5 use a SAN with
	differentaccess rights for different nodes in the cluster?
References: <C47D7762.128A%RichardW@iodynamix.com>
	<639ce0480806171423u4503665ewd7426080145309ea@mail.gmail.com>
Message-ID: <41E8D4F07FCE154CBEBAA60FFC92F67709FE33@apollo.eu.tieto.com>

Hi,

If you have ability (in your storage system) to export that disk also via NFS, it's the simpliest way... you could restrict access just via mount/export perms and no gfs needed... works perfectly in ftp-usage.

-hjp




2008/6/18 Richard Williams - IoDynamix <RichardW at iodynamix.com>:

> Please advise and/or redirect this posting if this is not the correct forum
> for my question - thanks.
>
> A company wants to use clustered rhel5 systems as inside/outside ftp
> servers. Users on the inside (LAN) cluster nodes can read and write to the
> SAN, while users on the outside (DMZ) cluster can only read.
>
> Is this application possible without GFS?
>
> If one node in the cluster fails, can the other node be provisioned to
> provide all services until recovery?
>
> Can a SAN be used as the "single" ftp location for both services (inside
> FTP
> & outside FTP?)
>
> Does the customer need more than four systems (i.e. 2 inside - 2 outside) -
> is a separate "command" system required?
>
>
> Have Dell's m1000e & 600 series blades been certified for this operating
> system?
>
> Is there any documentation available regarding separate access rights for
> multiple nodes in a cluster available?
>
> Thanks for your constructive reply.
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3799 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081103/850640a6/attachment.bin>

From tuckerd at engr.smu.edu  Mon Nov  3 15:56:27 2008
From: tuckerd at engr.smu.edu (Doug Tucker)
Date: Mon, 03 Nov 2008 09:56:27 -0600
Subject: [Linux-cluster] Data Loss / Files and Folders "2-Node_GFS-Cluster"
In-Reply-To: <490A01C1.5030003@gmail.com>
References: <2fd157df0810301037jf985e3bne5ca25e91dd74872@mail.gmail.com>
	<490A01C1.5030003@gmail.com>
Message-ID: <1225727787.8639.3.camel@thor.seas.smu.edu>



> I don't (or "didn't") have adequate involvements with RHEL5 GFS. I may 
> not know enough to response. However, users should be aware of ...
> 
> Before RHEL 5.1 and community version 2.6.22 kernels, NFS locks (i.e. 
> flock, posix lock, etc) is not populated into filesystem layer. It only 
> reaches Linux VFS layer (local to one particular server). If your file 
> access needs to get synchronized via either flock or posix locks 
> *between multiple hosts (i.e. NFS servers)*,  data loss could occur. 
> Newer versions of RHEL and 2.6.22-and-above kernels should have the code 
> to support this new feature.
> 
> There was an old write-up in section 4.1 of 
> "http://people.redhat.com/wcheng/Project/nfs.htm" about this issue.
> 
> -- Wendy
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Wendy,

To be clear, does this include RHEL 4.7, or is it specific to 5.x?



From s.wendy.cheng at gmail.com  Mon Nov  3 16:31:30 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Mon, 03 Nov 2008 11:31:30 -0500
Subject: [Linux-cluster] Data Loss / Files and Folders "2-Node_GFS-Cluster"
In-Reply-To: <1225727787.8639.3.camel@thor.seas.smu.edu>
References: <2fd157df0810301037jf985e3bne5ca25e91dd74872@mail.gmail.com>	<490A01C1.5030003@gmail.com>
	<1225727787.8639.3.camel@thor.seas.smu.edu>
Message-ID: <490F2762.80902@gmail.com>

Doug Tucker wrote:
>   
>> I don't (or "didn't") have adequate involvements with RHEL5 GFS. I may 
>> not know enough to response. However, users should be aware of ...
>>
>> Before RHEL 5.1 and community version 2.6.22 kernels, NFS locks (i.e. 
>> flock, posix lock, etc) is not populated into filesystem layer. It only 
>> reaches Linux VFS layer (local to one particular server). If your file 
>> access needs to get synchronized via either flock or posix locks 
>> *between multiple hosts (i.e. NFS servers)*,  data loss could occur. 
>> Newer versions of RHEL and 2.6.22-and-above kernels should have the code 
>> to support this new feature.
>>
>> There was an old write-up in section 4.1 of 
>> "http://people.redhat.com/wcheng/Project/nfs.htm" about this issue.
>>
>> -- Wendy
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>     
>
> Wendy,
>
> To be clear, does this include RHEL 4.7, or is it specific to 5.x?
>   

The changes were made on 2.6.22 kernel. I would think RHEL 4.7 has the 
same issue - but I'm not sure as I left Red Hat before 4.7 was released. 
Better to open a service ticket to Red Hat if you need the fix.

If applications are directly run on GFS nodes, instead of going thru NFS 
servers, posix locks and flocks should work *fine* across different 
nodes. The problem had existed in Linux NFS servers for years - no one 
seemed to complain about it until clusters started to get deployed more 
commonly.

-- Wendy



From a.holstvoogd at nedforce.nl  Tue Nov  4 09:45:36 2008
From: a.holstvoogd at nedforce.nl (Arthur Holstvoogd)
Date: Tue, 04 Nov 2008 10:45:36 +0100
Subject: [Linux-cluster] gfs2-utils 0.1.17 and kernel-2.6.18-120>
Message-ID: <491019C0.4000700@nedforce.nl>

Hi,
I'm using a beta kernel from http://people.redhat.com/dzickus/el5 
because of trouble with dlm which is solved in this version. This has 
broken some of the gfs2-utils tools, specifically gfs2_quota which still 
uses the old metafs.
I have two questions:

- is the utils 0.1.49 version available anywhere as a rpm orso? Or only 
from source?
- will other tools, like fsck, from the gfs2-utils 0.1.17 ( current 
version with centos 5.2) still work?

Regards,
Arthur




From jeder at invision.net  Tue Nov  4 16:56:47 2008
From: jeder at invision.net (Jeremy Eder)
Date: Tue, 4 Nov 2008 11:56:47 -0500
Subject: [Linux-cluster] cluster-snmp setup question
Message-ID: <B46E11DA5901E842822F71220F8A3B2B19CD08@inv-ent-exswing.ad.invision.net>

Hello,

I have a newly installed rhel4u7 32bit (2node) cluster.  I have
cluster-snmp-0.11.1-2.el4 installed, and I have added this to snmpd.conf
per instructions found in
/usr/share/doc/cluster-snmp-0.11.1/README.snmpd

--------
dlmod RedHatCluster     /usr/lib/cluster-snmp/libClusterMonitorSnmp.so
view    systemview      included        REDHAT-CLUSTER-MIB:RedHatCluster
--------


The problem is that the system only responds to snmpwalk on this:

root at db1: /etc/snmp # snmpwalk -v1 -c public localhost
.1.3.6.1.4.1.2312.8
SNMPv2-SMI::enterprises.2312.8.1.1.0 = INTEGER: 1

Any other OID silently completes with no output...
root at db1: /etc/snmp # snmpwalk -v1 -c public localhost
.1.3.6.1.4.1.2312.8.2.1.0
root at db1: /etc/snmp #

The 2nd snmpwalk should output the rhcClusterName as a STRING.

Note that REDHAT-MIB and REDHAT-CLUSTER-MIB are installed properly:
-rw-r--r--  1 root root 7957 Apr 14  2008
/usr/share/snmp/mibs/REDHAT-CLUSTER-MIB
-rw-r--r--  1 root root  772 Apr 14  2008
/usr/share/snmp/mibs/REDHAT-MIB


Did I miss some step in the setup (are there some services that need to
be started ?)
There are no hits for "snmp" on the cluster-wiki...


-- jer



From ffv at tjpr.jus.br  Tue Nov  4 19:12:38 2008
From: ffv at tjpr.jus.br (ffv at tjpr.jus.br)
Date: Tue, 04 Nov 2008 17:12:38 -0200
Subject: [Linux-cluster] GFS2 poor performance
Message-ID: <49109ea6.+nLiDwujAEL50VEe%ffv@tjpr.jus.br>

Hi all,
 
I?m getting a very poor performance using GFS2.
I have two qmail (mail) servers and one gfs2 filesystem shared by them.
In this case, each directory in GFS2 filesystem may have upon to 10000 files (mails)
 
The problem is in performance of some operations like ls, du, rm, etc
for example, 
 
# time du -sh /dados/teste
40M     /dados/teste
 
real    7m14.919s
user    0m0.008s
sys     0m0.129s

this is unacceptable
 
Some attributes i already set using gfs2_tool:
 
gfs2_tool settune /dados demote_secs 100
gfs2_tool setflag jdata /dados
gfs2_tool setflag sync /dados
gfs2_tool setflag directio /dados
 
but the performance is still very bad
 
 
Anybody know how to tune the filesystem for a acceptable performance working with directory with 10000 files?
thanks for any help



From rpeterso at redhat.com  Tue Nov  4 19:16:53 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 4 Nov 2008 14:16:53 -0500 (EST)
Subject: [Linux-cluster] GFS2 poor performance
In-Reply-To: <49109ea6.+nLiDwujAEL50VEe%ffv@tjpr.jus.br>
Message-ID: <1819458317.178811225826213442.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>

----- ffv at tjpr.jus.br wrote:
| Hi all,
|  
| I?m getting a very poor performance using GFS2.
| I have two qmail (mail) servers and one gfs2 filesystem shared by
| them.
| In this case, each directory in GFS2 filesystem may have upon to 10000
| files (mails)
|  
| The problem is in performance of some operations like ls, du, rm, etc
| for example, 
|  
| # time du -sh /dados/teste
| 40M     /dados/teste
|  
| real    7m14.919s
| user    0m0.008s
| sys     0m0.129s
| 
| this is unacceptable

What version of GFS2 and what kernel are you using?

Regards,

Bob Peterson
Red Hat Clustering & GFS



From jeff.sturm at eprize.com  Tue Nov  4 19:41:56 2008
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Tue, 4 Nov 2008 14:41:56 -0500
Subject: [Linux-cluster] GFS2 poor performance
In-Reply-To: <49109ea6.+nLiDwujAEL50VEe%ffv@tjpr.jus.br>
References: <49109ea6.+nLiDwujAEL50VEe%ffv@tjpr.jus.br>
Message-ID: <64D0546C5EBBD147B75DE133D798665F01806B7C@hugo.eprize.local>

What sort of network and storage device are you using?

Also, why set demote_secs so low? 

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of ffv at tjpr.jus.br
Sent: Tuesday, November 04, 2008 2:13 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] GFS2 poor performance

Hi all,
 
I?m getting a very poor performance using GFS2.
I have two qmail (mail) servers and one gfs2 filesystem shared by them.
In this case, each directory in GFS2 filesystem may have upon to 10000 files (mails)
 
The problem is in performance of some operations like ls, du, rm, etc for example, 
 
# time du -sh /dados/teste
40M     /dados/teste
 
real    7m14.919s
user    0m0.008s
sys     0m0.129s

this is unacceptable
 
Some attributes i already set using gfs2_tool:
 
gfs2_tool settune /dados demote_secs 100 gfs2_tool setflag jdata /dados gfs2_tool setflag sync /dados gfs2_tool setflag directio /dados
 
but the performance is still very bad
 
 
Anybody know how to tune the filesystem for a acceptable performance working with directory with 10000 files?
thanks for any help

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster





From ffv at tjpr.jus.br  Tue Nov  4 20:00:42 2008
From: ffv at tjpr.jus.br (Fabiano F. Vitale)
Date: Tue, 4 Nov 2008 18:00:42 -0200
Subject: [Linux-cluster] GFS2 poor performance
References: <1819458317.178811225826213442.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <007301c93eb8$04926f60$3e0a10ac@tjpr.net>

Hi all,

I use CentOS 5.2 and the kernel and gfs2 versions are:

[root at smtp01 ~]# uname -r
2.6.18-92.1.13.el5

[root at smtp01 ~]# gfs2_tool version
gfs2_tool 0.1.44 (built Jul  6 2008 10:58:08)
Copyright (C) Red Hat, Inc.  2004-2006  All rights reserved.

thanks for any help

> ----- ffv at tjpr.jus.br wrote:
> | Hi all,
> |
> | I?m getting a very poor performance using GFS2.
> | I have two qmail (mail) servers and one gfs2 filesystem shared by
> | them.
> | In this case, each directory in GFS2 filesystem may have upon to 10000
> | files (mails)
> |
> | The problem is in performance of some operations like ls, du, rm, etc
> | for example,
> |
> | # time du -sh /dados/teste
> | 40M     /dados/teste
> |
> | real    7m14.919s
> | user    0m0.008s
> | sys     0m0.129s
> |
> | this is unacceptable

> What version of GFS2 and what kernel are you using?





From ffv at tjpr.jus.br  Tue Nov  4 20:18:51 2008
From: ffv at tjpr.jus.br (Fabiano F. Vitale)
Date: Tue, 4 Nov 2008 18:18:51 -0200
Subject: [Linux-cluster] GFS2 poor performance
References: <49109ea6.+nLiDwujAEL50VEe%ffv@tjpr.jus.br>
	<64D0546C5EBBD147B75DE133D798665F01806B7C@hugo.eprize.local>
Message-ID: <007f01c93eba$8d731b20$3e0a10ac@tjpr.net>

Hi,

for cluster purpose the two nodes are linked by a  patch cord cat6 and the 
lan interfaces are gigabit.

All nodes have a Fibre Channel Emulex Corporation Zephyr-X LightPulse and 
the
Storage is a HP EVA8100

I read the document 
http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Summit08presentation_GFSBestPractices_Final.pdf
which show some parameters to tune and one of  them is  demote_secs, to 
adjust to 100sec

thanks

> What sort of network and storage device are you using?
>
> Also, why set demote_secs so low?
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of ffv at tjpr.jus.br
> Sent: Tuesday, November 04, 2008 2:13 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] GFS2 poor performance
>
> Hi all,
>
> I?m getting a very poor performance using GFS2.
> I have two qmail (mail) servers and one gfs2 filesystem shared by them.
> In this case, each directory in GFS2 filesystem may have upon to 10000 
> files (mails)
>
> The problem is in performance of some operations like ls, du, rm, etc for 
> example,
>
> # time du -sh /dados/teste
> 40M     /dados/teste
>
> real    7m14.919s
> user    0m0.008s
> sys     0m0.129s
>
> this is unacceptable
>
> Some attributes i already set using gfs2_tool:
>
> gfs2_tool settune /dados demote_secs 100 gfs2_tool setflag jdata /dados 
> gfs2_tool setflag sync /dados gfs2_tool setflag directio /dados
>
> but the performance is still very bad
>
>
> Anybody know how to tune the filesystem for a acceptable performance 
> working with directory with 10000 files?
> thanks for any help
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster 



From fdinitto at redhat.com  Wed Nov  5 13:36:56 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 5 Nov 2008 14:36:56 +0100 (CET)
Subject: [Linux-cluster] making init scripts distro agnostic
Message-ID: <Pine.LNX.4.64.0811051433300.30510@trider-g7>


One of the goals agreed at the Cluster Summit 2008 was to ship init 
scripts that did work on all distribution.

I just landed a first cut of this work into the initscripts branch in git:

http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=shortlog;h=refs/heads/initscripts

I tested those changes on 3 distributions and they work for me.

Unless I receive any major complain, those changes will land in master 
friday.

There is still some work that needs to be done to cleanup the scripts. I 
am perfectly aware of that, but this is one step forward.

Thanks
Fabio

--
I'm going to make him an offer he can't refuse.



From afahounko at gmail.com  Wed Nov  5 15:00:30 2008
From: afahounko at gmail.com (AFAHOUNKO Danny)
Date: Wed, 05 Nov 2008 15:00:30 +0000
Subject: [Linux-Cluster] RHCS + DRBD
Message-ID: <4911B50E.1080809@gmail.com>

Hi,
i'm deploying a cluster (two nodes without a share storage) with redhat 
cluster suite.
I want to synchronise data between two partitions.
I've started with rsync, but it's complicated.
 I've read that i can do it with DRBD.
I've started drbd installation and it work fine. But i don't know how to 
integrate it to my cluster ressources.
I mean, before mounting the device (/dev/drbd0 for example) on one node, 
the partition must be promote as master and as secondary on the second node.
Anyone have expierences in configuring drbd with RHCS ?!
thx.

-- 
Cordialement AFAHOUNKO Danny
Administrateur R?seaux & Syst?me d'Information - CICA-RE
Gsm: +228 914.55.89
Tel: +228 223.62.62



From tuckerd at engr.smu.edu  Wed Nov  5 15:51:52 2008
From: tuckerd at engr.smu.edu (Doug Tucker)
Date: Wed, 05 Nov 2008 09:51:52 -0600
Subject: [Linux-cluster] Data Loss / Files and Folders "2-Node_GFS-Cluster"
In-Reply-To: <490F2762.80902@gmail.com>
References: <2fd157df0810301037jf985e3bne5ca25e91dd74872@mail.gmail.com>
	<490A01C1.5030003@gmail.com>
	<1225727787.8639.3.camel@thor.seas.smu.edu>
	<490F2762.80902@gmail.com>
Message-ID: <1225900312.26904.6.camel@thor.seas.smu.edu>


> 
> The changes were made on 2.6.22 kernel. I would think RHEL 4.7 has the 
> same issue - but I'm not sure as I left Red Hat before 4.7 was released. 
> Better to open a service ticket to Red Hat if you need the fix.
> 
> If applications are directly run on GFS nodes, instead of going thru NFS 
> servers, posix locks and flocks should work *fine* across different 
> nodes. The problem had existed in Linux NFS servers for years - no one 
> seemed to complain about it until clusters started to get deployed more 
> commonly.
> 
> -- Wendy
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

That's always been tough for me to discern, as they stay with the same
base kernel "name" while actually moving the code forward.  4.7 has
kernel:  2.6.9-78.0.1.ELsmp .  Now how that translates to the "actual"
kernel number as 2.6.21, 22, etc, I never can figure out.



From chris at cmiware.com  Wed Nov  5 16:15:04 2008
From: chris at cmiware.com (Chris Harms)
Date: Wed, 05 Nov 2008 10:15:04 -0600
Subject: [Linux-Cluster] RHCS + DRBD
In-Reply-To: <4911B50E.1080809@gmail.com>
References: <4911B50E.1080809@gmail.com>
Message-ID: <4911C688.4060708@cmiware.com>

You need a script that does it and you add that to the service in RHCS.  
I cobbled one together from the linux-ha project with some customizations.


AFAHOUNKO Danny wrote:
> Hi,
> i'm deploying a cluster (two nodes without a share storage) with 
> redhat cluster suite.
> I want to synchronise data between two partitions.
> I've started with rsync, but it's complicated.
> I've read that i can do it with DRBD.
> I've started drbd installation and it work fine. But i don't know how 
> to integrate it to my cluster ressources.
> I mean, before mounting the device (/dev/drbd0 for example) on one 
> node, the partition must be promote as master and as secondary on the 
> second node.
> Anyone have expierences in configuring drbd with RHCS ?!
> thx.
>



From afahounko at gmail.com  Wed Nov  5 17:29:37 2008
From: afahounko at gmail.com (AFAHOUNKO Danny)
Date: Wed, 05 Nov 2008 17:29:37 +0000
Subject: [Linux-cluster] Re: RHCS and DRBD
In-Reply-To: <a785eb730811050920q5d27e99aw2e44c4bb98ff0fb7@mail.gmail.com>
References: <a785eb730811050920q5d27e99aw2e44c4bb98ff0fb7@mail.gmail.com>
Message-ID: <4911D801.7080101@gmail.com>

Hi !
thx ! i'll try it and will let you know ! ;)


Dani Filth a ?crit :
> Hi-
>
> I've had good luck with the script shown on this page:
> http://www.redhat.com/archives/linux-cluster/2006-July/msg00109.html
>
> Just replace NUM=0 with the number of your drbd resource.
>
> In my cluster.conf, I have a resource that calls that script, and 
> another resource to mount the filesystem.
>
> Good luck!


-- 
Cordialement AFAHOUNKO Danny
Administrateur R?seaux & Syst?me d'Information - CICA-RE
Gsm: +228 914.55.89
Tel: +228 223.62.62



From s.wendy.cheng at gmail.com  Wed Nov  5 18:06:38 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Wed, 05 Nov 2008 13:06:38 -0500
Subject: [Linux-cluster] Data Loss / Files and Folders "2-Node_GFS-Cluster"
In-Reply-To: <1225900312.26904.6.camel@thor.seas.smu.edu>
References: <2fd157df0810301037jf985e3bne5ca25e91dd74872@mail.gmail.com>	<490A01C1.5030003@gmail.com>	<1225727787.8639.3.camel@thor.seas.smu.edu>	<490F2762.80902@gmail.com>
	<1225900312.26904.6.camel@thor.seas.smu.edu>
Message-ID: <4911E0AE.20502@gmail.com>

Doug Tucker wrote:
>> The changes were made on 2.6.22 kernel. I would think RHEL 4.7 has the 
>> same issue - but I'm not sure as I left Red Hat before 4.7 was released. 
>> Better to open a service ticket to Red Hat if you need the fix.
>>
>> If applications are directly run on GFS nodes, instead of going thru NFS 
>> servers, posix locks and flocks should work *fine* across different 
>> nodes. The problem had existed in Linux NFS servers for years - no one 
>> seemed to complain about it until clusters started to get deployed more 
>> commonly.
>>
>> -- Wendy
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>     
>
> That's always been tough for me to discern, as they stay with the same
> base kernel "name" while actually moving the code forward.  4.7 has
> kernel:  2.6.9-78.0.1.ELsmp .  Now how that translates to the "actual"
> kernel number as 2.6.21, 22, etc, I never can figure out.
>   

You seem to assume, if the service ticket is approved, the fix would 
have to move the whole kernel from 2.6.9 into 2.6.22 ? That is a 
(surprising) mis-understanding.

As any bug fix with any operating system distribution, it could get done 
across different kernels, if it passes certain types of risk and 
resource review process(es). The code change has to be tailored into its 
own release framework - the actual implementation may look different but 
it should accomplish similar logic(s) to fix the identical problem.

Hopefully I interpret your comment right.

-- Wendy



From jerlyon at gmail.com  Wed Nov  5 21:00:40 2008
From: jerlyon at gmail.com (Jeremy Lyon)
Date: Wed, 5 Nov 2008 14:00:40 -0700
Subject: [Linux-cluster] Oracle locking on GFS
Message-ID: <779919740811051300j3ae82ccav98e129d3a824b3c8@mail.gmail.com>

Hi,

I was curious if anyone has run into a similar issue.

We have a 2 node cluster with a GFS file system running Oracle 10g (not
RAC).  The DB crashed and RHCS failed over the service as expected, but
Oracle couldn't start correctly because of an exiting lock on one of the
files.  We had to unmount the GFS from both nodes, then mount.  At that
point, Oracle start up correctly.

I'm assuming that the lock was in place during the crash and DLM was
honoring it and that's what cause Oracle to not start correctly.  Then by
unmounting DLM dropped all locks.

Any recommendations on how to avoid this?  Is it a good idea to set up the
GFS resource with a force unmount to avoid this scenario?

TIA
-Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081105/fdaa0dd4/attachment.htm>

From michael.osullivan at auckland.ac.nz  Thu Nov  6 04:43:26 2008
From: michael.osullivan at auckland.ac.nz (Michael O'Sullivan)
Date: Wed, 05 Nov 2008 21:43:26 -0700
Subject: [Linux-cluster] Distributed RAID
Message-ID: <491275EE.8050508@auckland.ac.nz>

Hi everyone,

I have just read that GFS on mdadm does not work because mdadm is not 
cluster aware. I really hoped to build a n + 1 RAID of the disks I have 
presented to the RHCS nodes via iSCSI. I had a look at DDRAID which is 
old and looks like it only supports 3, 5 and 9 disks in the distributed 
RAID. I currently only have two (multipathed) devices, but I want them 
to be active-active. If I put them into a mirrored logical volume in 
CLVM will this do the trick? Or will I have to install DRDB? Is there 
any more up-to-date distributed RAID options available for when I want 
to make a 2 + 1, 3 +1, etc storage array? There are some posts that say 
this may be available in CLVM soon or that mdadm may be cluster aware 
soon. Any progress on either of these options?

Any help on this would be great. Thanks, Mike



From fdinitto at redhat.com  Thu Nov  6 07:31:01 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 6 Nov 2008 08:31:01 +0100 (CET)
Subject: [Linux-Cluster] RHCS + DRBD
In-Reply-To: <4911C688.4060708@cmiware.com>
References: <4911B50E.1080809@gmail.com> <4911C688.4060708@cmiware.com>
Message-ID: <Pine.LNX.4.64.0811060829520.30510@trider-g7>


Hi,

all those kind of scripts can be really useful to a wider audience and 
could be easily added to the normal release process.

Would it be possible for you to share the script with a GPLv2+ licence and 
proper copyright attached to it?

Thanks
Fabio

On Wed, 5 Nov 2008, Chris Harms wrote:

> You need a script that does it and you add that to the service in RHCS.  I 
> cobbled one together from the linux-ha project with some customizations.
>
>
> AFAHOUNKO Danny wrote:
>> Hi,
>> i'm deploying a cluster (two nodes without a share storage) with redhat 
>> cluster suite.
>> I want to synchronise data between two partitions.
>> I've started with rsync, but it's complicated.
>> I've read that i can do it with DRBD.
>> I've started drbd installation and it work fine. But i don't know how to 
>> integrate it to my cluster ressources.
>> I mean, before mounting the device (/dev/drbd0 for example) on one node, 
>> the partition must be promote as master and as secondary on the second 
>> node.
>> Anyone have expierences in configuring drbd with RHCS ?!
>> thx.
>> 
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
I'm going to make him an offer he can't refuse.



From gordan at bobich.net  Thu Nov  6 09:38:02 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Thu, 06 Nov 2008 09:38:02 +0000
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <491275EE.8050508@auckland.ac.nz>
References: <491275EE.8050508@auckland.ac.nz>
Message-ID: <4912BAFA.2080901@bobich.net>

Michael O'Sullivan wrote:

> I have just read that GFS on mdadm does not work because mdadm is not 
> cluster aware. I really hoped to build a n + 1 RAID of the disks I have 
> presented to the RHCS nodes via iSCSI. I had a look at DDRAID which is 
> old and looks like it only supports 3, 5 and 9 disks in the distributed 
> RAID. I currently only have two (multipathed) devices, but I want them 
> to be active-active. If I put them into a mirrored logical volume in 
> CLVM will this do the trick? Or will I have to install DRDB? Is there 
> any more up-to-date distributed RAID options available for when I want 
> to make a 2 + 1, 3 +1, etc storage array? There are some posts that say 
> this may be available in CLVM soon or that mdadm may be cluster aware 
> soon. Any progress on either of these options?

You probably saw me asking these very same questions in the archives, 
without any response.

DDRAID is unmaintained, and IIRC the code was removed from the current 
development tree a while back. So don't count on it ever getting 
resurrected.

I rather doubt md will become cluster aware any time soon. CLVM doesn't 
yet support even more important features like snapshotting, so I 
wouldn't count on it supporting anything more advanced.

For straight mirroring (which is all you could sensibly do with 2 nodes 
anyway), I can highly recommend DRBD. It "just works" and works well. I 
have a number of 2-node clusters deployed with it with shared-root.

Of you really want to look into larger scale clustering with n+m 
redundancy, look into cleversafe.org. There's a thread I started on the 
forum there looking into exactly this sort of thing. I'll be testing it 
in the next month or so, when I get the hardware together, but it's 
looking plausible. It also provides proper n+m redundancy.

Another thing to note is that RAID5 is not really usable on today's big 
disks in arrays of more than 6. Think about the expected read failure 
rates on modern disks: 10^-14. That's about one uncorrectable error 
every 10TB. So if you have a 6x1TB disk array, and you lose a disk, you 
have to read 5TB of data to reconstruct onto a fresh disk. Since you get 
one uncorrectable error every 10TB, that means you have a 50/50 chance 
of another disk encountering an error and dropping out of the array, and 
losing all your data. These days higher RAID levels are really a 
necessity, not an optional extra, and at this rate, considering the read 
error rates have stayed constant while sizes have exploded, even RAID6 
won't last long.

Gordan



From hicheerup at gmail.com  Thu Nov  6 12:46:28 2008
From: hicheerup at gmail.com (lingu)
Date: Thu, 6 Nov 2008 18:16:28 +0530
Subject: [Linux-cluster] RHEL3 Cluster Heart Beat Using Cross Over Cable
Message-ID: <29e045b80811060446q5b868dfdxd3281608b04dca62@mail.gmail.com>

Hi,

 I am running two node active/passive  cluster running  RHEL3 update
8 64 bit  OS on Hp Box with external hp storage connected via scsi. My
cluster was running fine for  last 3 years.But all of a sudden cluster
service keep on shifting (atleast one time in a day )form one node to
another.

 After analysed the syslog i found that  due to some network
fluctuation service was getting shifted.Both the nodes has two NIC
bonded together and configured with  below ip.

My network details:

192.168.1.2 --node 1 physical ip  with  class c subnet (bond0 )
192.168.1.3 --node 2 physical ip  with class c subnet (bond0 )
192.168.1.4  --- floating ip  ( cluster )

 Since it is a very critical and busy server may be due to heavy
network load  some hear beat signal is getting missed  resulting in
shifting of service from one node to another.

So i planned to connect crossover cable for heart beat messages, can
any one guide me  or provide me the link that best explains  how to do
the same and the changes i have to made in cluster configuration file
after connecting the crossover cable.

Regards,

Lingu



From federico.simoncelli at gmail.com  Thu Nov  6 15:40:55 2008
From: federico.simoncelli at gmail.com (Federico Simoncelli)
Date: Thu, 6 Nov 2008 16:40:55 +0100
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <4912BAFA.2080901@bobich.net>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
Message-ID: <a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>

On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic <gordan at bobich.net> wrote:
> I rather doubt md will become cluster aware any time soon. CLVM doesn't yet
> support even more important features like snapshotting, so I wouldn't count
> on it supporting anything more advanced.

I worked a little on clvm snapshots:
https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html

Review and testing is required.
-- 
Federico.



From ricks at nerd.com  Thu Nov  6 17:38:11 2008
From: ricks at nerd.com (Rick Stevens)
Date: Thu, 06 Nov 2008 09:38:11 -0800
Subject: [Linux-cluster] RHEL3 Cluster Heart Beat Using Cross Over Cable
In-Reply-To: <29e045b80811060446q5b868dfdxd3281608b04dca62@mail.gmail.com>
References: <29e045b80811060446q5b868dfdxd3281608b04dca62@mail.gmail.com>
Message-ID: <49132B83.1010402@nerd.com>

lingu wrote:
> Hi,
> 
>  I am running two node active/passive  cluster running  RHEL3 update
> 8 64 bit  OS on Hp Box with external hp storage connected via scsi. My
> cluster was running fine for  last 3 years.But all of a sudden cluster
> service keep on shifting (atleast one time in a day )form one node to
> another.
> 
>  After analysed the syslog i found that  due to some network
> fluctuation service was getting shifted.Both the nodes has two NIC
> bonded together and configured with  below ip.
> 
> My network details:
> 
> 192.168.1.2 --node 1 physical ip  with  class c subnet (bond0 )
> 192.168.1.3 --node 2 physical ip  with class c subnet (bond0 )
> 192.168.1.4  --- floating ip  ( cluster )
> 
>  Since it is a very critical and busy server may be due to heavy
> network load  some hear beat signal is getting missed  resulting in
> shifting of service from one node to another.
> 
> So i planned to connect crossover cable for heart beat messages, can
> any one guide me  or provide me the link that best explains  how to do
> the same and the changes i have to made in cluster configuration file
> after connecting the crossover cable.

The crossover cable is pretty easy to make and a lot of places have
ones prebuilt.  If you want to make one yourself, you're interested in
the orange pair of wires (normally pins 1 and 2) and the green pair of
wires (normally pins 3 and 6).  The blue and brown pairs don't do
anyting in standard TIA-56B cables.  The wiring diagram is:

	End "A" (std)			End "B" (crossover)
	pin 1		Orange/White	pin 3
	pin 2		Orange		pin 6
	pin 3		Green/White	pin 1
	pin 4		Blue		pin 4
	pin 5		Blue/White	pin 5
	pin 6		Green		pin 2
	pin 7		Brown/White	pin 7
	pin 8		Brown		pin 8

Remember that the pins are numbered from the left, looking at the hole
the cable goes into with the latch on the bottom.  I generally put some
sort of rather blatant mark on any such cable such as a big piece of
tape or coloring the ends with a red marker so it's obvious that the
cable is "special".

To use it, just plug one end of the cable into the cluster NIC of the
first system and the other end into the cluster NIC of the second
system.  You should get link lights at both ends.

As far as any other changes, the only thing that may go a bit weird is
the ARP tables on the systems since you've removed the hub/switch from
the signal path and the ARP table may retain the old HW addresses.  I
don't think that'll be a problem.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer                      ricks at nerd.com -
- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
-                                                                    -
-  Tempt not the dragons of fate, since thou art crunchy and taste   -
-                         good with ketchup.                         -
----------------------------------------------------------------------



From jbrassow at redhat.com  Thu Nov  6 19:49:45 2008
From: jbrassow at redhat.com (Jonathan Brassow)
Date: Thu, 6 Nov 2008 13:49:45 -0600
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
Message-ID: <EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>

Cluster mirror (RAID1) will be available in rhel5.3 for LVM.

  brassow

On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:

> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic <gordan at bobich.net>  
> wrote:
>> I rather doubt md will become cluster aware any time soon. CLVM  
>> doesn't yet
>> support even more important features like snapshotting, so I  
>> wouldn't count
>> on it supporting anything more advanced.
>
> I worked a little on clvm snapshots:
> https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
>
> Review and testing is required.
> -- 
> Federico.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jeff.sturm at eprize.com  Thu Nov  6 19:53:30 2008
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Thu, 6 Nov 2008 14:53:30 -0500
Subject: [Linux-cluster] GFS2 poor performance
In-Reply-To: <007f01c93eba$8d731b20$3e0a10ac@tjpr.net>
References: <49109ea6.+nLiDwujAEL50VEe%ffv@tjpr.jus.br><64D0546C5EBBD147B75DE133D798665F01806B7C@hugo.eprize.local>
	<007f01c93eba$8d731b20$3e0a10ac@tjpr.net>
Message-ID: <64D0546C5EBBD147B75DE133D798665F01806C15@hugo.eprize.local>

I looked over the summit document you referenced below.  The value of demote_secs mentioned is an example setting, and unfortunately no recommendations or rationale accompany this.

For some access patterns you can get better performance by actually increasing demote_secs.  For example, we have a node that we routinely rsync a file tree onto using a GFS partition.  Increasing demote_secs from 300 to 86400 reduced the average rsync time by a factor of about 4.  The reason is that this node has little lock contention and needs to lock each file every time we start an rsync process.  With demote_secs=300, it was doing much more work to reacquire locks on each run.  Whereas demote_secs=86400 allowed the locks to persist up to a day, since the overall number of files in our application is bounded such that they will fit in buffer cache, together with locks.

At another extreme, we have an application that creates a lot of files but seldom opens them on the same node.  In this case there is no value in holding onto the locks, so we set demote_secs to a small value and glock_purge as high as 70 to ensure locks are quickly released in memory.

The best advice I can give in general is to experiment with different settings for demote_secs and glock_purge while watching the output of "gfs_tool counters" to see how they behave.

Jeff

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Fabiano F. Vitale
Sent: Tuesday, November 04, 2008 3:19 PM
To: linux clustering
Subject: Re: [Linux-cluster] GFS2 poor performance

Hi,

for cluster purpose the two nodes are linked by a  patch cord cat6 and the lan interfaces are gigabit.

All nodes have a Fibre Channel Emulex Corporation Zephyr-X LightPulse and the Storage is a HP EVA8100

I read the document
http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Summit08presentation_GFSBestPractices_Final.pdf
which show some parameters to tune and one of  them is  demote_secs, to adjust to 100sec

thanks

> What sort of network and storage device are you using?
>
> Also, why set demote_secs so low?
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of ffv at tjpr.jus.br
> Sent: Tuesday, November 04, 2008 2:13 PM
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] GFS2 poor performance
>
> Hi all,
>
> I?m getting a very poor performance using GFS2.
> I have two qmail (mail) servers and one gfs2 filesystem shared by them.
> In this case, each directory in GFS2 filesystem may have upon to 10000 
> files (mails)
>
> The problem is in performance of some operations like ls, du, rm, etc 
> for example,
>
> # time du -sh /dados/teste
> 40M     /dados/teste
>
> real    7m14.919s
> user    0m0.008s
> sys     0m0.129s
>
> this is unacceptable
>
> Some attributes i already set using gfs2_tool:
>
> gfs2_tool settune /dados demote_secs 100 gfs2_tool setflag jdata 
> /dados gfs2_tool setflag sync /dados gfs2_tool setflag directio /dados
>
> but the performance is still very bad
>
>
> Anybody know how to tune the filesystem for a acceptable performance 
> working with directory with 10000 files?
> thanks for any help
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster





From gordan at bobich.net  Thu Nov  6 20:40:39 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Thu, 06 Nov 2008 20:40:39 +0000
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
References: <491275EE.8050508@auckland.ac.nz>
	<4912BAFA.2080901@bobich.net>	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
Message-ID: <49135647.3020701@bobich.net>

What about CLVM based striping (RAID0)? Does that work already or is it 
planned for the near future?

Gordan

Jonathan Brassow wrote:
> Cluster mirror (RAID1) will be available in rhel5.3 for LVM.
> 
>  brassow
> 
> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> 
>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic <gordan at bobich.net> wrote:
>>> I rather doubt md will become cluster aware any time soon. CLVM 
>>> doesn't yet
>>> support even more important features like snapshotting, so I wouldn't 
>>> count
>>> on it supporting anything more advanced.
>>
>> I worked a little on clvm snapshots:
>> https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
>>
>> Review and testing is required.
>> -- 
>> Federico.
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From quickshiftin at gmail.com  Thu Nov  6 21:30:35 2008
From: quickshiftin at gmail.com (Nathan Nobbe)
Date: Thu, 6 Nov 2008 14:30:35 -0700
Subject: [Linux-cluster] RHEL3 Cluster Heart Beat Using Cross Over Cable
In-Reply-To: <49132B83.1010402@nerd.com>
References: <29e045b80811060446q5b868dfdxd3281608b04dca62@mail.gmail.com>
	<49132B83.1010402@nerd.com>
Message-ID: <7dd2dc0b0811061330x693bdd9eifc092bf14ed0f959@mail.gmail.com>

On Thu, Nov 6, 2008 at 10:38 AM, Rick Stevens <ricks at nerd.com> wrote:

> lingu wrote:
>
>> Hi,
>>
>>  I am running two node active/passive  cluster running  RHEL3 update
>> 8 64 bit  OS on Hp Box with external hp storage connected via scsi. My
>> cluster was running fine for  last 3 years.But all of a sudden cluster
>> service keep on shifting (atleast one time in a day )form one node to
>> another.
>>
>>  After analysed the syslog i found that  due to some network
>> fluctuation service was getting shifted.Both the nodes has two NIC
>> bonded together and configured with  below ip.
>>
>> My network details:
>>
>> 192.168.1.2 --node 1 physical ip  with  class c subnet (bond0 )
>> 192.168.1.3 --node 2 physical ip  with class c subnet (bond0 )
>> 192.168.1.4  --- floating ip  ( cluster )
>>
>>  Since it is a very critical and busy server may be due to heavy
>> network load  some hear beat signal is getting missed  resulting in
>> shifting of service from one node to another.
>>
>> So i planned to connect crossover cable for heart beat messages, can
>> any one guide me  or provide me the link that best explains  how to do
>> the same and the changes i have to made in cluster configuration file
>> after connecting the crossover cable.
>>
>
> The crossover cable is pretty easy to make and a lot of places have
> ones prebuilt.  If you want to make one yourself, you're interested in
> the orange pair of wires (normally pins 1 and 2) and the green pair of
> wires (normally pins 3 and 6).  The blue and brown pairs don't do
> anyting in standard TIA-56B cables.  The wiring diagram is:
>
>        End "A" (std)                   End "B" (crossover)
>        pin 1           Orange/White    pin 3
>        pin 2           Orange          pin 6
>        pin 3           Green/White     pin 1
>        pin 4           Blue            pin 4
>        pin 5           Blue/White      pin 5
>        pin 6           Green           pin 2
>        pin 7           Brown/White     pin 7
>        pin 8           Brown           pin 8
>
> Remember that the pins are numbered from the left, looking at the hole
> the cable goes into with the latch on the bottom.  I generally put some
> sort of rather blatant mark on any such cable such as a big piece of
> tape or coloring the ends with a red marker so it's obvious that the
> cable is "special".
>
> To use it, just plug one end of the cable into the cluster NIC of the
> first system and the other end into the cluster NIC of the second
> system.  You should get link lights at both ends.


many modern machines will work w/o a crossover cable.   ive got 2 dell 1650s
running heartbeat / drbd over a direct connection for heartbeat
communication.  i dont need to use a crossover on the 1650s for the direct
connection to work, and those boxes are pretty old by now.  so long story
short, probly worth saving a little hassle and just trying a regular cat-5
cable for the direct connection.

or if its a requirement for you hardware you can pick up a 3 foot crossover
at radio shack, bust buy etc, for less than 10 bucks.

-nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081106/aeb16683/attachment.htm>

From ricks at nerd.com  Thu Nov  6 22:03:53 2008
From: ricks at nerd.com (Rick Stevens)
Date: Thu, 06 Nov 2008 14:03:53 -0800
Subject: [Linux-cluster] RHEL3 Cluster Heart Beat Using Cross Over Cable
In-Reply-To: <7dd2dc0b0811061330x693bdd9eifc092bf14ed0f959@mail.gmail.com>
References: <29e045b80811060446q5b868dfdxd3281608b04dca62@mail.gmail.com>	<49132B83.1010402@nerd.com>
	<7dd2dc0b0811061330x693bdd9eifc092bf14ed0f959@mail.gmail.com>
Message-ID: <491369C9.5060602@nerd.com>

Nathan Nobbe wrote:
> On Thu, Nov 6, 2008 at 10:38 AM, Rick Stevens <ricks at nerd.com> wrote:
> 
>> lingu wrote:
>>
>>> Hi,
>>>
>>>  I am running two node active/passive  cluster running  RHEL3 update
>>> 8 64 bit  OS on Hp Box with external hp storage connected via scsi. My
>>> cluster was running fine for  last 3 years.But all of a sudden cluster
>>> service keep on shifting (atleast one time in a day )form one node to
>>> another.
>>>
>>>  After analysed the syslog i found that  due to some network
>>> fluctuation service was getting shifted.Both the nodes has two NIC
>>> bonded together and configured with  below ip.
>>>
>>> My network details:
>>>
>>> 192.168.1.2 --node 1 physical ip  with  class c subnet (bond0 )
>>> 192.168.1.3 --node 2 physical ip  with class c subnet (bond0 )
>>> 192.168.1.4  --- floating ip  ( cluster )
>>>
>>>  Since it is a very critical and busy server may be due to heavy
>>> network load  some hear beat signal is getting missed  resulting in
>>> shifting of service from one node to another.
>>>
>>> So i planned to connect crossover cable for heart beat messages, can
>>> any one guide me  or provide me the link that best explains  how to do
>>> the same and the changes i have to made in cluster configuration file
>>> after connecting the crossover cable.
>>>
>> The crossover cable is pretty easy to make and a lot of places have
>> ones prebuilt.  If you want to make one yourself, you're interested in
>> the orange pair of wires (normally pins 1 and 2) and the green pair of
>> wires (normally pins 3 and 6).  The blue and brown pairs don't do
>> anyting in standard TIA-56B cables.  The wiring diagram is:
>>
>>        End "A" (std)                   End "B" (crossover)
>>        pin 1           Orange/White    pin 3
>>        pin 2           Orange          pin 6
>>        pin 3           Green/White     pin 1
>>        pin 4           Blue            pin 4
>>        pin 5           Blue/White      pin 5
>>        pin 6           Green           pin 2
>>        pin 7           Brown/White     pin 7
>>        pin 8           Brown           pin 8
>>
>> Remember that the pins are numbered from the left, looking at the hole
>> the cable goes into with the latch on the bottom.  I generally put some
>> sort of rather blatant mark on any such cable such as a big piece of
>> tape or coloring the ends with a red marker so it's obvious that the
>> cable is "special".
>>
>> To use it, just plug one end of the cable into the cluster NIC of the
>> first system and the other end into the cluster NIC of the second
>> system.  You should get link lights at both ends.
> 
> 
> many modern machines will work w/o a crossover cable.   ive got 2 dell 1650s
> running heartbeat / drbd over a direct connection for heartbeat
> communication.  i dont need to use a crossover on the 1650s for the direct
> connection to work, and those boxes are pretty old by now.  so long story
> short, probly worth saving a little hassle and just trying a regular cat-5
> cable for the direct connection.
> 
> or if its a requirement for you hardware you can pick up a 3 foot crossover
> at radio shack, bust buy etc, for less than 10 bucks.

True.  Some NICs have autosense for MDI and MDIX cables (the technical
terms for straight and crossover, respectively), but a lot of them
don't.  Nathan's right, try a regular cable first.  If it doesn't work,
crossovers are available at lots of places quite cheaply.  They often
use red cable (the ones I've bought are red), but there are a lot of
straight cables that use red as well, so I'd still mark MDIX cables
very conspicuously.  A big tag that says "I'M A CROSSOVER" can't hurt!

My diagram above is valid if you really want to "roll your own".  I've
made so damned many CAT5/5e/6 cables in my life (MDI and MDIX both),
that I can do it almost in my sleep.  Ditto with thinnet (10Base-2) and
I'm a past master at putting parasite taps on thicknet (10Base-5)
cables, sticking on the transceivers and snaking that gawdawful AUI
cable down cable stud pockets.  I'm an original DECnet geek!
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer                      ricks at nerd.com -
- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
-                                                                    -
-      On a scale of 1 to 10 I'd say...  oh, somewhere in there.     -
----------------------------------------------------------------------



From jbrassow at redhat.com  Thu Nov  6 22:34:49 2008
From: jbrassow at redhat.com (Jonathan Brassow)
Date: Thu, 6 Nov 2008 16:34:49 -0600
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <49135647.3020701@bobich.net>
References: <491275EE.8050508@auckland.ac.nz>
	<4912BAFA.2080901@bobich.net>	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
Message-ID: <940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>

that works already.

single machine: linear, stripe, mirror, snapshot
cluster-aware: linear, stripe, mirror (5.3)

  brassow

On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:

> What about CLVM based striping (RAID0)? Does that work already or is  
> it planned for the near future?
>
> Gordan
>
> Jonathan Brassow wrote:
>> Cluster mirror (RAID1) will be available in rhel5.3 for LVM.
>> brassow
>> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
>>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic <gordan at bobich.net>  
>>> wrote:
>>>> I rather doubt md will become cluster aware any time soon. CLVM  
>>>> doesn't yet
>>>> support even more important features like snapshotting, so I  
>>>> wouldn't count
>>>> on it supporting anything more advanced.
>>>
>>> I worked a little on clvm snapshots:
>>> https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
>>>
>>> Review and testing is required.
>>> -- 
>>> Federico.
>>>
>>> -- 
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From hicheerup at gmail.com  Fri Nov  7 10:44:08 2008
From: hicheerup at gmail.com (lingu)
Date: Fri, 7 Nov 2008 16:14:08 +0530
Subject: [Linux-cluster] Cluster Broken pipe Error & node Reboot
Message-ID: <29e045b80811070244w142f0ceey62ea0a8ec65db714@mail.gmail.com>

Hi  all,

   I am running two node RHEL3U8  cluster of below cluster version on
HP servers connected  via scsi channel to HP Storage (SAN) for oracle
database server.

Kernel & Cluster Version

Kernel-2.4.21-47.EL #1 SMP
redhat-config-cluster-1.0.7-1-noarch
clumanager-1.2.26.1-1-x86_64


  Suddenly  my active node got rebooted after analysed the logs it is
throwing below errors on syslog.I want to know what might cause this
type of error and also after analysed the sar output indicates there
was no load on the server at the time system get rebooted as well as
on the time i am getting I/O Hang error.

Nov  3 14:23:00 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162:
Broken pipe
Nov  3 14:23:00 cluster1 clulockd[1996]: <err> select error: Broken pipe
Nov  3 14:23:06 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162:
Broken pipe
Nov  3 14:23:06 cluster1 clulockd[1996]: <err> select error: Broken pipe
Nov  3 14:23:13 cluster1 cluquorumd[1921]: <warning> Disk-TB: Detected
I/O Hang!
Nov  3 14:23:15 cluster1 clulockd[1996]: <warning> Denied 20.1.2.161:
Broken pipe
Nov  3 14:23:15 cluster1 clulockd[1996]: <err> select error: Broken pipe
Nov  3 14:23:12 cluster1 clusvcmgrd[2011]: <err> Unable to obtain
cluster lock: Connection timed out

Nov  5 17:18:00 cluster1 cluquorumd[1921]: <warning> Disk-TB: Detected
I/O Hang!
Nov  5 17:18:00 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162:
Broken pipe
Nov  5 17:18:00 cluster1 clulockd[1996]: <err> select error: Broken pipe
Nov  5 17:18:17 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162:
Broken pipe
Nov  5 17:18:17 cluster1 clulockd[1996]: <err> select error: Broken pipe
Nov  5 17:18:17 cluster1 clulockd[1996]: <warning> Potential recursive
lock #0 grant to member
 #1, PID1962


 I need some one help  in guiding how to fix out this error and also
the real cause for such above  errors .

Attached my cluster.xml file.



<?xml version="1.0"?>
<cluconfig version="3.0">
  <clumembd broadcast="yes" interval="1000000" loglevel="5"
multicast="no" multicast_ipaddress="" thread="yes" tko_count="25"/>
  <cluquorumd loglevel="7" pinginterval="5" tiebreaker_ip=""/>
  <clurmtabd loglevel="7" pollinterval="4"/>
  <clusvcmgrd loglevel="7"/>
  <clulockd loglevel="7"/>
  <cluster config_viewnumber="4"
key="6672bc0a71be2ec9486f6a2f5846c172" name="ORACLECLUSTER"/>
  <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1"
rawshadow="/dev/raw/raw2" type="raw"/>
  <members>
    <member id="0" name="cluster1" watchdog="yes"/>
    <member id="1" name="cluster2" watchdog="yes"/>
  </members>
  <services>
    <service checkinterval="10" failoverdomain="oracle_db" id="0"
maxfalsestarts="0" maxrestarts="0" name="database"
userscript="/etc/init.d/script_db.sh">
      <service_ipaddresses>
        <service_ipaddress broadcast="None" id="0"
ipaddress="20.1.2.35" monitor_link="1" netmask="255.255.0.0"/>
      </service_ipaddresses>
       <device id="0" name="/dev/cciss/c0d0p1" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol1"
options="rw"/>
      </device>
      <device id="1" name="/dev/cciss/c0d0p2" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol2"
options="rw"/>
      </device>
      <device id="2" name="/dev/cciss/c0d0p5" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol3"
options="rw"/>
      </device>

 </service>
  </services>
  <failoverdomains>
    <failoverdomain id="0" name="oracle_db" ordered="no" restricted="yes">
      <failoverdomainnode id="0" name="cluster1"/>
      <failoverdomainnode id="1" name="cluster2"/>
    </failoverdomain>
  </failoverdomains>
</cluconfig>

Regards,
Lingu



From mad at wol.de  Fri Nov  7 12:22:36 2008
From: mad at wol.de (Marc - A. Dahlhaus [ Administration | Westermann GmbH ])
Date: Fri, 07 Nov 2008 13:22:36 +0100
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
Message-ID: <1226060556.12833.4.camel@marc>

Hello,


will the changes to mirroring get merged into stable2 and head after
RHEL-5.3 release?


Marc

Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan Brassow:
> that works already.
> 
> single machine: linear, stripe, mirror, snapshot
> cluster-aware: linear, stripe, mirror (5.3)
> 
>   brassow
> 
> On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
> 
> > What about CLVM based striping (RAID0)? Does that work already or is  
> > it planned for the near future?
> >
> > Gordan
> >
> > Jonathan Brassow wrote:
> >> Cluster mirror (RAID1) will be available in rhel5.3 for LVM.
> >> brassow
> >> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> >>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic <gordan at bobich.net>  
> >>> wrote:
> >>>> I rather doubt md will become cluster aware any time soon. CLVM  
> >>>> doesn't yet
> >>>> support even more important features like snapshotting, so I  
> >>>> wouldn't count
> >>>> on it supporting anything more advanced.
> >>>
> >>> I worked a little on clvm snapshots:
> >>> https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
> >>>
> >>> Review and testing is required.
> >>> -- 
> >>> Federico.
> >>>
> >>> -- 
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >> -- 
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From pronix.service at gmail.com  Fri Nov  7 12:59:09 2008
From: pronix.service at gmail.com (pronix pronix)
Date: Fri, 7 Nov 2008 15:59:09 +0300
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <1226060556.12833.4.camel@marc>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
Message-ID: <639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>

can i use cluster raid1 if i get development release from
sources.redhat.com/cluster ?


2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann GmbH ] <
mad at wol.de>

> Hello,
>
>
> will the changes to mirroring get merged into stable2 and head after
> RHEL-5.3 release?
>
>
> Marc
>
> Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan Brassow:
> > that works already.
> >
> > single machine: linear, stripe, mirror, snapshot
> > cluster-aware: linear, stripe, mirror (5.3)
> >
> >   brassow
> >
> > On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
> >
> > > What about CLVM based striping (RAID0)? Does that work already or is
> > > it planned for the near future?
> > >
> > > Gordan
> > >
> > > Jonathan Brassow wrote:
> > >> Cluster mirror (RAID1) will be available in rhel5.3 for LVM.
> > >> brassow
> > >> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> > >>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic <gordan at bobich.net>
> > >>> wrote:
> > >>>> I rather doubt md will become cluster aware any time soon. CLVM
> > >>>> doesn't yet
> > >>>> support even more important features like snapshotting, so I
> > >>>> wouldn't count
> > >>>> on it supporting anything more advanced.
> > >>>
> > >>> I worked a little on clvm snapshots:
> > >>> https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
> > >>>
> > >>> Review and testing is required.
> > >>> --
> > >>> Federico.
> > >>>
> > >>> --
> > >>> Linux-cluster mailing list
> > >>> Linux-cluster at redhat.com
> > >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >> --
> > >> Linux-cluster mailing list
> > >> Linux-cluster at redhat.com
> > >> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081107/1bdd62a4/attachment.htm>

From david.costakos at gmail.com  Sat Nov  8 03:53:23 2008
From: david.costakos at gmail.com (Dave Costakos)
Date: Fri, 7 Nov 2008 19:53:23 -0800
Subject: [Linux-cluster] gfs2 convert hosed all VMs
Message-ID: <6b6836c60811071953m52d068cfpd0a204f8f4d1b99d@mail.gmail.com>

I just converted a shared file-backed Xen VM GFS filesystem to a GFS2
filesystem.  The conversion was successfully and all my files appear
intact.  I followed the GFS instructions by unmounting the filesystem on all
machines, running gfs_fsck, and gfs2_covert.

Since I converted the filesystem, all my file-backed Xen VMs can no longer
boot.  pygrub reports errors that the boot loader isn't returning any data.
If I edit the Xen config to boot of a kernel on the DomU, VMs still can't
start up because LVM cannot identify any volume groups.

If I try to access the VM device files locally via losetup and kpartx, I get
"read errors".

So what's the deal  I know GFS2 is a preview, but I have to assume I've
missed some crucial step here.

-- 
Dave Costakos
mailto:david.costakos at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081107/5df28f52/attachment.htm>

From fs at debian.org  Mon Nov 10 09:27:41 2008
From: fs at debian.org (Frederik =?iso-8859-1?q?Sch=FCler?=)
Date: Mon, 10 Nov 2008 10:27:41 +0100
Subject: [Linux-cluster] soft lockups on 2.6.27/2.03.09 with gfs1
Message-ID: <200811101027.47158.fs@debian.org>

Hello,

I am experiencing a rather big problem with deadlocks on a 9 nodes GFS1 
cluster, with vanilla 2.6.27 and both rhcs 2.03.09 and latest git stable2. 
Fencing is done via fabric, the node keeps throwing these errors after it got 
fenced.

This is a rather busy webserver cluster, with usually some dozens to hundreds 
of apache processes running concurrently, and 4 gfs1 shares with lots of 
small writes on the "template cache" volume from all 9 nodes. 

Lockups look like this:

[44955.425003] BUG: soft lockup - CPU#2 stuck for 61s! [apache:12639]
[44955.425007] Modules linked in: gfs ac battery ipv6 iptable_filter xt_tcpudp 
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables lock_dlm 
gfs2 dlm configfs snd_pcm snd_timer snd soundcore snd_page_alloc rtc_cmos 
rtc_core i2c_nforce2 k8temp shpchp rtc_lib pcspkr pci_hotplug i2c_core button 
evdev ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom 
amd74xx sd_mod ide_pci_generic ide_core floppy qla2xxx scsi_transport_fc 
3w_9xxx e1000e scsi_tgt ata_generic sata_nv forcedeth libata ehci_hcd 
scsi_mod dock ohci_hcd thermal processor fan thermal_sys
[44955.425007] irq event stamp: 0
[44955.425007] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[44955.425007] hardirqs last disabled at (0): [<ffffffff8023d7df>] 
copy_process+0x543/0x12b4
[44955.425007] softirqs last  enabled at (0): [<ffffffff8023d7df>] 
copy_process+0x543/0x12b4
[44955.425007] softirqs last disabled at (0): [<0000000000000000>] 0x0
[44955.425007] CPU 2:
[44955.425007] Modules linked in: gfs ac battery ipv6 iptable_filter xt_tcpudp 
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables lock_dlm 
gfs2 dlm configfs snd_pcm snd_timer snd soundcore snd_page_alloc rtc_cmos 
rtc_core i2c_nforce2 k8temp shpchp rtc_lib pcspkr pci_hotplug i2c_core button 
evdev ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom 
amd74xx sd_mod ide_pci_generic ide_core floppy qla2xxx scsi_transport_fc 
3w_9xxx e1000e scsi_tgt ata_generic sata_nv forcedeth libata ehci_hcd 
scsi_mod dock ohci_hcd thermal processor fan thermal_sys
[44955.425007] Pid: 12639, comm: apache Not tainted 2.6.27-2-amd64 #1
[44955.425007] RIP: 0010:[<ffffffff8021759b>]  [<ffffffff8021759b>] 
native_read_tsc+0x6/0x18
[44955.425007] RSP: 0018:ffff880214af9d80  EFLAGS: 00000202
[44955.425007] RAX: 0000000000000000 RBX: 00000000498fb129 RCX: 
ffffffff8085d300
[44955.425007] RDX: 000062bb00000000 RSI: 0000000001062560 RDI: 
0000000000000001
[44955.425007] RBP: 0000000000000002 R08: 0000000000000002 R09: 
0000000000000000
[44955.425007] R10: 0000000000000000 R11: ffffffff8033dd3e R12: 
ffff88041f0b0000
[44955.425007] R13: ffff8802abb76000 R14: ffff880214af8000 R15: 
ffffffff8085a890
[44955.425007] FS:  00007f3e8ea7d6d0(0000) GS:ffff88041f0c9940(0000) 
knlGS:0000000000000000
[44955.425007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44955.425007] CR2: 00007f3e8e9fc000 CR3: 0000000214adf000 CR4: 
00000000000006e0
[44955.425007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[44955.425007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[44955.425007] 
[44955.425007] Call Trace:
[44955.425007]  [<ffffffff8033dd53>] ? delay_tsc+0x15/0x45
[44955.425007]  [<ffffffff80341333>] ? _raw_spin_lock+0x98/0x100
[44955.425007]  [<ffffffff8045b3ce>] ? _spin_lock+0x4e/0x5a
[44955.425007]  [<ffffffff802c47dd>] ? igrab+0x10/0x36
[44955.425007]  [<ffffffff802c47dd>] ? igrab+0x10/0x36
[44955.425007]  [<ffffffffa0394971>] ? gfs_getattr+0x83/0xb7 [gfs]
[44955.425007]  [<ffffffff802b5846>] ? vfs_getattr+0x1a/0x5e
[44955.425007]  [<ffffffff802b59f6>] ? vfs_stat_fd+0x2f/0x43
[44955.425007]  [<ffffffff802b5a66>] ? sys_newstat+0x19/0x31
[44955.425007]  [<ffffffff8020ff7a>] ? system_call_fastpath+0x16/0x1b


Best regards
Frederik Sch?ler

-- 
ENOSIG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081110/991ad986/attachment.sig>

From swhiteho at redhat.com  Mon Nov 10 09:24:13 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 10 Nov 2008 09:24:13 +0000
Subject: [Linux-cluster] gfs2 convert hosed all VMs
In-Reply-To: <6b6836c60811071953m52d068cfpd0a204f8f4d1b99d@mail.gmail.com>
References: <6b6836c60811071953m52d068cfpd0a204f8f4d1b99d@mail.gmail.com>
Message-ID: <1226309053.25004.196.camel@quoit>

Hi,

You don't say what kernel version you are using. I'd suspect that maybe
its too old. Do you get any messages in syslog at all?

Steve.

On Fri, 2008-11-07 at 19:53 -0800, Dave Costakos wrote:
> 
> I just converted a shared file-backed Xen VM GFS filesystem to a GFS2
> filesystem.  The conversion was successfully and all my files appear
> intact.  I followed the GFS instructions by unmounting the filesystem
> on all machines, running gfs_fsck, and gfs2_covert.  
> 
> Since I converted the filesystem, all my file-backed Xen VMs can no
> longer boot.  pygrub reports errors that the boot loader isn't
> returning any data.  If I edit the Xen config to boot of a kernel on
> the DomU, VMs still can't start up because LVM cannot identify any
> volume groups.
> 
> If I try to access the VM device files locally via losetup and kpartx,
> I get "read errors".
> 
> So what's the deal  I know GFS2 is a preview, but I have to assume
> I've missed some crucial step here.
> 
> -- 
> Dave Costakos
> mailto:david.costakos at gmail.com
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From david.costakos at gmail.com  Mon Nov 10 17:00:08 2008
From: david.costakos at gmail.com (Dave Costakos)
Date: Mon, 10 Nov 2008 09:00:08 -0800
Subject: [Linux-cluster] gfs2 convert hosed all VMs
In-Reply-To: <1226309053.25004.196.camel@quoit>
References: <6b6836c60811071953m52d068cfpd0a204f8f4d1b99d@mail.gmail.com>
	<1226309053.25004.196.camel@quoit>
Message-ID: <6b6836c60811100900r1779220ft3d18c1ac9576dbf@mail.gmail.com>

I'm using the kernel from RHEL 5.2 release.

$ uname -rm
2.6.18-92.el5xen x86_64

I got no syslog messages, but when I ran gfs_fsck, it complained quite a
bit.  I presume it destroyed my files. Sadly, I don't have too much time to
reproduce the error since I need to restore 30 virtual machines from
backup.  Luckily, it's just our lab environment.

On Mon, Nov 10, 2008 at 1:24 AM, Steven Whitehouse <swhiteho at redhat.com>wrote:

> Hi,
>
> You don't say what kernel version you are using. I'd suspect that maybe
> its too old. Do you get any messages in syslog at all?
>
> Steve.
>
> On Fri, 2008-11-07 at 19:53 -0800, Dave Costakos wrote:
> >
> > I just converted a shared file-backed Xen VM GFS filesystem to a GFS2
> > filesystem.  The conversion was successfully and all my files appear
> > intact.  I followed the GFS instructions by unmounting the filesystem
> > on all machines, running gfs_fsck, and gfs2_covert.
> >
> > Since I converted the filesystem, all my file-backed Xen VMs can no
> > longer boot.  pygrub reports errors that the boot loader isn't
> > returning any data.  If I edit the Xen config to boot of a kernel on
> > the DomU, VMs still can't start up because LVM cannot identify any
> > volume groups.
> >
> > If I try to access the VM device files locally via losetup and kpartx,
> > I get "read errors".
> >
> > So what's the deal  I know GFS2 is a preview, but I have to assume
> > I've missed some crucial step here.
> >
> > --
> > Dave Costakos
> > mailto:david.costakos at gmail.com
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Dave Costakos
mailto:david.costakos at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081110/9f3cfadf/attachment.htm>

From swhiteho at redhat.com  Mon Nov 10 17:05:05 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 10 Nov 2008 17:05:05 +0000
Subject: [Linux-cluster] gfs2 convert hosed all VMs
In-Reply-To: <6b6836c60811100900r1779220ft3d18c1ac9576dbf@mail.gmail.com>
References: <6b6836c60811071953m52d068cfpd0a204f8f4d1b99d@mail.gmail.com>
	<1226309053.25004.196.camel@quoit>
	<6b6836c60811100900r1779220ft3d18c1ac9576dbf@mail.gmail.com>
Message-ID: <1226336705.25004.220.camel@quoit>

Hi,

On Mon, 2008-11-10 at 09:00 -0800, Dave Costakos wrote:
> I'm using the kernel from RHEL 5.2 release.
> 
> $ uname -rm
> 2.6.18-92.el5xen x86_64
> 
Well that is quite old, but even so I'm surprised that you had a problem
like that. What kind of messages did fsck spit out? I can't really debug
this without any information on what is going wrong,

Steve.

> I got no syslog messages, but when I ran gfs_fsck, it complained quite
> a bit.  I presume it destroyed my files. Sadly, I don't have too much
> time to reproduce the error since I need to restore 30 virtual
> machines from backup.  Luckily, it's just our lab environment.
> 
> On Mon, Nov 10, 2008 at 1:24 AM, Steven Whitehouse
> <swhiteho at redhat.com> wrote:
>         Hi,
>         
>         You don't say what kernel version you are using. I'd suspect
>         that maybe
>         its too old. Do you get any messages in syslog at all?
>         
>         Steve.
>         
>         
>         On Fri, 2008-11-07 at 19:53 -0800, Dave Costakos wrote:
>         >
>         > I just converted a shared file-backed Xen VM GFS filesystem
>         to a GFS2
>         > filesystem.  The conversion was successfully and all my
>         files appear
>         > intact.  I followed the GFS instructions by unmounting the
>         filesystem
>         > on all machines, running gfs_fsck, and gfs2_covert.
>         >
>         > Since I converted the filesystem, all my file-backed Xen VMs
>         can no
>         > longer boot.  pygrub reports errors that the boot loader
>         isn't
>         > returning any data.  If I edit the Xen config to boot of a
>         kernel on
>         > the DomU, VMs still can't start up because LVM cannot
>         identify any
>         > volume groups.
>         >
>         > If I try to access the VM device files locally via losetup
>         and kpartx,
>         > I get "read errors".
>         >
>         > So what's the deal  I know GFS2 is a preview, but I have to
>         assume
>         > I've missed some crucial step here.
>         >
>         > --
>         > Dave Costakos
>         > mailto:david.costakos at gmail.com
>         
>         > --
>         > Linux-cluster mailing list
>         > Linux-cluster at redhat.com
>         > https://www.redhat.com/mailman/listinfo/linux-cluster
>         
>         --
>         Linux-cluster mailing list
>         Linux-cluster at redhat.com
>         https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> -- 
> Dave Costakos
> mailto:david.costakos at gmail.com
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From ffv at tjpr.jus.br  Mon Nov 10 18:03:43 2008
From: ffv at tjpr.jus.br (Fabiano F. Vitale)
Date: Mon, 10 Nov 2008 16:03:43 -0200
Subject: [Linux-cluster] GFS2 poor performance
References: <49109ea6.+nLiDwujAEL50VEe%ffv@tjpr.jus.br><64D0546C5EBBD147B75DE133D798665F01806B7C@hugo.eprize.local><007f01c93eba$8d731b20$3e0a10ac@tjpr.net>
	<64D0546C5EBBD147B75DE133D798665F01806C15@hugo.eprize.local>
Message-ID: <000d01c9435e$abc63410$3e0a10ac@tjpr.net>

 Setting demote_secs to 30 and glock_purge to 70 in a gfs filesystem 
increased frightfully performance of commands like ls, df, in a directory 
that has many files.
But the gfs2 filesystem doesn't have the attribute glock_purge to tune.
Exists any attribute  in gfs2 in place of glock_purge which exists only in 
gfs1

thanks

Fabiano


----- Original Message ----- 
From: "Jeff Sturm" <jeff.sturm at eprize.com>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Thursday, November 06, 2008 5:53 PM
Subject: RE: [Linux-cluster] GFS2 poor performance


>I looked over the summit document you referenced below.  The value of 
>demote_secs mentioned is an example setting, and unfortunately no 
>recommendations or rationale accompany this.
>
> For some access patterns you can get better performance by actually 
> increasing demote_secs.  For example, we have a node that we routinely 
> rsync a file tree onto using a GFS partition.  Increasing demote_secs from 
> 300 to 86400 reduced the average rsync time by a factor of about 4.  The 
> reason is that this node has little lock contention and needs to lock each 
> file every time we start an rsync process.  With demote_secs=300, it was 
> doing much more work to reacquire locks on each run.  Whereas 
> demote_secs=86400 allowed the locks to persist up to a day, since the 
> overall number of files in our application is bounded such that they will 
> fit in buffer cache, together with locks.
>
> At another extreme, we have an application that creates a lot of files but 
> seldom opens them on the same node.  In this case there is no value in 
> holding onto the locks, so we set demote_secs to a small value and 
> glock_purge as high as 70 to ensure locks are quickly released in memory.
>
> The best advice I can give in general is to experiment with different 
> settings for demote_secs and glock_purge while watching the output of 
> "gfs_tool counters" to see how they behave.
>
> Jeff
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Fabiano F. Vitale
> Sent: Tuesday, November 04, 2008 3:19 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] GFS2 poor performance
>
> Hi,
>
> for cluster purpose the two nodes are linked by a  patch cord cat6 and the 
> lan interfaces are gigabit.
>
> All nodes have a Fibre Channel Emulex Corporation Zephyr-X LightPulse and 
> the Storage is a HP EVA8100
>
> I read the document
> http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Summit08presentation_GFSBestPractices_Final.pdf
> which show some parameters to tune and one of  them is  demote_secs, to 
> adjust to 100sec
>
> thanks
>
>> What sort of network and storage device are you using?
>>
>> Also, why set demote_secs so low?
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of ffv at tjpr.jus.br
>> Sent: Tuesday, November 04, 2008 2:13 PM
>> To: linux-cluster at redhat.com
>> Subject: [Linux-cluster] GFS2 poor performance
>>
>> Hi all,
>>
>> I?m getting a very poor performance using GFS2.
>> I have two qmail (mail) servers and one gfs2 filesystem shared by them.
>> In this case, each directory in GFS2 filesystem may have upon to 10000
>> files (mails)
>>
>> The problem is in performance of some operations like ls, du, rm, etc
>> for example,
>>
>> # time du -sh /dados/teste
>> 40M     /dados/teste
>>
>> real    7m14.919s
>> user    0m0.008s
>> sys     0m0.129s
>>
>> this is unacceptable
>>
>> Some attributes i already set using gfs2_tool:
>>
>> gfs2_tool settune /dados demote_secs 100 gfs2_tool setflag jdata
>> /dados gfs2_tool setflag sync /dados gfs2_tool setflag directio /dados
>>
>> but the performance is still very bad
>>
>>
>> Anybody know how to tune the filesystem for a acceptable performance
>> working with directory with 10000 files?
>> thanks for any help
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster 



From fdinitto at redhat.com  Mon Nov 10 19:09:10 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 10 Nov 2008 20:09:10 +0100
Subject: [Linux-cluster] logging: final call on configuration,
	output and implementation
Message-ID: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>

Hi all,

those logging threads have been going on for way too long. It's time to
close them and make a final decision. This is a long email, so please
take time to read it all.

This is a recap of what I believe a user would like to see:

- a consistent, easy and quick way to configure logging.
- a reasonable default if nothing is specified.
- a consistent, easy to read, output.

In order to avoid misinterpretation, "debug" priority means DEBUGLO as
defined here by David:
http://www.redhat.com/archives/cluster-devel/2008-November/msg00002.html

== Configuration ==

XML version:

<logging debug="on|off" (read note below about debugging)
         to_syslog="yes|no"
         syslog_facility="local4"
         syslog_priority="debug|info|notice|etc.."
         to_file="yes|no"
         logfile="/var/log/cluster/foo.log"
         log_priority="debug|info|notice|etc..">

  <logger_subsys subsys="NAME" debug=.. />
</logging>

corosync/openais equivalent:

logging {
        debug="on|off"; (read note below about debugging)

        to_syslog="yes|no";
        syslog_facility="local4";
        syslog_priority="debug|info|notice|etc..";

        to_file="yes|no";
        logfile="/var/log/cluster/foo.log";
        log_priority="debug|info|notice|etc..";

        logger_subsys {
                subsys="NAME";
                debug=..
        }
}

(the two configuration are equivalent)

The global configuration (<logging>/logging {) contains all the keywords
required to configure logging as discussed so far and would affects
every daemon or subsystems. The common user would generally stop here.

The logger_subsys can be used to do more detailed control over logging.
This is where more advance users or developers work.

Equivalent entries in logger_subsys override configurations in the
global section.

Use of environment variables and settings from the command line can
override any configuration. This is used by some people to enable
debugging via init scripts (for example /etc/sysconfig/cman or
equivalent for other distros).


== Defaults ==

- debug = off. Switch to "on" to enable execution of debugging code
(read note below) and set log_priority to debug.

- to_syslog = yes.
- syslog_facility = local4 (default to local4 to respect old behaviour).
- syslog_priority = info (default to respect old behaviour and it seems
to be a recent compromise at this point of development, doesn't flood
logs and has enough info).

- to_file = yes
- logfile = LOGDIR/[daemon|subsystem].log. By default each
daemon/subsystem should log in its own file (mostly valid for daemons as
corosync/openais and plugins all share the same file and logging
system).
- logpriority= info (same reason as above)

to_stderr will disappear from the config options. Set to off by default
as most daemons will fork in background, we can set it "ON"
automatically when not running in background (it's the only case where
it actually makes sense).


== Output ==

to_file:

echo $(date "+%b %d %T") subsystem_or_daemon: entry_to_log
Nov 10 19:46:40 subsystem_or_daemon: entry_to_log

It is important to note 2 things. First we need to have a time stamp to
be able to compare logs if daemons are logging to different files, then
we need to know what subsystem is logging if we are logging to the same
file for everything.

Since we don't want to create N combinations that will add/remove
date/daemon_name etc, this is one format can fit them all and it is the
same that would appear in syslog (go for consistency!).

to_syslog:

openlog("subsystem_or_daemon", options, facility);
syslog(priority, "%s", entry_to_log);

(consistent with to_file).

to_stderr:

fprintf(stderr, "%s\n", entry); <- this is really free format.


== Implementations requirements ==

- (strict requirement) must support the configuration options agreed by
everybody.

- (high priority) logging should never fail. For example, if we can't
open a log file we will not fail, we will try to warn by other meanings,
but nothing should block our operations if logging is not working.

- (wish list) logging should be non-blocking. Best if we can delegate
the logging work to something else than ourselves.


== Note about debugging ==

Debugging is an art. Every developer on this planet has a different view
of what debugging is. What we want with the debug="on|off" option is an
absolute quick way to set all the different log_priority="debug" on and
have a flag that can be used in the code to follow different paths.

I often find myself setting <logging debug="on"/> and be done with
everything. No need to remember fancy keywords or understand the whole
implementation details on what overrides what and how.

At some point in future we should probably talk about debugging in
general and what means for all of us but it's outside the scope of this
email.

Fabio



From teigland at redhat.com  Mon Nov 10 19:27:30 2008
From: teigland at redhat.com (David Teigland)
Date: Mon, 10 Nov 2008 13:27:30 -0600
Subject: [Linux-cluster] Re: [Cluster-devel] logging: final call on
	configuration, output and implementation
In-Reply-To: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
References: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
Message-ID: <20081110192730.GB17894@redhat.com>

On Mon, Nov 10, 2008 at 08:09:10PM +0100, Fabio M. Di Nitto wrote:
> Hi all,
> 
> those logging threads have been going on for way too long. It's time to
> close them and make a final decision. This is a long email, so please
> take time to read it all.
> 
> This is a recap of what I believe a user would like to see:
> 
> - a consistent, easy and quick way to configure logging.
> - a reasonable default if nothing is specified.
> - a consistent, easy to read, output.

I like this.  Two minor points regarding the actual terminology; I'd like
to be a little more consistent and identify some "keywords".  Right now
the word "log" is combined with other words in a bunch of ways (logging,
logger, logfile, syslog, log_foo).  How about:

. logging, logging_subsys (common keyword "logging")
  for the config file section tags

- to_syslog, syslog_facility, syslog_priority (common keyword "syslog")
  for every parameter related to syslog

- to_logfile, logfile, logfile_priority (common keyword "logfile")
  for every parameter related to logfile

And then we have some values that are "on/off" and others that are
"yes/no"; let's pick one.

Dave



From fdinitto at redhat.com  Mon Nov 10 19:48:05 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 10 Nov 2008 20:48:05 +0100
Subject: [Linux-cluster] Re: [Cluster-devel] logging: final call on
 configuration, output and implementation
In-Reply-To: <20081110192730.GB17894@redhat.com>
References: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
	<20081110192730.GB17894@redhat.com>
Message-ID: <1226346485.2445.68.camel@daitarn-fedora.int.fabbione.net>

On Mon, 2008-11-10 at 13:27 -0600, David Teigland wrote:
> On Mon, Nov 10, 2008 at 08:09:10PM +0100, Fabio M. Di Nitto wrote:
> > Hi all,
> > 
> > those logging threads have been going on for way too long. It's time to
> > close them and make a final decision. This is a long email, so please
> > take time to read it all.
> > 
> > This is a recap of what I believe a user would like to see:
> > 
> > - a consistent, easy and quick way to configure logging.
> > - a reasonable default if nothing is specified.
> > - a consistent, easy to read, output.
> 
> I like this.  Two minor points regarding the actual terminology; I'd like
> to be a little more consistent and identify some "keywords".  Right now
> the word "log" is combined with other words in a bunch of ways (logging,
> logger, logfile, syslog, log_foo).  How about:
> 
> . logging, logging_subsys (common keyword "logging")
>   for the config file section tags
> 
> - to_syslog, syslog_facility, syslog_priority (common keyword "syslog")
>   for every parameter related to syslog
> 
> - to_logfile, logfile, logfile_priority (common keyword "logfile")
>   for every parameter related to logfile

+1 on all 3 of them.

The only reason I did try to avoid keyword changes was to minimize the
impact in compat layers since most of those are already in use by us.

> 
> And then we have some values that are "on/off" and others that are
> "yes/no"; let's pick one.

+1. Given the boolean value, "on/off" sounds more nerdy.

Fabio



From Joel.Becker at oracle.com  Mon Nov 10 20:49:48 2008
From: Joel.Becker at oracle.com (Joel Becker)
Date: Mon, 10 Nov 2008 12:49:48 -0800
Subject: [Linux-cluster] Re: [Cluster-devel] logging: final call on
	configuration, output and implementation
In-Reply-To: <20081110192730.GB17894@redhat.com>
References: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
	<20081110192730.GB17894@redhat.com>
Message-ID: <20081110204948.GB12445@mail.oracle.com>

On Mon, Nov 10, 2008 at 01:27:30PM -0600, David Teigland wrote:
> And then we have some values that are "on/off" and others that are
> "yes/no"; let's pick one.

	I always liked the X resource route:

/true|yes|on/i -> "True"
/false|no|off/i -> "False"

Joel

-- 

"In the beginning, the universe was created. This has made a lot 
 of people very angry, and is generally considered to have been a 
 bad move."
        - Douglas Adams

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



From merhar at arlut.utexas.edu  Mon Nov 10 21:04:17 2008
From: merhar at arlut.utexas.edu (David Merhar)
Date: Mon, 10 Nov 2008 15:04:17 -0600
Subject: [Linux-cluster] GFS2 poor performance (gfs2_tool counters)
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F01806C15@hugo.eprize.local>
References: <49109ea6.+nLiDwujAEL50VEe%ffv@tjpr.jus.br><64D0546C5EBBD147B75DE133D798665F01806B7C@hugo.eprize.local>
	<007f01c93eba$8d731b20$3e0a10ac@tjpr.net>
	<64D0546C5EBBD147B75DE133D798665F01806C15@hugo.eprize.local>
Message-ID: <6F4B5DBB-3217-42DA-8BAA-E73AB2E7C5B9@arlut.utexas.edu>

Is "gfs2_tool counters" supported?

Doesn't work for us, and I found reference to correcting the man page  
so it's no longer included.

Thanks.

djm



On Nov 6, 2008, at 1:53 PM, Jeff Sturm wrote:

> I looked over the summit document you referenced below.  The value  
> of demote_secs mentioned is an example setting, and unfortunately no  
> recommendations or rationale accompany this.
>
> For some access patterns you can get better performance by actually  
> increasing demote_secs.  For example, we have a node that we  
> routinely rsync a file tree onto using a GFS partition.  Increasing  
> demote_secs from 300 to 86400 reduced the average rsync time by a  
> factor of about 4.  The reason is that this node has little lock  
> contention and needs to lock each file every time we start an rsync  
> process.  With demote_secs=300, it was doing much more work to  
> reacquire locks on each run.  Whereas demote_secs=86400 allowed the  
> locks to persist up to a day, since the overall number of files in  
> our application is bounded such that they will fit in buffer cache,  
> together with locks.
>
> At another extreme, we have an application that creates a lot of  
> files but seldom opens them on the same node.  In this case there is  
> no value in holding onto the locks, so we set demote_secs to a small  
> value and glock_purge as high as 70 to ensure locks are quickly  
> released in memory.
>
> The best advice I can give in general is to experiment with  
> different settings for demote_secs and glock_purge while watching  
> the output of "gfs_tool counters" to see how they behave.
>
> Jeff
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com 
> ] On Behalf Of Fabiano F. Vitale
> Sent: Tuesday, November 04, 2008 3:19 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] GFS2 poor performance
>
> Hi,
>
> for cluster purpose the two nodes are linked by a  patch cord cat6  
> and the lan interfaces are gigabit.
>
> All nodes have a Fibre Channel Emulex Corporation Zephyr-X  
> LightPulse and the Storage is a HP EVA8100
>
> I read the document
> http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Summit08presentation_GFSBestPractices_Final.pdf
> which show some parameters to tune and one of  them is  demote_secs,  
> to adjust to 100sec
>
> thanks
>
>> What sort of network and storage device are you using?
>>
>> Also, why set demote_secs so low?
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of  
>> ffv at tjpr.jus.br
>> Sent: Tuesday, November 04, 2008 2:13 PM
>> To: linux-cluster at redhat.com
>> Subject: [Linux-cluster] GFS2 poor performance
>>
>> Hi all,
>>
>> I?m getting a very poor performance using GFS2.
>> I have two qmail (mail) servers and one gfs2 filesystem shared by  
>> them.
>> In this case, each directory in GFS2 filesystem may have upon to  
>> 10000
>> files (mails)
>>
>> The problem is in performance of some operations like ls, du, rm, etc
>> for example,
>>
>> # time du -sh /dados/teste
>> 40M     /dados/teste
>>
>> real    7m14.919s
>> user    0m0.008s
>> sys     0m0.129s
>>
>> this is unacceptable
>>
>> Some attributes i already set using gfs2_tool:
>>
>> gfs2_tool settune /dados demote_secs 100 gfs2_tool setflag jdata
>> /dados gfs2_tool setflag sync /dados gfs2_tool setflag directio / 
>> dados
>>
>> but the performance is still very bad
>>
>>
>> Anybody know how to tune the filesystem for a acceptable performance
>> working with directory with 10000 files?
>> thanks for any help
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From sdake at redhat.com  Tue Nov 11 00:46:28 2008
From: sdake at redhat.com (Steven Dake)
Date: Mon, 10 Nov 2008 17:46:28 -0700
Subject: [Linux-cluster] Re: [Cluster-devel] logging: final call on
	configuration, output and implementation
In-Reply-To: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
References: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
Message-ID: <1226364388.14398.7.camel@balance>

I disagree with a global debug keyword.  At one time I thought it was a
good idea but that time has long since passed.  The idea of turning
debug to on and then having all debug output go to syslog is frightening
and will result in lost messages.  While it appears this proposal
includes the selectable log output filtering per output medium as was
discussed already, it is unclear how the debug keyword affects this.  It
would simply make sense to change the file's log priority or the
syslog's log priority if that is the behavior desired and then no need
for any extra keyword.

Regards
-steve

On Mon, 2008-11-10 at 20:09 +0100, Fabio M. Di Nitto wrote:
> Hi all,
> 
> those logging threads have been going on for way too long. It's time to
> close them and make a final decision. This is a long email, so please
> take time to read it all.
> 
> This is a recap of what I believe a user would like to see:
> 
> - a consistent, easy and quick way to configure logging.
> - a reasonable default if nothing is specified.
> - a consistent, easy to read, output.
> 
> In order to avoid misinterpretation, "debug" priority means DEBUGLO as
> defined here by David:
> http://www.redhat.com/archives/cluster-devel/2008-November/msg00002.html
> 
> == Configuration ==
> 
> XML version:
> 
> <logging debug="on|off" (read note below about debugging)
>          to_syslog="yes|no"
>          syslog_facility="local4"
>          syslog_priority="debug|info|notice|etc.."
>          to_file="yes|no"
>          logfile="/var/log/cluster/foo.log"
>          log_priority="debug|info|notice|etc..">
> 
>   <logger_subsys subsys="NAME" debug=.. />
> </logging>
> 
> corosync/openais equivalent:
> 
> logging {
>         debug="on|off"; (read note below about debugging)
> 
>         to_syslog="yes|no";
>         syslog_facility="local4";
>         syslog_priority="debug|info|notice|etc..";
> 
>         to_file="yes|no";
>         logfile="/var/log/cluster/foo.log";
>         log_priority="debug|info|notice|etc..";
> 
>         logger_subsys {
>                 subsys="NAME";
>                 debug=..
>         }
> }
> 
> (the two configuration are equivalent)
> 
> The global configuration (<logging>/logging {) contains all the keywords
> required to configure logging as discussed so far and would affects
> every daemon or subsystems. The common user would generally stop here.
> 
> The logger_subsys can be used to do more detailed control over logging.
> This is where more advance users or developers work.
> 
> Equivalent entries in logger_subsys override configurations in the
> global section.
> 
> Use of environment variables and settings from the command line can
> override any configuration. This is used by some people to enable
> debugging via init scripts (for example /etc/sysconfig/cman or
> equivalent for other distros).
> 
> 
> == Defaults ==
> 
> - debug = off. Switch to "on" to enable execution of debugging code
> (read note below) and set log_priority to debug.
> 
> - to_syslog = yes.
> - syslog_facility = local4 (default to local4 to respect old behaviour).
> - syslog_priority = info (default to respect old behaviour and it seems
> to be a recent compromise at this point of development, doesn't flood
> logs and has enough info).
> 
> - to_file = yes
> - logfile = LOGDIR/[daemon|subsystem].log. By default each
> daemon/subsystem should log in its own file (mostly valid for daemons as
> corosync/openais and plugins all share the same file and logging
> system).
> - logpriority= info (same reason as above)
> 
> to_stderr will disappear from the config options. Set to off by default
> as most daemons will fork in background, we can set it "ON"
> automatically when not running in background (it's the only case where
> it actually makes sense).
> 
> 
> == Output ==
> 
> to_file:
> 
> echo $(date "+%b %d %T") subsystem_or_daemon: entry_to_log
> Nov 10 19:46:40 subsystem_or_daemon: entry_to_log
> 
> It is important to note 2 things. First we need to have a time stamp to
> be able to compare logs if daemons are logging to different files, then
> we need to know what subsystem is logging if we are logging to the same
> file for everything.
> 
> Since we don't want to create N combinations that will add/remove
> date/daemon_name etc, this is one format can fit them all and it is the
> same that would appear in syslog (go for consistency!).
> 
> to_syslog:
> 
> openlog("subsystem_or_daemon", options, facility);
> syslog(priority, "%s", entry_to_log);
> 
> (consistent with to_file).
> 
> to_stderr:
> 
> fprintf(stderr, "%s\n", entry); <- this is really free format.
> 
> 
> == Implementations requirements ==
> 
> - (strict requirement) must support the configuration options agreed by
> everybody.
> 
> - (high priority) logging should never fail. For example, if we can't
> open a log file we will not fail, we will try to warn by other meanings,
> but nothing should block our operations if logging is not working.
> 
> - (wish list) logging should be non-blocking. Best if we can delegate
> the logging work to something else than ourselves.
> 
> 
> == Note about debugging ==
> 
> Debugging is an art. Every developer on this planet has a different view
> of what debugging is. What we want with the debug="on|off" option is an
> absolute quick way to set all the different log_priority="debug" on and
> have a flag that can be used in the code to follow different paths.
> 
> I often find myself setting <logging debug="on"/> and be done with
> everything. No need to remember fancy keywords or understand the whole
> implementation details on what overrides what and how.
> 
> At some point in future we should probably talk about debugging in
> general and what means for all of us but it's outside the scope of this
> email.
> 
> Fabio
> 



From fdinitto at redhat.com  Tue Nov 11 04:55:12 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 11 Nov 2008 05:55:12 +0100
Subject: [Linux-cluster] Re: [Cluster-devel] logging: final call on
 configuration, output and implementation
In-Reply-To: <1226364388.14398.7.camel@balance>
References: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
	<1226364388.14398.7.camel@balance>
Message-ID: <1226379312.2445.73.camel@daitarn-fedora.int.fabbione.net>

On Mon, 2008-11-10 at 17:46 -0700, Steven Dake wrote:
> I disagree with a global debug keyword. 
>  At one time I thought it was a
> good idea but that time has long since passed.  The idea of turning
> debug to on and then having all debug output go to syslog is frightening
> and will result in lost messages.  While it appears this proposal
> includes the selectable log output filtering per output medium as was
> discussed already, it is unclear how the debug keyword affects this.  It
> would simply make sense to change the file's log priority or the
> syslog's log priority if that is the behavior desired and then no need
> for any extra keyword.

You have these two situations:

print_log(LOG_DEBUG, "doing this and that....\n");

if (debug) { /*
 gather_some_data_that_is_very_expensive_operation_to_do_all_the_time();
 print_log(LOG_DEBUG, "print those extra data\n");
}

as it is now, it would basically be an alias to set logpriority to DEBUG
but enables people to execute debugging code conditionally and as I
wrote it is an easy keyword to remember compared to
syslog_priority/logpriority.

Fabio



From sdake at redhat.com  Tue Nov 11 05:47:23 2008
From: sdake at redhat.com (Steven Dake)
Date: Mon, 10 Nov 2008 22:47:23 -0700
Subject: [Linux-cluster] Re: [Cluster-devel] logging: final call on
	configuration, output and implementation
In-Reply-To: <1226379312.2445.73.camel@daitarn-fedora.int.fabbione.net>
References: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
	<1226364388.14398.7.camel@balance>
	<1226379312.2445.73.camel@daitarn-fedora.int.fabbione.net>
Message-ID: <1226382444.14398.18.camel@balance>


On Tue, 2008-11-11 at 05:55 +0100, Fabio M. Di Nitto wrote:
> On Mon, 2008-11-10 at 17:46 -0700, Steven Dake wrote:
> > I disagree with a global debug keyword. 
> >  At one time I thought it was a
> > good idea but that time has long since passed.  The idea of turning
> > debug to on and then having all debug output go to syslog is frightening
> > and will result in lost messages.  While it appears this proposal
> > includes the selectable log output filtering per output medium as was
> > discussed already, it is unclear how the debug keyword affects this.  It
> > would simply make sense to change the file's log priority or the
> > syslog's log priority if that is the behavior desired and then no need
> > for any extra keyword.
> 
> You have these two situations:
> 
> print_log(LOG_DEBUG, "doing this and that....\n");
> 
> if (debug) { /*
>  gather_some_data_that_is_very_expensive_operation_to_do_all_the_time();
>  print_log(LOG_DEBUG, "print those extra data\n");
> }
> 
> as it is now, it would basically be an alias to set logpriority to DEBUG
> but enables people to execute debugging code conditionally and as I
> wrote it is an easy keyword to remember compared to
> syslog_priority/logpriority.
> 
> Fabio
> 

The second situation doesn't exist in any code I have written and never
would.  Having any conditional debug output is asking for trouble.  Been
down that road, done that, and discarded that idea...  The "debughi" or
high volume debug messages do not go through log_printf nor would they
be committed to any persistent log (only memory).  The output of the
logging message is significantly more expensive then that of gathering
logging data.

Turning debug on for all of the entire stack to be output to syslog is
not satisfactory because messages would be lost in overload conditions.
Logging to file is only a slight bit better solution but if you really
must have debug output in a persistent store that doesn't occur as a
result of a failure, logging to file is the only suitable answer.

A global debug option without selecting log output is not a workable
solution because of overload of syslog, even overload of the filesystem,
or other issues.

What makes sense is to have a mechanism to set the priority for each
specific log output mechanism and forget about any global debug option
nonsense.

Regards
-steve




From fdinitto at redhat.com  Tue Nov 11 05:54:00 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 11 Nov 2008 06:54:00 +0100
Subject: [Linux-cluster] Re: [Cluster-devel] logging: final call on
 configuration, output and implementation
In-Reply-To: <1226382444.14398.18.camel@balance>
References: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
	<1226364388.14398.7.camel@balance>
	<1226379312.2445.73.camel@daitarn-fedora.int.fabbione.net>
	<1226382444.14398.18.camel@balance>
Message-ID: <1226382840.2445.78.camel@daitarn-fedora.int.fabbione.net>

On Mon, 2008-11-10 at 22:47 -0700, Steven Dake wrote:
> On Tue, 2008-11-11 at 05:55 +0100, Fabio M. Di Nitto wrote:
> > On Mon, 2008-11-10 at 17:46 -0700, Steven Dake wrote:
> > > I disagree with a global debug keyword. 
> > >  At one time I thought it was a
> > > good idea but that time has long since passed.  The idea of turning
> > > debug to on and then having all debug output go to syslog is frightening
> > > and will result in lost messages.  While it appears this proposal
> > > includes the selectable log output filtering per output medium as was
> > > discussed already, it is unclear how the debug keyword affects this.  It
> > > would simply make sense to change the file's log priority or the
> > > syslog's log priority if that is the behavior desired and then no need
> > > for any extra keyword.
> > 
> > You have these two situations:
> > 
> > print_log(LOG_DEBUG, "doing this and that....\n");
> > 
> > if (debug) { /*
> >  gather_some_data_that_is_very_expensive_operation_to_do_all_the_time();
> >  print_log(LOG_DEBUG, "print those extra data\n");
> > }
> > 
> > as it is now, it would basically be an alias to set logpriority to DEBUG
> > but enables people to execute debugging code conditionally and as I
> > wrote it is an easy keyword to remember compared to
> > syslog_priority/logpriority.
> > 
> > Fabio
> > 
> 
> The second situation doesn't exist in any code I have written and never
> would.

Clearly you haven't read what I wrote in the debugging note.

> Turning debug on for all of the entire stack to be output to syslog is
> not satisfactory because messages would be lost in overload conditions.
> Logging to file is only a slight bit better solution but if you really
> must have debug output in a persistent store that doesn't occur as a
> result of a failure, logging to file is the only suitable answer.

Please point me to where I wrote that it should go to syslog as I only
mentioned logfile_priority so far.

Fabio



From sdake at redhat.com  Tue Nov 11 06:00:58 2008
From: sdake at redhat.com (Steven Dake)
Date: Mon, 10 Nov 2008 23:00:58 -0700
Subject: [Linux-cluster] Re: [Cluster-devel] logging: final call on
	configuration, output and implementation
In-Reply-To: <1226382840.2445.78.camel@daitarn-fedora.int.fabbione.net>
References: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
	<1226364388.14398.7.camel@balance>
	<1226379312.2445.73.camel@daitarn-fedora.int.fabbione.net>
	<1226382444.14398.18.camel@balance>
	<1226382840.2445.78.camel@daitarn-fedora.int.fabbione.net>
Message-ID: <1226383258.14398.31.camel@balance>


On Tue, 2008-11-11 at 06:54 +0100, Fabio M. Di Nitto wrote:
> On Mon, 2008-11-10 at 22:47 -0700, Steven Dake wrote:
> > On Tue, 2008-11-11 at 05:55 +0100, Fabio M. Di Nitto wrote:
> > > On Mon, 2008-11-10 at 17:46 -0700, Steven Dake wrote:
> > > > I disagree with a global debug keyword. 
> > > >  At one time I thought it was a
> > > > good idea but that time has long since passed.  The idea of turning
> > > > debug to on and then having all debug output go to syslog is frightening
> > > > and will result in lost messages.  While it appears this proposal
> > > > includes the selectable log output filtering per output medium as was
> > > > discussed already, it is unclear how the debug keyword affects this.  It
> > > > would simply make sense to change the file's log priority or the
> > > > syslog's log priority if that is the behavior desired and then no need
> > > > for any extra keyword.
> > > 
> > > You have these two situations:
> > > 
> > > print_log(LOG_DEBUG, "doing this and that....\n");
> > > 
> > > if (debug) { /*
> > >  gather_some_data_that_is_very_expensive_operation_to_do_all_the_time();
> > >  print_log(LOG_DEBUG, "print those extra data\n");
> > > }
> > > 
> > > as it is now, it would basically be an alias to set logpriority to DEBUG
> > > but enables people to execute debugging code conditionally and as I
> > > wrote it is an easy keyword to remember compared to
> > > syslog_priority/logpriority.
> > > 
> > > Fabio
> > > 
> > 
> > The second situation doesn't exist in any code I have written and never
> > would.
> 
> Clearly you haven't read what I wrote in the debugging note.
> 

I read it but don't agree you can have a discussion about logging and
flight recording without discussing how debugging fits into the log
system.


> > Turning debug on for all of the entire stack to be output to syslog is
> > not satisfactory because messages would be lost in overload conditions.
> > Logging to file is only a slight bit better solution but if you really
> > must have debug output in a persistent store that doesn't occur as a
> > result of a failure, logging to file is the only suitable answer.
> 
> Please point me to where I wrote that it should go to syslog as I only
> mentioned logfile_priority so far.
> 

If syslog is configured it will go to syslog by default in your scheme.

Regards
-steve

> Fabio
> 



From maurizio.rottin at gmail.com  Tue Nov 11 08:26:05 2008
From: maurizio.rottin at gmail.com (Maurizio Rottin)
Date: Tue, 11 Nov 2008 09:26:05 +0100
Subject: [Linux-cluster] Fence VirtualIron - i have the script but...
Message-ID: <e83473390811110026r7f2ad322s3b98df20caf02d7@mail.gmail.com>

Hello everyone,
I need to fence VirtualIron VM in order to GFS to work when a node is
not responding.

Actually, i wrote a simple python script that fences the node, but...i
don't understand how to integrate it in the cluster suite!

What files should i touch in order to have this fence method in luci?

-- 
mr



From swhiteho at redhat.com  Tue Nov 11 09:23:43 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 11 Nov 2008 09:23:43 +0000
Subject: [Linux-cluster] GFS2 poor performance
In-Reply-To: <000d01c9435e$abc63410$3e0a10ac@tjpr.net>
References: <49109ea6.+nLiDwujAEL50VEe%ffv@tjpr.jus.br>
	<64D0546C5EBBD147B75DE133D798665F01806B7C@hugo.eprize.local>
	<007f01c93eba$8d731b20$3e0a10ac@tjpr.net>
	<64D0546C5EBBD147B75DE133D798665F01806C15@hugo.eprize.local>
	<000d01c9435e$abc63410$3e0a10ac@tjpr.net>
Message-ID: <1226395423.9046.3.camel@quoit>

Hi,

On Mon, 2008-11-10 at 16:03 -0200, Fabiano F. Vitale wrote:
>  Setting demote_secs to 30 and glock_purge to 70 in a gfs filesystem 
> increased frightfully performance of commands like ls, df, in a directory 
> that has many files.
> But the gfs2 filesystem doesn't have the attribute glock_purge to tune.
> Exists any attribute  in gfs2 in place of glock_purge which exists only in 
> gfs1
> 
> thanks
> 
> Fabiano
> 
> 
That is entirely deliberate. GFS2 is self-tuning so far as glocks goes,
so such settings are not needed. The demote time setting for glocks in
GFS2 only applies to non-inode glocks and it might well go away in the
future when we have an automatic way to deal with them too.

One of the goals of GFS2 is to reduce the need for users to have to
change obscure settings in order to get the best performance in any
particular situation,

Steve.

> ----- Original Message ----- 
> From: "Jeff Sturm" <jeff.sturm at eprize.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Thursday, November 06, 2008 5:53 PM
> Subject: RE: [Linux-cluster] GFS2 poor performance
> 
> 
> >I looked over the summit document you referenced below.  The value of 
> >demote_secs mentioned is an example setting, and unfortunately no 
> >recommendations or rationale accompany this.
> >
> > For some access patterns you can get better performance by actually 
> > increasing demote_secs.  For example, we have a node that we routinely 
> > rsync a file tree onto using a GFS partition.  Increasing demote_secs from 
> > 300 to 86400 reduced the average rsync time by a factor of about 4.  The 
> > reason is that this node has little lock contention and needs to lock each 
> > file every time we start an rsync process.  With demote_secs=300, it was 
> > doing much more work to reacquire locks on each run.  Whereas 
> > demote_secs=86400 allowed the locks to persist up to a day, since the 
> > overall number of files in our application is bounded such that they will 
> > fit in buffer cache, together with locks.
> >
> > At another extreme, we have an application that creates a lot of files but 
> > seldom opens them on the same node.  In this case there is no value in 
> > holding onto the locks, so we set demote_secs to a small value and 
> > glock_purge as high as 70 to ensure locks are quickly released in memory.
> >
> > The best advice I can give in general is to experiment with different 
> > settings for demote_secs and glock_purge while watching the output of 
> > "gfs_tool counters" to see how they behave.
> >
> > Jeff
> >
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Fabiano F. Vitale
> > Sent: Tuesday, November 04, 2008 3:19 PM
> > To: linux clustering
> > Subject: Re: [Linux-cluster] GFS2 poor performance
> >
> > Hi,
> >
> > for cluster purpose the two nodes are linked by a  patch cord cat6 and the 
> > lan interfaces are gigabit.
> >
> > All nodes have a Fibre Channel Emulex Corporation Zephyr-X LightPulse and 
> > the Storage is a HP EVA8100
> >
> > I read the document
> > http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Summit08presentation_GFSBestPractices_Final.pdf
> > which show some parameters to tune and one of  them is  demote_secs, to 
> > adjust to 100sec
> >
> > thanks
> >
> >> What sort of network and storage device are you using?
> >>
> >> Also, why set demote_secs so low?
> >>
> >> -----Original Message-----
> >> From: linux-cluster-bounces at redhat.com
> >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of ffv at tjpr.jus.br
> >> Sent: Tuesday, November 04, 2008 2:13 PM
> >> To: linux-cluster at redhat.com
> >> Subject: [Linux-cluster] GFS2 poor performance
> >>
> >> Hi all,
> >>
> >> I?m getting a very poor performance using GFS2.
> >> I have two qmail (mail) servers and one gfs2 filesystem shared by them.
> >> In this case, each directory in GFS2 filesystem may have upon to 10000
> >> files (mails)
> >>
> >> The problem is in performance of some operations like ls, du, rm, etc
> >> for example,
> >>
> >> # time du -sh /dados/teste
> >> 40M     /dados/teste
> >>
> >> real    7m14.919s
> >> user    0m0.008s
> >> sys     0m0.129s
> >>
> >> this is unacceptable
> >>
> >> Some attributes i already set using gfs2_tool:
> >>
> >> gfs2_tool settune /dados demote_secs 100 gfs2_tool setflag jdata
> >> /dados gfs2_tool setflag sync /dados gfs2_tool setflag directio /dados
> >>
> >> but the performance is still very bad
> >>
> >>
> >> Anybody know how to tune the filesystem for a acceptable performance
> >> working with directory with 10000 files?
> >> thanks for any help
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From swhiteho at redhat.com  Tue Nov 11 09:27:09 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 11 Nov 2008 09:27:09 +0000
Subject: [Linux-cluster] GFS2 poor performance (gfs2_tool counters)
In-Reply-To: <6F4B5DBB-3217-42DA-8BAA-E73AB2E7C5B9@arlut.utexas.edu>
References: <49109ea6.+nLiDwujAEL50VEe%ffv@tjpr.jus.br>
	<64D0546C5EBBD147B75DE133D798665F01806B7C@hugo.eprize.local>
	<007f01c93eba$8d731b20$3e0a10ac@tjpr.net>
	<64D0546C5EBBD147B75DE133D798665F01806C15@hugo.eprize.local>
	<6F4B5DBB-3217-42DA-8BAA-E73AB2E7C5B9@arlut.utexas.edu>
Message-ID: <1226395629.9046.7.camel@quoit>

Hi,

On Mon, 2008-11-10 at 15:04 -0600, David Merhar wrote:
> Is "gfs2_tool counters" supported?
> 
> Doesn't work for us, and I found reference to correcting the man page  
> so it's no longer included.
> 
> Thanks.
> 
> djm
> 
> 
No, it isn't supported any more. There are plenty of existing methods of
tracing the actions of the filesystem, such as strace, blktrace, and
more recently FIEMAP so that the counters are no longer needed,

Steve.

> 
> On Nov 6, 2008, at 1:53 PM, Jeff Sturm wrote:
> 
> > I looked over the summit document you referenced below.  The value  
> > of demote_secs mentioned is an example setting, and unfortunately no  
> > recommendations or rationale accompany this.
> >
> > For some access patterns you can get better performance by actually  
> > increasing demote_secs.  For example, we have a node that we  
> > routinely rsync a file tree onto using a GFS partition.  Increasing  
> > demote_secs from 300 to 86400 reduced the average rsync time by a  
> > factor of about 4.  The reason is that this node has little lock  
> > contention and needs to lock each file every time we start an rsync  
> > process.  With demote_secs=300, it was doing much more work to  
> > reacquire locks on each run.  Whereas demote_secs=86400 allowed the  
> > locks to persist up to a day, since the overall number of files in  
> > our application is bounded such that they will fit in buffer cache,  
> > together with locks.
> >
> > At another extreme, we have an application that creates a lot of  
> > files but seldom opens them on the same node.  In this case there is  
> > no value in holding onto the locks, so we set demote_secs to a small  
> > value and glock_purge as high as 70 to ensure locks are quickly  
> > released in memory.
> >
> > The best advice I can give in general is to experiment with  
> > different settings for demote_secs and glock_purge while watching  
> > the output of "gfs_tool counters" to see how they behave.
> >
> > Jeff
> >
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com 
> > ] On Behalf Of Fabiano F. Vitale
> > Sent: Tuesday, November 04, 2008 3:19 PM
> > To: linux clustering
> > Subject: Re: [Linux-cluster] GFS2 poor performance
> >
> > Hi,
> >
> > for cluster purpose the two nodes are linked by a  patch cord cat6  
> > and the lan interfaces are gigabit.
> >
> > All nodes have a Fibre Channel Emulex Corporation Zephyr-X  
> > LightPulse and the Storage is a HP EVA8100
> >
> > I read the document
> > http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Summit08presentation_GFSBestPractices_Final.pdf
> > which show some parameters to tune and one of  them is  demote_secs,  
> > to adjust to 100sec
> >
> > thanks
> >
> >> What sort of network and storage device are you using?
> >>
> >> Also, why set demote_secs so low?
> >>
> >> -----Original Message-----
> >> From: linux-cluster-bounces at redhat.com
> >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of  
> >> ffv at tjpr.jus.br
> >> Sent: Tuesday, November 04, 2008 2:13 PM
> >> To: linux-cluster at redhat.com
> >> Subject: [Linux-cluster] GFS2 poor performance
> >>
> >> Hi all,
> >>
> >> I?m getting a very poor performance using GFS2.
> >> I have two qmail (mail) servers and one gfs2 filesystem shared by  
> >> them.
> >> In this case, each directory in GFS2 filesystem may have upon to  
> >> 10000
> >> files (mails)
> >>
> >> The problem is in performance of some operations like ls, du, rm, etc
> >> for example,
> >>
> >> # time du -sh /dados/teste
> >> 40M     /dados/teste
> >>
> >> real    7m14.919s
> >> user    0m0.008s
> >> sys     0m0.129s
> >>
> >> this is unacceptable
> >>
> >> Some attributes i already set using gfs2_tool:
> >>
> >> gfs2_tool settune /dados demote_secs 100 gfs2_tool setflag jdata
> >> /dados gfs2_tool setflag sync /dados gfs2_tool setflag directio / 
> >> dados
> >>
> >> but the performance is still very bad
> >>
> >>
> >> Anybody know how to tune the filesystem for a acceptable performance
> >> working with directory with 10000 files?
> >> thanks for any help
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From finnzi at finnzi.com  Tue Nov 11 11:25:32 2008
From: finnzi at finnzi.com (Finnur =?iso-8859-1?Q?=D6rn_Gu=F0mundsson?=)
Date: Tue, 11 Nov 2008 11:25:32 -0000 (GMT)
Subject: [Linux-cluster] Multiple oracle databases with RHCS
Message-ID: <60659.217.28.182.1.1226402732.squirrel@webmail.finnzi.com>

Hi,

I'm running 3 RHCS clusters that have one Oracle database.

Now i need to configure a RHCS cluster that will be running 4 databases,
but from my initial testing i can see i need to have a special user for
each database.
However, in other cluster software we are used to use only single oracle
user (ie: ServiceGuard) and we've had no issues there. So i am wondering,
is there any reason why Red Hat has choosen to do this like this or would
it be ok if i would just modify the oracledb.sh script so i can have a
single user....would it still be supported ?

There is hardly any documentation (that i could find) regarding running
multiple databases on a RHCS cluster. If someone can point me to a good
document regarding this please do! :)

Thanks in advanced,
Finnur



From fdinitto at redhat.com  Tue Nov 11 18:11:06 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 11 Nov 2008 19:11:06 +0100
Subject: [Linux-cluster] Re: [Cluster-devel] logging: final call on
 configuration, output and implementation
In-Reply-To: <1226383258.14398.31.camel@balance>
References: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
	<1226364388.14398.7.camel@balance>
	<1226379312.2445.73.camel@daitarn-fedora.int.fabbione.net>
	<1226382444.14398.18.camel@balance>
	<1226382840.2445.78.camel@daitarn-fedora.int.fabbione.net>
	<1226383258.14398.31.camel@balance>
Message-ID: <1226427066.2445.82.camel@daitarn-fedora.int.fabbione.net>

Just for the record, Steven and I had a chat on IRC. This is the
transcript:

07:51 < riley_dt> yes i read your original message
07:52 < riley_dt> i was reswponding to your email not dave's
07:52 < riley_dt> your right i had not read dave's message at all
07:53 < fabbione> riley_dt: well.. nothing I can do about that, but the
thread is evolving in a positive direction 
                  IMHO. There are only few bits that needs smoothing
07:53 < riley_dt> i see the correction in dave's email
07:53 < fabbione> and I am simply talking from the end of the thread. I
can't possible know what have you read or not
07:54 < riley_dt> well you can put debug in the top level config
07:54 < riley_dt> but i dont intend to do anything about it :)
07:55 < fabbione> riley_dt: and that _IS_ fine :)
07:55 < fabbione> riley_dt: allow others to use it if they want
07:55 < riley_dt> also the oring has to go
07:55 < fabbione> oring?
07:55 < riley_dt> DEBUG|WARN
07:55 < riley_dt> should be DEBUG then debug+ are logged
07:55 < fabbione> there is no oring...
07:55 < fabbione> it's an example to show what values can be there
07:55 < fabbione> you can't ORING priorities
07:56 < riley_dt> you could or select priorities
07:56 < riley_dt> i thought that is what you proposed
07:56 < riley_dt> then we are good to go
07:56 < fabbione> no.. simply a list of options 
07:56 < fabbione> i received some comments from other people in my inbox
that I need to sort out
07:56 < fabbione> they didn't go to mailing lists
07:56 < riley_dt> well for trace there is oring
07:56 < riley_dt> ie: TRACE1|TRACE2
07:57 < riley_dt> it isn't "trace 1 or trace2" it is an oring
07:57 < fabbione> some are really good points.. so before everybody
dives into implementing, let's wait for the "final 
                  email"
07:57 < fabbione> gotcha...
07:57 < riley_dt> when I see | I think "or" 
07:57 < riley_dt> overloaded term i guess
08:49 < fabbione> riley_dt: just for the record, can you please reply to
thread and tell that you understood what we 
                  have been talking about? I strongly want to avoid the
situation where in 2 months from now we will be 
                  fighting again:" I said this, not this, understood
that etc."
10:11 < riley_dt> sure


On Mon, 2008-11-10 at 23:00 -0700, Steven Dake wrote:
> On Tue, 2008-11-11 at 06:54 +0100, Fabio M. Di Nitto wrote:
> > On Mon, 2008-11-10 at 22:47 -0700, Steven Dake wrote:
> > > On Tue, 2008-11-11 at 05:55 +0100, Fabio M. Di Nitto wrote:
> > > > On Mon, 2008-11-10 at 17:46 -0700, Steven Dake wrote:
> > > > > I disagree with a global debug keyword. 
> > > > >  At one time I thought it was a
> > > > > good idea but that time has long since passed.  The idea of turning
> > > > > debug to on and then having all debug output go to syslog is frightening
> > > > > and will result in lost messages.  While it appears this proposal
> > > > > includes the selectable log output filtering per output medium as was
> > > > > discussed already, it is unclear how the debug keyword affects this.  It
> > > > > would simply make sense to change the file's log priority or the
> > > > > syslog's log priority if that is the behavior desired and then no need
> > > > > for any extra keyword.
> > > > 
> > > > You have these two situations:
> > > > 
> > > > print_log(LOG_DEBUG, "doing this and that....\n");
> > > > 
> > > > if (debug) { /*
> > > >  gather_some_data_that_is_very_expensive_operation_to_do_all_the_time();
> > > >  print_log(LOG_DEBUG, "print those extra data\n");
> > > > }
> > > > 
> > > > as it is now, it would basically be an alias to set logpriority to DEBUG
> > > > but enables people to execute debugging code conditionally and as I
> > > > wrote it is an easy keyword to remember compared to
> > > > syslog_priority/logpriority.
> > > > 
> > > > Fabio
> > > > 
> > > 
> > > The second situation doesn't exist in any code I have written and never
> > > would.
> > 
> > Clearly you haven't read what I wrote in the debugging note.
> > 
> 
> I read it but don't agree you can have a discussion about logging and
> flight recording without discussing how debugging fits into the log
> system.
> 
> 
> > > Turning debug on for all of the entire stack to be output to syslog is
> > > not satisfactory because messages would be lost in overload conditions.
> > > Logging to file is only a slight bit better solution but if you really
> > > must have debug output in a persistent store that doesn't occur as a
> > > result of a failure, logging to file is the only suitable answer.
> > 
> > Please point me to where I wrote that it should go to syslog as I only
> > mentioned logfile_priority so far.
> > 
> 
> If syslog is configured it will go to syslog by default in your scheme.
> 
> Regards
> -steve
> 
> > Fabio
> > 
> 



From garromo at us.ibm.com  Tue Nov 11 18:22:18 2008
From: garromo at us.ibm.com (Gary Romo)
Date: Tue, 11 Nov 2008 11:22:18 -0700
Subject: [Linux-cluster] Multiple oracle databases with RHCS
In-Reply-To: <60659.217.28.182.1.1226402732.squirrel@webmail.finnzi.com>
Message-ID: <OF9C7DC99D.60F58775-ON872574FE.0064D12B-872574FE.0064EB2E@us.ibm.com>


We run multiple oracle databases on our clusters, and they all use the one
oracle account.
I'm not clear as to what testing you did tha determined you needed one
account per database.

Gary Romo


                                                                           
             Finnur ?rn                                                    
             Gu?mundsson                                                   
             <finnzi at finnzi.co                                          To 
             m>                        linux-cluster at redhat.com            
             Sent by:                                                   cc 
             linux-cluster-bou                                             
             nces at redhat.com                                       Subject 
                                       [Linux-cluster] Multiple oracle     
                                       databases with RHCS                 
             11/11/2008 04:25                                              
             AM                                                            
                                                                           
                                                                           
             Please respond to                                             
             linux clustering                                              
             <linux-cluster at re                                             
                 dhat.com>                                                 
                                                                           
                                                                           




Hi,

I'm running 3 RHCS clusters that have one Oracle database.

Now i need to configure a RHCS cluster that will be running 4 databases,
but from my initial testing i can see i need to have a special user for
each database.
However, in other cluster software we are used to use only single oracle
user (ie: ServiceGuard) and we've had no issues there. So i am wondering,
is there any reason why Red Hat has choosen to do this like this or would
it be ok if i would just modify the oracledb.sh script so i can have a
single user....would it still be supported ?

There is hardly any documentation (that i could find) regarding running
multiple databases on a RHCS cluster. If someone can point me to a good
document regarding this please do! :)

Thanks in advanced,
Finnur

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081111/940f1747/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081111/940f1747/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic08461.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081111/940f1747/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081111/940f1747/attachment-0002.gif>

From lhh at redhat.com  Tue Nov 11 19:21:31 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 11 Nov 2008 14:21:31 -0500
Subject: [Linux-cluster] Multiple oracle databases with RHCS
In-Reply-To: <60659.217.28.182.1.1226402732.squirrel@webmail.finnzi.com>
References: <60659.217.28.182.1.1226402732.squirrel@webmail.finnzi.com>
Message-ID: <1226431291.16686.55.camel@ayanami>

On Tue, 2008-11-11 at 11:25 +0000, Finnur ?rn Gu?mundsson wrote:
> Hi,
> 
> I'm running 3 RHCS clusters that have one Oracle database.
> 
> Now i need to configure a RHCS cluster that will be running 4 databases,
> but from my initial testing i can see i need to have a special user for
> each database.

> However, in other cluster software we are used to use only single oracle
> user (ie: ServiceGuard) and we've had no issues there. So i am wondering,
> is there any reason why Red Hat has choosen to do this like this or would
> it be ok if i would just modify the oracledb.sh script so i can have a
> single user....would it still be supported ?

You can edit it, but you need to modify the oracledb.sh script to not
kill all processes.  There's a bugzilla open about this one:

https://bugzilla.redhat.com/show_bug.cgi?id=458481

In the bugzilla is a better approximation of a multiple instance
Oracledb.sh with a bug in it (and possible fix).

We don't have a lot of reports on people running this in larger
environments, so any feedback you provide will be helpful.

-- Lon



From lhh at redhat.com  Tue Nov 11 22:26:20 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 11 Nov 2008 17:26:20 -0500
Subject: [Linux-cluster] Fence VirtualIron - i have the script but...
In-Reply-To: <e83473390811110026r7f2ad322s3b98df20caf02d7@mail.gmail.com>
References: <e83473390811110026r7f2ad322s3b98df20caf02d7@mail.gmail.com>
Message-ID: <1226442380.7132.65.camel@ayanami>

On Tue, 2008-11-11 at 09:26 +0100, Maurizio Rottin wrote:
> Hello everyone,
> I need to fence VirtualIron VM in order to GFS to work when a node is
> not responding.
> 
> Actually, i wrote a simple python script that fences the node, but...i
> don't understand how to integrate it in the cluster suite!
> 
> What files should i touch in order to have this fence method in luci?

You'd have to write a UI for it; I am not sure how to do it.

However...

http://sources.redhat.com/cluster/wiki/FenceAgentAPI

If your agent follows those guidelines, you can just stick the agent
directives and configuration in cluster.conf.

-- Lon



From maurizio.rottin at gmail.com  Tue Nov 11 23:36:35 2008
From: maurizio.rottin at gmail.com (Maurizio Rottin)
Date: Wed, 12 Nov 2008 00:36:35 +0100
Subject: [Linux-cluster] Fence VirtualIron - i have the script but...
In-Reply-To: <1226442380.7132.65.camel@ayanami>
References: <e83473390811110026r7f2ad322s3b98df20caf02d7@mail.gmail.com>
	<1226442380.7132.65.camel@ayanami>
Message-ID: <e83473390811111536l5eb37025x2d32b03fa368a23a@mail.gmail.com>

2008/11/11 Lon Hohberger <lhh at redhat.com>:
> http://sources.redhat.com/cluster/wiki/FenceAgentAPI
>
> If your agent follows those guidelines, you can just stick the agent
> directives and configuration in cluster.conf.
>

This is enought!

thanks you, Lon!

-- 
mr



From jbrassow at redhat.com  Wed Nov 12 02:00:28 2008
From: jbrassow at redhat.com (Jonathan Brassow)
Date: Tue, 11 Nov 2008 20:00:28 -0600
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <1226060556.12833.4.camel@marc>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
Message-ID: <FADC43D6-5AE2-4126-BDF9-ACB073F7714A@redhat.com>

cmirror will progress on the RHEL5 branch, but for upstream placement,  
it will probably move to the LVM repository.  That will be happening  
soon.

  brassow

On Nov 7, 2008, at 6:22 AM, Marc - A. Dahlhaus [ Administration |  
Westermann GmbH ] wrote:

> Hello,
>
>
> will the changes to mirroring get merged into stable2 and head after
> RHEL-5.3 release?
>
>
> Marc
>
> Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan Brassow:
>> that works already.
>>
>> single machine: linear, stripe, mirror, snapshot
>> cluster-aware: linear, stripe, mirror (5.3)
>>
>>  brassow
>>
>> On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
>>
>>> What about CLVM based striping (RAID0)? Does that work already or is
>>> it planned for the near future?
>>>
>>> Gordan
>>>
>>> Jonathan Brassow wrote:
>>>> Cluster mirror (RAID1) will be available in rhel5.3 for LVM.
>>>> brassow
>>>> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
>>>>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic <gordan at bobich.net>
>>>>> wrote:
>>>>>> I rather doubt md will become cluster aware any time soon. CLVM
>>>>>> doesn't yet
>>>>>> support even more important features like snapshotting, so I
>>>>>> wouldn't count
>>>>>> on it supporting anything more advanced.
>>>>>
>>>>> I worked a little on clvm snapshots:
>>>>> https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
>>>>>
>>>>> Review and testing is required.
>>>>> -- 
>>>>> Federico.
>>>>>
>>>>> -- 
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> -- 
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jbrassow at redhat.com  Wed Nov 12 02:01:44 2008
From: jbrassow at redhat.com (Jonathan Brassow)
Date: Tue, 11 Nov 2008 20:01:44 -0600
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
Message-ID: <322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>

Sure.  In fact, if you have access to the red hat 5.3 beta, it is  
ready there.

  brassow

On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:

> can i use cluster raid1 if i get development release from  
> sources.redhat.com/cluster ?
>
>
> 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann GmbH ] <mad at wol.de 
> >
> Hello,
>
>
> will the changes to mirroring get merged into stable2 and head after
> RHEL-5.3 release?
>
>
> Marc
>
> Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan Brassow:
> > that works already.
> >
> > single machine: linear, stripe, mirror, snapshot
> > cluster-aware: linear, stripe, mirror (5.3)
> >
> >   brassow
> >
> > On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
> >
> > > What about CLVM based striping (RAID0)? Does that work already  
> or is
> > > it planned for the near future?
> > >
> > > Gordan
> > >
> > > Jonathan Brassow wrote:
> > >> Cluster mirror (RAID1) will be available in rhel5.3 for LVM.
> > >> brassow
> > >> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> > >>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic  
> <gordan at bobich.net>
> > >>> wrote:
> > >>>> I rather doubt md will become cluster aware any time soon. CLVM
> > >>>> doesn't yet
> > >>>> support even more important features like snapshotting, so I
> > >>>> wouldn't count
> > >>>> on it supporting anything more advanced.
> > >>>
> > >>> I worked a little on clvm snapshots:
> > >>> https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
> > >>>
> > >>> Review and testing is required.
> > >>> --
> > >>> Federico.
> > >>>
> > >>> --
> > >>> Linux-cluster mailing list
> > >>> Linux-cluster at redhat.com
> > >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >> --
> > >> Linux-cluster mailing list
> > >> Linux-cluster at redhat.com
> > >> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081111/06b5002e/attachment.htm>

From achievement.hk at gmail.com  Wed Nov 12 09:44:59 2008
From: achievement.hk at gmail.com (Achievement Chan)
Date: Wed, 12 Nov 2008 17:44:59 +0800
Subject: [Linux-cluster] GFS performance of imap service (Maildir)
Message-ID: <f7f193930811120144h6b5c8bc7p36280f0e17176f1c@mail.gmail.com>

Dear All,
I've setup a courier-imap server which store the email data in Maildir format.
The mailbox are saved under a LUN in ISCSI SAN.

For handling a mailbox with 10000 email, it takes 6-8 seconds for
waiting response from first "SELECT" command.
The response time is also unstable too, sometimes it takes 10-20
seconds for the same mailbox.

Based some online material, i've tried to tune the gfs. But there are
seems no improvement.
e.g.
gfs_tool setflag inherit_jdata /home/domains
gfs_tool  settune /home/domains recoverd_secs 60
gfs_tool settune /home/domains glock_purge 50
gfs_tool settune /home/domains demote_secs 100
gfs_tool settune /home/domains scand_secs 3
gfs_tool settune /home/domains max_readahead 262144
gfs_tool settune /home/domains statfs_fast 1


I've tested the mailbox under ext3 and gfs2, both under the LUN in the
same SAN.
The response time can be within 1 second.

Has anyone tried to provide imap service in GFS?

or I need to go for GFS2? Is GFS2 still unstable for production system?


Regards,
Achievement Chan



From kiss.zoltan at bardiauto.hu  Wed Nov 12 10:57:04 2008
From: kiss.zoltan at bardiauto.hu (=?iso-8859-2?B?S2lzcyBab2x04W4=?=)
Date: Wed, 12 Nov 2008 11:57:04 +0100
Subject: [Linux-cluster] slow lock performance
Message-ID: <91632af9d61397469c647a809455d908@mail.bardiauto.hu>

Hello,

 

I have a little problem with my GFS2 installation.

My Company have a little bit old business software (written in clipper),
but our programmers can compile itt o linux platform with the xHarbour
compiler. The sw uses DBF databases to store data.

Here is the problem:

We have 100-150 client computers, that mean we must run 100-150
(minimum) application on our servers. The problem is the GFS or GFS2
locking. If i mount the storage GFS/GFS2 partition with the localflocks
option, then the application is running very fast (slower like ext3, but
fast). Without the localflocks option the applications is slow down. If
i enable cluster wide flocks, then my apps running approx. 100-140X
slower.

 

Can anybody help me? Any useful mount option, gfs tune option, etc.etc..

 

 

Thank you in anticipation!

 

 

Best regards,

 

Zoltan Kiss

system administrator

B?rdi Aut? Zrt.

zoltan.kiss at bardiauto.hu

+36204300386

 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081112/a7f4fd6a/attachment.htm>

From veliogluh at itu.edu.tr  Wed Nov 12 11:17:00 2008
From: veliogluh at itu.edu.tr (Hakan VELIOGLU)
Date: Wed, 12 Nov 2008 13:17:00 +0200
Subject: [Linux-cluster] Cman doesn't realize the failed node
In-Reply-To: <91632af9d61397469c647a809455d908@mail.bardiauto.hu>
References: <91632af9d61397469c647a809455d908@mail.bardiauto.hu>
Message-ID: <20081112131700.16801d3do17enme8@webmail.beta.itu.edu.tr>

Hi,

I am testing and trying to understand the cluster environment. I ve  
built a two node cluster system without any service (Red Hat EL 5.2  
x64). I run the cman and rgmanager services succesfully and then  
poweroff one node suddenly. After thsi I excpect that the other node  
realize this failure and take up all the resources however running  
node doesn't realize this failure. I use "cman_tool nodes" and  
"clustat" commands and they say the failed node is active and online.  
What am i missing? Why cman doesn't realize the failure?

[root at cl1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0" ?>
<cluster alias="kume" config_version="54" name="kume">
         <totem token="1000" hold="100"/>
         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
         <clusternodes>
                 <clusternode name="cl2.cc.itu.edu.tr" nodeid="1" votes="1">
                         <fence/>
                 </clusternode>
                 <clusternode name="cl1.cc.itu.edu.tr" nodeid="2" votes="1">
                         <fence/>
                 </clusternode>
         </clusternodes>
         <cman expected_votes="1" two_node="1"/>
         <fencedevices/>
         <rm>
                 <failoverdomains>
                         <failoverdomain name="domain" ordered="1"  
restricted="1">
                                 <failoverdomainnode  
name="cl2.cc.itu.edu.tr" priority="1"/>
                                 <failoverdomainnode  
name="cl1.cc.itu.edu.tr" priority="2"/>
                         </failoverdomain>
                 </failoverdomains>
                 <resources/>
                 <service autostart="0" domain="domain"  
name="veritabani" recovery="restart"/>
         </rm>
</cluster>
[root at cl1 ~]#


When the node gows down, the TOTEM repeastedly logs messages like this.
Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.



Hakan




From hicheerup at gmail.com  Wed Nov 12 13:44:08 2008
From: hicheerup at gmail.com (lingu)
Date: Wed, 12 Nov 2008 19:14:08 +0530
Subject: [Linux-cluster] RHEL3 Cluster Broken Pipe error and Heartbeat
	configuration
Message-ID: <29e045b80811120544j1a85eeay237b72daf8de3e16@mail.gmail.com>

Hi,

   I am running two node active/passive cluster on RHEL3U8-64 bit
operating system for my oracle database,both the nodes are connected
to HP MSA-500 storage(scsi not Fibre channel) . Below are my hardware
and clumanager version details. It was running fine and stable for
last two years but all of a sudden for the past one month i am getting
below errors on syslog  and cluster restarting locally.

Server Hardware: HP ProLiant DL580 G4
OS: RHEL3U8-64BIT INTEL EMT
Kernel : 2.4.21-47.EL
Storage : HP MSA-500 storage (scsci channel)

Cluster Version:
clumanager-1.2.26.1-1
redhat-config-cluster-1.0.7-1

NODE1 ip: 20.2.135.161 (network bonding configured)
NODE2 ip: 20.2.135.162 (network bonding configured)
VIP : 20.2.135.35

Syslog errors

cluquorumd[1921]: <warning> Disk-TB: Detected I/O Hang!
clulockd[1996]: <warning> Potential recursive lock #0 grant to member
#1, PID1962
clulockd[1996]: <warning> Denied 20.1.135.162: Broken pipe
clulockd[1996]: <err> select error: Broken pipe
clulockd[1996]: <warning> Denied 20.1.135.162: Broken pipe
clulockd[1996]: <err> select error: Broken pipe
cluquorumd[1921]: <warning> Disk-TB: Detected I/O Hang!
clulockd[1996]: <warning> Denied 20.1.135.161: Broken pipe
clulockd[1996]: <err> select error: Broken pipe
clusvcmgrd[2011]: <err> Unable to obtain cluster lock: Connection timed out
cluquorumd[2100]: <err> VF: Abort: Invalid header in reply from member #0
cluquorumd[1934]: <err> __msg_send: Incomplete write to 13. Error:
Connection reset by peer

  Can any one guide me  what is this above error indicates and how to
troubleshoot.After a long google search i found the below link from
redhat that is matching my scenario.Can i follow the same because it
is my very critical production server.

https://bugzilla.redhat.com/show_bug.cgi?id=185484


 Also  anyone help me to configure a dedicated LAN (for example eth3)
as heartbeat(private  point to point cross over cable network for
cluster communications),I don't wish heartbeat over public LAN ,
because of heavy Network saturation.

  Fot the above heartbeat configuration  i didnot found any suitable
document for rhel. Can any one provide me the suitable link or guide
me what are all the changes i have to made in my  existing cluster.xml
 file for this private heartbeat configuration to work.

Waiting for some one reply its urgent for me

Regards,
Lingu



From treed at ultraviolet.org  Wed Nov 12 18:43:52 2008
From: treed at ultraviolet.org (Tracy Reed)
Date: Wed, 12 Nov 2008 10:43:52 -0800
Subject: [Linux-cluster] HA cluster cache coherency
Message-ID: <20081112184351.GC21564@tracyreed.org>

Hello all,

I have been wondering what most people do about cache coherency issues
when doing high availability failover between two or more Linux
servers (RHEL, if it matters) with shared storage?

Consider a typical master-slave arrangement managed by heartbeat and a
fibrechannel HBA in each node each connected to a switch with a bunch
of storage also connected to the switch. The master and the slave are
sharing this disk to other clients on the network via NFS.

When the master fails during heavy writes with a gig of data in its
cache all of that data will be lost.

What do most people do about this? Is there any way to tell the kernel
to only do write-through and no caching? This might not be infeasible
if one has a lot of cache in the disk storage connected to the
fibrechannel switch which is the case for me.

I have read http://www.westnet.com/~gsmith/content/linux-pdflush.htm
which seems to be an excellent treatment of pdflush related issues.
However, it does not seem to address this specific issue. It mentions
four tunables in /proc/sys/vm which when set to zero seem like they
might accomplish what I'm looking for:

dirty_background_ratio
dirty_ratio
dirty_expire_centisecs
dirty_writeback_centisecs

but I set them all to zero on a test system and the Dirty field of
/proc/meminfo still routinely shows dirty pages.

Your comments are appreciated.

-- 
Tracy Reed
http://tracyreed.org



From kadlec at sunserv.kfki.hu  Wed Nov 12 18:51:40 2008
From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsef)
Date: Wed, 12 Nov 2008 19:51:40 +0100 (CET)
Subject: [Linux-cluster] GFS performance of imap service (Maildir)
In-Reply-To: <f7f193930811120144h6b5c8bc7p36280f0e17176f1c@mail.gmail.com>
References: <f7f193930811120144h6b5c8bc7p36280f0e17176f1c@mail.gmail.com>
Message-ID: <alpine.DEB.2.00.0811121945400.8800@lxserv1.kfki.hu>

Hello,

On Wed, 12 Nov 2008, Achievement Chan wrote:

> For handling a mailbox with 10000 email, it takes 6-8 seconds for
> waiting response from first "SELECT" command.
> The response time is also unstable too, sometimes it takes 10-20
> seconds for the same mailbox.
> 
> Based some online material, i've tried to tune the gfs. But there are
> seems no improvement.
> e.g.
> gfs_tool setflag inherit_jdata /home/domains
> gfs_tool  settune /home/domains recoverd_secs 60
> gfs_tool settune /home/domains glock_purge 50
> gfs_tool settune /home/domains demote_secs 100
> gfs_tool settune /home/domains scand_secs 3
> gfs_tool settune /home/domains max_readahead 262144
> gfs_tool settune /home/domains statfs_fast 1
> 
> Has anyone tried to provide imap service in GFS?

We have had exactly the same problems with maildir over GFS. There was no 
tuning whatsoever which helped: the fighting for the locks for every 
file in the maildir costs so much that you cannot expect better 
performance.

The best is to avoid maildir and use simple mailbox format instead. We 
went (back) to mailbox and since then our users have not complained about 
performance.

Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary



From fdinitto at redhat.com  Thu Nov 13 05:51:29 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 13 Nov 2008 06:51:29 +0100
Subject: [Linux-cluster] Re: logging: more input
In-Reply-To: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
References: <1226344150.2445.61.camel@daitarn-fedora.int.fabbione.net>
Message-ID: <1226555489.4022.15.camel@daitarn-fedora.int.fabbione.net>

Hi guys,

I have received a bunch of comments and suggestions in my personal inbox
from people that didn't want to participate in the thread directly.

Some of them are absolutely valid points IMHO.

So here they are:

On Mon, 2008-11-10 at 20:09 +0100, Fabio M. Di Nitto wrote:

> == Output ==
> 
> to_file:
> 
> echo $(date "+%b %d %T") subsystem_or_daemon: entry_to_log
> Nov 10 19:46:40 subsystem_or_daemon: entry_to_log

Quoting:

--
I've not read any of the threads but did you consider a date format that
sorts easily i.e. YYYY MM DD-based?

And for investigating races across files/nodes using sub-second
timestamps (e.g. 19:46:40.123 ) and adding the node name.

So you can just extract relevant sections from several files, cat them
all together, sort them and then review the sequence of happenings.

If you're generalising, make the log format string a customisable option
similar to apache.
--

I agree that a more precise time and sortable format is a major winner.
He also has a good point to add the node name to the log.

My reply to the "customisable format" request was based on the
discussion we had at the Summit (that we want to avoid parsing lots of
different logging etc.) and his reply was:

Quoting:

--

1. Most people will use the default or one of the example lines you
supply.

2. If it is a problem, you've got the corresponding config line so it
should be straightforward to have a utility to convert it back to your
preferred canonical output format.
(And yes, "Nov 14" format can also be parsed and converted back into a
sortable format without much difficulty, so it's no big deal.)

3. If lots of people change the log format, then you probably chose a
poor default.

4. There's no reason why log-to-file should exactly match syslog format.
If you want syslog format, use syslog.  You want the best format to
assist you in debugging problems etc.

--

Fabio



From robejrm at gmail.com  Thu Nov 13 09:56:28 2008
From: robejrm at gmail.com (Juan Ramon Martin Blanco)
Date: Thu, 13 Nov 2008 10:56:28 +0100
Subject: [Linux-cluster] Limit service restarting times
Message-ID: <8a5668960811130156i2ef6f533s1f39332b5dd72195@mail.gmail.com>

First of all, hello and many thanks everyone, this list has helped me a lot
in the cluster world ;)

I have configured a 2 node cluster with RHEL 5.2, shared storage and GFS2.
I have configured several services with our company own software. This
software evolves fast because we are in active development, so sometimes
cores are dumped. When this happens, the cluster tries to restart the
failing service again and again...filling the service's filesystem with
cores.
Is there any way to limit the number of retries for a certain service?

Thanks in advance,
Juan Ram?n Mart?n Blanco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/2dd0a171/attachment.htm>

From mcasiraghi73 at gmail.com  Thu Nov 13 14:14:11 2008
From: mcasiraghi73 at gmail.com (Mauro Casiraghi)
Date: Thu, 13 Nov 2008 15:14:11 +0100
Subject: [Linux-cluster] RedHat Cluster Suite cluster resource management
Message-ID: <18c35c650811130614i2a58e035w3bb9f574075389ca@mail.gmail.com>

I had created this cluster configuration with Redhat Cluster Suite

I Have one Service Group with the follow resources

Service Group Name : WEB

Resources of the service group:

1) IP_ADRESS
2) APACHE

The resource dependency are:

The Apache resource is dependent of ip_adress

How can i stop only apache resource wiyhout stopping all service group ???
What is the cluster.conf to do this ??

Best Regards

Mauro Casiraghi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/b64a7442/attachment.htm>

From Harri.Paivaniemi at tietoenator.com  Thu Nov 13 14:22:05 2008
From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com)
Date: Thu, 13 Nov 2008 16:22:05 +0200
Subject: [Linux-cluster] RedHat Cluster Suite cluster resource management
References: <18c35c650811130614i2a58e035w3bb9f574075389ca@mail.gmail.com>
Message-ID: <41E8D4F07FCE154CBEBAA60FFC92F67709FE46@apollo.eu.tieto.com>


Add "exit 0" to the beginning of your apache status-check portion ;)

-hjp



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Mauro Casiraghi
Sent: Thu 11/13/2008 16:14
To: linux-cluster at redhat.com
Subject: [Linux-cluster] RedHat Cluster Suite cluster resource management
 
I had created this cluster configuration with Redhat Cluster Suite

I Have one Service Group with the follow resources

Service Group Name : WEB

Resources of the service group:

1) IP_ADRESS
2) APACHE

The resource dependency are:

The Apache resource is dependent of ip_adress

How can i stop only apache resource wiyhout stopping all service group ???
What is the cluster.conf to do this ??

Best Regards

Mauro Casiraghi

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3009 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/6f9c9d1f/attachment.bin>

From mcasiraghi73 at gmail.com  Thu Nov 13 14:28:17 2008
From: mcasiraghi73 at gmail.com (Mauro Casiraghi)
Date: Thu, 13 Nov 2008 15:28:17 +0100
Subject: [Linux-cluster] Service Group Dependency
Message-ID: <18c35c650811130628r613fddbfi42bf4d0b2a83e1fe@mail.gmail.com>

If i have two or more services how can set dependency for the services using
Redhat Cluster Suite??

For example if i have service A B end C how can set those dependency:

B depend from A and C depend from B

In this case to stop service group A i must stop before C and B

What is the cluster.conf setting ??

Best Regards

Mauro Casiraghi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/0c0f5ba1/attachment.htm>

From mcasiraghi73 at gmail.com  Thu Nov 13 14:34:34 2008
From: mcasiraghi73 at gmail.com (Mauro Casiraghi)
Date: Thu, 13 Nov 2008 15:34:34 +0100
Subject: [Linux-cluster] RedHat Cluster Suite cluster resource management
In-Reply-To: <41E8D4F07FCE154CBEBAA60FFC92F67709FE46@apollo.eu.tieto.com>
References: <18c35c650811130614i2a58e035w3bb9f574075389ca@mail.gmail.com>
	<41E8D4F07FCE154CBEBAA60FFC92F67709FE46@apollo.eu.tieto.com>
Message-ID: <18c35c650811130634y3539ac24waf6649dde280db4d@mail.gmail.com>

Ok this it works but is a workaround.
Normaly in a cluster (veritas or Sun Cluster) you can enable,disable stop
and start a resource in a service group without change monitoring script.
Is it possible do the same in RHCS ???

Thank you for your help

Mauro Casiraghi

2008/11/13 <Harri.Paivaniemi at tietoenator.com>

>
> Add "exit 0" to the beginning of your apache status-check portion ;)
>
> -hjp
>
>
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com on behalf of Mauro Casiraghi
> Sent: Thu 11/13/2008 16:14
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] RedHat Cluster Suite cluster resource management
>
> I had created this cluster configuration with Redhat Cluster Suite
>
> I Have one Service Group with the follow resources
>
> Service Group Name : WEB
>
> Resources of the service group:
>
> 1) IP_ADRESS
> 2) APACHE
>
> The resource dependency are:
>
> The Apache resource is dependent of ip_adress
>
> How can i stop only apache resource wiyhout stopping all service group ???
> What is the cluster.conf to do this ??
>
> Best Regards
>
> Mauro Casiraghi
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/d2f9a0cb/attachment.htm>

From mcasiraghi73 at gmail.com  Thu Nov 13 14:51:00 2008
From: mcasiraghi73 at gmail.com (Mauro Casiraghi)
Date: Thu, 13 Nov 2008 15:51:00 +0100
Subject: [Linux-cluster] Manual Fencing problem
Message-ID: <18c35c650811130651i2931bd0h211a37192390f9e4@mail.gmail.com>

I have two cluster nodes with the follow configuration

For each node i had setup manual fencing

<?xml version="1.0"?>
<cluster alias="rhcs" config_version="13" name="mauro">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="node0" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="Manual-0"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node1" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="Manual-1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="Manual-0" />
                <fencedevice agent="fence_manual" name="Manual-1" />
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="rhcs-domain" ordered="0"
restricted="1">
                                <failoverdomainnode name="node0"
priority="1"/>
                                <failoverdomainnode name="node1"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="xx.xxx.xx.78" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="rhcs-domain" exclusive="0"
name="rhcs-web" recovery="relocate">
                        <ip ref="xx.xxx.xx..78"/>
                </service>
        </rm>
</cluster>

On my messages (node0) i had recived this message

Nov 13 12:06:34 lxxxxxxx fenced[2002]: fencing node "node1"
Nov 13 12:06:34 lxxxxxxx fenced[2002]: agent "fence_manual" reports: failed:
fence_manual no node name

How can i fix this problem

Best Regards

Mauro Casiraghi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/6669c575/attachment.htm>

From jruemker at redhat.com  Thu Nov 13 14:58:08 2008
From: jruemker at redhat.com (John Ruemker)
Date: Thu, 13 Nov 2008 09:58:08 -0500
Subject: [Linux-cluster] Manual Fencing problem
In-Reply-To: <18c35c650811130651i2931bd0h211a37192390f9e4@mail.gmail.com>
References: <18c35c650811130651i2931bd0h211a37192390f9e4@mail.gmail.com>
Message-ID: <491C4080.3050400@redhat.com>

Try adding the nodename attribute to each device as seen here:

Mauro Casiraghi wrote:
> I have two cluster nodes with the follow configuration
>  
> For each node i had setup manual fencing
>  
> <?xml version="1.0"?>
> <cluster alias="rhcs" config_version="13" name="mauro">
>         <fence_daemon clean_start="0" post_fail_delay="0" 
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="node0" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
                                                <device name="Manual-0" 
nodename="node0"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="node1" nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
                                               <device name="Manual-1" 
nodename="node1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_manual" name="Manual-0" />
>                 <fencedevice agent="fence_manual" name="Manual-1" />
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="rhcs-domain" ordered="0" 
> restricted="1">
>                                 <failoverdomainnode name="node0" 
> priority="1"/>
>                                 <failoverdomainnode name="node1" 
> priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <ip address="xx.xxx.xx.78" monitor_link="1"/>
>                 </resources>
>                 <service autostart="1" domain="rhcs-domain" 
> exclusive="0" name="rhcs-web" recovery="relocate">
>                         <ip ref="xx.xxx.xx..78"/>
>                 </service>
>         </rm>
> </cluster>
>  
> On my messages (node0) i had recived this message
>  
> Nov 13 12:06:34 lxxxxxxx fenced[2002]: fencing node "node1"
> Nov 13 12:06:34 lxxxxxxx fenced[2002]: agent "fence_manual" reports: 
> failed: fence_manual no node name
>  
> How can i fix this problem
>  

-John



From mcasiraghi73 at gmail.com  Thu Nov 13 15:05:53 2008
From: mcasiraghi73 at gmail.com (Mauro Casiraghi)
Date: Thu, 13 Nov 2008 16:05:53 +0100
Subject: [Linux-cluster] Resource Restart
Message-ID: <18c35c650811130705r6c6628fbmbbef0e9e07be9f64@mail.gmail.com>

If i have one resource in a service cluster, how can set the max number of
time it can be restarted from the cluster before is considered fault ???

And in this case is it possible stop the service group relocation on other
nodes ??

Some time if the restart problem of the resource is an application problem,
the best choice is to live the resource faulted on the node and investigate
about the problem.
In Veritas Cluster this is very easy to implement.

Is it possible implement the same configuration using RedHat Cluster suite
???

Best Regards

Mauro Casiraghi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/79edb257/attachment.htm>

From mcasiraghi73 at gmail.com  Thu Nov 13 15:25:03 2008
From: mcasiraghi73 at gmail.com (Mauro Casiraghi)
Date: Thu, 13 Nov 2008 16:25:03 +0100
Subject: [Linux-cluster] Manual Fencing problem
In-Reply-To: <491C4080.3050400@redhat.com>
References: <18c35c650811130651i2931bd0h211a37192390f9e4@mail.gmail.com>
	<491C4080.3050400@redhat.com>
Message-ID: <18c35c650811130725q7384fb57o414a3743bd9b4d73@mail.gmail.com>

Ok i think that it works but now i have another problem

On the node0 messages i can see the follow message


fence_manual: Node node1 needs to be reset before recovery can procede.
Waiting for node1 to rejoin the cluster or for manual acknowledgement that
it has been reset (i.e. fence_ack_manual -n node1)
so i try to fence_ack_manual -n node1

and i recived this message

fence_ack_manual -n node1
Warning:  If the node "node1" has not been manually fenced
(i.e. power cycled or disconnected from shared storage devices)
the GFS file system may become corrupted and all its data
unrecoverable!  Please verify that the node shown above has
been reset or disconnected from storage.
Are you certain you want to continue? [yN] y
can't open /tmp/fence_manual.fifo: No such file or directory

Thank you for your help

Best Regards

Mauro Casiraghi

On Thu, Nov 13, 2008 at 3:58 PM, John Ruemker <jruemker at redhat.com> wrote:

> Try adding the nodename attribute to each device as seen here:
>
> Mauro Casiraghi wrote:
>
>> I have two cluster nodes with the follow configuration
>>  For each node i had setup manual fencing
>>  <?xml version="1.0"?>
>> <cluster alias="rhcs" config_version="13" name="mauro">
>>        <fence_daemon clean_start="0" post_fail_delay="0"
>> post_join_delay="3"/>
>>        <clusternodes>
>>                <clusternode name="node0" nodeid="1" votes="1">
>>                        <fence>
>>                                <method name="1">
>>
>                                               <device name="Manual-0"
> nodename="node0"/>
>
>>                                </method>
>>                        </fence>
>>                </clusternode>
>>                <clusternode name="node1" nodeid="2" votes="1">
>>                        <fence>
>>                                <method name="1">
>>
>                                              <device name="Manual-1"
> nodename="node1"/>
>
>                                </method>
>>                        </fence>
>>                </clusternode>
>>        </clusternodes>
>>        <cman expected_votes="1" two_node="1"/>
>>        <fencedevices>
>>                <fencedevice agent="fence_manual" name="Manual-0" />
>>                <fencedevice agent="fence_manual" name="Manual-1" />
>>        </fencedevices>
>>        <rm>
>>                <failoverdomains>
>>                        <failoverdomain name="rhcs-domain" ordered="0"
>> restricted="1">
>>                                <failoverdomainnode name="node0"
>> priority="1"/>
>>                                <failoverdomainnode name="node1"
>> priority="1"/>
>>                        </failoverdomain>
>>                </failoverdomains>
>>                <resources>
>>                        <ip address="xx.xxx.xx.78" monitor_link="1"/>
>>                </resources>
>>                <service autostart="1" domain="rhcs-domain" exclusive="0"
>> name="rhcs-web" recovery="relocate">
>>                        <ip ref="xx.xxx.xx..78"/>
>>                </service>
>>        </rm>
>> </cluster>
>>  On my messages (node0) i had recived this message
>>  Nov 13 12:06:34 lxxxxxxx fenced[2002]: fencing node "node1"
>> Nov 13 12:06:34 lxxxxxxx fenced[2002]: agent "fence_manual" reports:
>> failed: fence_manual no node name
>>  How can i fix this problem
>>
>>
>
> -John
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/e44e7228/attachment.htm>

From michael.osullivan at auckland.ac.nz  Thu Nov 13 16:21:22 2008
From: michael.osullivan at auckland.ac.nz (Michael O'Sullivan)
Date: Thu, 13 Nov 2008 09:21:22 -0700
Subject: [Linux-cluster] Clusters with multihomed hosts
Message-ID: <491C5402.8060706@auckland.ac.nz>

Hi all,

I need to know more about using redundant NICs in clusters.

I have a 2-node cluster with 2 NICs in each node. The first NICs on each 
node are connected to one switch, the second NICs on each node are 
connected to another switch. This is an experimental arrangement so I am 
using /etc/hosts instead of DNS. It appears that the cluster software 
becomes confused if I put both NICs for the hosts in the /etc/hosts 
file, even if they are in different subnets. Here is the /etc/hosts file 
I would like to use:

# localhost line
192.168.10.1     node1
192.168.10.2     node2
192.168.20.1     node1 # Second NIC on node 1
192.168.20.2     node2 # Second NIC on node 2

but this seems to cause the cluster to hang (confused about which NIC to 
use?), so I have removed the last 2 lines and everything works fine. 
However, this means if the switch on the 192.168.10.x subnet fails the 
cluster will "break". If the cluster would recognise that node1 and 
node2 are available via the second NICs then I wouldn't have to worry 
about this single point-of-failure.

I have thought about bonding the NICs which (I think) would take care of 
the problem, but I have heard that boding two NICs usually does not  
give double the bandwidth. I  have read a little about high-availability 
and failing over IP addresses, but this seems to be between different 
nodes, not different NICs in the same host.

Would anyone please be able to give me some direction about the best way 
to set up my cluster and NICs to make the cluster reliable in the event 
of switch failure? And keep the full bandwidth of the NICs intact?

Thanks in advance for any help you can give. Kind regards, Mike



From billpp at gmail.com  Thu Nov 13 16:25:52 2008
From: billpp at gmail.com (Flavio Junior)
Date: Thu, 13 Nov 2008 14:25:52 -0200
Subject: [Linux-cluster] Clusters with multihomed hosts
In-Reply-To: <491C5402.8060706@auckland.ac.nz>
References: <491C5402.8060706@auckland.ac.nz>
Message-ID: <58aa8d780811130825w64c4476crb64c2704d4e0b2fd@mail.gmail.com>

You can use bonding NICs with active-backup mode.

Only one NIC is used at a time, the second will only come up if the primary
(active) fails.

Dont forget to configure miimon value for link monitor.

--

Fl?vio do Carmo J?nior aka waKKu

On Thu, Nov 13, 2008 at 2:21 PM, Michael O'Sullivan <
michael.osullivan at auckland.ac.nz> wrote:

> Hi all,
>
> I need to know more about using redundant NICs in clusters.
>
> I have a 2-node cluster with 2 NICs in each node. The first NICs on each
> node are connected to one switch, the second NICs on each node are connected
> to another switch. This is an experimental arrangement so I am using
> /etc/hosts instead of DNS. It appears that the cluster software becomes
> confused if I put both NICs for the hosts in the /etc/hosts file, even if
> they are in different subnets. Here is the /etc/hosts file I would like to
> use:
>
> # localhost line
> 192.168.10.1     node1
> 192.168.10.2     node2
> 192.168.20.1     node1 # Second NIC on node 1
> 192.168.20.2     node2 # Second NIC on node 2
>
> but this seems to cause the cluster to hang (confused about which NIC to
> use?), so I have removed the last 2 lines and everything works fine.
> However, this means if the switch on the 192.168.10.x subnet fails the
> cluster will "break". If the cluster would recognise that node1 and node2
> are available via the second NICs then I wouldn't have to worry about this
> single point-of-failure.
>
> I have thought about bonding the NICs which (I think) would take care of
> the problem, but I have heard that boding two NICs usually does not  give
> double the bandwidth. I  have read a little about high-availability and
> failing over IP addresses, but this seems to be between different nodes, not
> different NICs in the same host.
>
> Would anyone please be able to give me some direction about the best way to
> set up my cluster and NICs to make the cluster reliable in the event of
> switch failure? And keep the full bandwidth of the NICs intact?
>
> Thanks in advance for any help you can give. Kind regards, Mike
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/418bb935/attachment.htm>

From raju.rajsand at gmail.com  Thu Nov 13 16:34:52 2008
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Thu, 13 Nov 2008 22:04:52 +0530
Subject: [Linux-cluster] Manual Fencing problem
In-Reply-To: <18c35c650811130725q7384fb57o414a3743bd9b4d73@mail.gmail.com>
References: <18c35c650811130651i2931bd0h211a37192390f9e4@mail.gmail.com>
	<491C4080.3050400@redhat.com>
	<18c35c650811130725q7384fb57o414a3743bd9b4d73@mail.gmail.com>
Message-ID: <8786b91c0811130834h4f9cd5f2hce49028b4df42c97@mail.gmail.com>

Greetings,

2008/11/13 Mauro Casiraghi <mcasiraghi73 at gmail.com>:
> Ok i think that it works but now i have another problem
>
> On the node0 messages i can see the follow message
>

Apologies to remind of a basic fact of life in network.

Do you have both node's name in the /etc/hosts file or DNS server oin
your environment

Regards,

Rajagopal



From Harri.Paivaniemi at tietoenator.com  Thu Nov 13 16:34:31 2008
From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com)
Date: Thu, 13 Nov 2008 18:34:31 +0200
Subject: [Linux-cluster] Clusters with multihomed hosts
References: <491C5402.8060706@auckland.ac.nz>
	<58aa8d780811130825w64c4476crb64c2704d4e0b2fd@mail.gmail.com>
Message-ID: <41E8D4F07FCE154CBEBAA60FFC92F67709FE49@apollo.eu.tieto.com>

... and you could also use mode 0 in bonding to get roundrobin-lb.

-hpj



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Flavio Junior
Sent: Thu 11/13/2008 18:25
To: linux clustering
Subject: Re: [Linux-cluster] Clusters with multihomed hosts
 
You can use bonding NICs with active-backup mode.

Only one NIC is used at a time, the second will only come up if the primary
(active) fails.

Dont forget to configure miimon value for link monitor.

--

Fl?vio do Carmo J?nior aka waKKu

On Thu, Nov 13, 2008 at 2:21 PM, Michael O'Sullivan <
michael.osullivan at auckland.ac.nz> wrote:

> Hi all,
>
> I need to know more about using redundant NICs in clusters.
>
> I have a 2-node cluster with 2 NICs in each node. The first NICs on each
> node are connected to one switch, the second NICs on each node are connected
> to another switch. This is an experimental arrangement so I am using
> /etc/hosts instead of DNS. It appears that the cluster software becomes
> confused if I put both NICs for the hosts in the /etc/hosts file, even if
> they are in different subnets. Here is the /etc/hosts file I would like to
> use:
>
> # localhost line
> 192.168.10.1     node1
> 192.168.10.2     node2
> 192.168.20.1     node1 # Second NIC on node 1
> 192.168.20.2     node2 # Second NIC on node 2
>
> but this seems to cause the cluster to hang (confused about which NIC to
> use?), so I have removed the last 2 lines and everything works fine.
> However, this means if the switch on the 192.168.10.x subnet fails the
> cluster will "break". If the cluster would recognise that node1 and node2
> are available via the second NICs then I wouldn't have to worry about this
> single point-of-failure.
>
> I have thought about bonding the NICs which (I think) would take care of
> the problem, but I have heard that boding two NICs usually does not  give
> double the bandwidth. I  have read a little about high-availability and
> failing over IP addresses, but this seems to be between different nodes, not
> different NICs in the same host.
>
> Would anyone please be able to give me some direction about the best way to
> set up my cluster and NICs to make the cluster reliable in the event of
> switch failure? And keep the full bandwidth of the NICs intact?
>
> Thanks in advance for any help you can give. Kind regards, Mike
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4011 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/c51a85b1/attachment.bin>

From raju.rajsand at gmail.com  Thu Nov 13 16:38:04 2008
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Thu, 13 Nov 2008 22:08:04 +0530
Subject: [Linux-cluster] Clusters with multihomed hosts
In-Reply-To: <491C5402.8060706@auckland.ac.nz>
References: <491C5402.8060706@auckland.ac.nz>
Message-ID: <8786b91c0811130838gd857579s962212dfc378efb7@mail.gmail.com>

Greetings,

On Thu, Nov 13, 2008 at 9:51 PM, Michael O'Sullivan
<michael.osullivan at auckland.ac.nz> wrote:
> Hi all,
>
> I need to know more about using redundant NICs in clusters.

1. Have you enabled the multicast feature in the switch carrying the
Cluster heartbeat traffic?

2. HAve you tried 802.3ad in the bonding mode?

Regards

Rajagopal



From swhiteho at redhat.com  Thu Nov 13 16:54:24 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Thu, 13 Nov 2008 16:54:24 +0000
Subject: [Linux-cluster] Clusters with multihomed hosts
In-Reply-To: <491C5402.8060706@auckland.ac.nz>
References: <491C5402.8060706@auckland.ac.nz>
Message-ID: <1226595264.9571.75.camel@quoit>

Hi,

On Thu, 2008-11-13 at 09:21 -0700, Michael O'Sullivan wrote:
> Hi all,
> 
> I need to know more about using redundant NICs in clusters.
> 
> I have a 2-node cluster with 2 NICs in each node. The first NICs on each 
> node are connected to one switch, the second NICs on each node are 
> connected to another switch. This is an experimental arrangement so I am 
> using /etc/hosts instead of DNS. It appears that the cluster software 
> becomes confused if I put both NICs for the hosts in the /etc/hosts 
> file, even if they are in different subnets. Here is the /etc/hosts file 
> I would like to use:
> 
> # localhost line
> 192.168.10.1     node1
> 192.168.10.2     node2
> 192.168.20.1     node1 # Second NIC on node 1
> 192.168.20.2     node2 # Second NIC on node 2
> 
> but this seems to cause the cluster to hang (confused about which NIC to 
> use?), so I have removed the last 2 lines and everything works fine. 
> However, this means if the switch on the 192.168.10.x subnet fails the 
> cluster will "break". If the cluster would recognise that node1 and 
> node2 are available via the second NICs then I wouldn't have to worry 
> about this single point-of-failure.
> 
The trouble with this kind of thing is that you find that its not easy
to control which external IP address a particular application uses as
you have discovered. It can be done though, with the aid of iproute2.

The kernel will look at the routing table to work out where to send a
particular packet, and once its found a suitable destination interface
it will then look at the various possible source IPs on that interface
in order to work out which one to use. It tries to use the source
address which has most bits matching with the destination (counting from
the network end to the host end of the IP address) so that if the
destination address is on a particular subnet, it will try to use an IP
from the same subnet as the source address if one is available.

You can alter this quite easily though, you just set up a second routing
table and use routing rules in order to select the correct table
according to your network. Thats where iproute2 comes in and there is a
set of docs here: http://lartc.org/

Also, just because you have two NIC's connected to different switches
doesn't mean that you need to give them different IP addresses/subnets.
The Linux IP stack can easily cope with them being the same, which would
also simplify the situation that you have, where, I suspect the cluster
stack has replied via a different NIC and thus received a different
source address.

> I have thought about bonding the NICs which (I think) would take care of 
> the problem, but I have heard that boding two NICs usually does not  
> give double the bandwidth. I  have read a little about high-availability 
> and failing over IP addresses, but this seems to be between different 
> nodes, not different NICs in the same host.
> 
> Would anyone please be able to give me some direction about the best way 
> to set up my cluster and NICs to make the cluster reliable in the event 
> of switch failure? And keep the full bandwidth of the NICs intact?
> 
> Thanks in advance for any help you can give. Kind regards, Mike
> 
The problem with bonding is that a single packet can only use a single
one of the parallel links. Also, by using multiple links on a single
stream (i.e. a TCP connection) you run the risk, if you are not careful,
of reordering the packets and that can cause slow downs at the receiving
end, and possibly generation of out of order ACK packets which might
cause retransmissions at the sending end, further slowing things down.

The Linux bonding driver has various modes to try and avoid that, and in
addition it also has 802.3ad mode which allows it to automatically
negotiate settings with a switch. Thats ideal if all the bonded links
for a particular node go to the same switch, but won't work across
switches as in your situation.

I suspect that the choice will come down to one of the following:

1. something easy to set up & not very efficient in terms of bandwidth,
but probably not too bad either.
    -> choose bonding (just be sure to select the right mode)

2. something more complex to set up, but which can be made to make full
use all of the available bandwidth, and be extended into more
complicated setups (dynamic routing, etc), given enough application
support & tweeking.
    -> choose the IP based solution

Steve.




From achievement.hk at gmail.com  Thu Nov 13 18:31:32 2008
From: achievement.hk at gmail.com (Achievement Chan)
Date: Fri, 14 Nov 2008 02:31:32 +0800
Subject: [Linux-cluster] Is GFS2 stable for production system?
Message-ID: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>

Dear All,
Is GFS2 stable for production system? Is it still not defined as
stable by redhat?

I would like to use it with apache, and courier-imap (Maildir format mailbox)

reagrds,
Achievement Chan



From a.holstvoogd at nedforce.nl  Thu Nov 13 19:09:51 2008
From: a.holstvoogd at nedforce.nl (Arthur Holstvoogd)
Date: Thu, 13 Nov 2008 20:09:51 +0100
Subject: [Linux-cluster] Is GFS2 stable for production system?
In-Reply-To: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
References: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
Message-ID: <D56966F4-5CCC-4316-B14F-A62C2B1C3D6F@nedforce.nl>

Hi,

In my experience, it is not really. We have been using the 'stable'  
version that comes with centos 5.2 and had to upgrade to beta releases  
to get it running properly. Now we have some fs corruption we can't  
solve because fsck keeps segfaulting. Some of the other tools don't  
work either, especially if your mixing versions of the different  
parts. (which is to be expected of course)
I guess if your using the latest beta versions it runs stable, but  
some tooling just doesn't work properly yet. Might be out specific  
case of bad luck, but still...
I'm considering moving back to gfs1 for production.

Cheers
Arthur


On Nov 13, 2008, at 19:31 , Achievement Chan wrote:

> Dear All,
> Is GFS2 stable for production system? Is it still not defined as
> stable by redhat?
>
> I would like to use it with apache, and courier-imap (Maildir format  
> mailbox)
>
> reagrds,
> Achievement Chan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From kanderso at redhat.com  Thu Nov 13 19:54:24 2008
From: kanderso at redhat.com (Kevin Anderson)
Date: Thu, 13 Nov 2008 13:54:24 -0600
Subject: [Linux-cluster] Is GFS2 stable for production system?
In-Reply-To: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
References: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
Message-ID: <1226606064.4108.60.camel@dhcp80-204.msp.redhat.com>

For anything prior to the latest Fedora 9/10 kernels and RHEL 5.3, GFS2
is not considered stable and has known blatant issues with cluster
coherent operations.   RHEL 5.3 beta has been released with a working
GFS2. This will move from our tech preview status to supported when RHEL
5.3 GA version.  No one should be running GFS2 in a cluster production
environment prior to these versions.

Kevin

On Fri, 2008-11-14 at 02:31 +0800, Achievement Chan wrote:
> Dear All,
> Is GFS2 stable for production system? Is it still not defined as
> stable by redhat?
> 
> I would like to use it with apache, and courier-imap (Maildir format mailbox)
> 
> reagrds,
> Achievement Chan
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From andrew at ntsg.umt.edu  Thu Nov 13 20:01:18 2008
From: andrew at ntsg.umt.edu (Andrew Neuschwander)
Date: Thu, 13 Nov 2008 13:01:18 -0700
Subject: [Linux-cluster] Is GFS2 stable for production system?
In-Reply-To: <1226606064.4108.60.camel@dhcp80-204.msp.redhat.com>
References: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
	<1226606064.4108.60.camel@dhcp80-204.msp.redhat.com>
Message-ID: <491C878E.70607@ntsg.umt.edu>

Does this mean that it should work fine (i.e. no known issues) as a
local only file system (one node, with lock_nolock)?

-A
--
Andrew A. Neuschwander, RHCE
Linux Systems/Software Engineer
College of Forestry and Conservation
The University of Montana
http://www.ntsg.umt.edu
andrew at ntsg.umt.edu - 406.243.6310

Kevin Anderson wrote:
> For anything prior to the latest Fedora 9/10 kernels and RHEL 5.3, GFS2
> is not considered stable and has known blatant issues with cluster
> coherent operations.   RHEL 5.3 beta has been released with a working
> GFS2. This will move from our tech preview status to supported when RHEL
> 5.3 GA version.  No one should be running GFS2 in a cluster production
> environment prior to these versions.
> 
> Kevin
> 
> On Fri, 2008-11-14 at 02:31 +0800, Achievement Chan wrote:
>> Dear All,
>> Is GFS2 stable for production system? Is it still not defined as
>> stable by redhat?
>>
>> I would like to use it with apache, and courier-imap (Maildir format mailbox)
>>
>> reagrds,
>> Achievement Chan
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From kanderso at redhat.com  Thu Nov 13 20:07:40 2008
From: kanderso at redhat.com (Kevin Anderson)
Date: Thu, 13 Nov 2008 14:07:40 -0600
Subject: [Linux-cluster] Is GFS2 stable for production system?
In-Reply-To: <491C878E.70607@ntsg.umt.edu>
References: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
	<1226606064.4108.60.camel@dhcp80-204.msp.redhat.com>
	<491C878E.70607@ntsg.umt.edu>
Message-ID: <1226606860.4108.64.camel@dhcp80-204.msp.redhat.com>

Single node GFS2 has been pretty stable since May/June timeframe of this
year, not sure which kernel version this mapped into.  We did an errata
release post RHEL 5.2 with a couple of fixes in a special gfs2-kmod rpm,
but still is considered tech preview from a support perspective.

Kevin

On Thu, 2008-11-13 at 13:01 -0700, Andrew Neuschwander wrote:
> Does this mean that it should work fine (i.e. no known issues) as a
> local only file system (one node, with lock_nolock)?
> 
> -A
> --
> Andrew A. Neuschwander, RHCE
> Linux Systems/Software Engineer
> College of Forestry and Conservation
> The University of Montana
> http://www.ntsg.umt.edu
> andrew at ntsg.umt.edu - 406.243.6310
> 
> Kevin Anderson wrote:
> > For anything prior to the latest Fedora 9/10 kernels and RHEL 5.3, GFS2
> > is not considered stable and has known blatant issues with cluster
> > coherent operations.   RHEL 5.3 beta has been released with a working
> > GFS2. This will move from our tech preview status to supported when RHEL
> > 5.3 GA version.  No one should be running GFS2 in a cluster production
> > environment prior to these versions.
> > 
> > Kevin
> > 
> > On Fri, 2008-11-14 at 02:31 +0800, Achievement Chan wrote:
> >> Dear All,
> >> Is GFS2 stable for production system? Is it still not defined as
> >> stable by redhat?
> >>
> >> I would like to use it with apache, and courier-imap (Maildir format mailbox)
> >>
> >> reagrds,
> >> Achievement Chan
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From treed at ultraviolet.org  Thu Nov 13 20:16:34 2008
From: treed at ultraviolet.org (Tracy Reed)
Date: Thu, 13 Nov 2008 12:16:34 -0800
Subject: [Linux-cluster] Is GFS2 stable for production system?
In-Reply-To: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
References: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
Message-ID: <20081113201634.GA10575@tracyreed.org>

On Fri, Nov 14, 2008 at 02:31:32AM +0800, Achievement Chan spake thusly:
> I would like to use it with apache, and courier-imap (Maildir format mailbox)

See the thread: [Linux-cluster] GFS performance of imap service
(Maildir) from yesterday. It seems that GFS performance may not be so
good with lots of small files due to locking issues. Not sure if Kevin
Anderson <kanderso at redhat.com>'s comments about using RHEL5.3 will
have any bearing on this issue or not.

I've been following the GFS project since it was first created at the
University of Minnesota, went to Sistina, DotHill and other SAN
vendors looked into it and built special disk firmware which
implemented locking, went closed source, OpenGFS forked it, Sistina
got bought by Redhat, OpenGFS sorta died, went open-source by RedHat
again...and through all of this I'm still waiting for a stable GFS
with decent performance. :) Hopefully RHEL5.3 finally provides it.

-- 
Tracy Reed
http://tracyreed.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081113/271f08f5/attachment.sig>

From kanderso at redhat.com  Thu Nov 13 20:50:27 2008
From: kanderso at redhat.com (Kevin Anderson)
Date: Thu, 13 Nov 2008 14:50:27 -0600
Subject: [Linux-cluster] Is GFS2 stable for production system?
In-Reply-To: <20081113201634.GA10575@tracyreed.org>
References: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
	<20081113201634.GA10575@tracyreed.org>
Message-ID: <1226609427.4108.77.camel@dhcp80-204.msp.redhat.com>

On Thu, 2008-11-13 at 12:16 -0800, Tracy Reed wrote:
> On Fri, Nov 14, 2008 at 02:31:32AM +0800, Achievement Chan spake thusly:
> > I would like to use it with apache, and courier-imap (Maildir format mailbox)
> 
> See the thread: [Linux-cluster] GFS performance of imap service
> (Maildir) from yesterday. It seems that GFS performance may not be so
> good with lots of small files due to locking issues. Not sure if Kevin
> Anderson <kanderso at redhat.com>'s comments about using RHEL5.3 will
> have any bearing on this issue or not.
> 
> I've been following the GFS project since it was first created at the
> University of Minnesota, went to Sistina, DotHill and other SAN
> vendors looked into it and built special disk firmware which
> implemented locking, went closed source, OpenGFS forked it, Sistina
> got bought by Redhat, OpenGFS sorta died, went open-source by RedHat
> again...and through all of this I'm still waiting for a stable GFS
> with decent performance. :) Hopefully RHEL5.3 finally provides it.
> 

We have done what we can from an internal testing standpoint.  Would
really love some real life feedback on the beta bits or the latest
Fedora releases to see if we have been successful.  So, how about it,
participate by pulling the RHEL 5.3 beta version, configure courier-imap
with real data and provide feedback.  This is an opensource community
effort, am sure Steve Whitehouse would love to have feedback, analysis,
patches, etc..... :)

Thanks
Kevin




From mwill at penguincomputing.com  Thu Nov 13 21:31:54 2008
From: mwill at penguincomputing.com (Michael Will)
Date: Thu, 13 Nov 2008 13:31:54 -0800
Subject: [Linux-cluster] Clusters with multihomed hosts
In-Reply-To: <1226595264.9571.75.camel@quoit>
References: <491C5402.8060706@auckland.ac.nz> <1226595264.9571.75.camel@quoit>
Message-ID: <20081113213154.GI17008@miwi.penguincomputing.com>

Double check that the netmask on the two interfaces when
you inspect it with ifconfig is 255.255.255.0 and not 255.255.0.0
so that you really are on two separate networks.

Michael
On Thu, Nov 13, 2008 at 04:54:24PM +0000, Steven Whitehouse wrote:
> Hi,
> 
> On Thu, 2008-11-13 at 09:21 -0700, Michael O'Sullivan wrote:
> > Hi all,
> > 
> > I need to know more about using redundant NICs in clusters.
> > 
> > I have a 2-node cluster with 2 NICs in each node. The first NICs on each 
> > node are connected to one switch, the second NICs on each node are 
> > connected to another switch. This is an experimental arrangement so I am 
> > using /etc/hosts instead of DNS. It appears that the cluster software 
> > becomes confused if I put both NICs for the hosts in the /etc/hosts 
> > file, even if they are in different subnets. Here is the /etc/hosts file 
> > I would like to use:
> > 
> > # localhost line
> > 192.168.10.1     node1
> > 192.168.10.2     node2
> > 192.168.20.1     node1 # Second NIC on node 1
> > 192.168.20.2     node2 # Second NIC on node 2
> > 
> > but this seems to cause the cluster to hang (confused about which NIC to 
> > use?), so I have removed the last 2 lines and everything works fine. 
> > However, this means if the switch on the 192.168.10.x subnet fails the 
> > cluster will "break". If the cluster would recognise that node1 and 
> > node2 are available via the second NICs then I wouldn't have to worry 
> > about this single point-of-failure.
> > 
> The trouble with this kind of thing is that you find that its not easy
> to control which external IP address a particular application uses as
> you have discovered. It can be done though, with the aid of iproute2.
> 
> The kernel will look at the routing table to work out where to send a
> particular packet, and once its found a suitable destination interface
> it will then look at the various possible source IPs on that interface
> in order to work out which one to use. It tries to use the source
> address which has most bits matching with the destination (counting from
> the network end to the host end of the IP address) so that if the
> destination address is on a particular subnet, it will try to use an IP
> from the same subnet as the source address if one is available.
> 
> You can alter this quite easily though, you just set up a second routing
> table and use routing rules in order to select the correct table
> according to your network. Thats where iproute2 comes in and there is a
> set of docs here: http://lartc.org/
> 
> Also, just because you have two NIC's connected to different switches
> doesn't mean that you need to give them different IP addresses/subnets.
> The Linux IP stack can easily cope with them being the same, which would
> also simplify the situation that you have, where, I suspect the cluster
> stack has replied via a different NIC and thus received a different
> source address.
> 
> > I have thought about bonding the NICs which (I think) would take care of 
> > the problem, but I have heard that boding two NICs usually does not  
> > give double the bandwidth. I  have read a little about high-availability 
> > and failing over IP addresses, but this seems to be between different 
> > nodes, not different NICs in the same host.
> > 
> > Would anyone please be able to give me some direction about the best way 
> > to set up my cluster and NICs to make the cluster reliable in the event 
> > of switch failure? And keep the full bandwidth of the NICs intact?
> > 
> > Thanks in advance for any help you can give. Kind regards, Mike
> > 
> The problem with bonding is that a single packet can only use a single
> one of the parallel links. Also, by using multiple links on a single
> stream (i.e. a TCP connection) you run the risk, if you are not careful,
> of reordering the packets and that can cause slow downs at the receiving
> end, and possibly generation of out of order ACK packets which might
> cause retransmissions at the sending end, further slowing things down.
> 
> The Linux bonding driver has various modes to try and avoid that, and in
> addition it also has 802.3ad mode which allows it to automatically
> negotiate settings with a switch. Thats ideal if all the bonded links
> for a particular node go to the same switch, but won't work across
> switches as in your situation.
> 
> I suspect that the choice will come down to one of the following:
> 
> 1. something easy to set up & not very efficient in terms of bandwidth,
> but probably not too bad either.
>     -> choose bonding (just be sure to select the right mode)
> 
> 2. something more complex to set up, but which can be made to make full
> use all of the available bandwidth, and be extended into more
> complicated setups (dynamic routing, etc), given enough application
> support & tweeking.
>     -> choose the IP based solution
> 
> Steve.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From diegoliz at gmail.com  Thu Nov 13 23:03:54 2008
From: diegoliz at gmail.com (Diego Liziero)
Date: Fri, 14 Nov 2008 00:03:54 +0100
Subject: [Linux-cluster] Is GFS2 stable for production system?
In-Reply-To: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
References: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
Message-ID: <68fe87e60811131503m25acf15ei7175c93d6ff95453@mail.gmail.com>

On Thu, Nov 13, 2008 at 7:31 PM, Achievement Chan
<achievement.hk at gmail.com> wrote:
> Dear All,
> Is GFS2 stable for production system?

Not as regards the one in current 5.2.

I had this issues:

- locking with samba running on a single node that caused continuous
freezes of the shares, even when exported read-only after the latest
kernel+cman update (this could have been cased by the fact that not
all nodes have been rebooted after the update). After a reboot of all
nodes and a switch to the old stable gfs this hasn't happened any
longer.

- the last modification time of a file is not always updated on all
nodes (doing an ls of the same file on different nodes may show
different modification time after it has been edited on a node).

- sometimes the space used by deleted files is not freed. Launching
gfs2_fsck -y on unmounted filesystem detect lots of "Ondisk status is
1 (Data) but FSCK thinks it should be 0 (Free)" messages, but, despite
that, the space is still not freed.

BTW I've still a corrupted gfs2 empty filesystem that shows incorrect
free space if someone feels like to debug it.

Regards,
Diego.



From npf-mlists at eurotux.com  Fri Nov 14 09:48:20 2008
From: npf-mlists at eurotux.com (Nuno Fernandes)
Date: Fri, 14 Nov 2008 09:48:20 +0000
Subject: [Linux-cluster] Fence VirtualIron - i have the script but...
In-Reply-To: <e83473390811110026r7f2ad322s3b98df20caf02d7@mail.gmail.com>
References: <e83473390811110026r7f2ad322s3b98df20caf02d7@mail.gmail.com>
Message-ID: <200811140948.20717.npf-mlists@eurotux.com>

On Tuesday 11 November 2008 08:26:05 Maurizio Rottin wrote:
> Hello everyone,
> I need to fence VirtualIron VM in order to GFS to work when a node is
> not responding.
>
> Actually, i wrote a simple python script that fences the node, but...i
> don't understand how to integrate it in the cluster suite!
>
> What files should i touch in order to have this fence method in luci?

Hi,

We also use virtual iron. Could you please post your script?

Thanks,
Nuno Fernandes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081114/4fb74330/attachment.htm>

From npf-mlists at eurotux.com  Fri Nov 14 10:00:13 2008
From: npf-mlists at eurotux.com (Nuno Fernandes)
Date: Fri, 14 Nov 2008 10:00:13 +0000
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
Message-ID: <200811141000.13623.npf-mlists@eurotux.com>

Hi,

we have an cluster with 7 machines with a SAN. We are using them to provide 
virtual machines, so we are using clvmd.

At some point we are unable to access any of the pv/lv/vg tools. They are all 
stuck. From stracing them i've come to the conclusion that they are waiting 
for clvmd.

Has anyone been in this situation?

Thanks for any help,
Nuno Fernandes

in host xen1:                                                                                                                
Linux blade01.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4 14:13:09 EST 
2008 x86_64 x86_64 x86_64 GNU/Linux            
lvm2-cluster-2.02.32-4.el5                                                                                                   
cman-2.0.84-2.el5_2.1                                                                                                        
  PID TTY      STAT   TIME COMMAND                                                                                           
20874 ?        D<     0:00  \_ [dlm_recoverd]                                                                                
20854 pts/1    S+     0:00      \_ /bin/sh /sbin/service clvmd start                                                         
20861 pts/1    S+     0:00          \_ /bin/bash /etc/init.d/clvmd start                                                     
20931 pts/1    S+     0:00              \_ /usr/sbin/vgscan -d                                                               
20869 ?        Ssl    0:00 clvmd -T40                                                                                        
ps ax -o pid,cmd,wchan                                                                                                       
20874 [dlm_recoverd]              -                                                                                          
------------------------------                                                                                               
Connection to xen1 closed.                                                                                                   
in host xen2:                                                                                                                
Linux blade02.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56 WEST 
2007 x86_64 x86_64 x86_64 GNU/Linux            
lvm2-cluster-2.02.16-3.el5                                                                                                   
cman-2.0.64-1.0.1.el5                                                                                                        
  PID TTY      STAT   TIME COMMAND                                                                                           
22662 ?        D<     0:00  \_ [dlm_recoverd]                                                                                
22613 ?        Ssl    0:02 clvmd -T40                                                                                        
ps ax -o pid,cmd,wchan                                                                                                       
22662 [dlm_recoverd]              -                                                                                          
------------------------------                                                                                               
Connection to xen2 closed.                                                                                                   
in host xen3:                                                                                                                
Linux blade03.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56 WEST 
2007 x86_64 x86_64 x86_64 GNU/Linux            
lvm2-cluster-2.02.16-3.el5                                                                                                   
cman-2.0.64-1.0.1.el5                                                                                                        
  PID TTY      STAT   TIME COMMAND                                                                                           
22236 ?        D<     0:00  \_ [dlm_recoverd]                                                                                
22231 ?        Ssl    0:02 clvmd -T40                                                                                        
ps ax -o pid,cmd,wchan                                                                                                       
Connection to xen3 closed.                                                                                                   
22236 [dlm_recoverd]              dlm_wait_function                                                                          
------------------------------                                                                                               
in host xen4:                                                                                                                
Linux blade04.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56 WEST 
2007 x86_64 x86_64 x86_64 GNU/Linux            
lvm2-cluster-2.02.16-3.el5                                                                                                   
cman-2.0.64-1.0.1.el5                                                                                                        
  PID TTY      STAT   TIME COMMAND                                                                                           
25097 ?        D<     0:00  \_ [dlm_recoverd]                                                                                
25092 ?        Ssl    0:02 clvmd -T40                                                                                        
ps ax -o pid,cmd,wchan                                                                                                       
25097 [dlm_recoverd]              dlm_wait_function                                                                          
------------------------------                                                                                               
Connection to xen4 closed.                                                                                                   
in host xen5:
Linux blade05.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4 14:13:09 EST 
2008 x86_64 x86_64 x86_64 GNU/Linux
lvm2-cluster-2.02.32-4.el5
cman-2.0.84-2.el5_2.1
  PID TTY      STAT   TIME COMMAND
22333 ?        D<     0:00  \_ [dlm_recoverd]
22328 ?        Ssl    0:02 clvmd -T40
ps ax -o pid,cmd,wchan
22333 [dlm_recoverd]              -
------------------------------
Connection to xen5 closed.
in host xen6:
Linux blade06.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4 14:13:09 EST 
2008 x86_64 x86_64 x86_64 GNU/Linux
lvm2-cluster-2.02.32-4.el5
cman-2.0.84-2.el5_2.1
  PID TTY      STAT   TIME COMMAND
ps ax -o pid,cmd,wchan
------------------------------
Connection to xen6 closed.
in host xen7:
Linux blade07.dc.xpto.com 2.6.18-92.1.13.el5xen #1 SMP Wed Sep 24 20:01:15 EDT 
2008 x86_64 x86_64 x86_64 GNU/Linux
lvm2-cluster-2.02.32-4.el5
cman-2.0.84-2.el5
cman-2.0.84-2.el5_2.1
  PID TTY      STAT   TIME COMMAND
19793 ?        D<     0:00  \_ [dlm_recoverd]
19788 ?        Ssl    0:01 clvmd -T40
ps ax -o pid,cmd,wchan
19793 [dlm_recoverd]              -
------------------------------
Connection to xen7 closed.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081114/949033f7/attachment.htm>

From ccaulfie at redhat.com  Fri Nov 14 10:29:50 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Fri, 14 Nov 2008 10:29:50 +0000
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <200811141000.13623.npf-mlists@eurotux.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
Message-ID: <491D531E.9010104@redhat.com>

Nuno Fernandes wrote:
> Hi,
> 
> we have an cluster with 7 machines with a SAN. We are using them to
> provide virtual machines, so we are using clvmd.
> 
> At some point we are unable to access any of the pv/lv/vg tools. They
> are all stuck. From stracing them i've come to the conclusion that they
> are waiting for clvmd.
> 

They could be waiting for fencing to complete.

Have a look at the output from group_tool, that will tell you which
services have recovered after a node has joined or left the cluster

Chrissie


> Nuno Fernandes
> 
> in host xen1:
> 
> Linux blade01.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4
> 14:13:09 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.32-4.el5
> 
> cman-2.0.84-2.el5_2.1
> 
> PID TTY STAT TIME COMMAND
> 
> 20874 ? D< 0:00 \_ [dlm_recoverd]
> 
> 20854 pts/1 S+ 0:00 \_ /bin/sh /sbin/service clvmd start
> 
> 20861 pts/1 S+ 0:00 \_ /bin/bash /etc/init.d/clvmd start
> 
> 20931 pts/1 S+ 0:00 \_ /usr/sbin/vgscan -d
> 
> 20869 ? Ssl 0:00 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> 20874 [dlm_recoverd] -
> 
> ------------------------------
> 
> Connection to xen1 closed.
> 
> in host xen2:
> 
> Linux blade02.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56
> WEST 2007 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.16-3.el5
> 
> cman-2.0.64-1.0.1.el5
> 
> PID TTY STAT TIME COMMAND
> 
> 22662 ? D< 0:00 \_ [dlm_recoverd]
> 
> 22613 ? Ssl 0:02 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> 22662 [dlm_recoverd] -
> 
> ------------------------------
> 
> Connection to xen2 closed.
> 
> in host xen3:
> 
> Linux blade03.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56
> WEST 2007 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.16-3.el5
> 
> cman-2.0.64-1.0.1.el5
> 
> PID TTY STAT TIME COMMAND
> 
> 22236 ? D< 0:00 \_ [dlm_recoverd]
> 
> 22231 ? Ssl 0:02 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> Connection to xen3 closed.
> 
> 22236 [dlm_recoverd] dlm_wait_function
> 
> ------------------------------
> 
> in host xen4:
> 
> Linux blade04.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56
> WEST 2007 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.16-3.el5
> 
> cman-2.0.64-1.0.1.el5
> 
> PID TTY STAT TIME COMMAND
> 
> 25097 ? D< 0:00 \_ [dlm_recoverd]
> 
> 25092 ? Ssl 0:02 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> 25097 [dlm_recoverd] dlm_wait_function
> 
> ------------------------------
> 
> Connection to xen4 closed.
> 
> in host xen5:
> 
> Linux blade05.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4
> 14:13:09 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.32-4.el5
> 
> cman-2.0.84-2.el5_2.1
> 
> PID TTY STAT TIME COMMAND
> 
> 22333 ? D< 0:00 \_ [dlm_recoverd]
> 
> 22328 ? Ssl 0:02 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> 22333 [dlm_recoverd] -
> 
> ------------------------------
> 
> Connection to xen5 closed.
> 
> in host xen6:
> 
> Linux blade06.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4
> 14:13:09 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.32-4.el5
> 
> cman-2.0.84-2.el5_2.1
> 
> PID TTY STAT TIME COMMAND
> 
> ps ax -o pid,cmd,wchan
> 
> ------------------------------
> 
> Connection to xen6 closed.
> 
> in host xen7:
> 
> Linux blade07.dc.xpto.com 2.6.18-92.1.13.el5xen #1 SMP Wed Sep 24
> 20:01:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> lvm2-cluster-2.02.32-4.el5
> 
> cman-2.0.84-2.el5
> 
> cman-2.0.84-2.el5_2.1
> 
> PID TTY STAT TIME COMMAND
> 
> 19793 ? D< 0:00 \_ [dlm_recoverd]
> 
> 19788 ? Ssl 0:01 clvmd -T40
> 
> ps ax -o pid,cmd,wchan
> 
> 19793 [dlm_recoverd] -
> 
> ------------------------------
> 
> Connection to xen7 closed.
>



From npf-mlists at eurotux.com  Fri Nov 14 11:02:36 2008
From: npf-mlists at eurotux.com (Nuno Fernandes)
Date: Fri, 14 Nov 2008 11:02:36 +0000
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <491D531E.9010104@redhat.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<491D531E.9010104@redhat.com>
Message-ID: <200811141102.37139.npf-mlists@eurotux.com>

On Friday 14 November 2008 10:29:50 Christine Caulfield wrote:
> Nuno Fernandes wrote:
> > Hi,
> >
> > we have an cluster with 7 machines with a SAN. We are using them to
> > provide virtual machines, so we are using clvmd.
> >
> > At some point we are unable to access any of the pv/lv/vg tools. They
> > are all stuck. From stracing them i've come to the conclusion that they
> > are waiting for clvmd.
>
> They could be waiting for fencing to complete.
>
> Have a look at the output from group_tool, that will tell you which
> services have recovered after a node has joined or left the cluster

I don't think that is the reason..


# group_tool
type             level name     id       state
fence            0     default  00010002 none
[1 2 3 4 5 7]
dlm              1     clvmd    00010004 none
[1 2 3 4 5 7]

Any other ideas?
Best regards,
Nuno Fernandes

>
> Chrissie
>
> > Nuno Fernandes
> >
> > in host xen1:
> >
> > Linux blade01.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4
> > 14:13:09 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> >
> > lvm2-cluster-2.02.32-4.el5
> >
> > cman-2.0.84-2.el5_2.1
> >
> > PID TTY STAT TIME COMMAND
> >
> > 20874 ? D< 0:00 \_ [dlm_recoverd]
> >
> > 20854 pts/1 S+ 0:00 \_ /bin/sh /sbin/service clvmd start
> >
> > 20861 pts/1 S+ 0:00 \_ /bin/bash /etc/init.d/clvmd start
> >
> > 20931 pts/1 S+ 0:00 \_ /usr/sbin/vgscan -d
> >
> > 20869 ? Ssl 0:00 clvmd -T40
> >
> > ps ax -o pid,cmd,wchan
> >
> > 20874 [dlm_recoverd] -
> >
> > ------------------------------
> >
> > Connection to xen1 closed.
> >
> > in host xen2:
> >
> > Linux blade02.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56
> > WEST 2007 x86_64 x86_64 x86_64 GNU/Linux
> >
> > lvm2-cluster-2.02.16-3.el5
> >
> > cman-2.0.64-1.0.1.el5
> >
> > PID TTY STAT TIME COMMAND
> >
> > 22662 ? D< 0:00 \_ [dlm_recoverd]
> >
> > 22613 ? Ssl 0:02 clvmd -T40
> >
> > ps ax -o pid,cmd,wchan
> >
> > 22662 [dlm_recoverd] -
> >
> > ------------------------------
> >
> > Connection to xen2 closed.
> >
> > in host xen3:
> >
> > Linux blade03.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56
> > WEST 2007 x86_64 x86_64 x86_64 GNU/Linux
> >
> > lvm2-cluster-2.02.16-3.el5
> >
> > cman-2.0.64-1.0.1.el5
> >
> > PID TTY STAT TIME COMMAND
> >
> > 22236 ? D< 0:00 \_ [dlm_recoverd]
> >
> > 22231 ? Ssl 0:02 clvmd -T40
> >
> > ps ax -o pid,cmd,wchan
> >
> > Connection to xen3 closed.
> >
> > 22236 [dlm_recoverd] dlm_wait_function
> >
> > ------------------------------
> >
> > in host xen4:
> >
> > Linux blade04.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56
> > WEST 2007 x86_64 x86_64 x86_64 GNU/Linux
> >
> > lvm2-cluster-2.02.16-3.el5
> >
> > cman-2.0.64-1.0.1.el5
> >
> > PID TTY STAT TIME COMMAND
> >
> > 25097 ? D< 0:00 \_ [dlm_recoverd]
> >
> > 25092 ? Ssl 0:02 clvmd -T40
> >
> > ps ax -o pid,cmd,wchan
> >
> > 25097 [dlm_recoverd] dlm_wait_function
> >
> > ------------------------------
> >
> > Connection to xen4 closed.
> >
> > in host xen5:
> >
> > Linux blade05.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4
> > 14:13:09 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> >
> > lvm2-cluster-2.02.32-4.el5
> >
> > cman-2.0.84-2.el5_2.1
> >
> > PID TTY STAT TIME COMMAND
> >
> > 22333 ? D< 0:00 \_ [dlm_recoverd]
> >
> > 22328 ? Ssl 0:02 clvmd -T40
> >
> > ps ax -o pid,cmd,wchan
> >
> > 22333 [dlm_recoverd] -
> >
> > ------------------------------
> >
> > Connection to xen5 closed.
> >
> > in host xen6:
> >
> > Linux blade06.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4
> > 14:13:09 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> >
> > lvm2-cluster-2.02.32-4.el5
> >
> > cman-2.0.84-2.el5_2.1
> >
> > PID TTY STAT TIME COMMAND
> >
> > ps ax -o pid,cmd,wchan
> >
> > ------------------------------
> >
> > Connection to xen6 closed.
> >
> > in host xen7:
> >
> > Linux blade07.dc.xpto.com 2.6.18-92.1.13.el5xen #1 SMP Wed Sep 24
> > 20:01:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
> >
> > lvm2-cluster-2.02.32-4.el5
> >
> > cman-2.0.84-2.el5
> >
> > cman-2.0.84-2.el5_2.1
> >
> > PID TTY STAT TIME COMMAND
> >
> > 19793 ? D< 0:00 \_ [dlm_recoverd]
> >
> > 19788 ? Ssl 0:01 clvmd -T40
> >
> > ps ax -o pid,cmd,wchan
> >
> > 19793 [dlm_recoverd] -
> >
> > ------------------------------
> >
> > Connection to xen7 closed.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081114/0757bdd5/attachment.htm>

From mcasiraghi73 at gmail.com  Fri Nov 14 13:45:46 2008
From: mcasiraghi73 at gmail.com (Mauro Casiraghi)
Date: Fri, 14 Nov 2008 14:45:46 +0100
Subject: [Linux-cluster] Service Dependency
Message-ID: <18c35c650811140545v4d8b8b0cob050e2964965ef6c@mail.gmail.com>

How can i set service dependency ??

I need to set service dependency for to services

service A depend from service B

How is the file cluster.conf to do this ??

Best Regards

Mauro Casiraghi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081114/3f7a9739/attachment.htm>

From mcasiraghi73 at gmail.com  Fri Nov 14 13:46:03 2008
From: mcasiraghi73 at gmail.com (Mauro Casiraghi)
Date: Fri, 14 Nov 2008 14:46:03 +0100
Subject: [Linux-cluster] Service Dependency
Message-ID: <18c35c650811140546m14e40adcuc4b37fcc9b739a6b@mail.gmail.com>

How can i set service dependency ??

I need to set service dependency for to services

service A depend from service B

How is the file cluster.conf to do this ??

Best Regards

Mauro Casiraghi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081114/598fcf3c/attachment.htm>

From veliogluh at itu.edu.tr  Fri Nov 14 14:31:49 2008
From: veliogluh at itu.edu.tr (Hakan VELIOGLU)
Date: Fri, 14 Nov 2008 16:31:49 +0200
Subject: [Linux-cluster] Cman doesn't realize the failed node
In-Reply-To: <20081112131700.16801d3do17enme8@webmail.beta.itu.edu.tr>
References: <91632af9d61397469c647a809455d908@mail.bardiauto.hu>
	<20081112131700.16801d3do17enme8@webmail.beta.itu.edu.tr>
Message-ID: <20081114163149.27785czyk7jxdfbs@webmail.beta.itu.edu.tr>

Hi,

I solved my problem. When the kernel IP forwarding feature  
(/proc/sys/net/ipv4/ip_forward) is 0, then cluster nodes don't realize  
the failure. I write this solution to help others.

However, I am curious about that is all of your RedHat 5 OS default  
ip_forward settting is enabled? Are all your failover clusters working  
as expected ?

Have a nice day list.

PS: This change is included just in Red Hat 4 Cluster Suite  
documentation not in Red Hat 5 cluster suite. Interesting!!!

----- veliogluh at itu.edu.tr den ileti ---------
   Tarih: Wed, 12 Nov 2008 13:17:00 +0200
  Kimden:  Hakan VELIOGLU <veliogluh at itu.edu.tr>
Yan?t Adresi:linux clustering <linux-cluster at redhat.com>
    Konu: [Linux-cluster] Cman doesn't realize the failed node
    Kime: linux clustering <linux-cluster at redhat.com>


> Hi,
>
> I am testing and trying to understand the cluster environment. I ve  
> built a two node cluster system without any service (Red Hat EL 5.2  
> x64). I run the cman and rgmanager services succesfully and then  
> poweroff one node suddenly. After thsi I excpect that the other node  
> realize this failure and take up all the resources however running  
> node doesn't realize this failure. I use "cman_tool nodes" and  
> "clustat" commands and they say the failed node is active and  
> online. What am i missing? Why cman doesn't realize the failure?
>
> [root at cl1 ~]# cat /etc/cluster/cluster.conf
> <?xml version="1.0" ?>
> <cluster alias="kume" config_version="54" name="kume">
>         <totem token="1000" hold="100"/>
>         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="cl2.cc.itu.edu.tr" nodeid="1" votes="1">
>                         <fence/>
>                 </clusternode>
>                 <clusternode name="cl1.cc.itu.edu.tr" nodeid="2" votes="1">
>                         <fence/>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices/>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="domain" ordered="1"  
> restricted="1">
>                                 <failoverdomainnode  
> name="cl2.cc.itu.edu.tr" priority="1"/>
>                                 <failoverdomainnode  
> name="cl1.cc.itu.edu.tr" priority="2"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources/>
>                 <service autostart="0" domain="domain"  
> name="veritabani" recovery="restart"/>
>         </rm>
> </cluster>
> [root at cl1 ~]#
>
>
> When the node gows down, the TOTEM repeastedly logs messages like this.
> Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
> Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
> Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
> Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
> Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
> Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
> Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
> Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
>
>
>
> Hakan
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


----- veliogluh at itu.edu.tr den iletiyi bitir -----





From maurizio.rottin at gmail.com  Fri Nov 14 14:31:37 2008
From: maurizio.rottin at gmail.com (Maurizio Rottin)
Date: Fri, 14 Nov 2008 15:31:37 +0100
Subject: Fwd: [Linux-cluster] Fence VirtualIron - i have the script but...
In-Reply-To: <e83473390811140230g55aabe54tec6a279204c1d067@mail.gmail.com>
References: <e83473390811110026r7f2ad322s3b98df20caf02d7@mail.gmail.com>
	<200811140948.20717.npf-mlists@eurotux.com>
	<e83473390811140230g55aabe54tec6a279204c1d067@mail.gmail.com>
Message-ID: <e83473390811140631g7e695ae5n9afe360058c24138@mail.gmail.com>

it seems that i lost the mailing list address...then i forward the
answer to Nuno Fernandes.

And Nuno, please, write after this thread if you make some
improvments, i'll be glad to discuss bugs/improvements/ideas.

Bye!

---------- Forwarded message ----------
From: Maurizio Rottin <maurizio.rottin at gmail.com>
Date: 2008/11/14
Subject: Re: [Linux-cluster] Fence VirtualIron - i have the script but...
To: Nuno Fernandes <npf-mlists at eurotux.com>


2008/11/14 Nuno Fernandes <npf-mlists at eurotux.com>:
> We also use virtual iron. Could you please post your script?
>
> Thanks,
>
> Nuno Fernandes

first of all you must have a working java>1.5.0

than (mind that my scripts are little "temporary")

vim /sbin/fence_vivm
#!/bin/bash
# Maurizio Rottin 2008-11-11
#
# fence a VirtualIron VirtualServer
#
###############################################################################
#
# Copyright (C) 2008 Maurizio Rottin.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License, version 2, as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
# or visit http://www.gnu.org/licenses/gpl.txt
#
###############################################################################

#must parse arguments passed from stdin
#in format name=param
#must ignore #name=param
#there should not be two name=param string passed, as the xml should
be considered bad!
name=""
vm=""
while read param;do
       if [ `expr index $param \# ` -ne 0 ];then
               continue
       elif expr match $param 'ip=.*' >/dev/null ;then
               ip=${param#ip=}
       elif expr match $param 'name=.*'  >/dev/null ;then
               name=${param#name=}
       elif expr match $param 'vm=.*'  >/dev/null ;then
               vm=${param#vm=}
       fi
done
#end parsing
if [ "$vm" == "" ];then
       echo "$$: No name provided!"
       exit 1
fi
basedir="/root/vso"

if [ -d $basedir ];then
       cd /root/vso
       sh ./runner --vivmgr=http://youripaddress:80  --username=admin
--password='yourpassword' --inputfile=vsOperations.py --action=fence
--vs="${vm}"
       retval=$?
else
       retval=1
fi
exit $retval
->>>>end of script

of course you can pass also password, ip and so on, depending on your needs.


then you must copy this files from VirtualIron directory into
/root/vso (if you change /root/vso, then change it also in the
fence_vivm)


./system
./system/resources
./system/resources/lib
./system/resources/lib/myoocomClient.jar
./system/resources/lib/MgmtAPI.jar
./system/resources/lib/MgmtUtil.jar
./system/resources/lib/runner.jar
./system/resources/lib/jython.jar
./system/resources/lib/log4j.jar
./system/resources/lib/MgmtControl.jar
./system/resources/lib/jline.jar
./system/resources/lib/MgmtTools.jar
./system/resources/lib/myoodbClient.jar
./etc
./etc/runner.properties


ok, now we need two more scripts

This is taken from VirtualIron so i don't know if some licences are involved.

vim /root/vso/runner
#!/bin/bash
export INS_PATH=`pwd`

#
# run in the context of the install
#
cd "$INS_PATH"

#
#   usage: runner.sh --mode=<jython|sql>
#
#          --mode      : default is <jython>
#          --vivmgr    : default is <tcp://localhost:54321>
#          --username  : default is <admin>
#          --password  : default is <admin>
#          --inputfile : default is <interactive>
#
#          --help      : print usage
#

java -Xmx512m -Dpython.inclusive.packages="java,javax,org.python" -jar
"system/resources/lib/runner.jar" $*
exit $?
->>> end script
chmod 700 /root/vso/runner

AND

taken from VirtualIron scripts and modified for our needs:

vim /root/vso/vsOperations.py
#
# vsOperations: perform start/stop/shutdown/restart action on VirtualServer
#
from com.virtualiron.vce.mgmt.api import VirtualizationManager
from com.virtualiron.vce.mgmt.api.virtual import VirtualServer
import java.lang
import java.util
import os
import string
import sys
import traceback

def Usage():
       if os.name == 'nt':
               command = 'runner.bat'
       else:
               command = 'runner.sh'
       print 'Usage: %s --inputfile=vsOperations.py --vs="Virtual
Server" --action=[start,stop,shutdown,restart]' % (command)
       print 'or Usage: %s --vivmgr=http://192.168.0.48:80
--username=admin --password=\'dba at iron\' --inputfile=vsOperations.py
--action=[start,stop,shutdown,restart,reboot]' % (command)
       sys.exit(1)

#
# parse command line arguments
#
vsName = None
action = None
for arg in sys.argv[1:]:
       if string.find(arg, "--vs=") != -1:
               tokens = string.splitfields(arg, "=")
               vsName = tokens[1]

       elif string.find(arg, "--action=") != -1:
               tokens = string.splitfields(arg, "=")
               action = tokens[1]

#
# check if required arguments were specified
#
if vsName is None or action is None:
       Usage()

#
# get connection to Database
#
configurationManager = VirtualizationManager.getConfigurationManager()

#
# find virtual server object
#
vs = configurationManager.findObject(VirtualServer, vsName)
if vs is None:
       print 'FAIL to find VirtualServer %s' % (vsName)
       sys.exit(1)

#
# wrap VS action in job control
#
error = 0
try:
       jobName = java.lang.Long.toString(configurationManager.getLocalTime())
       job = configurationManager.createJob(jobName)
       job.begin()

       if action == 'start':
               vs.start()
               job.addOperationDescription("Start VirtualServer", vs, vs, vs)
       elif action == 'stop':
               vs.stop()
               job.addOperationDescription("Stop VirtualServer", vs, vs, vs)
       elif action == 'shutdown':
               vs.shutdown()
               job.addOperationDescription("Shutdown VirtualServer",
vs, vs, vs)
       elif action == 'restart':
               vs.restart()
               job.addOperationDescription("Restart VirtualServer", vs, vs, vs)
       elif action == 'reboot':
               vs.reboot()
               job.addOperationDescription("Hard reset and boot
VirtualServer", vs, vs, vs)
       elif action == 'fence':
               statusEvent = vs.getStatusEvent().toString()
               if string.find(statusEvent, 'VirtualServerStoppedEvent') == 0:
                       print "fence: starting"
                       vs.start()
                       job.addOperationDescription("Fence Start
VirtualServer", vs, vs, vs)
               elif string.find(statusEvent, 'VirtualServerRunningEvent') == 0:
                       print "fence: rebooting"
                       vs.reboot()
                       job.addOperationDescription("Fence reboot
VirtualServer", vs, vs, vs)
               elif string.find(statusEvent,
'VirtualServerStartingEvent') == 0:
                       print "wait 60 sec reboot()? or do nothing?"
               else:
                       print "Unknown status %s " % (statusEvent)
       else:
               error = 1
               print 'Unknown action:', action
               job.abort()

       if not error:
               job.commit()            # commit job

except java.lang.Throwable, throw:
       job.abort()             # if job fails, rollback
       throw.printStackTrace()
except:
       job.abort()             # if job fails, rollback
       traceback.print_exc()

if error:
       Usage()

->>> end script

chmod 700 /root/vso/vsOperations.py


Last thing is to modify the cluster.conf
copy your cluster.conf somewhere
add plus 1 to "config_version=number"

the beginning should look like this:
<------begin
<?xml version="1.0"?>
<cluster alias="VironLab" config_version="33" name="VironLab">
       <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
       <clusternodes>
               <clusternode name="bend02.viron.local" nodeid="1" votes="1">
                       <fence>
                               <method name="1">
                                       <device name="bend02"/>
                               </method>
                       </fence>
               </clusternode>
               <clusternode name="bend01.viron.local" nodeid="2" votes="1">
                       <fence>
                               <method name="1">
                                       <device name="bend01"/>
                               </method>
                       </fence>
               </clusternode>
       </clusternodes>
       <cman expected_votes="1" two_node="1"/>
       <fencedevices>
               <fencedevice agent="fence_vivm" name="bend02" vm="bend_02"/>
               <fencedevice agent="fence_vivm" name="bend01" vm="bend_01"/>
       </fencedevices>
       <rm>
--------->cut<-------

where "bend_02" and "bend_01" are the real names in the VirtualIron manager!

update the cluste.conf while the cluster is running:
ccs_tool update cluster.conf


enjoy it!

--
mr



-- 
mr



From teigland at redhat.com  Fri Nov 14 16:26:49 2008
From: teigland at redhat.com (David Teigland)
Date: Fri, 14 Nov 2008 10:26:49 -0600
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <200811141000.13623.npf-mlists@eurotux.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
Message-ID: <20081114162649.GA4054@redhat.com>

On Fri, Nov 14, 2008 at 10:00:13AM +0000, Nuno Fernandes wrote:
> 22236 [dlm_recoverd]              dlm_wait_function
> 25097 [dlm_recoverd]              dlm_wait_function

dlm recovery appears to be stuck; this is usually due to a problem at the
network level.  The recovery seems to be caused by a node starting clvmd.

sysrq-t backtraces from all the nodes could confirm some of this, and
adding <dlm log_debug="1"/> to cluster.conf would give us more information
the next time it happens.

Dave



From curtis at athabascau.ca  Fri Nov 14 17:06:51 2008
From: curtis at athabascau.ca (Curtis Collicutt)
Date: Fri, 14 Nov 2008 10:06:51 -0700
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <200811141000.13623.npf-mlists@eurotux.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
Message-ID: <1226682383-sup-5263@beaker.cs.athabascau.ca>

Excerpts from Nuno Fernandes's message of Fri Nov 14 03:00:13 -0700 2008:
> Hi,
> 
> we have an cluster with 7 machines with a SAN. We are using them to provide 
> virtual machines, so we are using clvmd.
> 
> At some point we are unable to access any of the pv/lv/vg tools. They are all 
> stuck. From stracing them i've come to the conclusion that they are waiting 
> for clvmd.
> 
> Has anyone been in this situation?

This happens to me as well every once and a while. Haven't figure it out yet either.

Thanks,
Curtis.

> 
> Thanks for any help,
> Nuno Fernandes
> 
> in host xen1:                                                                  
>                                              
> Linux blade01.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4 14:13:09 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux            
> lvm2-cluster-2.02.32-4.el5                                                     
>                                              
> cman-2.0.84-2.el5_2.1                                                          
>                                              
>   PID TTY      STAT   TIME COMMAND                                             
>                                              
> 20874 ?        D<     0:00  \_ [dlm_recoverd]                                  
>                                              
> 20854 pts/1    S+     0:00      \_ /bin/sh /sbin/service clvmd start           
>                                              
> 20861 pts/1    S+     0:00          \_ /bin/bash /etc/init.d/clvmd start       
>                                              
> 20931 pts/1    S+     0:00              \_ /usr/sbin/vgscan -d                 
>                                              
> 20869 ?        Ssl    0:00 clvmd -T40                                          
>                                              
> ps ax -o pid,cmd,wchan                                                         
>                                              
> 20874 [dlm_recoverd]              -                                            
>                                              
> ------------------------------                                                 
>                                              
> Connection to xen1 closed.                                                     
>                                              
> in host xen2:                                                                  
>                                              
> Linux blade02.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56 WEST 
> 2007 x86_64 x86_64 x86_64 GNU/Linux            
> lvm2-cluster-2.02.16-3.el5                                                     
>                                              
> cman-2.0.64-1.0.1.el5                                                          
>                                              
>   PID TTY      STAT   TIME COMMAND                                             
>                                              
> 22662 ?        D<     0:00  \_ [dlm_recoverd]                                  
>                                              
> 22613 ?        Ssl    0:02 clvmd -T40                                          
>                                              
> ps ax -o pid,cmd,wchan                                                         
>                                              
> 22662 [dlm_recoverd]              -                                            
>                                              
> ------------------------------                                                 
>                                              
> Connection to xen2 closed.                                                     
>                                              
> in host xen3:                                                                  
>                                              
> Linux blade03.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56 WEST 
> 2007 x86_64 x86_64 x86_64 GNU/Linux            
> lvm2-cluster-2.02.16-3.el5                                                     
>                                              
> cman-2.0.64-1.0.1.el5                                                          
>                                              
>   PID TTY      STAT   TIME COMMAND                                             
>                                              
> 22236 ?        D<     0:00  \_ [dlm_recoverd]                                  
>                                              
> 22231 ?        Ssl    0:02 clvmd -T40                                          
>                                              
> ps ax -o pid,cmd,wchan                                                         
>                                              
> Connection to xen3 closed.                                                     
>                                              
> 22236 [dlm_recoverd]              dlm_wait_function                            
>                                              
> ------------------------------                                                 
>                                              
> in host xen4:                                                                  
>                                              
> Linux blade04.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56 WEST 
> 2007 x86_64 x86_64 x86_64 GNU/Linux            
> lvm2-cluster-2.02.16-3.el5                                                     
>                                              
> cman-2.0.64-1.0.1.el5                                                          
>                                              
>   PID TTY      STAT   TIME COMMAND                                             
>                                              
> 25097 ?        D<     0:00  \_ [dlm_recoverd]                                  
>                                              
> 25092 ?        Ssl    0:02 clvmd -T40                                          
>                                              
> ps ax -o pid,cmd,wchan                                                         
>                                              
> 25097 [dlm_recoverd]              dlm_wait_function                            
>                                              
> ------------------------------                                                 
>                                              
> Connection to xen4 closed.                                                     
>                                              
> in host xen5:
> Linux blade05.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4 14:13:09 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> lvm2-cluster-2.02.32-4.el5
> cman-2.0.84-2.el5_2.1
>   PID TTY      STAT   TIME COMMAND
> 22333 ?        D<     0:00  \_ [dlm_recoverd]
> 22328 ?        Ssl    0:02 clvmd -T40
> ps ax -o pid,cmd,wchan
> 22333 [dlm_recoverd]              -
> ------------------------------
> Connection to xen5 closed.
> in host xen6:
> Linux blade06.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4 14:13:09 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> lvm2-cluster-2.02.32-4.el5
> cman-2.0.84-2.el5_2.1
>   PID TTY      STAT   TIME COMMAND
> ps ax -o pid,cmd,wchan
> ------------------------------
> Connection to xen6 closed.
> in host xen7:
> Linux blade07.dc.xpto.com 2.6.18-92.1.13.el5xen #1 SMP Wed Sep 24 20:01:15 EDT 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> lvm2-cluster-2.02.32-4.el5
> cman-2.0.84-2.el5
> cman-2.0.84-2.el5_2.1
>   PID TTY      STAT   TIME COMMAND
> 19793 ?        D<     0:00  \_ [dlm_recoverd]
> 19788 ?        Ssl    0:01 clvmd -T40
> ps ax -o pid,cmd,wchan
> 19793 [dlm_recoverd]              -

__ 
    This communication is intended for the use of the recipient to whom it
    is addressed, and may contain confidential, personal, and or privileged
    information. Please contact us immediately if you are not the intended
    recipient of this communication, and do not copy, distribute, or take
    action relying on it. Any communications received in error, or
    subsequent reply, should be deleted or destroyed.
---



From lhh at redhat.com  Fri Nov 14 21:49:43 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 14 Nov 2008 16:49:43 -0500
Subject: [Linux-cluster] Limit service restarting times
In-Reply-To: <8a5668960811130156i2ef6f533s1f39332b5dd72195@mail.gmail.com>
References: <8a5668960811130156i2ef6f533s1f39332b5dd72195@mail.gmail.com>
Message-ID: <1226699383.25751.66.camel@ayanami>

On Thu, 2008-11-13 at 10:56 +0100, Juan Ramon Martin Blanco wrote:
> First of all, hello and many thanks everyone, this list has helped me
> a lot in the cluster world ;)
> 
> I have configured a 2 node cluster with RHEL 5.2, shared storage and
> GFS2.
> I have configured several services with our company own software. This
> software evolves fast because we are in active development, so
> sometimes cores are dumped. When this happens, the cluster tries to
> restart the failing service again and again...filling the service's
> filesystem with cores. 
> Is there any way to limit the number of retries for a certain service?

<service max_restarts="x" restart_expire="y" .../>

max_restarts="x" 

 * Maximum tolerated.  Ex: 3 means the *4th* restart will fail

restart_expire="y"

 * After this # of seconds time, a restart is forgotten.

-- Lon






From npf-mlists at eurotux.com  Fri Nov 14 21:53:13 2008
From: npf-mlists at eurotux.com (Nuno Fernandes)
Date: Fri, 14 Nov 2008 21:53:13 +0000
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <20081114162649.GA4054@redhat.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<20081114162649.GA4054@redhat.com>
Message-ID: <200811142153.13752.npf-mlists@eurotux.com>

On Friday 14 November 2008 16:26:49 David Teigland wrote:
> On Fri, Nov 14, 2008 at 10:00:13AM +0000, Nuno Fernandes wrote:
> > 22236 [dlm_recoverd]              dlm_wait_function
> > 25097 [dlm_recoverd]              dlm_wait_function
>
> dlm recovery appears to be stuck; this is usually due to a problem at the
> network level.  The recovery seems to be caused by a node starting clvmd.
Hi,

I don't know if it helps, but groupd is using all available CPU, but only in 2 
of the nodes.

I don't know if it's required to be up.. but we've disabled IPV6..

snip of modprobe.conf:

alias net-pf-10 off

Best regards,
./npf

>
> sysrq-t backtraces from all the nodes could confirm some of this, and
> adding <dlm log_debug="1"/> to cluster.conf would give us more information
> the next time it happens.
>
> Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081114/871ed8e3/attachment.htm>

From lhh at redhat.com  Fri Nov 14 21:54:45 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 14 Nov 2008 16:54:45 -0500
Subject: [Linux-cluster] RHEL3 Cluster Broken Pipe error and Heartbeat
	configuration
In-Reply-To: <29e045b80811120544j1a85eeay237b72daf8de3e16@mail.gmail.com>
References: <29e045b80811120544j1a85eeay237b72daf8de3e16@mail.gmail.com>
Message-ID: <1226699685.25751.72.camel@ayanami>

On Wed, 2008-11-12 at 19:14 +0530, lingu wrote:
> cluquorumd[1921]: <warning> Disk-TB: Detected I/O Hang!

Eep.

This means that I/O to shared storage has gotten slow.  Strange.  I
heard reports of this on another cluster (after going from U3->U8), but
I don't know what the cause is.  With this cluster, we straced the
cluquorumd process and found that it was slowing down *a lot* in the
write() call when writing to shared storage.

You can try the current U9+erratum clumanager or the test release if you
want to (it makes unlock more robust when I/O performance is slow for
some reason).

However, someone really needs to profile the kernel if you're seeing
slow write times while stracing cluquorumd...

-- Lon



From lhh at redhat.com  Fri Nov 14 22:02:03 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 14 Nov 2008 17:02:03 -0500
Subject: [Linux-cluster] RedHat Cluster Suite cluster resource management
In-Reply-To: <18c35c650811130614i2a58e035w3bb9f574075389ca@mail.gmail.com>
References: <18c35c650811130614i2a58e035w3bb9f574075389ca@mail.gmail.com>
Message-ID: <1226700123.25751.80.camel@ayanami>

On Thu, 2008-11-13 at 15:14 +0100, Mauro Casiraghi wrote:
> I had created this cluster configuration with Redhat Cluster Suite 
>  
> I Have one Service Group with the follow resources
>  
> Service Group Name : WEB
>  
> Resources of the service group:
>  
> 1) IP_ADRESS
> 2) APACHE
>  
> The resource dependency are:
>  
> The Apache resource is dependent of ip_adress

Assuming you are running RHEL 5.3 beta or RHEL5.2 (or using stable2
branch of linux-cluster):

<rm> 
  <service name="apache" depend="service:ip_address">
    <script name="my_apache_script" ... />
  </service>
  <service name="ip_address" >
    <ip adress="my_ip_address"/>
  </service>
</rm>

[NOTE: that because rgmanager is an 'event-action' (not 'dependency
modeling') between services, disabling ip_address will cause a
successive transition of apache, not a preceding transition.]

OR..........

<rm> 
  <service name="apache_service">
    <script name="my_apache_script" ... />
    <ip adress="my_ip_address"/>
  </service>
</rm>

Start the service, then do:

  'clusvcadm -Z apache_service'

... then do whatever you want, then ...
  
  'clusvcadm -U apache_service'

[NOTE: 5.3 beta or STABLE2 branch ONLY!]

-- Lon




From teigland at redhat.com  Fri Nov 14 22:05:15 2008
From: teigland at redhat.com (David Teigland)
Date: Fri, 14 Nov 2008 16:05:15 -0600
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <200811142153.13752.npf-mlists@eurotux.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<20081114162649.GA4054@redhat.com>
	<200811142153.13752.npf-mlists@eurotux.com>
Message-ID: <20081114220515.GB7394@redhat.com>

On Fri, Nov 14, 2008 at 09:53:13PM +0000, Nuno Fernandes wrote:
> > On Fri, Nov 14, 2008 at 10:00:13AM +0000, Nuno Fernandes wrote:
> > dlm recovery appears to be stuck; this is usually due to a problem at the
> > network level.  The recovery seems to be caused by a node starting clvmd.
> Hi,
> 
> I don't know if it helps, but groupd is using all available CPU, but
> only in 2 of the nodes.

That sounds like https://bugzilla.redhat.com/show_bug.cgi?id=444529
which is fixed in 5.3.  I suspect that's the cause of you're problems.

Dave



From lhh at redhat.com  Fri Nov 14 22:11:31 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 14 Nov 2008 17:11:31 -0500
Subject: [Linux-cluster] Service Group Dependency
In-Reply-To: <18c35c650811130628r613fddbfi42bf4d0b2a83e1fe@mail.gmail.com>
References: <18c35c650811130628r613fddbfi42bf4d0b2a83e1fe@mail.gmail.com>
Message-ID: <1226700691.25751.91.camel@ayanami>

On Thu, 2008-11-13 at 15:28 +0100, Mauro Casiraghi wrote:
> If i have two or more services how can set dependency for the services
> using Redhat Cluster Suite??
>  
> For example if i have service A B end C how can set those dependency:
>  
> B depend from A and C depend from B
>  
> In this case to stop service group A i must stop before C and B
>  
> What is the cluster.conf setting ??

That depends.  Starting services is done in the "right" order internally
using the 'depend=' service attribute, but it's done backwards for
'stop' (e.g. stopping A will cause 'dependent' services to subsequently
stop, not stop before A).

Rgmanager has an event-action processor you can use to make this work,
(it's not a typical dependency engine, though).  You'd have to:

  (a) codify the actions necessary
  (b) enable central_processing
  (c) set up an event trigger to catch an event *before* a service
transitions and make some decisions.

A-+->B
  |
  +->C

...should be easy...

A->B->C

...would be more difficult.

There's a (very) experimental dependency engine / transition
calculator / etc. in the master branch of git which uses an A* search,
but it's not likely that we will be integrating it...

-- Lon




From lhh at redhat.com  Fri Nov 14 23:17:01 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 14 Nov 2008 18:17:01 -0500
Subject: [Linux-cluster] Limit service restarting times
In-Reply-To: <1226699383.25751.66.camel@ayanami>
References: <8a5668960811130156i2ef6f533s1f39332b5dd72195@mail.gmail.com>
	<1226699383.25751.66.camel@ayanami>
Message-ID: <1226704621.25751.93.camel@ayanami>

On Fri, 2008-11-14 at 16:49 -0500, Lon Hohberger wrote:

> restart_expire="y"

Er, whoops - restart_expire_time="y" :)

-- Lon



From hicheerup at gmail.com  Sat Nov 15 18:55:57 2008
From: hicheerup at gmail.com (lingu)
Date: Sun, 16 Nov 2008 00:25:57 +0530
Subject: [Linux-cluster] RHEL3 Cluster Broken Pipe error and Heartbeat
	configuration
In-Reply-To: <1226699685.25751.72.camel@ayanami>
References: <29e045b80811120544j1a85eeay237b72daf8de3e16@mail.gmail.com>
	<1226699685.25751.72.camel@ayanami>
Message-ID: <29e045b80811151055u184467ddk4b3a724b752e4e6b@mail.gmail.com>

Hi Lon,

   Thanks a lot for your valuable reply, I am waiting for your reply
only, i know this type of cluster error i got only during heavy I/O to
the shared storage(scsi) even i read the sar report telling that
system  idle percentage of merry 1 to 2 % at that time.If i go with
fibre channel storage can i prevent this type of issues without
upgrading the clumanager version. Also Please help me in configuring
the below part if your suggesting some thing else for the below thing
than also i will be very happy.

############################################################
 Also  anyone help me to configure a dedicated LAN (for example eth3)
as heartbeat(private  point to point cross over cable network for
cluster communications),I don't wish heartbeat over public LAN ,
because of heavy Network saturation.

 Forthe above heartbeat configuration  i didn't found any suitable
document for RHEL. Can  you provide me the suitable link or guide
me what are all the changes i have to made in my  existing cluster.xml
 file for this private heartbeat configuration to work.
#############################################################

Anticipating your reply


On Sat, Nov 15, 2008 at 3:24 AM, Lon Hohberger <lhh at redhat.com> wrote:
> On Wed, 2008-11-12 at 19:14 +0530, lingu wrote:
>> cluquorumd[1921]: <warning> Disk-TB: Detected I/O Hang!
>
> Eep.
>
> This means that I/O to shared storage has gotten slow.  Strange.  I
> heard reports of this on another cluster (after going from U3->U8), but
> I don't know what the cause is.  With this cluster, we straced the
> cluquorumd process and found that it was slowing down *a lot* in the
> write() call when writing to shared storage.
>
> You can try the current U9+erratum clumanager or the test release if you
> want to (it makes unlock more robust when I/O performance is slow for
> some reason).
>
> However, someone really needs to profile the kernel if you're seeing
> slow write times while stracing cluquorumd...
>
> -- Lon
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From hicheerup at gmail.com  Sat Nov 15 19:27:01 2008
From: hicheerup at gmail.com (lingu)
Date: Sun, 16 Nov 2008 00:57:01 +0530
Subject: [Linux-cluster] Oracle start up script issue with RHEL3 Cluster
Message-ID: <29e045b80811151127l7b092683xfd257ba9f3d2e585@mail.gmail.com>

Hi all

I am running two node active/passive cluster on RHEL3U8-64 bit
operating system for my oracle 9i  database.I am facing issue when
cluster fail over to node 2 my oracle start up script not starting up
even though i configured start up script in my cluster configuration
but if i type clustat on node 2  it shows service is started. If i
shift the service manually from node 2 to node 1 or even if rebooted
the node1  my cluster starting up the same  service  properly on node
1.Any one help me to fix out this issue.Below  is my script.

Cluster Version: clumanager-1.2.31-1.x86_64

Note: If i start the same script manually on the node 2 it is starting
the database properly without any error.

/etc/init.d/script_db.sh
#####################################################################################
#!/bin/bash
. /etc/rc.d/init.d/functions

start()	{
	echo "Starting Database"

	su -l oracle -c "sh startdb.sh"
	## Sleep for 2 Minuts ####

#	sleep 120
#	echo " reStarting app"
#        clusvcadm -R application
#        echo " app restarted"

	RETVAL=$?
	return $RETVAL
}


stop()	{
	echo "Stopping  Database"
	
	su -l oracle -c "sh stopdb.sh"

	RETVAL=$?
	return $RETVAL
}

status() {
#DBUP=`ps -ef | grep oracle | grep -v grep | grep -c ora_pmon `
DBUP=`ps -ef | grep -c ora_pmon `
#LSNRUP=`ps -ef | grep oracle | grep 9.2.0  | grep -v grep | grep -c lsnrctl`
LSNRUP=`ps -ef | grep -c lsnrctl`
if [ ${DBUP} -ge 1 ]
then
        if [ ${LSNRUP} -ge 1 ]
        then
                echo "Database Running"
		return 0
        fi
fi
echo "Database  Not Running"
return 1
}


case "$1" in
	start)
	        start
		echo "Start Database complete"
	        ;;
	
	stop)
         	stop
		echo "Database Stopped"
		;;
	restart)
	         stop
	         start
	         ;;
	status)
		status
		;;
	
	*)
	
	echo $" Not Applicable"
	exit 1

esac
echo "exiting script"
exit $RETVAL
###################################################################################
stopdb.sh

/home/oracle/orahome/bin/sqlplus /nolog<<EOF
connect sys/sysxyzcall at xyzcall as sysdba
shutdown immediate;
host lsnrctl stop
quit;
EOF
exit
######################################################################################
startdb.sh
#######################################################################
/home/oracle/orahome/bin/sqlplus /nolog <<EOF
host lsnrctl stop
host lsnrctl start
connect sys/sysxyzcall at xyzcall as sysdba
startup
quit;
EOF
exit
############################################################################

Regards,
Lingu



From orkcu at yahoo.com  Sun Nov 16 15:26:00 2008
From: orkcu at yahoo.com (Roger Pena Escobio)
Date: Sun, 16 Nov 2008 07:26:00 -0800 (PST)
Subject: [Linux-cluster] RHEL3 Cluster Broken Pipe error and Heartbeat
	configuration
In-Reply-To: <29e045b80811151055u184467ddk4b3a724b752e4e6b@mail.gmail.com>
Message-ID: <317242.49815.qm@web88305.mail.re4.yahoo.com>




--- On Sat, 11/15/08, lingu <hicheerup at gmail.com> wrote:

> From: lingu <hicheerup at gmail.com>
> Subject: Re: [Linux-cluster] RHEL3 Cluster Broken Pipe error and Heartbeat configuration
> To: "linux clustering" <linux-cluster at redhat.com>
> Received: Saturday, November 15, 2008, 1:55 PM
> Hi Lon,
> 
> ############################################################
>  Also  anyone help me to configure a dedicated LAN (for
> example eth3)
> as heartbeat(private  point to point cross over cable
> network for
> cluster communications),I don't wish heartbeat over
> public LAN ,
> because of heavy Network saturation.
> 
>  Forthe above heartbeat configuration  i didn't found
> any suitable
> document for RHEL. Can  you provide me the suitable link or
> guide
> me what are all the changes i have to made in my  existing
> cluster.xml
>  file for this private heartbeat configuration to work.
> #############################################################
> 

at least in the old cluster arch (rhel4) you only need to be sure the names of the nodes (the names of the hosts that you use in cluster.conf) resolve to the private network you want to use for internal comunication.
example, in your cluster.conf your nodes names are:

node1-int, node2-int, node3-int

/etc/hosts in every node:
1.1.1.1 node1-int
1.1.1.2 node2-int
1.1.1.3 node3-int

not 100% sure but I think the hostname don't need to match that names

cu
roger



From npf-mlists at eurotux.com  Mon Nov 17 09:32:46 2008
From: npf-mlists at eurotux.com (Nuno Fernandes)
Date: Mon, 17 Nov 2008 09:32:46 +0000
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <20081114220515.GB7394@redhat.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<200811142153.13752.npf-mlists@eurotux.com>
	<20081114220515.GB7394@redhat.com>
Message-ID: <200811170932.46793.npf-mlists@eurotux.com>

On Friday 14 November 2008 22:05:15 David Teigland wrote:
> On Fri, Nov 14, 2008 at 09:53:13PM +0000, Nuno Fernandes wrote:
> > > On Fri, Nov 14, 2008 at 10:00:13AM +0000, Nuno Fernandes wrote:
> > > dlm recovery appears to be stuck; this is usually due to a problem at
> > > the network level.  The recovery seems to be caused by a node starting
> > > clvmd.
> >
> > Hi,
> >
> > I don't know if it helps, but groupd is using all available CPU, but
> > only in 2 of the nodes.
>
> That sounds like https://bugzilla.redhat.com/show_bug.cgi?id=444529
> which is fixed in 5.3.  I suspect that's the cause of you're problems.
>
> Dave

Hi,

Is there anyway i can unstuck the servers without rebooting all the servers at 
the same time?

Best regards,
Nuno Fernandes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081117/3d0535a2/attachment.htm>

From nick at javacat.f2s.com  Mon Nov 17 11:40:53 2008
From: nick at javacat.f2s.com (nick at javacat.f2s.com)
Date: Mon, 17 Nov 2008 11:40:53 +0000
Subject: [Linux-cluster] How to reboot the whole cluster
Message-ID: <1226922053.492158458da51@webmail.freedom2surf.net>

Hi folks,

RHEL 5.2
cman-2.0.84-2.el5
gfs-utils-0.1.17-1.el5
rgmanager-2.0.38-2.el5
openais-0.80.3-15.el5
kmod-gfs-PAE-0.1.23-5.el5
kmod-gfs2-PAE-1.92-1.1.el5
gfs2-utils-0.1.44-1.el5_2.1

I came into work this morning and our 4 node cluster was down because access to the GFS filesystem had been lost by all nodes due to an iSCSI error.
Even though the iSCSI error corrected itself in the middle of the night, the cluster did not regain quorum.

It took me 2 hours to fix the problem. Rebooting any node would would fail to start fencing during boot.

I eventually got it working by powering off all nodes, rebooting one at a time, but fencing did not start working until the fourth node was booted but
even then the GFS filesystem was not mounted.

Here's what I did.
Power off node 4.
Power off node 3.
Power off node 3.
Reboot node 1.

Node 1 can join the fence domain.
Power on node 2. Node 2 can't join the fence domain.
Power on node 3. Node 3 can't join the fence domain.
Power on node 4. Node 4 joins the fence domain.

I then had to 'service gfs start' on nodes 1 2 & 3 and the cluster was back up and running.

What is the correct way to get GFS filesystems running again after access to the GFS device has been temporarily lost and the cluster is blocking all
activity ?

Thanks,
Nick .






From fdinitto at redhat.com  Mon Nov 17 14:43:04 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 17 Nov 2008 15:43:04 +0100 (CET)
Subject: [Linux-cluster] Re: [Cluster-devel] PAM and NSS for clusters
In-Reply-To: <alpine.DEB.2.00.0811171532250.14360@lxserv0.kfki.hu>
References: <alpine.DEB.2.00.0811171532250.14360@lxserv0.kfki.hu>
Message-ID: <Pine.LNX.4.64.0811171541320.7841@trider-g7>


Hi,

On Mon, 17 Nov 2008, Kadlecsik Jozsef wrote:

> Hello,
>
> In order to store users in alternate passwd, shadow and group files I have
> written some patches over Linux PAM 1.0.2 and an NSS module.
>
> With these packages one can store the passwd, shadow and group files for
> the cluster users over GFS. We have been using such a setup for more than
> half a year in production. If somebody is interested in, the patches,
> sources and the installation, configuration descriptions are available at
>
> http://www.kfki.hu/~kadlec/sw/cluster/

This looks very interesting. Did you consider submitting those patches 
upstream?

I am pretty sure some of them (like PAtch 1) should be accepted right 
away given they fix what could be a bug and reduce your delta in time.

Fabio

--
I'm going to make him an offer he can't refuse.



From teigland at redhat.com  Mon Nov 17 14:49:16 2008
From: teigland at redhat.com (David Teigland)
Date: Mon, 17 Nov 2008 08:49:16 -0600
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <200811170932.46793.npf-mlists@eurotux.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<200811142153.13752.npf-mlists@eurotux.com>
	<20081114220515.GB7394@redhat.com>
	<200811170932.46793.npf-mlists@eurotux.com>
Message-ID: <20081117144916.GA22296@redhat.com>

On Mon, Nov 17, 2008 at 09:32:46AM +0000, Nuno Fernandes wrote:
> On Friday 14 November 2008 22:05:15 David Teigland wrote:
> > On Fri, Nov 14, 2008 at 09:53:13PM +0000, Nuno Fernandes wrote:
> > > > On Fri, Nov 14, 2008 at 10:00:13AM +0000, Nuno Fernandes wrote:
> > > > dlm recovery appears to be stuck; this is usually due to a problem at
> > > > the network level.  The recovery seems to be caused by a node starting
> > > > clvmd.
> > >
> > > Hi,
> > >
> > > I don't know if it helps, but groupd is using all available CPU, but
> > > only in 2 of the nodes.
> >
> > That sounds like https://bugzilla.redhat.com/show_bug.cgi?id=444529
> > which is fixed in 5.3.  I suspect that's the cause of you're problems.
> >
> > Dave
> 
> Hi,
> 

> Is there anyway i can unstuck the servers without rebooting all the
> servers at the same time?

Reboot just the nodes where groupd (or dlm_controld or gfs_controld) are
running at 100% cpu.

Dave



From d.vasilets at peterhost.ru  Mon Nov 17 16:31:24 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Mon, 17 Nov 2008 19:31:24 +0300
Subject: [Linux-cluster] corosync died during startup
Message-ID: <1226939484.14688.1.camel@dima-desktop>

on fedora10 try cman start
get this error
Starting cluster: 
   Loading modules... done
   Mounting configfs... done
   Setting network parameters... done
   Starting cman... failed
/sbin/cman_tool: corosync died during startup

how to resolve this or somebody write to bugreport 



From lhh at redhat.com  Mon Nov 17 18:05:03 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 17 Nov 2008 13:05:03 -0500
Subject: [Linux-cluster] Re: [Cluster-devel] PAM and NSS for clusters
In-Reply-To: <Pine.LNX.4.64.0811171541320.7841@trider-g7>
References: <alpine.DEB.2.00.0811171532250.14360@lxserv0.kfki.hu>
	<Pine.LNX.4.64.0811171541320.7841@trider-g7>
Message-ID: <1226945103.25751.106.camel@ayanami>

On Mon, 2008-11-17 at 15:43 +0100, Fabio M. Di Nitto wrote:
> Hi,
> 
> On Mon, 17 Nov 2008, Kadlecsik Jozsef wrote:

> > http://www.kfki.hu/~kadlec/sw/cluster/

> 
> This looks very interesting. Did you consider submitting those patches 
> upstream?

I agree - it's very cool.  It can't be used for bringing up GFS
(chicken/egg), but for permissions on the file system and such, it looks
pretty good.

What's neat is that you don't need centralized management server(s) :)

> I am pretty sure some of them (like PAtch 1) should be accepted right 
> away given they fix what could be a bug and reduce your delta in time.

0005 looks like it statically defines /etc/cluster_rootdir, but I am
probably reading the patch incorrectly.  I don't know PAM well enough to
answer this question, so I need to ask it anyway:
  
* Is there a way to make the root directory configurable, or are admins
expected to link /etc/cluster_rootdir to /gfs/system (or whatever they
choose)?


Side note:

I wonder if it would get accepted in a distribution ... that would be
neat.  Since it doesn't actually require cluster software itself (just a
shared file system), then it shouldn't be that hard... in theory :/

-- Lon



From hlawatschek at atix.de  Mon Nov 17 22:58:31 2008
From: hlawatschek at atix.de (Mark Hlawatschek)
Date: Mon, 17 Nov 2008 23:58:31 +0100
Subject: [Linux-cluster] Re: [Cluster-devel] PAM and NSS for clusters
In-Reply-To: <alpine.DEB.2.00.0811171532250.14360@lxserv0.kfki.hu>
References: <alpine.DEB.2.00.0811171532250.14360@lxserv0.kfki.hu>
Message-ID: <200811172358.32314.hlawatschek@atix.de>

Hi,

this looks very interesting. I think that a shared /etc/passwd saves a lot of 
trouble with user management in a cluster. 
Another way to get a shared /etc/passwd and /etc/nsswitch.conf is to use a 
shared root cluster. In this case, all configuration files can easily be 
shared in a cluster. You might be interested in having a look at 
http://www.open-sharedroot.org

Best Regards,

Mark
   

On Monday 17 November 2008 15:33:45 Kadlecsik Jozsef wrote:
> Hello,
>
> In order to store users in alternate passwd, shadow and group files I have
> written some patches over Linux PAM 1.0.2 and an NSS module.
>
> With these packages one can store the passwd, shadow and group files for
> the cluster users over GFS. We have been using such a setup for more than
> half a year in production. If somebody is interested in, the patches,
> sources and the installation, configuration descriptions are available at
>
> http://www.kfki.hu/~kadlec/sw/cluster/
>

-- 
Gruss / Regards,

Dipl.-Ing. Mark Hlawatschek
http://www.atix.de/
http://www.open-sharedroot.org/



From linux-cluster at merctech.com  Mon Nov 17 23:42:39 2008
From: linux-cluster at merctech.com (linux-cluster at merctech.com)
Date: Mon, 17 Nov 2008 18:42:39 -0500
Subject: [Linux-cluster] RedHat Cluster Suite cluster resource management
In-Reply-To: Your message of "Fri,
	14 Nov 2008 17:02:03 EST." <1226700123.25751.80.camel@ayanami>
References: <1226700123.25751.80.camel@ayanami>
	<18c35c650811130614i2a58e035w3bb9f574075389ca@mail.gmail.com>
Message-ID: <22786.1226965359@mirchi>



In the message dated: Fri, 14 Nov 2008 17:02:03 EST,
Lon Hohberger  used the subject line
	<Re: [Linux-cluster] RedHat Cluster Suite cluster resource management>
and wrote:

=> On Thu, 2008-11-13 at 15:14 +0100, Mauro Casiraghi wrote:
=> > I had created this cluster configuration with Redhat Cluster Suite 
=> >  
=> > I Have one Service Group with the follow resources
=> >  
=> > Service Group Name : WEB
=> >  
=> > Resources of the service group:
=> >  
=> > 1) IP_ADRESS
=> > 2) APACHE
=> >  
=> > The resource dependency are:
=> >  
=> > The Apache resource is dependent of ip_adress
=> 
=> Assuming you are running RHEL 5.3 beta or RHEL5.2 (or using stable2
=> branch of linux-cluster):
=> 
=> <rm> 
=>   <service name="apache" depend="service:ip_address">
=>     <script name="my_apache_script" ... />
=>   </service>
=>   <service name="ip_address" >
=>     <ip adress="my_ip_address"/>
=>   </service>
=> </rm>
=> 
=> [NOTE: that because rgmanager is an 'event-action' (not 'dependency
=> modeling') between services, disabling ip_address will cause a
=> successive transition of apache, not a preceding transition.]

Can you please give more information about your "NOTE" or provide some
references? I've worked with other HA clusters (Sun, Veritas, HP), but it's 
still unclear to me how to manage what seem to be basic service hierarchies
in RHCS.

Does the snippet above also mean that the apache service will start only if the 
IP address is available?

I've got a 2-node cluster providing NFS services, using the Managed Virtual IP
model described in the RHCS NFS Cookbook. The basic NFS service works fine.
However, I want to make sure that the NFS service (ie., the VIP) depends on the
presence of the GFS storage resource.

For example:
	the VIP used for NFS export of "/data" depends on
	the GFS directory "/data" that will be exported via NFS...the cluster
	shouldn't activate the VIP on a node that doesn't have "/data" mounted

However, if the VIP transition from one cluster node to the other, that doesn't 
imply that the GFS partition should be unmounted from the node that doesn't 
have the service.

Ideally, I'd like a configuration that specifies that the GFS partition is a
prerequisite for the NFS service, but is not a resource controlled by the NFS 
service. Is this possible with RHCS?

Thanks,

Mark

=> 
=> OR..........
=> 
=> <rm> 
=>   <service name="apache_service">
=>     <script name="my_apache_script" ... />
=>     <ip adress="my_ip_address"/>
=>   </service>
=> </rm>
=> 
=> Start the service, then do:
=> 
=>   'clusvcadm -Z apache_service'
=> 
=> ... then do whatever you want, then ...
=>   
=>   'clusvcadm -U apache_service'
=> 
=> [NOTE: 5.3 beta or STABLE2 branch ONLY!]
=> 
=> -- Lon
=> 
=> 
=> --
=> Linux-cluster mailing list
=> Linux-cluster at redhat.com
=> https://www.redhat.com/mailman/listinfo/linux-cluster
=> 







From tom at netspot.com.au  Tue Nov 18 06:44:38 2008
From: tom at netspot.com.au (Tom Lanyon)
Date: Tue, 18 Nov 2008 17:14:38 +1030
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <20081114220515.GB7394@redhat.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<20081114162649.GA4054@redhat.com>
	<200811142153.13752.npf-mlists@eurotux.com>
	<20081114220515.GB7394@redhat.com>
Message-ID: <B2F48526-C962-474D-ACFC-F95852D3F387@netspot.com.au>

On 15/11/2008, at 8:35 AM, David Teigland wrote:

> On Fri, Nov 14, 2008 at 09:53:13PM +0000, Nuno Fernandes wrote:
>>> On Fri, Nov 14, 2008 at 10:00:13AM +0000, Nuno Fernandes wrote:
>>> dlm recovery appears to be stuck; this is usually due to a problem  
>>> at the
>>> network level.  The recovery seems to be caused by a node starting  
>>> clvmd.
>> Hi,
>>
>> I don't know if it helps, but groupd is using all available CPU, but
>> only in 2 of the nodes.
>
> That sounds like https://bugzilla.redhat.com/show_bug.cgi?id=444529
> which is fixed in 5.3.  I suspect that's the cause of you're problems.
>
> Dave


We seem to be having the same problem on a 5 node virtual cluster  
where 3 of the nodes share a GFS mount.

A backup script runs on one node which does some heavy reads + writes  
to this mount at which point all three nodes jump to 100% cpu (90%  
iowait on the machine that is doing the backup, 100% system on the  
other two) and all LVM VGs, LVs and GFS mounts lock up.

Is there anything that could be tuned here to avoid this issue until a  
bug fix is released?

Regards,
Tom



From fdinitto at redhat.com  Tue Nov 18 10:25:57 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 18 Nov 2008 11:25:57 +0100
Subject: [Linux-cluster] Announce: GNBD 2.99.13 release
Message-ID: <1227003957.2447.82.camel@daitarn-fedora.int.fabbione.net>

Hi everybody,

as of today GNBD has been removed from cluster.git master branch and
made into a standalone, community driven, project and it will not be
part of cluster-2.99.xx (and higher) releases.

The main reason for splitting, and somehow sign the end-of-life of GNBD
is that there are many, more powerful and recognized standard
technologies out there that should be used in place of GNBD.

The new project source can be found here:
http://git.fedorahosted.org/git/gnbd.git

The new source tarball can be downloaded here:
https://fedorahosted.org/releases/g/n/gnbd/gnbd-2.99.13.tar.gz

In order to build gnbd-2.99.13, you will need libcman installed from
cluster.git.

If you are interested in working and maintaining GNBD, you are welcome
to apply directly to access the gitgnbd fedora group or contact me.

For older releases the authoritative source is cluster.git.

Regards,
Fabio

NOTE: this is a no-change release compared to 2.99.12 code. diffstat and
changelog have been omitted to avoid a 10MB email to flood everybody's
inbox for a few Makefile change and removal of all unrelated code.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081118/74e01964/attachment.sig>

From scott at MIT.EDU  Tue Nov 18 10:58:43 2008
From: scott at MIT.EDU (Scott R. Ehrlich)
Date: Tue, 18 Nov 2008 05:58:43 -0500 (EST)
Subject: [Linux-cluster] Need help with Dell MD3000i and snapshots
In-Reply-To: <mailman.70252.1227005698.13494.linux-cluster@redhat.com>
References: <mailman.70252.1227005698.13494.linux-cluster@redhat.com>
Message-ID: <Pine.GSO.4.64L.0811180555310.12605@mint-square.mit.edu>

I purchased an MD3000i, purchased the snapshot premium feature, using the 
MD Storage Manager partitioned space in a raid set for snapshots and 
assigned said space for snapshots.   Now, of all Dell's documentation I 
have, nothing tells me how to actually configure the allocated snapshot 
space to take snapshots of any of the RAIDs I created (happen to be two x 
RAID 5), schedule snapshots, or anything else.

How do I fully manage snapshots beyond allocating space for them?  I 
happen to be using CentOS 5.2 (same as RHEL 5.2).

Thanks.

Scott



From ray at oneunified.net  Tue Nov 18 12:52:33 2008
From: ray at oneunified.net (Ray Burkholder)
Date: Tue, 18 Nov 2008 08:52:33 -0400
Subject: [Linux-cluster] Announce: GNBD 2.99.13 release
In-Reply-To: <1227003957.2447.82.camel@daitarn-fedora.int.fabbione.net>
References: <1227003957.2447.82.camel@daitarn-fedora.int.fabbione.net>
Message-ID: <AAE16DDC253147BDBC8D9DB13A64B332@oneunified.local>

> as of today GNBD has been removed from cluster.git master 
> branch and made into a standalone, community driven, project 
> and it will not be part of cluster-2.99.xx (and higher) releases.
> 
> The main reason for splitting, and somehow sign the 
> end-of-life of GNBD is that there are many, more powerful and 
> recognized standard technologies out there that should be 
> used in place of GNBD.
> 

Would you be able to provide some information on what you forsee to be the
'more powerful and recognized standard technologies'?

Thanx.

Ray.


-- 
Scanned for viruses and dangerous content at 
http://www.oneunified.net and is believed to be clean.



From fdinitto at redhat.com  Tue Nov 18 14:42:38 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 18 Nov 2008 15:42:38 +0100 (CET)
Subject: [Linux-cluster] Announce: GNBD 2.99.13 release
In-Reply-To: <AAE16DDC253147BDBC8D9DB13A64B332@oneunified.local>
References: <1227003957.2447.82.camel@daitarn-fedora.int.fabbione.net>
	<AAE16DDC253147BDBC8D9DB13A64B332@oneunified.local>
Message-ID: <Pine.LNX.4.64.0811181541110.17722@trider-g7>

On Tue, 18 Nov 2008, Ray Burkholder wrote:

>> as of today GNBD has been removed from cluster.git master
>> branch and made into a standalone, community driven, project
>> and it will not be part of cluster-2.99.xx (and higher) releases.
>>
>> The main reason for splitting, and somehow sign the
>> end-of-life of GNBD is that there are many, more powerful and
>> recognized standard technologies out there that should be
>> used in place of GNBD.
>>
>
> Would you be able to provide some information on what you forsee to be the
> 'more powerful and recognized standard technologies'?

iSCSI, AOE and probably others that are recognized by hw vendors as 
standard protocol for their products.

Fabio

--
I'm going to make him an offer he can't refuse.



From fdinitto at redhat.com  Tue Nov 18 15:12:37 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 18 Nov 2008 16:12:37 +0100 (CET)
Subject: [Linux-cluster] corosync died during startup
In-Reply-To: <1226939484.14688.1.camel@dima-desktop>
References: <1226939484.14688.1.camel@dima-desktop>
Message-ID: <Pine.LNX.4.64.0811181612060.17722@trider-g7>

On Mon, 17 Nov 2008, ??????? ??????? wrote:

> on fedora10 try cman start
> get this error
> Starting cluster:
>   Loading modules... done
>   Mounting configfs... done
>   Setting network parameters... done
>   Starting cman... failed
> /sbin/cman_tool: corosync died during startup
>
> how to resolve this or somebody write to bugreport

If you have iptables and ip6tables setup, shut them down or open them to 
allow multicast traffic.

Fabio

--
I'm going to make him an offer he can't refuse.

From d.vasilets at peterhost.ru  Tue Nov 18 15:18:18 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Tue, 18 Nov 2008 18:18:18 +0300
Subject: [Linux-cluster] corosync died during startup
Message-ID: <1227021498.12976.1.camel@dima-desktop>

iptables is stopped



From ccaulfie at redhat.com  Tue Nov 18 15:23:06 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 18 Nov 2008 15:23:06 +0000
Subject: [Linux-cluster] corosync died during startup
In-Reply-To: <1226939484.14688.1.camel@dima-desktop>
References: <1226939484.14688.1.camel@dima-desktop>
Message-ID: <4922DDDA.5090404@redhat.com>

??????? ??????? wrote:
> on fedora10 try cman start
> get this error
> Starting cluster: 
>    Loading modules... done
>    Mounting configfs... done
>    Setting network parameters... done
>    Starting cman... failed
> /sbin/cman_tool: corosync died during startup
> 
> how to resolve this or somebody write to bugreport 


Make sure you don't have openais or corosync running before you start cman.

Chrissie



From d.vasilets at peterhost.ru  Tue Nov 18 15:27:54 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Tue, 18 Nov 2008 18:27:54 +0300
Subject: [Linux-cluster] corosync died during startup
In-Reply-To: <4922DDDA.5090404@redhat.com>
References: <1226939484.14688.1.camel@dima-desktop>
	<4922DDDA.5090404@redhat.com>
Message-ID: <1227022074.12976.3.camel@dima-desktop>

corosync/openais stopped
i try all possible combination
? ???, 18/11/2008 ? 15:23 +0000, Christine Caulfield ?????:
> ??????? ??????? wrote:
> > on fedora10 try cman start
> > get this error
> > Starting cluster: 
> >    Loading modules... done
> >    Mounting configfs... done
> >    Setting network parameters... done
> >    Starting cman... failed
> > /sbin/cman_tool: corosync died during startup
> > 
> > how to resolve this or somebody write to bugreport 
> 
> 
> Make sure you don't have openais or corosync running before you start cman.
> 
> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From teigland at redhat.com  Tue Nov 18 15:36:19 2008
From: teigland at redhat.com (David Teigland)
Date: Tue, 18 Nov 2008 09:36:19 -0600
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <B2F48526-C962-474D-ACFC-F95852D3F387@netspot.com.au>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<20081114162649.GA4054@redhat.com>
	<200811142153.13752.npf-mlists@eurotux.com>
	<20081114220515.GB7394@redhat.com>
	<B2F48526-C962-474D-ACFC-F95852D3F387@netspot.com.au>
Message-ID: <20081118153619.GA10717@redhat.com>

On Tue, Nov 18, 2008 at 05:14:38PM +1030, Tom Lanyon wrote:
> On 15/11/2008, at 8:35 AM, David Teigland wrote:
> 
> >On Fri, Nov 14, 2008 at 09:53:13PM +0000, Nuno Fernandes wrote:
> >>>On Fri, Nov 14, 2008 at 10:00:13AM +0000, Nuno Fernandes wrote:
> >>>dlm recovery appears to be stuck; this is usually due to a problem  
> >>>at the
> >>>network level.  The recovery seems to be caused by a node starting  
> >>>clvmd.
> >>Hi,
> >>
> >>I don't know if it helps, but groupd is using all available CPU, but
> >>only in 2 of the nodes.
> >
> >That sounds like https://bugzilla.redhat.com/show_bug.cgi?id=444529
> >which is fixed in 5.3.  I suspect that's the cause of you're problems.
> >
> >Dave
> 
> 
> We seem to be having the same problem on a 5 node virtual cluster  
> where 3 of the nodes share a GFS mount.
> 
> A backup script runs on one node which does some heavy reads + writes  
> to this mount at which point all three nodes jump to 100% cpu (90%  
> iowait on the machine that is doing the backup, 100% system on the  
> other two) and all LVM VGs, LVs and GFS mounts lock up.

Which process was using 100% cpu?  If it was groupd, fenced, dlm_controld
or gfs_controld, then yes it may be the same problem.

> Is there anything that could be tuned here to avoid this issue until a  
> bug fix is released?

I don't think there's any way to avoid the bug in the bz I referenced.

Dave



From skadlec at gk-software.com  Tue Nov 18 15:48:52 2008
From: skadlec at gk-software.com (Stepan Kadlec)
Date: Tue, 18 Nov 2008 16:48:52 +0100
Subject: [Linux-cluster] nodes boot synchronization sensitivity
Message-ID: <4922E3E4.20901@gk-software.com>

hello,
I have two_node cluster. If I synchronize the boot to the same time, 
both nodes join fain and everything works.

I am trying to make it less sensitive to boot-time synchronization (to 
accept at least two minutes difference) but the nodes never join and 
after some time, one node is fenced.

I have prolonged the post_join_delay to 120 seconds, but even when both 
nodes are trying to join in the nearly same time (~30 sec difference), 
they are unsuccessful - the log shows

	"not a cluster member after 120 sec post_join_delay"

and the other node is fenced.

I am running the cluster in following steps:

cman_tool -t 120 -w join -n node1 -c cluster
groupd
fenced
dlm_controld
gfs_controld
fence_tool -w -t 300 -m 20 join

how can I make the nodes less sensitive to boot synchronization?

thanks for your advices.
stepan



From teigland at redhat.com  Tue Nov 18 15:49:57 2008
From: teigland at redhat.com (David Teigland)
Date: Tue, 18 Nov 2008 09:49:57 -0600
Subject: [Cluster-devel] RE: [Linux-cluster] Announce: GNBD 2.99.13 release
In-Reply-To: <Pine.LNX.4.64.0811181541110.17722@trider-g7>
References: <1227003957.2447.82.camel@daitarn-fedora.int.fabbione.net>
	<AAE16DDC253147BDBC8D9DB13A64B332@oneunified.local>
	<Pine.LNX.4.64.0811181541110.17722@trider-g7>
Message-ID: <20081118154956.GB10717@redhat.com>

On Tue, Nov 18, 2008 at 03:42:38PM +0100, Fabio M. Di Nitto wrote:
> On Tue, 18 Nov 2008, Ray Burkholder wrote:
> 
> >>as of today GNBD has been removed from cluster.git master
> >>branch and made into a standalone, community driven, project
> >>and it will not be part of cluster-2.99.xx (and higher) releases.
> >>
> >>The main reason for splitting, and somehow sign the
> >>end-of-life of GNBD is that there are many, more powerful and
> >>recognized standard technologies out there that should be
> >>used in place of GNBD.
> >>
> >
> >Would you be able to provide some information on what you forsee to be the
> >'more powerful and recognized standard technologies'?
> 
> iSCSI, AOE and probably others that are recognized by hw vendors as 
> standard protocol for their products.

I'd just add that if there's still a niche for gnbd for some reason, then
the best path forward is to enhance nbd in the upstream kernel as a
replacement.



From fdinitto at redhat.com  Tue Nov 18 16:04:15 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 18 Nov 2008 17:04:15 +0100 (CET)
Subject: [Linux-cluster] corosync died during startup
In-Reply-To: <1227022074.12976.3.camel@dima-desktop>
References: <1226939484.14688.1.camel@dima-desktop>
	<4922DDDA.5090404@redhat.com>
	<1227022074.12976.3.camel@dima-desktop>
Message-ID: <Pine.LNX.4.64.0811181703570.17722@trider-g7>


Can you please run cman_tool -d join and post the output?

Maybe also add the configuration

Fabio

On Tue, 18 Nov 2008, ??????? ??????? wrote:

> corosync/openais stopped
> i try all possible combination
> ?? ??????, 18/11/2008 ?? 15:23 +0000, Christine Caulfield ??????????:
>> ?????????????? ?????????????? wrote:
>>> on fedora10 try cman start
>>> get this error
>>> Starting cluster:
>>>    Loading modules... done
>>>    Mounting configfs... done
>>>    Setting network parameters... done
>>>    Starting cman... failed
>>> /sbin/cman_tool: corosync died during startup
>>>
>>> how to resolve this or somebody write to bugreport
>>
>>
>> Make sure you don't have openais or corosync running before you start cman.
>>
>> Chrissie
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
I'm going to make him an offer he can't refuse.

From d.vasilets at peterhost.ru  Tue Nov 18 16:44:03 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Tue, 18 Nov 2008 19:44:03 +0300
Subject: [Linux-cluster] corosync died during startup
In-Reply-To: <Pine.LNX.4.64.0811181703570.17722@trider-g7>
References: <1226939484.14688.1.camel@dima-desktop>
	<4922DDDA.5090404@redhat.com> <1227022074.12976.3.camel@dima-desktop>
	<Pine.LNX.4.64.0811181703570.17722@trider-g7>
Message-ID: <1227026643.15564.2.camel@dima-desktop>

Starting /usr/sbin/corosync corosync -f
CMAN_DEBUGLOG=255
COROSYNC_DEFAULT_CONFIG_IFACE=xmlconfig:cmanpreconfig:openaisserviceenable
CMAN_PIPE=4
[MAIN ] Corosync Executive Service RELEASE 'trunk'
[MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and
contributors.
[MAIN ] Copyright (C) 2006-2008 Red Hat, Inc.
[MAIN ] Corosync Executive Service: started and ready to provide
service.
[MAIN ] Successfully read config from /etc/cluster/cluster.conf
[MAIN ] Cannot get node name
[MAIN ] AIS Executive exiting with status -9 at main.c:604.
cman_tool: corosync died during startup



hostname is recipient1

/etc/hosts is
10.20.4.1 node1
10.20.4.2 node2
10.20.4.3 recipient1



? ???, 18/11/2008 ? 17:04 +0100, Fabio M. Di Nitto ?????:
> Can you please run cman_tool -d join and post the output?
> 
> Maybe also add the configuration
> 
> Fabio
> 
> On Tue, 18 Nov 2008,   wrote:
> 
> > corosync/openais stopped
> > i try all possible combination
> > ? ???, 18/11/2008 ? 15:23 +0000, Christine Caulfield ?????:
> >> ??????? ??????? wrote:
> >>> on fedora10 try cman start
> >>> get this error
> >>> Starting cluster:
> >>>    Loading modules... done
> >>>    Mounting configfs... done
> >>>    Setting network parameters... done
> >>>    Starting cman... failed
> >>> /sbin/cman_tool: corosync died during startup
> >>>
> >>> how to resolve this or somebody write to bugreport
> >>
> >>
> >> Make sure you don't have openais or corosync running before you start cman.
> >>
> >> Chrissie
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> 
> --
> I'm going to make him an offer he can't refuse.
> -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster



From fdinitto at redhat.com  Tue Nov 18 17:52:14 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 18 Nov 2008 18:52:14 +0100 (CET)
Subject: [Linux-cluster] corosync died during startup
In-Reply-To: <1227026643.15564.2.camel@dima-desktop>
References: <1226939484.14688.1.camel@dima-desktop>
	<4922DDDA.5090404@redhat.com>
	<1227022074.12976.3.camel@dima-desktop>
	<Pine.LNX.4.64.0811181703570.17722@trider-g7>
	<1227026643.15564.2.camel@dima-desktop>
Message-ID: <Pine.LNX.4.64.0811181851160.17722@trider-g7>

On Tue, 18 Nov 2008, ??????? ??????? wrote:

> Starting /usr/sbin/corosync corosync -f
> CMAN_DEBUGLOG=255
> COROSYNC_DEFAULT_CONFIG_IFACE=xmlconfig:cmanpreconfig:openaisserviceenable
> CMAN_PIPE=4
> [MAIN ] Corosync Executive Service RELEASE 'trunk'
> [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and
> contributors.
> [MAIN ] Copyright (C) 2006-2008 Red Hat, Inc.
> [MAIN ] Corosync Executive Service: started and ready to provide
> service.
> [MAIN ] Successfully read config from /etc/cluster/cluster.conf
> [MAIN ] Cannot get node name
> [MAIN ] AIS Executive exiting with status -9 at main.c:604.
> cman_tool: corosync died during startup
>
>
>
> hostname is recipient1
>
> /etc/hosts is
> 10.20.4.1 node1
> 10.20.4.2 node2
> 10.20.4.3 recipient1
>

Ok this looks like a bug that was reported 10 minutes before your. Can you 
please post your cluster.conf? If it is the same bug I know how to 
workaround it while I fix it properly, but I need to see cluster.conf.

Thanks
Fabio

--
I'm going to make him an offer he can't refuse.

From treed at ultraviolet.org  Tue Nov 18 18:27:17 2008
From: treed at ultraviolet.org (Tracy Reed)
Date: Tue, 18 Nov 2008 10:27:17 -0800
Subject: [Linux-cluster] Announce: GNBD 2.99.13 release
In-Reply-To: <Pine.LNX.4.64.0811181541110.17722@trider-g7>
References: <1227003957.2447.82.camel@daitarn-fedora.int.fabbione.net>
	<AAE16DDC253147BDBC8D9DB13A64B332@oneunified.local>
	<Pine.LNX.4.64.0811181541110.17722@trider-g7>
Message-ID: <20081118182717.GW7224@tracyreed.org>

On Tue, Nov 18, 2008 at 03:42:38PM +0100, Fabio M. Di Nitto spake thusly:
> iSCSI, AOE and probably others that are recognized by hw vendors as  
> standard protocol for their products.

Let me chime in here with how AWESOME AoE is. It is a hugely
underrated technology with iSCSI getting all the buzz. I've been
wondering what can be done about this and haven't come up with much
yet. I have been using AoE with Xen and documenting my efforts here:

http://xenaoe.org

Still lots more to document but it's getting there.

AoE is SO much simpler than iSCSI. And faster. Once you learn how to
use it you will wonder why someone didn't come up with something like
this a long time ago. I don't make a dime off of AoE, I just save
money deploying it.

-- 
Tracy Reed
http://tracyreed.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081118/22478ec0/attachment.sig>

From alan.zg at gmail.com  Tue Nov 18 21:09:46 2008
From: alan.zg at gmail.com (Alan A)
Date: Tue, 18 Nov 2008 15:09:46 -0600
Subject: [Linux-cluster] What is the proper way to grow LVM/GFS volumes
Message-ID: <fac531740811181309m791a4d88kf7b806646b3d3c61@mail.gmail.com>

I tried a few times to grow LVM by adding additional PV to VG, and then
executing 'lvextend' command. I am not sure what I am doing wrong but I get
the message that there is error locking on one of the nodes, and then the
GFS hangs.

Here are some of the details:

[root at fenmrdev03 ~]# vgs
  VG         #PV #LV #SN Attr   VSize  VFree
  VolGroup00   2   2   0 wz--n- 50.75G    0
  gfs_sda1     1   1   0 wz--nc 10.00G 5.00G
  gfs_sdb1     1   1   0 wz--nc 10.00G 3.00G
  nuvg4        2   1   0 wz--nc 10.00G 5.00G
[root at fenmrdev03 ~]# lvs
  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%  Convert
  LogVol00 VolGroup00 -wi-ao 40.81G
  LogVol01 VolGroup00 -wi-ao  9.94G
  gfs_sda1 gfs_sda1   -wi-ao  5.00G
  gfs_sdb1 gfs_sdb1   -wi-ao  7.00G
  nulv4    nuvg4      -wi-ao  5.00G
[root at fenmrdev03 ~]# pvs
  PV                VG         Fmt  Attr PSize  PFree
  /dev/cciss/c0d0p2 VolGroup00 lvm2 a-   33.81G    0
  /dev/cciss/c0d1p1 VolGroup00 lvm2 a-   16.94G    0
  /dev/sda1         gfs_sda1   lvm2 a-   10.00G 5.00G
  /dev/sdb1         gfs_sdb1   lvm2 a-   10.00G 3.00G
  /dev/sdc          nuvg4      lvm2 a-    5.00G    0
  /dev/sdd          nuvg4      lvm2 a-    5.00G 5.00G


Command I tried:
[root at fenmrdev03 ~]# lvextend  -l100%FREE  /dev/nuvg4/nulv4




-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081118/ba4652f2/attachment.htm>

From swilson at uchicago.edu  Tue Nov 18 21:15:57 2008
From: swilson at uchicago.edu (Scott Wilson)
Date: Tue, 18 Nov 2008 15:15:57 -0600 (CST)
Subject: [Linux-cluster] What is the proper way to grow LVM/GFS volumes
In-Reply-To: <fac531740811181309m791a4d88kf7b806646b3d3c61@mail.gmail.com>
References: <fac531740811181309m791a4d88kf7b806646b3d3c61@mail.gmail.com>
Message-ID: <Pine.GSO.4.64.0811181515050.22470@tiki>


I think you need:

lvextend  -l+100%FREE  /dev/nuvg4/nulv4
             ^

Without the +, you were trying to set the logical volume size to the size 
of your free space, not adding the free space to the size.


Scott Wilson                    Lead System Administrator
swilson at uchicago.edu            NSIT - DCS - SeaUnix

On Tue, 18 Nov 2008, Alan A wrote:

> I tried a few times to grow LVM by adding additional PV to VG, and then
> executing 'lvextend' command. I am not sure what I am doing wrong but I get
> the message that there is error locking on one of the nodes, and then the
> GFS hangs.
>
> Here are some of the details:
>
> [root at fenmrdev03 ~]# vgs
>  VG         #PV #LV #SN Attr   VSize  VFree
>  VolGroup00   2   2   0 wz--n- 50.75G    0
>  gfs_sda1     1   1   0 wz--nc 10.00G 5.00G
>  gfs_sdb1     1   1   0 wz--nc 10.00G 3.00G
>  nuvg4        2   1   0 wz--nc 10.00G 5.00G
> [root at fenmrdev03 ~]# lvs
>  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%  Convert
>  LogVol00 VolGroup00 -wi-ao 40.81G
>  LogVol01 VolGroup00 -wi-ao  9.94G
>  gfs_sda1 gfs_sda1   -wi-ao  5.00G
>  gfs_sdb1 gfs_sdb1   -wi-ao  7.00G
>  nulv4    nuvg4      -wi-ao  5.00G
> [root at fenmrdev03 ~]# pvs
>  PV                VG         Fmt  Attr PSize  PFree
>  /dev/cciss/c0d0p2 VolGroup00 lvm2 a-   33.81G    0
>  /dev/cciss/c0d1p1 VolGroup00 lvm2 a-   16.94G    0
>  /dev/sda1         gfs_sda1   lvm2 a-   10.00G 5.00G
>  /dev/sdb1         gfs_sdb1   lvm2 a-   10.00G 3.00G
>  /dev/sdc          nuvg4      lvm2 a-    5.00G    0
>  /dev/sdd          nuvg4      lvm2 a-    5.00G 5.00G
>
>
> Command I tried:
> [root at fenmrdev03 ~]# lvextend  -l100%FREE  /dev/nuvg4/nulv4
>
>
>
>
> -- 
> Alan A.
>



From alan.zg at gmail.com  Tue Nov 18 21:38:12 2008
From: alan.zg at gmail.com (Alan A)
Date: Tue, 18 Nov 2008 15:38:12 -0600
Subject: [Linux-cluster] What is the proper way to grow LVM/GFS volumes
In-Reply-To: <Pine.GSO.4.64.0811181515050.22470@tiki>
References: <fac531740811181309m791a4d88kf7b806646b3d3c61@mail.gmail.com>
	<Pine.GSO.4.64.0811181515050.22470@tiki>
Message-ID: <fac531740811181338h2ca24c11n2d84ae1e543b891d@mail.gmail.com>

Here is the update:

[root at fenmrdev03 ~]# lvextend  -l+100%FREE  /dev/nuvg4/nulv4
  Extending logical volume nulv4 to 10.00 GB
  Error locking on node fenmrdev04: device-mapper: create ioctl failed:
Device or resource busy
  Error locking on node fenmrdev03: device-mapper: create ioctl failed:
Device or resource busy
  Failed to suspend nulv4


On Tue, Nov 18, 2008 at 3:15 PM, Scott Wilson <swilson at uchicago.edu> wrote:

>
> I think you need:
>
> lvextend  -l+100%FREE  /dev/nuvg4/nulv4
>            ^
>
> Without the +, you were trying to set the logical volume size to the size
> of your free space, not adding the free space to the size.
>
>
> Scott Wilson                    Lead System Administrator
> swilson at uchicago.edu            NSIT - DCS - SeaUnix
>
>
> On Tue, 18 Nov 2008, Alan A wrote:
>
>  I tried a few times to grow LVM by adding additional PV to VG, and then
>> executing 'lvextend' command. I am not sure what I am doing wrong but I
>> get
>> the message that there is error locking on one of the nodes, and then the
>> GFS hangs.
>>
>> Here are some of the details:
>>
>> [root at fenmrdev03 ~]# vgs
>>  VG         #PV #LV #SN Attr   VSize  VFree
>>  VolGroup00   2   2   0 wz--n- 50.75G    0
>>  gfs_sda1     1   1   0 wz--nc 10.00G 5.00G
>>  gfs_sdb1     1   1   0 wz--nc 10.00G 3.00G
>>  nuvg4        2   1   0 wz--nc 10.00G 5.00G
>> [root at fenmrdev03 ~]# lvs
>>  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%  Convert
>>  LogVol00 VolGroup00 -wi-ao 40.81G
>>  LogVol01 VolGroup00 -wi-ao  9.94G
>>  gfs_sda1 gfs_sda1   -wi-ao  5.00G
>>  gfs_sdb1 gfs_sdb1   -wi-ao  7.00G
>>  nulv4    nuvg4      -wi-ao  5.00G
>> [root at fenmrdev03 ~]# pvs
>>  PV                VG         Fmt  Attr PSize  PFree
>>  /dev/cciss/c0d0p2 VolGroup00 lvm2 a-   33.81G    0
>>  /dev/cciss/c0d1p1 VolGroup00 lvm2 a-   16.94G    0
>>  /dev/sda1         gfs_sda1   lvm2 a-   10.00G 5.00G
>>  /dev/sdb1         gfs_sdb1   lvm2 a-   10.00G 3.00G
>>  /dev/sdc          nuvg4      lvm2 a-    5.00G    0
>>  /dev/sdd          nuvg4      lvm2 a-    5.00G 5.00G
>>
>>
>> Command I tried:
>> [root at fenmrdev03 ~]# lvextend  -l100%FREE  /dev/nuvg4/nulv4
>>
>>
>>
>>
>> --
>> Alan A.
>>
>>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081118/8c3597ca/attachment.htm>

From finnzi at finnzi.com  Tue Nov 18 22:00:10 2008
From: finnzi at finnzi.com (Finnur =?iso-8859-1?Q?=D6rn_Gu=F0mundsson?=)
Date: Tue, 18 Nov 2008 22:00:10 -0000 (GMT)
Subject: [Linux-cluster] What is the proper way to grow LVM/GFS volumes
In-Reply-To: <fac531740811181338h2ca24c11n2d84ae1e543b891d@mail.gmail.com>
References: <fac531740811181309m791a4d88kf7b806646b3d3c61@mail.gmail.com>
	<Pine.GSO.4.64.0811181515050.22470@tiki>
	<fac531740811181338h2ca24c11n2d84ae1e543b891d@mail.gmail.com>
Message-ID: <50356.213.176.147.73.1227045610.squirrel@server0.nszoom.com>

Hi,

Try to run: clvmd -R on one of the nodes.

Bgrds,
Finnur

> Here is the update:
>
> [root at fenmrdev03 ~]# lvextend  -l+100%FREE  /dev/nuvg4/nulv4
>   Extending logical volume nulv4 to 10.00 GB
>   Error locking on node fenmrdev04: device-mapper: create ioctl failed:
> Device or resource busy
>   Error locking on node fenmrdev03: device-mapper: create ioctl failed:
> Device or resource busy
>   Failed to suspend nulv4
>
>
> On Tue, Nov 18, 2008 at 3:15 PM, Scott Wilson <swilson at uchicago.edu>
> wrote:
>
>>
>> I think you need:
>>
>> lvextend  -l+100%FREE  /dev/nuvg4/nulv4
>>            ^
>>
>> Without the +, you were trying to set the logical volume size to the
>> size
>> of your free space, not adding the free space to the size.
>>
>>
>> Scott Wilson                    Lead System Administrator
>> swilson at uchicago.edu            NSIT - DCS - SeaUnix
>>
>>
>> On Tue, 18 Nov 2008, Alan A wrote:
>>
>>  I tried a few times to grow LVM by adding additional PV to VG, and then
>>> executing 'lvextend' command. I am not sure what I am doing wrong but I
>>> get
>>> the message that there is error locking on one of the nodes, and then
>>> the
>>> GFS hangs.
>>>
>>> Here are some of the details:
>>>
>>> [root at fenmrdev03 ~]# vgs
>>>  VG         #PV #LV #SN Attr   VSize  VFree
>>>  VolGroup00   2   2   0 wz--n- 50.75G    0
>>>  gfs_sda1     1   1   0 wz--nc 10.00G 5.00G
>>>  gfs_sdb1     1   1   0 wz--nc 10.00G 3.00G
>>>  nuvg4        2   1   0 wz--nc 10.00G 5.00G
>>> [root at fenmrdev03 ~]# lvs
>>>  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%
>>> Convert
>>>  LogVol00 VolGroup00 -wi-ao 40.81G
>>>  LogVol01 VolGroup00 -wi-ao  9.94G
>>>  gfs_sda1 gfs_sda1   -wi-ao  5.00G
>>>  gfs_sdb1 gfs_sdb1   -wi-ao  7.00G
>>>  nulv4    nuvg4      -wi-ao  5.00G
>>> [root at fenmrdev03 ~]# pvs
>>>  PV                VG         Fmt  Attr PSize  PFree
>>>  /dev/cciss/c0d0p2 VolGroup00 lvm2 a-   33.81G    0
>>>  /dev/cciss/c0d1p1 VolGroup00 lvm2 a-   16.94G    0
>>>  /dev/sda1         gfs_sda1   lvm2 a-   10.00G 5.00G
>>>  /dev/sdb1         gfs_sdb1   lvm2 a-   10.00G 3.00G
>>>  /dev/sdc          nuvg4      lvm2 a-    5.00G    0
>>>  /dev/sdd          nuvg4      lvm2 a-    5.00G 5.00G
>>>
>>>
>>> Command I tried:
>>> [root at fenmrdev03 ~]# lvextend  -l100%FREE  /dev/nuvg4/nulv4
>>>
>>>
>>>
>>>
>>> --
>>> Alan A.
>>>
>>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Alan A.
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From alan.zg at gmail.com  Tue Nov 18 22:38:05 2008
From: alan.zg at gmail.com (Alan A)
Date: Tue, 18 Nov 2008 16:38:05 -0600
Subject: [Linux-cluster] What is the proper way to grow LVM/GFS volumes
In-Reply-To: <50356.213.176.147.73.1227045610.squirrel@server0.nszoom.com>
References: <fac531740811181309m791a4d88kf7b806646b3d3c61@mail.gmail.com>
	<Pine.GSO.4.64.0811181515050.22470@tiki>
	<fac531740811181338h2ca24c11n2d84ae1e543b891d@mail.gmail.com>
	<50356.213.176.147.73.1227045610.squirrel@server0.nszoom.com>
Message-ID: <fac531740811181438k3698e724u5f0e61164dd323ca@mail.gmail.com>

I did. After running lvextend command 'clvmd -R' failed on node2. Then I
fenced the node2 (fenmrdev02) and run the clvmd -R again, this time without
problems. I tried lvextend again - I get the same problem, and again clvmd
-R times out.

On Tue, Nov 18, 2008 at 4:00 PM, Finnur ?rn Gu?mundsson
<finnzi at finnzi.com>wrote:

> Hi,
>
> Try to run: clvmd -R on one of the nodes.
>
> Bgrds,
> Finnur
>
> > Here is the update:
> >
> > [root at fenmrdev03 ~]# lvextend  -l+100%FREE  /dev/nuvg4/nulv4
> >   Extending logical volume nulv4 to 10.00 GB
> >   Error locking on node fenmrdev04: device-mapper: create ioctl failed:
> > Device or resource busy
> >   Error locking on node fenmrdev03: device-mapper: create ioctl failed:
> > Device or resource busy
> >   Failed to suspend nulv4
> >
> >
> > On Tue, Nov 18, 2008 at 3:15 PM, Scott Wilson <swilson at uchicago.edu>
> > wrote:
> >
> >>
> >> I think you need:
> >>
> >> lvextend  -l+100%FREE  /dev/nuvg4/nulv4
> >>            ^
> >>
> >> Without the +, you were trying to set the logical volume size to the
> >> size
> >> of your free space, not adding the free space to the size.
> >>
> >>
> >> Scott Wilson                    Lead System Administrator
> >> swilson at uchicago.edu            NSIT - DCS - SeaUnix
> >>
> >>
> >> On Tue, 18 Nov 2008, Alan A wrote:
> >>
> >>  I tried a few times to grow LVM by adding additional PV to VG, and then
> >>> executing 'lvextend' command. I am not sure what I am doing wrong but I
> >>> get
> >>> the message that there is error locking on one of the nodes, and then
> >>> the
> >>> GFS hangs.
> >>>
> >>> Here are some of the details:
> >>>
> >>> [root at fenmrdev03 ~]# vgs
> >>>  VG         #PV #LV #SN Attr   VSize  VFree
> >>>  VolGroup00   2   2   0 wz--n- 50.75G    0
> >>>  gfs_sda1     1   1   0 wz--nc 10.00G 5.00G
> >>>  gfs_sdb1     1   1   0 wz--nc 10.00G 3.00G
> >>>  nuvg4        2   1   0 wz--nc 10.00G 5.00G
> >>> [root at fenmrdev03 ~]# lvs
> >>>  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%
> >>> Convert
> >>>  LogVol00 VolGroup00 -wi-ao 40.81G
> >>>  LogVol01 VolGroup00 -wi-ao  9.94G
> >>>  gfs_sda1 gfs_sda1   -wi-ao  5.00G
> >>>  gfs_sdb1 gfs_sdb1   -wi-ao  7.00G
> >>>  nulv4    nuvg4      -wi-ao  5.00G
> >>> [root at fenmrdev03 ~]# pvs
> >>>  PV                VG         Fmt  Attr PSize  PFree
> >>>  /dev/cciss/c0d0p2 VolGroup00 lvm2 a-   33.81G    0
> >>>  /dev/cciss/c0d1p1 VolGroup00 lvm2 a-   16.94G    0
> >>>  /dev/sda1         gfs_sda1   lvm2 a-   10.00G 5.00G
> >>>  /dev/sdb1         gfs_sdb1   lvm2 a-   10.00G 3.00G
> >>>  /dev/sdc          nuvg4      lvm2 a-    5.00G    0
> >>>  /dev/sdd          nuvg4      lvm2 a-    5.00G 5.00G
> >>>
> >>>
> >>> Command I tried:
> >>> [root at fenmrdev03 ~]# lvextend  -l100%FREE  /dev/nuvg4/nulv4
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Alan A.
> >>>
> >>>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> >
> >
> > --
> > Alan A.
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081118/37dde360/attachment.htm>

From tom at netspot.com.au  Wed Nov 19 04:30:33 2008
From: tom at netspot.com.au (Tom Lanyon)
Date: Wed, 19 Nov 2008 15:00:33 +1030
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <20081118153619.GA10717@redhat.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<20081114162649.GA4054@redhat.com>
	<200811142153.13752.npf-mlists@eurotux.com>
	<20081114220515.GB7394@redhat.com>
	<B2F48526-C962-474D-ACFC-F95852D3F387@netspot.com.au>
	<20081118153619.GA10717@redhat.com>
Message-ID: <7A795AFA-ED36-4718-B224-5BE02A5FBAD7@netspot.com.au>

On 19/11/2008, at 2:06 AM, David Teigland wrote:

> On Tue, Nov 18, 2008 at 05:14:38PM +1030, Tom Lanyon wrote:
>> We seem to be having the same problem on a 5 node virtual cluster
>> where 3 of the nodes share a GFS mount.
>>
>> A backup script runs on one node which does some heavy reads + writes
>> to this mount at which point all three nodes jump to 100% cpu (90%
>> iowait on the machine that is doing the backup, 100% system on the
>> other two) and all LVM VGs, LVs and GFS mounts lock up.
>
> Which process was using 100% cpu?  If it was groupd, fenced,  
> dlm_controld
> or gfs_controld, then yes it may be the same problem.
>
>> Is there anything that could be tuned here to avoid this issue  
>> until a
>> bug fix is released?
>
> I don't think there's any way to avoid the bug in the bz I referenced.
>
> Dave


We haven't been able to catch it quick enough to determine which  
process is using all CPU.

The other option is that we're just seeing a huge amount of glocks  
created on the node running backups and all others (webservers) are  
just hanging whilst trying to access files. I've just done some fairly  
aggressive tuning of the GFS mounts on all nodes; hopefully this fixes  
it!

Regards,
Tom



From mpartio at gmail.com  Wed Nov 19 05:03:12 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Wed, 19 Nov 2008 07:03:12 +0200
Subject: [Linux-cluster] What is the proper way to grow LVM/GFS volumes
In-Reply-To: <fac531740811181438k3698e724u5f0e61164dd323ca@mail.gmail.com>
References: <fac531740811181309m791a4d88kf7b806646b3d3c61@mail.gmail.com>
	<Pine.GSO.4.64.0811181515050.22470@tiki>
	<fac531740811181338h2ca24c11n2d84ae1e543b891d@mail.gmail.com>
	<50356.213.176.147.73.1227045610.squirrel@server0.nszoom.com>
	<fac531740811181438k3698e724u5f0e61164dd323ca@mail.gmail.com>
Message-ID: <2ca799770811182103r4af14490sc25afb513815efc0@mail.gmail.com>

2008/11/19 Alan A <alan.zg at gmail.com>

> I did. After running lvextend command 'clvmd -R' failed on node2. Then I
> fenced the node2 (fenmrdev02) and run the clvmd -R again, this time without
> problems. I tried lvextend again - I get the same problem, and again clvmd
> -R times out.


I have also encountered this problem several times, and although this list
seems to recommend running clvmd -R it has in fact never helped the
situation (I'm running centos 5.2). The only way I can solve this problem is
by rebooting all nodes in the cluster and then extending the lv.

Regards

Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/1357bf14/attachment.htm>

From skadlec at gk-software.com  Wed Nov 19 07:58:28 2008
From: skadlec at gk-software.com (Stepan Kadlec)
Date: Wed, 19 Nov 2008 08:58:28 +0100
Subject: [Linux-cluster] nodes boot synchronization sensitivity
In-Reply-To: <4922E3E4.20901@gk-software.com>
References: <4922E3E4.20901@gk-software.com>
Message-ID: <4923C724.20509@gk-software.com>

	oh, I have probably misunderstood the problem - the real cause seems be 
unsynchronized local clocks on the nodes...
	bye stepan


Stepan Kadlec wrote:
> hello,
> I have two_node cluster. If I synchronize the boot to the same time, 
> both nodes join fain and everything works.
> 
> I am trying to make it less sensitive to boot-time synchronization (to 
> accept at least two minutes difference) but the nodes never join and 
> after some time, one node is fenced.
> 
> I have prolonged the post_join_delay to 120 seconds, but even when both 
> nodes are trying to join in the nearly same time (~30 sec difference), 
> they are unsuccessful - the log shows
> 
>     "not a cluster member after 120 sec post_join_delay"
> 
> and the other node is fenced.
> 
> I am running the cluster in following steps:
> 
> cman_tool -t 120 -w join -n node1 -c cluster
> groupd
> fenced
> dlm_controld
> gfs_controld
> fence_tool -w -t 300 -m 20 join
> 
> how can I make the nodes less sensitive to boot synchronization?
> 
> thanks for your advices.
> stepan
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

-- 
Eurosoftware s.r.o.
skadlec at gk-software.com
+420 379 307 379
+420 724 554 104



From d.vasilets at peterhost.ru  Wed Nov 19 09:30:25 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Wed, 19 Nov 2008 12:30:25 +0300
Subject: [Linux-cluster] corosync died during startup
In-Reply-To: <Pine.LNX.4.64.0811181851160.17722@trider-g7>
References: <1226939484.14688.1.camel@dima-desktop>
	<4922DDDA.5090404@redhat.com> <1227022074.12976.3.camel@dima-desktop>
	<Pine.LNX.4.64.0811181703570.17722@trider-g7>
	<1227026643.15564.2.camel@dima-desktop>
	<Pine.LNX.4.64.0811181851160.17722@trider-g7>
Message-ID: <1227087025.5075.0.camel@dima-desktop>

<cluster config_version="0" name="pobeda">
  <cman/>
  <clusternodes>
    <clusternode nodeid="1" name="node1" votes="2" >
	<fence>
	  <method name="manual">
	  </method>
	</fence>	
    </clusternode>
    <clusternode nodeid="2" name="node2" votes="2" >
        <fence>
          <method name="manual">
          </method>
        </fence>
    </clusternode>
   <clusternode nodeid="3" name="recipient1" votes="1" >
        <fence>
          <method name="manual">
          </method>
        </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice name="manual" agent="fence_manual"/>
  </fencedevices>
</cluster>


? ???, 18/11/2008 ? 18:52 +0100, Fabio M. Di Nitto ?????:
> On Tue, 18 Nov 2008,   wrote:
> 
> > Starting /usr/sbin/corosync corosync -f
> > CMAN_DEBUGLOG=255
> > COROSYNC_DEFAULT_CONFIG_IFACE=xmlconfig:cmanpreconfig:openaisserviceenable
> > CMAN_PIPE=4
> > [MAIN ] Corosync Executive Service RELEASE 'trunk'
> > [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and
> > contributors.
> > [MAIN ] Copyright (C) 2006-2008 Red Hat, Inc.
> > [MAIN ] Corosync Executive Service: started and ready to provide
> > service.
> > [MAIN ] Successfully read config from /etc/cluster/cluster.conf
> > [MAIN ] Cannot get node name
> > [MAIN ] AIS Executive exiting with status -9 at main.c:604.
> > cman_tool: corosync died during startup
> >
> >
> >
> > hostname is recipient1
> >
> > /etc/hosts is
> > 10.20.4.1 node1
> > 10.20.4.2 node2
> > 10.20.4.3 recipient1
> >
> 
> Ok this looks like a bug that was reported 10 minutes before your. Can you 
> please post your cluster.conf? If it is the same bug I know how to 
> workaround it while I fix it properly, but I need to see cluster.conf.
> 
> Thanks
> Fabio
> 
> --
> I'm going to make him an offer he can't refuse.
> -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster



From skadlec at gk-software.com  Wed Nov 19 09:37:10 2008
From: skadlec at gk-software.com (Stepan Kadlec)
Date: Wed, 19 Nov 2008 10:37:10 +0100
Subject: [Linux-cluster] nodes boot synchronization sensitivity
In-Reply-To: <4923C724.20509@gk-software.com>
References: <4922E3E4.20901@gk-software.com> <4923C724.20509@gk-software.com>
Message-ID: <4923DE46.2020007@gk-software.com>

anyway, still don't understand:

node1 of the two_nodes cluster boots up and becomes quorate. the other 
node2 is still down, so the fenced on node1 reports:

   Nov 19 10:11:41 node1 fenced[3559]: node2 not a
   cluster member after 6 sec post_join_delay
   Nov 19 10:11:41 node1 fenced[3559]: fencing node "node2"

and fences the node2. than node2 boots up and repeats the same scenario 
- I can't understand, why at this point the node2 can't just join the 
running cluster with node1 and instead of that reports the same "node1 
not a cluster member after 6 sec" and fences it. this oscillates forever.

is this normal behavior?

thanks for advices.
stepan

Stepan Kadlec wrote:
>     oh, I have probably misunderstood the problem - the real cause seems 
> be unsynchronized local clocks on the nodes...
>     bye stepan
> 
> 
> Stepan Kadlec wrote:
>> hello,
>> I have two_node cluster. If I synchronize the boot to the same time, 
>> both nodes join fain and everything works.
>>
>> I am trying to make it less sensitive to boot-time synchronization (to 
>> accept at least two minutes difference) but the nodes never join and 
>> after some time, one node is fenced.
>>
>> I have prolonged the post_join_delay to 120 seconds, but even when 
>> both nodes are trying to join in the nearly same time (~30 sec 
>> difference), they are unsuccessful - the log shows
>>
>>     "not a cluster member after 120 sec post_join_delay"
>>
>> and the other node is fenced.
>>
>> I am running the cluster in following steps:
>>
>> cman_tool -t 120 -w join -n node1 -c cluster
>> groupd
>> fenced
>> dlm_controld
>> gfs_controld
>> fence_tool -w -t 300 -m 20 join
>>
>> how can I make the nodes less sensitive to boot synchronization?
>>
>> thanks for your advices.
>> stepan
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> 

-- 
Eurosoftware s.r.o.
skadlec at gk-software.com
+420 379 307 379
+420 724 554 104



From fdinitto at redhat.com  Wed Nov 19 10:30:04 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 19 Nov 2008 11:30:04 +0100 (CET)
Subject: [Linux-cluster] corosync died during startup
In-Reply-To: <1227087025.5075.0.camel@dima-desktop>
References: <1226939484.14688.1.camel@dima-desktop>
	<4922DDDA.5090404@redhat.com>
	<1227022074.12976.3.camel@dima-desktop>
	<Pine.LNX.4.64.0811181703570.17722@trider-g7>
	<1227026643.15564.2.camel@dima-desktop>
	<Pine.LNX.4.64.0811181851160.17722@trider-g7>
	<1227087025.5075.0.camel@dima-desktop>
Message-ID: <Pine.LNX.4.64.0811191128520.17722@trider-g7>

On Wed, 19 Nov 2008, ??????? ??????? wrote:

> <cluster config_version="0" name="pobeda">
>  <cman/>

^^ as a workaround, remove this <cman/> line and everything should work 
fine. I am testing the fix right now and will propagate that into F10 
soon.

Fabio

> ?? ??????, 18/11/2008 ?? 18:52 +0100, Fabio M. Di Nitto ??????????:
>> On Tue, 18 Nov 2008,   wrote:
>>
>>> Starting /usr/sbin/corosync corosync -f
>>> CMAN_DEBUGLOG=255
>>> COROSYNC_DEFAULT_CONFIG_IFACE=xmlconfig:cmanpreconfig:openaisserviceenable
>>> CMAN_PIPE=4
>>> [MAIN ] Corosync Executive Service RELEASE 'trunk'
>>> [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and
>>> contributors.
>>> [MAIN ] Copyright (C) 2006-2008 Red Hat, Inc.
>>> [MAIN ] Corosync Executive Service: started and ready to provide
>>> service.
>>> [MAIN ] Successfully read config from /etc/cluster/cluster.conf
>>> [MAIN ] Cannot get node name
>>> [MAIN ] AIS Executive exiting with status -9 at main.c:604.
>>> cman_tool: corosync died during startup
>>>
>>>
>>>
>>> hostname is recipient1
>>>
>>> /etc/hosts is
>>> 10.20.4.1 node1
>>> 10.20.4.2 node2
>>> 10.20.4.3 recipient1
>>>
>>
>> Ok this looks like a bug that was reported 10 minutes before your. Can you
>> please post your cluster.conf? If it is the same bug I know how to
>> workaround it while I fix it properly, but I need to see cluster.conf.
>>
>> Thanks
>> Fabio
>>
>> --
>> I'm going to make him an offer he can't refuse.
>> -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
I'm going to make him an offer he can't refuse.

From d.vasilets at peterhost.ru  Wed Nov 19 13:22:32 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Wed, 19 Nov 2008 16:22:32 +0300
Subject: [Linux-cluster] use clvmd only on one node
Message-ID: <1227100952.12209.1.camel@dima-desktop>

Hello
I have 3 nodes cluster - 2 nodes export by gnbd , and recipient.
how can i run clvmd only on recipient, without 2 export-nodes ?



From ccaulfie at redhat.com  Wed Nov 19 13:30:14 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Wed, 19 Nov 2008 13:30:14 +0000
Subject: [Linux-cluster] use clvmd only on one node
In-Reply-To: <1227100952.12209.1.camel@dima-desktop>
References: <1227100952.12209.1.camel@dima-desktop>
Message-ID: <492414E6.9000809@redhat.com>

??????? ??????? wrote:
> Hello
> I have 3 nodes cluster - 2 nodes export by gnbd , and recipient.
> how can i run clvmd only on recipient, without 2 export-nodes ?

Unfortunately you can't. clvmd needs to run on all cluster nodes.

Sorry,

Chrissie



From d.vasilets at peterhost.ru  Wed Nov 19 13:33:11 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Wed, 19 Nov 2008 16:33:11 +0300
Subject: [Linux-cluster] use clvmd only on one node
In-Reply-To: <492414E6.9000809@redhat.com>
References: <1227100952.12209.1.camel@dima-desktop>
	<492414E6.9000809@redhat.com>
Message-ID: <1227101592.12209.4.camel@dima-desktop>

gnbd need running cman
then i need import gnbd device on each server (include gnbd-export
node) ?

? ???, 19/11/2008 ? 13:30 +0000, Christine Caulfield ?????:
> ??????? ??????? wrote:
> > Hello
> > I have 3 nodes cluster - 2 nodes export by gnbd , and recipient.
> > how can i run clvmd only on recipient, without 2 export-nodes ?
> 
> Unfortunately you can't. clvmd needs to run on all cluster nodes.
> 
> Sorry,
> 
> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From skadlec at gk-software.com  Wed Nov 19 14:03:37 2008
From: skadlec at gk-software.com (Stepan Kadlec)
Date: Wed, 19 Nov 2008 15:03:37 +0100
Subject: [Linux-cluster] what is it "the same network path"?
Message-ID: <49241CB9.90605@gk-software.com>

hello,
what is it "the same network path" - mentioned in 
http://sources.redhat.com/cluster/faq.html#two_node_correct ?

I have two node cluster using HP iLO fencing. the iLO interfaces are 
connected in separate VLAN and the interfaces used for CMAN 
communication are connected to another VLAN (but both provided by single 
switch).

do I need quorum disc for such configuration?

currently I don't use quorum disc and the nodes easily end in fence loop 
(if not booted in exactly same time) - is this because I am breaking 
"the same network path" rule? why?

bye stepan.



From Harri.Paivaniemi at tietoenator.com  Wed Nov 19 14:17:54 2008
From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com)
Date: Wed, 19 Nov 2008 16:17:54 +0200
Subject: [Linux-cluster] what is it "the same network path"?
References: <49241CB9.90605@gk-software.com>
Message-ID: <41E8D4F07FCE154CBEBAA60FFC92F67709FE51@apollo.eu.tieto.com>

Yes,

If you "network stop" for exampele on node A, it won't see another node anymore and node B won't see node A anymore. When you don't have qdisk nor network tiebraker, both nodes have a right to believe that another node has failed - so even node A that has network stopped can believe node B has failed.

So you end up to fence loop or having both nodes down because both nodes still have a working fence-connection to use to shoot the partner...

If ILO-address lives in the same subnet (same network path) that cluster communication, "network stop" would also drop node A's fence-ILO and it can't fence node ..

By using qdiskd and it's heuristics you can totally avoid this..

-hjp





-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Stepan Kadlec
Sent: Wed 11/19/2008 16:03
To: linux clustering
Subject: [Linux-cluster] what is it "the same network path"?
 
hello,
what is it "the same network path" - mentioned in 
http://sources.redhat.com/cluster/faq.html#two_node_correct ?

I have two node cluster using HP iLO fencing. the iLO interfaces are 
connected in separate VLAN and the interfaces used for CMAN 
communication are connected to another VLAN (but both provided by single 
switch).

do I need quorum disc for such configuration?

currently I don't use quorum disc and the nodes easily end in fence loop 
(if not booted in exactly same time) - is this because I am breaking 
"the same network path" rule? why?

bye stepan.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3396 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/0991ffeb/attachment.bin>

From jerlyon at gmail.com  Wed Nov 19 14:42:01 2008
From: jerlyon at gmail.com (Jeremy Lyon)
Date: Wed, 19 Nov 2008 07:42:01 -0700
Subject: [Linux-cluster] What is the proper way to grow LVM/GFS volumes
In-Reply-To: <2ca799770811182103r4af14490sc25afb513815efc0@mail.gmail.com>
References: <fac531740811181309m791a4d88kf7b806646b3d3c61@mail.gmail.com>
	<Pine.GSO.4.64.0811181515050.22470@tiki>
	<fac531740811181338h2ca24c11n2d84ae1e543b891d@mail.gmail.com>
	<50356.213.176.147.73.1227045610.squirrel@server0.nszoom.com>
	<fac531740811181438k3698e724u5f0e61164dd323ca@mail.gmail.com>
	<2ca799770811182103r4af14490sc25afb513815efc0@mail.gmail.com>
Message-ID: <779919740811190642l258ec863p8a0d40c2aa4871bb@mail.gmail.com>

>
>
> I have also encountered this problem several times, and although this list
> seems to recommend running clvmd -R it has in fact never helped the
> situation (I'm running centos 5.2). The only way I can solve this problem is
> by rebooting all nodes in the cluster and then extending the lv.
>

We have seen this too, but do not go the route of rebooting.  The nice thing
about clvmd is that it's not required for the cluster to continue running
once up and established.  It's purpose is to communicate LVM metadata
changes to all nodes in the cluster.  So you can simply kill -9 the clvmd
process on all nodes then run service clvmd start. This will get clvmd back
up and allow pv/vg/lv commmands to complete correctly.

-Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/9fd66fa1/attachment.htm>

From alan.zg at gmail.com  Wed Nov 19 16:19:58 2008
From: alan.zg at gmail.com (Alan A)
Date: Wed, 19 Nov 2008 10:19:58 -0600
Subject: [Linux-cluster] What is the proper way to grow LVM/GFS volumes
In-Reply-To: <779919740811190642l258ec863p8a0d40c2aa4871bb@mail.gmail.com>
References: <fac531740811181309m791a4d88kf7b806646b3d3c61@mail.gmail.com>
	<Pine.GSO.4.64.0811181515050.22470@tiki>
	<fac531740811181338h2ca24c11n2d84ae1e543b891d@mail.gmail.com>
	<50356.213.176.147.73.1227045610.squirrel@server0.nszoom.com>
	<fac531740811181438k3698e724u5f0e61164dd323ca@mail.gmail.com>
	<2ca799770811182103r4af14490sc25afb513815efc0@mail.gmail.com>
	<779919740811190642l258ec863p8a0d40c2aa4871bb@mail.gmail.com>
Message-ID: <fac531740811190819o52a08dfesdb0fbb6d19f3db8c@mail.gmail.com>

This was great help - I rebooted the nodes one by one, and afterwards
lvextend command worked. I will try to reproduce the errors and see if
killing clvmd does the trick.

2008/11/19 Jeremy Lyon <jerlyon at gmail.com>

>
>> I have also encountered this problem several times, and although this list
>> seems to recommend running clvmd -R it has in fact never helped the
>> situation (I'm running centos 5.2). The only way I can solve this problem is
>> by rebooting all nodes in the cluster and then extending the lv.
>>
>
> We have seen this too, but do not go the route of rebooting.  The nice
> thing about clvmd is that it's not required for the cluster to continue
> running once up and established.  It's purpose is to communicate LVM
> metadata changes to all nodes in the cluster.  So you can simply kill -9 the
> clvmd process on all nodes then run service clvmd start. This will get clvmd
> back up and allow pv/vg/lv commmands to complete correctly.
>
> -Jeremy
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/c61e785c/attachment.htm>

From d.vasilets at peterhost.ru  Wed Nov 19 16:25:56 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Wed, 19 Nov 2008 19:25:56 +0300
Subject: [Linux-cluster] how to use gnbd without cluster ?
Message-ID: <1227111956.17984.2.camel@dima-desktop>

how to use gnbd without cluster ?



From bmarzins at redhat.com  Wed Nov 19 16:55:49 2008
From: bmarzins at redhat.com (Benjamin Marzinski)
Date: Wed, 19 Nov 2008 10:55:49 -0600
Subject: [Linux-cluster] how to use gnbd without cluster ?
In-Reply-To: <1227111956.17984.2.camel@dima-desktop>
References: <1227111956.17984.2.camel@dima-desktop>
Message-ID: <20081119165549.GF31569@ether.msp.redhat.com>

On Wed, Nov 19, 2008 at 07:25:56PM +0300, ??????? ??????? wrote:
> how to use gnbd without cluster ?

Just start up the the gnbd_server in non-clustered mode

# gnbd_serv -n

Then export the gnbd devices with caching enabled.

# gnbd_export -c -e <gnbd_name> -d <device_to_export>

Finally import the gnbd devices in non-clustered mode

# gnbd_import -ni <server>

When you are using nonclustered mode, you can't use mirroring or
multipathing on top of gnbd.  Either of these can result in data
corruption.

-Ben

> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From garromo at us.ibm.com  Wed Nov 19 17:25:19 2008
From: garromo at us.ibm.com (Gary Romo)
Date: Wed, 19 Nov 2008 10:25:19 -0700
Subject: [Linux-cluster] What is the proper way to grow LVM/GFS volumes
In-Reply-To: <fac531740811190819o52a08dfesdb0fbb6d19f3db8c@mail.gmail.com>
Message-ID: <OFD6025F93.3FC1ADD7-ON87257506.005FA283-87257506.005FB3F5@us.ibm.com>


So what causes clvmd to work this way in the first place?  Why can't it
play nice?

- Gary



                                                                           
             "Alan A"                                                      
             <alan.zg at gmail.co                                             
             m>                                                         To 
             Sent by:                  "linux clustering"                  
             linux-cluster-bou         <linux-cluster at redhat.com>          
             nces at redhat.com                                            cc 
                                                                           
                                                                   Subject 
             11/19/2008 09:19          Re: [Linux-cluster] What is the     
             AM                        proper way to grow LVM/GFS volumes  
                                                                           
                                                                           
             Please respond to                                             
             linux clustering                                              
             <linux-cluster at re                                             
                 dhat.com>                                                 
                                                                           
                                                                           




This was great help - I rebooted the nodes one by one, and afterwards
lvextend command worked. I will try to reproduce the errors and see if
killing clvmd does the trick.

2008/11/19 Jeremy Lyon <jerlyon at gmail.com>

   I have also encountered this problem several times, and although this
   list seems to recommend running clvmd -R it has in fact never helped the
   situation (I'm running centos 5.2). The only way I can solve this
   problem is by rebooting all nodes in the cluster and then extending the
   lv.

  We have seen this too, but do not go the route of rebooting.  The nice
  thing about clvmd is that it's not required for the cluster to continue
  running once up and established.  It's purpose is to communicate LVM
  metadata changes to all nodes in the cluster.  So you can simply kill -9
  the clvmd process on all nodes then run service clvmd start. This will
  get clvmd back up and allow pv/vg/lv commmands to complete correctly.

  -Jeremy


  --
  Linux-cluster mailing list
  Linux-cluster at redhat.com
  https://www.redhat.com/mailman/listinfo/linux-cluster



--
Alan A.--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/514f985a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/514f985a/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic31931.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/514f985a/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/514f985a/attachment-0002.gif>

From cedwards at smartechcorp.net  Wed Nov 19 19:47:34 2008
From: cedwards at smartechcorp.net (Chris Edwards)
Date: Wed, 19 Nov 2008 14:47:34 -0500
Subject: [Linux-cluster] GFS as a service
Message-ID: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>

I have a iSCSI SAN and I was wondering when setting up a cluster if I should run GFS as a service or should I let the servers automount GFS on bootup?

Thanks!

---

Chris Edwards

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/b69ee4d2/attachment.htm>

From billpp at gmail.com  Wed Nov 19 19:57:17 2008
From: billpp at gmail.com (Flavio Junior)
Date: Wed, 19 Nov 2008 17:57:17 -0200
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>
References: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>
Message-ID: <58aa8d780811191157y7150117fi5809205ba9caba0e@mail.gmail.com>

Check for /etc/init.d/gfs and /etc/init.d/gfs2

Those servicer are responsible for mount gfs filesystem's into /etc/fstab

--

Fl?vio do Carmo J?nior aka waKKu

2008/11/19 Chris Edwards <cedwards at smartechcorp.net>

>  I have a iSCSI SAN and I was wondering when setting up a cluster if I
> should run GFS as a service or should I let the servers automount GFS on
> bootup?
>
>
>
> Thanks!
>
>
>
> ---
>
>
>
> Chris Edwards
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/4a4d1856/attachment.htm>

From billpp at gmail.com  Wed Nov 19 19:57:17 2008
From: billpp at gmail.com (Flavio Junior)
Date: Wed, 19 Nov 2008 17:57:17 -0200
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>
References: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>
Message-ID: <58aa8d780811191157y7150117fi5809205ba9caba0e@mail.gmail.com>

Check for /etc/init.d/gfs and /etc/init.d/gfs2

Those servicer are responsible for mount gfs filesystem's into /etc/fstab

--

Fl?vio do Carmo J?nior aka waKKu

2008/11/19 Chris Edwards <cedwards at smartechcorp.net>

>  I have a iSCSI SAN and I was wondering when setting up a cluster if I
> should run GFS as a service or should I let the servers automount GFS on
> bootup?
>
>
>
> Thanks!
>
>
>
> ---
>
>
>
> Chris Edwards
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/4a4d1856/attachment-0001.htm>

From cedwards at smartechcorp.net  Wed Nov 19 20:30:11 2008
From: cedwards at smartechcorp.net (Chris Edwards)
Date: Wed, 19 Nov 2008 15:30:11 -0500
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <58aa8d780811191157y7150117fi5809205ba9caba0e@mail.gmail.com>
References: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>
	<58aa8d780811191157y7150117fi5809205ba9caba0e@mail.gmail.com>
Message-ID: <61252CC53A97634BA52256DCF2344FBC66CA1458EC@OFFICEEXCHANGE.office.smartechcorp.net>

I meant do I need to run GFS a clustered service inside of luci?

---

Chris Edwards


From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Flavio Junior
Sent: Wednesday, November 19, 2008 2:57 PM
To: linux clustering
Subject: Re: [Linux-cluster] GFS as a service

Check for /etc/init.d/gfs and /etc/init.d/gfs2

Those servicer are responsible for mount gfs filesystem's into /etc/fstab

--

Fl?vio do Carmo J?nior aka waKKu
2008/11/19 Chris Edwards <cedwards at smartechcorp.net<mailto:cedwards at smartechcorp.net>>

I have a iSCSI SAN and I was wondering when setting up a cluster if I should run GFS as a service or should I let the servers automount GFS on bootup?



Thanks!



---



Chris Edwards

--
Linux-cluster mailing list
Linux-cluster at redhat.com<mailto:Linux-cluster at redhat.com>
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/de08fcb7/attachment.htm>

From jos at xos.nl  Wed Nov 19 21:06:51 2008
From: jos at xos.nl (Jos Vos)
Date: Wed, 19 Nov 2008 22:06:51 +0100
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>
References: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>
Message-ID: <20081119210651.GA19323@jasmine.xos.nl>

On Wed, Nov 19, 2008 at 02:47:34PM -0500, Chris Edwards wrote:

> I have a iSCSI SAN and I was wondering when setting up a cluster if
> I should run GFS as a service or should I let the servers automount GFS
> on bootup?

You should (well, for the typical use):

-  not put the GFS filesystems in fstab

-  enable the gfs service (init script)

-  specificy the GFS filesystems as cluster resources

-- 
--    Jos Vos <jos at xos.nl>
--    X/OS Experts in Open Systems BV   |   Phone: +31 20 6938364
--    Amsterdam, The Netherlands        |     Fax: +31 20 6948204



From cedwards at smartechcorp.net  Wed Nov 19 21:10:51 2008
From: cedwards at smartechcorp.net (Chris Edwards)
Date: Wed, 19 Nov 2008 16:10:51 -0500
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <20081119210651.GA19323@jasmine.xos.nl>
References: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>
	<20081119210651.GA19323@jasmine.xos.nl>
Message-ID: <61252CC53A97634BA52256DCF2344FBC66CA1458F8@OFFICEEXCHANGE.office.smartechcorp.net>

> > I have a iSCSI SAN and I was wondering when setting up a cluster if
> > I should run GFS as a service or should I let the servers automount
> GFS
> > on bootup?
> 
> You should (well, for the typical use):
> 
> -  not put the GFS filesystems in fstab
> 
> -  enable the gfs service (init script)
> 

Do you happen to have an example of this script?


> -  specificy the GFS filesystems as cluster resources


Thanks for the reply!!!

---

Chris Edwards



From jos at xos.nl  Wed Nov 19 21:16:56 2008
From: jos at xos.nl (Jos Vos)
Date: Wed, 19 Nov 2008 22:16:56 +0100
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <61252CC53A97634BA52256DCF2344FBC66CA1458F8@OFFICEEXCHANGE.office.smartechcorp.net>
References: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>
	<20081119210651.GA19323@jasmine.xos.nl>
	<61252CC53A97634BA52256DCF2344FBC66CA1458F8@OFFICEEXCHANGE.office.smartechcorp.net>
Message-ID: <20081119211656.GB19323@jasmine.xos.nl>

On Wed, Nov 19, 2008 at 04:10:51PM -0500, Chris Edwards wrote:

> > -  enable the gfs service (init script)
> 
> Do you happen to have an example of this script?

The /etc/rc.d/init.d/gfs script is part of the gfs-utils package.

-- 
--    Jos Vos <jos at xos.nl>
--    X/OS Experts in Open Systems BV   |   Phone: +31 20 6938364
--    Amsterdam, The Netherlands        |     Fax: +31 20 6948204



From jaime.alonso.miguel at gmail.com  Wed Nov 19 21:49:49 2008
From: jaime.alonso.miguel at gmail.com (Jaime Alonso)
Date: Wed, 19 Nov 2008 22:49:49 +0100
Subject: [Linux-cluster] problems ILO-fencing
Message-ID: <68ca69660811191349t1692de05q62be4bee93f64b43@mail.gmail.com>

Hi all, this is my first post, i hope not the last.
I'm trying to configure a two node cluster but I have a big problem.

I have two hp servers with ILO integrated.
I configure both ILO's as fence devices with each Ip, user name, password, ...

I was doing test and when I force the node 1 to fail, the node 2 works
correctly and takes the services that i want, but suddently the node 1
restarts.

I understand that the fence service is who send the order, but i don't
want to restart, what I want is to switch off the server.

Do you know how can i configure the cluster to do it?

Thank you in advance.



From tom at netspot.com.au  Wed Nov 19 21:56:25 2008
From: tom at netspot.com.au (Tom Lanyon)
Date: Thu, 20 Nov 2008 08:26:25 +1030
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <7A795AFA-ED36-4718-B224-5BE02A5FBAD7@netspot.com.au>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<20081114162649.GA4054@redhat.com>
	<200811142153.13752.npf-mlists@eurotux.com>
	<20081114220515.GB7394@redhat.com>
	<B2F48526-C962-474D-ACFC-F95852D3F387@netspot.com.au>
	<20081118153619.GA10717@redhat.com>
	<7A795AFA-ED36-4718-B224-5BE02A5FBAD7@netspot.com.au>
Message-ID: <D35F1EE5-6145-453F-A65F-9CEBA3AF3F98@netspot.com.au>

On 19/11/2008, at 3:00 PM, Tom Lanyon wrote:

> We haven't been able to catch it quick enough to determine which  
> process is using all CPU.
>
> The other option is that we're just seeing a huge amount of glocks  
> created on the node running backups and all others (webservers) are  
> just hanging whilst trying to access files. I've just done some  
> fairly aggressive tuning of the GFS mounts on all nodes; hopefully  
> this fixes it!
>
> Regards,
> Tom

After tuning some GFS parameters yesterday, last night's backup ran  
without a hitch! :)

Tom



From bkyoung at gmail.com  Wed Nov 19 22:07:25 2008
From: bkyoung at gmail.com (Brandon Young)
Date: Wed, 19 Nov 2008 16:07:25 -0600
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <D35F1EE5-6145-453F-A65F-9CEBA3AF3F98@netspot.com.au>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<20081114162649.GA4054@redhat.com>
	<200811142153.13752.npf-mlists@eurotux.com>
	<20081114220515.GB7394@redhat.com>
	<B2F48526-C962-474D-ACFC-F95852D3F387@netspot.com.au>
	<20081118153619.GA10717@redhat.com>
	<7A795AFA-ED36-4718-B224-5BE02A5FBAD7@netspot.com.au>
	<D35F1EE5-6145-453F-A65F-9CEBA3AF3F98@netspot.com.au>
Message-ID: <824ffea00811191407g7829f368wc13e054fe37134ee@mail.gmail.com>

Could you please share the parameters you tuned, and perhaps a brief
explanation of your thinking?  I have hideously slow backups, too, and
haven't been successful in improving it through tuning.

--
Brandon

On Wed, Nov 19, 2008 at 3:56 PM, Tom Lanyon <tom at netspot.com.au> wrote:

> On 19/11/2008, at 3:00 PM, Tom Lanyon wrote:
>
>  We haven't been able to catch it quick enough to determine which process
>> is using all CPU.
>>
>> The other option is that we're just seeing a huge amount of glocks created
>> on the node running backups and all others (webservers) are just hanging
>> whilst trying to access files. I've just done some fairly aggressive tuning
>> of the GFS mounts on all nodes; hopefully this fixes it!
>>
>> Regards,
>> Tom
>>
>
> After tuning some GFS parameters yesterday, last night's backup ran without
> a hitch! :)
>
>
> Tom
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/d24f1f17/attachment.htm>

From jerlyon at gmail.com  Wed Nov 19 22:22:20 2008
From: jerlyon at gmail.com (Jeremy Lyon)
Date: Wed, 19 Nov 2008 15:22:20 -0700
Subject: [Linux-cluster] problems ILO-fencing
In-Reply-To: <68ca69660811191349t1692de05q62be4bee93f64b43@mail.gmail.com>
References: <68ca69660811191349t1692de05q62be4bee93f64b43@mail.gmail.com>
Message-ID: <779919740811191422j304828ealcbd4a72b581fe0be@mail.gmail.com>

>
> I was doing test and when I force the node 1 to fail, the node 2 works
> correctly and takes the services that i want, but suddently the node 1
> restarts.


How did you force the failure?  If the nodes unexpectedly lose communication
to each other via the cluster multicast, then fencing is going to be
attempted.

-Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/7a6dae72/attachment.htm>

From stephenamadei at hotmail.com  Thu Nov 20 02:57:25 2008
From: stephenamadei at hotmail.com (Stephen Amadei)
Date: Wed, 19 Nov 2008 21:57:25 -0500
Subject: [Linux-cluster] GFS kernel patch?
Message-ID: <COL109-W650CD6BAD011DAC29B1C86BB0C0@phx.gbl>


 
Am I crazy or was there a GFS kernel patch that was needed to use GFS with 2.6.27.x/2.03.09?
 
Now it looks like GFS is only available as a kernel module.  Is there a way to patch this into the kernel?
My kernels don't run modules.
 
Thanks.
 
Stephen
_________________________________________________________________
Proud to be a PC? Show the world. Download the ?I?m a PC? Messenger themepack now.
hthttp://clk.atdmt.com/MRT/go/119642558/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081119/4fbf4f66/attachment.htm>

From skadlec at gk-software.com  Thu Nov 20 06:23:03 2008
From: skadlec at gk-software.com (Stepan Kadlec)
Date: Thu, 20 Nov 2008 07:23:03 +0100
Subject: [Linux-cluster] problems ILO-fencing
In-Reply-To: <68ca69660811191349t1692de05q62be4bee93f64b43@mail.gmail.com>
References: <68ca69660811191349t1692de05q62be4bee93f64b43@mail.gmail.com>
Message-ID: <49250247.2000700@gk-software.com>



Jaime Alonso wrote:
> Hi all, this is my first post, i hope not the last.
> I'm trying to configure a two node cluster but I have a big problem.
> 
> I have two hp servers with ILO integrated.
> I configure both ILO's as fence devices with each Ip, user name, password, ...
> 
> I was doing test and when I force the node 1 to fail, the node 2 works
> correctly and takes the services that i want, but suddently the node 1
> restarts.
> 
> I understand that the fence service is who send the order, but i don't
> want to restart, what I want is to switch off the server.
> 
> Do you know how can i configure the cluster to do it?
> 

add option="off" to the fencedevice tag in cluster.conf

<fencedevice name="ilo-node1" agent="fence_ilo" hostname="ilo-node1" 
login="name" passwd="xxx" option="off" />

bye stepan

> Thank you in advance.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

-- 
Eurosoftware s.r.o.
skadlec at gk-software.com
+420 379 307 379
+420 724 554 104



From xavier.montagutelli at unilim.fr  Thu Nov 20 08:40:29 2008
From: xavier.montagutelli at unilim.fr (Xavier Montagutelli)
Date: Thu, 20 Nov 2008 09:40:29 +0100
Subject: [Linux-cluster] fenced eating all the CPU
Message-ID: <200811200940.29458.xavier.montagutelli@unilim.fr>


On one node of my 3 nodes cluster, fenced is eating one CPU (RHEL 5.2, 
cman-2.0.84-2.el5).

I suppose I should restart fenced on that node. Or is there any other way to 
recover ?

I don't want to disturb anything else (I don't want to migrate services or 
restart that node). Can I safely leave the fence domain (fence_tool leave), 
then restart fenced and join the fence domain again ?

Thanks for your advice.
-- 
Xavier Montagutelli                      Tel : +33 (0)5 55 45 77 20
Service Commun Informatique              Fax : +33 (0)5 55 45 75 95
Universite de Limoges
123, avenue Albert Thomas
87060 Limoges cedex



From d.vasilets at peterhost.ru  Thu Nov 20 09:33:14 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Thu, 20 Nov 2008 12:33:14 +0300
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
Message-ID: <1227173594.10020.4.camel@dima-desktop>

hello
i use lvm2  2.02.39 
when i write  "lvcreate  -L 60G -m1 --corelog  -n mirror1 volg1"
i get error 
"Error locking on node recipient1: device-mapper: reload ioctl failed:
Invalid argument
 Error locking on node recipient2: device-mapper: reload ioctl
failed:Invalid argument
 Aborting. Failed to activate new LV to wipe the start of it."

i must update lvm2 or other ?

? ???, 11/11/2008 ? 20:01 -0600, Jonathan Brassow ?????:
> Sure.  In fact, if you have access to the red hat 5.3 beta, it is
> ready there.
> 
> 
>  brassow
> 
> On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:
> 
> > can i use cluster raid1 if i get development release from
> > sources.redhat.com/cluster ?
> > 
> > 
> > 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann GmbH ]
> > <mad at wol.de>
> >         Hello,
> >         
> >         
> >         will the changes to mirroring get merged into stable2 and
> >         head after
> >         RHEL-5.3 release?
> >         
> >         
> >         Marc
> >         
> >         Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan
> >         Brassow:
> >         
> >         > that works already.
> >         >
> >         > single machine: linear, stripe, mirror, snapshot
> >         > cluster-aware: linear, stripe, mirror (5.3)
> >         >
> >         >   brassow
> >         >
> >         > On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
> >         >
> >         > > What about CLVM based striping (RAID0)? Does that work
> >         already or is
> >         > > it planned for the near future?
> >         > >
> >         > > Gordan
> >         > >
> >         > > Jonathan Brassow wrote:
> >         > >> Cluster mirror (RAID1) will be available in rhel5.3 for
> >         LVM.
> >         > >> brassow
> >         > >> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> >         > >>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic
> >         <gordan at bobich.net>
> >         > >>> wrote:
> >         > >>>> I rather doubt md will become cluster aware any time
> >         soon. CLVM
> >         > >>>> doesn't yet
> >         > >>>> support even more important features like
> >         snapshotting, so I
> >         > >>>> wouldn't count
> >         > >>>> on it supporting anything more advanced.
> >         > >>>
> >         > >>> I worked a little on clvm snapshots:
> >         > >>>
> >         https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
> >         > >>>
> >         > >>> Review and testing is required.
> >         > >>> --
> >         > >>> Federico.
> >         > >>>
> >         > >>> --
> >         > >>> Linux-cluster mailing list
> >         > >>> Linux-cluster at redhat.com
> >         > >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >         > >> --
> >         > >> Linux-cluster mailing list
> >         > >> Linux-cluster at redhat.com
> >         > >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >         > >
> >         > > --
> >         > > Linux-cluster mailing list
> >         > > Linux-cluster at redhat.com
> >         > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >         >
> >         > --
> >         > Linux-cluster mailing list
> >         > Linux-cluster at redhat.com
> >         > https://www.redhat.com/mailman/listinfo/linux-cluster
> >         
> >         
> >         --
> >         
> >         Linux-cluster mailing list
> >         Linux-cluster at redhat.com
> >         https://www.redhat.com/mailman/listinfo/linux-cluster
> >         
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From george.saji00 at gmail.com  Thu Nov 20 10:51:10 2008
From: george.saji00 at gmail.com (saji george)
Date: Thu, 20 Nov 2008 16:21:10 +0530
Subject: [Linux-cluster] Problem in starting modcluster
Message-ID: <2460eaad0811200251k1ee43d42udda8c171d7a9bd6a@mail.gmail.com>

Hi,    I am new to linux clustering.I have installed three servers with RHEL
5 and installed the ricic , luci and modcluster .

I am not able to start the modclusterd service.

While starting I am getting the below error

[root at node1 ~]# service modclusterd start
Starting Cluster Module - cluster monitor: Setting verbosity level to
LogBasic
                                                           [FAILED]



Regards
Saji
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081120/4958173c/attachment.htm>

From mpartio at gmail.com  Thu Nov 20 11:11:37 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Thu, 20 Nov 2008 13:11:37 +0200
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <20081119210651.GA19323@jasmine.xos.nl>
References: <61252CC53A97634BA52256DCF2344FBC66CA1458E7@OFFICEEXCHANGE.office.smartechcorp.net>
	<20081119210651.GA19323@jasmine.xos.nl>
Message-ID: <2ca799770811200311o5d4da5a6pdc9b7e5dc9221e65@mail.gmail.com>

On Wed, Nov 19, 2008 at 11:06 PM, Jos Vos <jos at xos.nl> wrote:

>
> You should (well, for the typical use):
>
> -  not put the GFS filesystems in fstab
>
> -  enable the gfs service (init script)
>


Doesn't gfs init script read fstab?

Regards

Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081120/481fac22/attachment.htm>

From d.vasilets at peterhost.ru  Thu Nov 20 12:06:14 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Thu, 20 Nov 2008 15:06:14 +0300
Subject: [Linux-cluster] bug in lvm2-2.02.43 dm_list_init
Message-ID: <1227182774.14306.2.camel@dima-desktop>

hello
i build lvm2 2.02.43.rpm 
write vgsca and get error "vgscan: relocation error: vgscan: symbol
dm_list_init, version Base not defined in file libdevmapper.so.1.02 with
link time reference"

device-mapper 1.02.27
lvm2 2.02.43




From agk at redhat.com  Thu Nov 20 12:12:43 2008
From: agk at redhat.com (Alasdair G Kergon)
Date: Thu, 20 Nov 2008 12:12:43 +0000
Subject: [Linux-cluster] bug in lvm2-2.02.43 dm_list_init
In-Reply-To: <1227182774.14306.2.camel@dima-desktop>
References: <1227182774.14306.2.camel@dima-desktop>
Message-ID: <20081120121243.GE6014@agk.fab.redhat.com>

On Thu, Nov 20, 2008 at 03:06:14PM +0300, ??????? ??????? wrote:
> i build lvm2 2.02.43.rpm 

Which RPM-based distro?

> device-mapper 1.02.27
> lvm2 2.02.43
 
On Fedora, device-mapper is a sub-package of lvm2 and the spec file is supposed
to require you to update both together.
 
Alasdair
-- 
agk at redhat.com



From d.vasilets at peterhost.ru  Thu Nov 20 12:26:18 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Thu, 20 Nov 2008 15:26:18 +0300
Subject: [Linux-cluster] bug in lvm2-2.02.43 dm_list_init
In-Reply-To: <20081120121243.GE6014@agk.fab.redhat.com>
References: <1227182774.14306.2.camel@dima-desktop>
	<20081120121243.GE6014@agk.fab.redhat.com>
Message-ID: <1227183978.14897.4.camel@dima-desktop>

Fedora 9 and update all needed packages ()
i build lvm2 2.02.42 and it work 


? ???, 20/11/2008 ? 12:12 +0000, Alasdair G Kergon ?????:
> On Thu, Nov 20, 2008 at 03:06:14PM +0300, ??????? ??????? wrote:
> > i build lvm2 2.02.43.rpm 
> 
> Which RPM-based distro?
> 
> > device-mapper 1.02.27
> > lvm2 2.02.43
>  
> On Fedora, device-mapper is a sub-package of lvm2 and the spec file is supposed
> to require you to update both together.
>  
> Alasdair



From d.vasilets at peterhost.ru  Thu Nov 20 12:30:31 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Thu, 20 Nov 2008 15:30:31 +0300
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <1227173594.10020.4.camel@dima-desktop>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
	<1227173594.10020.4.camel@dima-desktop>
Message-ID: <1227184232.14897.6.camel@dima-desktop>

i build lvm2-2.02.42 
error appear again

? ???, 20/11/2008 ? 12:33 +0300, ??????? ??????? ?????:
> hello
> i use lvm2  2.02.39 
> when i write  "lvcreate  -L 60G -m1 --corelog  -n mirror1 volg1"
> i get error 
> "Error locking on node recipient1: device-mapper: reload ioctl failed:
> Invalid argument
>  Error locking on node recipient2: device-mapper: reload ioctl
> failed:Invalid argument
>  Aborting. Failed to activate new LV to wipe the start of it."
> 
> i must update lvm2 or other ?
> 
> ? ???, 11/11/2008 ? 20:01 -0600, Jonathan Brassow ?????:
> > Sure.  In fact, if you have access to the red hat 5.3 beta, it is
> > ready there.
> > 
> > 
> >  brassow
> > 
> > On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:
> > 
> > > can i use cluster raid1 if i get development release from
> > > sources.redhat.com/cluster ?
> > > 
> > > 
> > > 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann GmbH ]
> > > <mad at wol.de>
> > >         Hello,
> > >         
> > >         
> > >         will the changes to mirroring get merged into stable2 and
> > >         head after
> > >         RHEL-5.3 release?
> > >         
> > >         
> > >         Marc
> > >         
> > >         Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan
> > >         Brassow:
> > >         
> > >         > that works already.
> > >         >
> > >         > single machine: linear, stripe, mirror, snapshot
> > >         > cluster-aware: linear, stripe, mirror (5.3)
> > >         >
> > >         >   brassow
> > >         >
> > >         > On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
> > >         >
> > >         > > What about CLVM based striping (RAID0)? Does that work
> > >         already or is
> > >         > > it planned for the near future?
> > >         > >
> > >         > > Gordan
> > >         > >
> > >         > > Jonathan Brassow wrote:
> > >         > >> Cluster mirror (RAID1) will be available in rhel5.3 for
> > >         LVM.
> > >         > >> brassow
> > >         > >> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> > >         > >>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic
> > >         <gordan at bobich.net>
> > >         > >>> wrote:
> > >         > >>>> I rather doubt md will become cluster aware any time
> > >         soon. CLVM
> > >         > >>>> doesn't yet
> > >         > >>>> support even more important features like
> > >         snapshotting, so I
> > >         > >>>> wouldn't count
> > >         > >>>> on it supporting anything more advanced.
> > >         > >>>
> > >         > >>> I worked a little on clvm snapshots:
> > >         > >>>
> > >         https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
> > >         > >>>
> > >         > >>> Review and testing is required.
> > >         > >>> --
> > >         > >>> Federico.
> > >         > >>>
> > >         > >>> --
> > >         > >>> Linux-cluster mailing list
> > >         > >>> Linux-cluster at redhat.com
> > >         > >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >         > >> --
> > >         > >> Linux-cluster mailing list
> > >         > >> Linux-cluster at redhat.com
> > >         > >> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >         > >
> > >         > > --
> > >         > > Linux-cluster mailing list
> > >         > > Linux-cluster at redhat.com
> > >         > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >         >
> > >         > --
> > >         > Linux-cluster mailing list
> > >         > Linux-cluster at redhat.com
> > >         > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >         
> > >         
> > >         --
> > >         
> > >         Linux-cluster mailing list
> > >         Linux-cluster at redhat.com
> > >         https://www.redhat.com/mailman/listinfo/linux-cluster
> > >         
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From d.vasilets at peterhost.ru  Thu Nov 20 14:02:25 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Thu, 20 Nov 2008 17:02:25 +0300
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <1227184232.14897.6.camel@dima-desktop>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
	<1227173594.10020.4.camel@dima-desktop>
	<1227184232.14897.6.camel@dima-desktop>
Message-ID: <1227189746.17899.0.camel@dima-desktop>

if i whant use distributed raid i must patch kernel ?
using this patch
http://www.linux-archive.org/device-mapper-development/131614-cluster-aware-log-module-enabling-cluster-aware-mirrors.html


? ???, 20/11/2008 ? 15:30 +0300, ??????? ??????? ?????:
> i build lvm2-2.02.42 
> error appear again
> 
> ? ???, 20/11/2008 ? 12:33 +0300, ??????? ??????? ?????:
> > hello
> > i use lvm2  2.02.39 
> > when i write  "lvcreate  -L 60G -m1 --corelog  -n mirror1 volg1"
> > i get error 
> > "Error locking on node recipient1: device-mapper: reload ioctl failed:
> > Invalid argument
> >  Error locking on node recipient2: device-mapper: reload ioctl
> > failed:Invalid argument
> >  Aborting. Failed to activate new LV to wipe the start of it."
> > 
> > i must update lvm2 or other ?
> > 
> > ? ???, 11/11/2008 ? 20:01 -0600, Jonathan Brassow ?????:
> > > Sure.  In fact, if you have access to the red hat 5.3 beta, it is
> > > ready there.
> > > 
> > > 
> > >  brassow
> > > 
> > > On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:
> > > 
> > > > can i use cluster raid1 if i get development release from
> > > > sources.redhat.com/cluster ?
> > > > 
> > > > 
> > > > 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann GmbH ]
> > > > <mad at wol.de>
> > > >         Hello,
> > > >         
> > > >         
> > > >         will the changes to mirroring get merged into stable2 and
> > > >         head after
> > > >         RHEL-5.3 release?
> > > >         
> > > >         
> > > >         Marc
> > > >         
> > > >         Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan
> > > >         Brassow:
> > > >         
> > > >         > that works already.
> > > >         >
> > > >         > single machine: linear, stripe, mirror, snapshot
> > > >         > cluster-aware: linear, stripe, mirror (5.3)
> > > >         >
> > > >         >   brassow
> > > >         >
> > > >         > On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
> > > >         >
> > > >         > > What about CLVM based striping (RAID0)? Does that work
> > > >         already or is
> > > >         > > it planned for the near future?
> > > >         > >
> > > >         > > Gordan
> > > >         > >
> > > >         > > Jonathan Brassow wrote:
> > > >         > >> Cluster mirror (RAID1) will be available in rhel5.3 for
> > > >         LVM.
> > > >         > >> brassow
> > > >         > >> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> > > >         > >>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic
> > > >         <gordan at bobich.net>
> > > >         > >>> wrote:
> > > >         > >>>> I rather doubt md will become cluster aware any time
> > > >         soon. CLVM
> > > >         > >>>> doesn't yet
> > > >         > >>>> support even more important features like
> > > >         snapshotting, so I
> > > >         > >>>> wouldn't count
> > > >         > >>>> on it supporting anything more advanced.
> > > >         > >>>
> > > >         > >>> I worked a little on clvm snapshots:
> > > >         > >>>
> > > >         https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
> > > >         > >>>
> > > >         > >>> Review and testing is required.
> > > >         > >>> --
> > > >         > >>> Federico.
> > > >         > >>>
> > > >         > >>> --
> > > >         > >>> Linux-cluster mailing list
> > > >         > >>> Linux-cluster at redhat.com
> > > >         > >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >         > >> --
> > > >         > >> Linux-cluster mailing list
> > > >         > >> Linux-cluster at redhat.com
> > > >         > >> https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >         > >
> > > >         > > --
> > > >         > > Linux-cluster mailing list
> > > >         > > Linux-cluster at redhat.com
> > > >         > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >         >
> > > >         > --
> > > >         > Linux-cluster mailing list
> > > >         > Linux-cluster at redhat.com
> > > >         > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >         
> > > >         
> > > >         --
> > > >         
> > > >         Linux-cluster mailing list
> > > >         Linux-cluster at redhat.com
> > > >         https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >         
> > > > 
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster at redhat.com
> > > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From brandt at decoit.de  Thu Nov 20 17:43:33 2008
From: brandt at decoit.de (Andre Brandt)
Date: Thu, 20 Nov 2008 18:43:33 +0100
Subject: [Linux-cluster] error while creating failoverdomain with luci
Message-ID: <4925A1C5.2080607@decoit.de>

Hi out there!

I'm trying to set up my cluster with luci. While creating a
failoverdomain I always get an error message. Luci says, that the
failoverdomain can't be created. The error message is in german language
- so I'll try to translate ist:

    Website-Error
    An error occured. Error message:

    Errortype
       AttributeError

    Errorvalue
       getFailoverDomainByName

    Requested at:
        2008/11/20 18:29:29.714 GMT+1

Can anybody tell me, whats going wrong there?


My configuration created by luci:

<?xml version="1.0"?>
<cluster alias="Nyx" config_version="3" name="Nyx">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="node2.mylan" nodeid="1" votes="1">
                        <fence/>
                        <multicast addr="226.95.1.1" interface="bond0.2"/>
                </clusternode>
                <clusternode name="node1.mylan" nodeid="2" votes="1">
                        <fence/>
                        <multicast addr="226.95.1.1" interface="bond0.2"/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
                <multicast addr="226.95.1.1"/>
        </cman>
        <fencedevices/>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
        <quorumd device="/dev/mapper/lun-quorump1" interval="3"
min_score="1" tko="10" votes="3">
                <heuristic interval="2" program="ping -c3 -t2
10.100.1.1" score="1"/>
                <heuristic interval="2" program="ping -c3 -t2
10.100.1.2" score="1"/>
                <heuristic interval="2" program="ping -c3 -t2
10.100.1.3" score="1"/>
                <heuristic interval="2" program="ping -c3 -t2
10.100.8.1" score="1"/>
                <heuristic interval="2" program="ping -c3 -t2
10.100.8.2" score="1"/>
        </quorumd>
</cluster>

Thanks, Andre





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081120/0c4e0d7b/attachment.htm>

From rpeterso at redhat.com  Thu Nov 20 18:50:00 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 20 Nov 2008 13:50:00 -0500 (EST)
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <2ca799770811200311o5d4da5a6pdc9b7e5dc9221e65@mail.gmail.com>
Message-ID: <198800046.1528001227207000769.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>

----- "Mikko Partio" <mpartio at gmail.com> wrote:
| On Wed, Nov 19, 2008 at 11:06 PM, Jos Vos < jos at xos.nl > wrote:
| You should (well, for the typical use):
| 
| - not put the GFS filesystems in fstab
| 
| - enable the gfs service (init script)
| 
| 
| Doesn't gfs init script read fstab?
| 
| Regards
| 
| Mikko

Hi Mikko,

Yes, the gfs init script reads fstab.  It's the "as a service"
qualifier that is confusing.  To clarify: For mounting gfs file
systems automatically, you have two choices:

1. Put your gfs mount points into /etc/fstab and enable /etc/init.d/gfs
   so that it runs at startup.  You can do this on all nodes because
   gfs is cluster-aware, so all your nodes may mount it at the same time.
2. Put your gfs mounts in your cluster configuration (cluster.conf) as
   services, and let rgmanager take care of mounting and unmounting as
   necessary on the nodes where you have the service defined.
   (This may or may not be done via conga/ricci/luci--I always do it by
   hand).

Regards,

Bob Peterson
Red Hat GFS



From rpeterso at redhat.com  Thu Nov 20 18:55:10 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 20 Nov 2008 13:55:10 -0500 (EST)
Subject: [Linux-cluster] GFS kernel patch?
In-Reply-To: <COL109-W650CD6BAD011DAC29B1C86BB0C0@phx.gbl>
Message-ID: <703454072.1529621227207310316.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>

----- "Stephen Amadei" <stephenamadei at hotmail.com> wrote:
| Am I crazy or was there a GFS kernel patch that was needed to use GFS
| with 2.6.27.x/2.03.09?
| 
| Now it looks like GFS is only available as a kernel module. Is there a
| way to patch this into the kernel?
| My kernels don't run modules.
| 
| Thanks.
| 
| Stephen

Hi Stephen,

AFAIK, gfs was always a kernel module.  You might be able to patch it in
to the kernel source tree by hand, but I've never done it.

There was a gfs patch to enable it to allocate files properly for newer
kernels, such as 2.6.27.x.  That was just to do a little kernel catchup
in gfs1.

On the other hand, GFS2 has been a part of the base kernel for a while
now, although it's still not considered production ready.  I think the
version currently pulled into Linus's tree is pretty good.  Steve
Whitehouse's "nmw" git tree in kernel.org has the latest/greatest GFS2
and should be fairly stable, although it's a moving target.

Regards,

Bob Peterson
Red Hat GFS



From cedwards at smartechcorp.net  Thu Nov 20 19:13:17 2008
From: cedwards at smartechcorp.net (Chris Edwards)
Date: Thu, 20 Nov 2008 14:13:17 -0500
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <198800046.1528001227207000769.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
References: <2ca799770811200311o5d4da5a6pdc9b7e5dc9221e65@mail.gmail.com>
	<198800046.1528001227207000769.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <61252CC53A97634BA52256DCF2344FBC66CA14595D@OFFICEEXCHANGE.office.smartechcorp.net>

> 
> Yes, the gfs init script reads fstab.  It's the "as a service"
> qualifier that is confusing.  To clarify: For mounting gfs file
> systems automatically, you have two choices:
> 
> 1. Put your gfs mount points into /etc/fstab and enable /etc/init.d/gfs
>    so that it runs at startup.  You can do this on all nodes because
>    gfs is cluster-aware, so all your nodes may mount it at the same
> time.
> 2. Put your gfs mounts in your cluster configuration (cluster.conf) as
>    services, and let rgmanager take care of mounting and unmounting as
>    necessary on the nodes where you have the service defined.
>    (This may or may not be done via conga/ricci/luci--I always do it by
>    hand).
> 
> Regards,
> 
> Bob Peterson
> Red Hat GFS

Ok, so which is better for a Red Hat cluster?



From stephenamadei at hotmail.com  Thu Nov 20 20:37:01 2008
From: stephenamadei at hotmail.com (Stephen Amadei)
Date: Thu, 20 Nov 2008 15:37:01 -0500
Subject: [Linux-cluster] GFS kernel patch? (and more)
In-Reply-To: <703454072.1529621227207310316.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
References: <COL109-W650CD6BAD011DAC29B1C86BB0C0@phx.gbl>
	<703454072.1529621227207310316.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <COL109-W62958EEEC9CA13573EF49CBB0C0@phx.gbl>


 
Thanks.  I might try putting the GFS1 in the kernel tree... but quite frankly, adding bits to the kernel without a patch is a bit beyond my talents.
 
I've been using GFS2 for a project for some time.  It's GFS2 over DRBD.  The system is production, but the GFS2 filesystem on it isn't.  In fact, left mounted for a few weeks and untouched, it eventually zombies any process that tries to even 'ls' the filesystem... leading to a late night reboot.  Granted, this project has had a lot of side tracking... such as being shipped out with a fubar L2 switch.
 
I am building a second cluster for a separate project, and I can be more daring with this until it has to go production... the reason I'd like to give GFS1 a try.
 
I have used DRBD for years... but GFS/Cluster is still new to me.  To build the GFS2 volume, I followed the LinBit GFS2 documentation.  I recently looked at some of the DRBD_Cookbook and I noticed it uses fencing, whereas LinBit's doc doesn't use it.  My cluster.conf is short and sweet:
 
<?xml version="1.0"?><cluster name="clust" config_version="1"><clusternodes><clusternode name="clust1a" nodeid="1"></clusternode><clusternode name="clust1b" nodeid="2"></clusternode></clusternodes><logging to_stderr="yes"><logger ident="CMAN" debug="on" to_stderr="yes"/></logging></cluster>
 
Have I followed an oversimplified path?  Is it a case where I need the fencing as in the DRBD_Cookbook?  From the perspective of the LinBit GFS2 doc, Cluster is only used to get the DLM and CLVMD running.
 
Thanks in advance.
 
Stephen
> ----- "Stephen Amadei" <stephenamadei at hotmail.com> wrote:> | Am I crazy or was there a GFS kernel patch that was needed to use GFS> | with 2.6.27.x/2.03.09?> | > | Now it looks like GFS is only available as a kernel module. Is there a> | way to patch this into the kernel?> | My kernels don't run modules.> | > | Thanks.> | > | Stephen> > Hi Stephen,> > AFAIK, gfs was always a kernel module. You might be able to patch it in> to the kernel source tree by hand, but I've never done it.> > There was a gfs patch to enable it to allocate files properly for newer> kernels, such as 2.6.27.x. That was just to do a little kernel catchup> in gfs1.> > On the other hand, GFS2 has been a part of the base kernel for a while> now, although it's still not considered production ready. I think the> version currently pulled into Linus's tree is pretty good. Steve> Whitehouse's "nmw" git tree in kernel.org has the latest/greatest GFS2> and should be fairly stable, although it's a moving target.> > Regards,> > Bob Peterson> Red Hat GFS> > --> Linux-cluster mailing list> Linux-cluster at redhat.com> https://www.redhat.com/mailman/listinfo/linux-cluster
_________________________________________________________________
Get more done, have more fun, and stay more connected with Windows Mobile?. 
http://clk.atdmt.com/MRT/go/119642556/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081120/119c389e/attachment.htm>

From jerlyon at gmail.com  Thu Nov 20 21:08:55 2008
From: jerlyon at gmail.com (Jeremy Lyon)
Date: Thu, 20 Nov 2008 14:08:55 -0700
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <61252CC53A97634BA52256DCF2344FBC66CA14595D@OFFICEEXCHANGE.office.smartechcorp.net>
References: <2ca799770811200311o5d4da5a6pdc9b7e5dc9221e65@mail.gmail.com>
	<198800046.1528001227207000769.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
	<61252CC53A97634BA52256DCF2344FBC66CA14595D@OFFICEEXCHANGE.office.smartechcorp.net>
Message-ID: <779919740811201308j5bd3bee7ga63d8352589f74ee@mail.gmail.com>

>
>
>
> Ok, so which is better for a Red Hat cluster?
>

I prefer to do both.  Since they are GFS I want them mounted on all nodes at
boot.  So the entries are in /etc/fstab and /etc/init.d/gfs is enabled.
Then I configure the gfs resources and do not enable the force unmount.  So
the cluster will mount the file system when needed, if for some reason it
was not mounted.  And it will leave the file system as is when the service
switches to another node.

-Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081120/33586f3b/attachment.htm>

From minoritystorm at gmail.com  Fri Nov 21 01:54:35 2008
From: minoritystorm at gmail.com (Brain Stormer)
Date: Fri, 21 Nov 2008 03:54:35 +0200
Subject: [Linux-cluster] Geographical Load-Balancing/High-Availability ?!
Message-ID: <aa0f51a10811201754q55374ec4gbd9ec8f2f446fe80@mail.gmail.com>

Hello,

Could any component from the `Red Hat Cluster Suite` be utilized to achieve
a geographical load-balancing and high-availability system ?

I know how to do load-balancing/high-availability when both nodes are
allocated in the same network however it happens that I have 2 nodes
allocated in different places with different IP pool so I think calling it
geographical is the proper word.

Also if the `Red Hat Cluster Suite` does not have anything related to that,
its might be helpful to know other techniques to achieve such goal.

Any input is appreciated!


Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081121/40d93b96/attachment.htm>

From nlam87346 at library.usyd.edu.au  Fri Nov 21 04:01:44 2008
From: nlam87346 at library.usyd.edu.au (Nikolas Lam)
Date: Fri, 21 Nov 2008 15:01:44 +1100
Subject: [Linux-cluster] Geographical Load-Balancing/High-Availability ?!
In-Reply-To: <aa0f51a10811201754q55374ec4gbd9ec8f2f446fe80@mail.gmail.com>
References: <aa0f51a10811201754q55374ec4gbd9ec8f2f446fe80@mail.gmail.com>
Message-ID: <1227240104.27476.51.camel@zaniah.library.usyd.edu.au>


On Fri, 2008-11-21 at 03:54 +0200, Brain Stormer wrote:
> Hello,
> 
> Could any component from the `Red Hat Cluster Suite` be utilized to
> achieve a geographical load-balancing and high-availability system ?
> 
> I know how to do load-balancing/high-availability when both nodes are
> allocated in the same network however it happens that I have 2 nodes
> allocated in different places with different IP pool so I think
> calling it geographical is the proper word.
> 
> Also if the `Red Hat Cluster Suite` does not have anything related to
> that, its might be helpful to know other techniques to achieve such
> goal.
> 
> Any input is appreciated!

I think the only thing that RHCS can't provide is an LVS director with
layer 3 routing capabilities. You need this in order to migrate the IP
addresses the services are published on between your two geographical
locations. Once you can do this, your directors can encapsulate packets
from the client back to your real servers using the ip-ip kernel module.
My institution runs directors with keepalived and quagga OSPF routers on
them.

The rest can be done with standard RHCS, the only other non-standard
thing you'll have to think about involves the multicasting of cluster
communications. The main gotcha with this is that you have to set up an
iptables rule to mangle the routing TTL of the multicast cluster
communications packets to a large enough number of hops (the default is
1, which means it won't get outside your LAN) to get from one site to
the other via the longest reasonable route.

e.g.

 iptables -A OUTPUT -d <destination_multicast_addr> -j TTL --ttl-set 30

Think about the security of multicast routing these communications as
well.

The new CLVM RAID 1 coming in RHEL5.3 probably means that you can do all
this without expensive proprietary back-end storage mirroring too.

A couple of other points:
* it really helps if you have good control of all the infrastructure
between the two sites
* the risk of split-brain is certainly higher than having all cluster
nodes in the one datacentre.

Regards,

Nik






From mpartio at gmail.com  Fri Nov 21 07:00:37 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Fri, 21 Nov 2008 09:00:37 +0200
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <198800046.1528001227207000769.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
References: <2ca799770811200311o5d4da5a6pdc9b7e5dc9221e65@mail.gmail.com>
	<198800046.1528001227207000769.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Message-ID: <2ca799770811202300u3bf6cecel182b4136c7d632b9@mail.gmail.com>

On Thu, Nov 20, 2008 at 8:50 PM, Bob Peterson <rpeterso at redhat.com> wrote:
>
> Yes, the gfs init script reads fstab.  It's the "as a service"
> qualifier that is confusing.  To clarify: For mounting gfs file
> systems automatically, you have two choices:
>
> 1. Put your gfs mount points into /etc/fstab and enable /etc/init.d/gfs
>   so that it runs at startup.  You can do this on all nodes because
>   gfs is cluster-aware, so all your nodes may mount it at the same time.
> 2. Put your gfs mounts in your cluster configuration (cluster.conf) as
>   services, and let rgmanager take care of mounting and unmounting as
>   necessary on the nodes where you have the service defined.
>   (This may or may not be done via conga/ricci/luci--I always do it by
>   hand).



This was how I had figured it out too, thanks for the clarification.

On a side note: we do exactly like Jeremy Lyon is describing and it seems to
be the easiest and most flexible way of handling the mounts.

Regards

Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081121/4facbe2d/attachment.htm>

From James.McElhone at digital-dispatch.com  Fri Nov 21 09:32:42 2008
From: James.McElhone at digital-dispatch.com (James McElhone)
Date: Fri, 21 Nov 2008 01:32:42 -0800
Subject: [Linux-cluster] ioctl(fd,SIOCGARP,ar [eth0]): No
Message-ID: <285BD43CC501FA47BE712EE90838A0140513A4C3@ddsmail1.digital-dispatch.com>

Hello

 

I have a 2 node RHEL 3.0 running RHCS 3.  We have had a few issues where
the services are failing over for no apparent reason and with logging
level on Normal there has been nothing to go on.

 

I recently increasing logging level to debug and setup logs away from
/var/log/messages to cluster.log.  I have noticed the following errors
on the Passvie node since the change in logging level.  Can anyone
suggest the problem:

 

ioctl(fd,SIOCGARP,ar [eth0]): No such device or address

 

Pinging between both nodes works fine.

 

Thanks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081121/5c5e73c5/attachment.htm>

From minoritystorm at gmail.com  Fri Nov 21 12:30:53 2008
From: minoritystorm at gmail.com (Brain Stormer)
Date: Fri, 21 Nov 2008 14:30:53 +0200
Subject: [Linux-cluster] Geographical Load-Balancing/High-Availability ?!
In-Reply-To: <1227240104.27476.51.camel@zaniah.library.usyd.edu.au>
References: <aa0f51a10811201754q55374ec4gbd9ec8f2f446fe80@mail.gmail.com>
	<1227240104.27476.51.camel@zaniah.library.usyd.edu.au>
Message-ID: <aa0f51a10811210430s29c90c2te5da1c9f9cfec726@mail.gmail.com>

Well,

>From what I understand is that there will be a 3rd party handling the
communication of the Layer 3 routing (if I got you well), however the only
thing available right now is the physical nodes, I am also aware that such
setup would increase thee probability of a split-brain situation, I also did
not understand the part (need this in order to migrate the IP addresses the
services are published on between your two geographical locations), you mean
if a box is dead it will still have the capability to migrate the services
to the other one ?

I am also thinking about about DNS based load-balancing, may be creating DNS
records at different DNS service providers and have each one a different A
record but I'vnt thought about how sessions/cookies/ssl/ftp would be handled
through this way ?


Thanks so much!


On Fri, Nov 21, 2008 at 6:01 AM, Nikolas Lam
<nlam87346 at library.usyd.edu.au>wrote:

>
> On Fri, 2008-11-21 at 03:54 +0200, Brain Stormer wrote:
> > Hello,
> >
> > Could any component from the `Red Hat Cluster Suite` be utilized to
> > achieve a geographical load-balancing and high-availability system ?
> >
> > I know how to do load-balancing/high-availability when both nodes are
> > allocated in the same network however it happens that I have 2 nodes
> > allocated in different places with different IP pool so I think
> > calling it geographical is the proper word.
> >
> > Also if the `Red Hat Cluster Suite` does not have anything related to
> > that, its might be helpful to know other techniques to achieve such
> > goal.
> >
> > Any input is appreciated!
>
> I think the only thing that RHCS can't provide is an LVS director with
> layer 3 routing capabilities. You need this in order to migrate the IP
> addresses the services are published on between your two geographical
> locations. Once you can do this, your directors can encapsulate packets
> from the client back to your real servers using the ip-ip kernel module.
> My institution runs directors with keepalived and quagga OSPF routers on
> them.
>
> The rest can be done with standard RHCS, the only other non-standard
> thing you'll have to think about involves the multicasting of cluster
> communications. The main gotcha with this is that you have to set up an
> iptables rule to mangle the routing TTL of the multicast cluster
> communications packets to a large enough number of hops (the default is
> 1, which means it won't get outside your LAN) to get from one site to
> the other via the longest reasonable route.
>
> e.g.
>
>  iptables -A OUTPUT -d <destination_multicast_addr> -j TTL --ttl-set 30
>
> Think about the security of multicast routing these communications as
> well.
>
> The new CLVM RAID 1 coming in RHEL5.3 probably means that you can do all
> this without expensive proprietary back-end storage mirroring too.
>
> A couple of other points:
> * it really helps if you have good control of all the infrastructure
> between the two sites
> * the risk of split-brain is certainly higher than having all cluster
> nodes in the one datacentre.
>
> Regards,
>
> Nik
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081121/30ae3b61/attachment.htm>

From stephenamadei at hotmail.com  Fri Nov 21 14:16:02 2008
From: stephenamadei at hotmail.com (Stephen Amadei)
Date: Fri, 21 Nov 2008 09:16:02 -0500
Subject: [Linux-cluster] Geographical Load-Balancing/High-Availability ?!
In-Reply-To: <aa0f51a10811210430s29c90c2te5da1c9f9cfec726@mail.gmail.com>
References: <aa0f51a10811201754q55374ec4gbd9ec8f2f446fe80@mail.gmail.com>
	<1227240104.27476.51.camel@zaniah.library.usyd.edu.au> 
	<aa0f51a10811210430s29c90c2te5da1c9f9cfec726@mail.gmail.com>
Message-ID: <COL109-W22C9A8040807CA9CD3A8F5BB0F0@phx.gbl>


Personally, I'm not too keen on DNS solutions.  I use UltraMonkey, which includes LDirectord, and it has provisions for some geographical load balancing between real servers by using tunnels, but to me something isn't geographical load balancing until you are dealing with separate sites handing off to one or the other via internet with no direct connection.  For me, I prefer using BGP to provide this bit of geographic redundancy, but it doesn't really truly load balance.  As Nik noted, the chances of split-brain are increased like this... so I always thought it would be interesting to set up a three-site geographic load balancing setup and use a consensus to determine when the disconnected node should accept that it is offline.
 
Just my thoughts, way too early in the morning.
 
Stephen
 



Date: Fri, 21 Nov 2008 14:30:53 +0200From: minoritystorm at gmail.comTo: linux-cluster at redhat.comSubject: Re: [Linux-cluster] Geographical Load-Balancing/High-Availability ?!
Well,From what I understand is that there will be a 3rd party handling the communication of the Layer 3 routing (if I got you well), however the only thing available right now is the physical nodes, I am also aware that such setup would increase thee probability of a split-brain situation, I also did not understand the part (need this in order to migrate the IP addresses the services are published on between your two geographical locations), you mean if a box is dead it will still have the capability to migrate the services to the other one ?I am also thinking about about DNS based load-balancing, may be creating DNS records at different DNS service providers and have each one a different A record but I'vnt thought about how sessions/cookies/ssl/ftp would be handled through this way ?Thanks so much!
On Fri, Nov 21, 2008 at 6:01 AM, Nikolas Lam <nlam87346 at library.usyd.edu.au> wrote:



On Fri, 2008-11-21 at 03:54 +0200, Brain Stormer wrote:> Hello,>> Could any component from the `Red Hat Cluster Suite` be utilized to> achieve a geographical load-balancing and high-availability system ?>> I know how to do load-balancing/high-availability when both nodes are> allocated in the same network however it happens that I have 2 nodes> allocated in different places with different IP pool so I think> calling it geographical is the proper word.>> Also if the `Red Hat Cluster Suite` does not have anything related to> that, its might be helpful to know other techniques to achieve such> goal.>> Any input is appreciated!I think the only thing that RHCS can't provide is an LVS director withlayer 3 routing capabilities. You need this in order to migrate the IPaddresses the services are published on between your two geographicallocations. Once you can do this, your directors can encapsulate packetsfrom the client back to your real servers using the ip-ip kernel module.My institution runs directors with keepalived and quagga OSPF routers onthem.The rest can be done with standard RHCS, the only other non-standardthing you'll have to think about involves the multicasting of clustercommunications. The main gotcha with this is that you have to set up aniptables rule to mangle the routing TTL of the multicast clustercommunications packets to a large enough number of hops (the default is1, which means it won't get outside your LAN) to get from one site tothe other via the longest reasonable route.e.g. iptables -A OUTPUT -d <destination_multicast_addr> -j TTL --ttl-set 30Think about the security of multicast routing these communications aswell.The new CLVM RAID 1 coming in RHEL5.3 probably means that you can do allthis without expensive proprietary back-end storage mirroring too.A couple of other points:* it really helps if you have good control of all the infrastructurebetween the two sites* the risk of split-brain is certainly higher than having all clusternodes in the one datacentre.Regards,Nik--Linux-cluster mailing listLinux-cluster at redhat.comhttps://www.redhat.com/mailman/listinfo/linux-cluster
_________________________________________________________________
Proud to be a PC? Show the world. Download the ?I?m a PC? Messenger themepack now.
hthttp://clk.atdmt.com/MRT/go/119642558/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081121/18327260/attachment.htm>

From cedwards at smartechcorp.net  Fri Nov 21 14:22:05 2008
From: cedwards at smartechcorp.net (Chris Edwards)
Date: Fri, 21 Nov 2008 09:22:05 -0500
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <779919740811201308j5bd3bee7ga63d8352589f74ee@mail.gmail.com>
References: <2ca799770811200311o5d4da5a6pdc9b7e5dc9221e65@mail.gmail.com>
	<198800046.1528001227207000769.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
	<61252CC53A97634BA52256DCF2344FBC66CA14595D@OFFICEEXCHANGE.office.smartechcorp.net>
	<779919740811201308j5bd3bee7ga63d8352589f74ee@mail.gmail.com>
Message-ID: <61252CC53A97634BA52256DCF2344FBC66CA14597B@OFFICEEXCHANGE.office.smartechcorp.net>

Great!  I can do both.  Can someone explain to me just for clarification purposes why I would define a resource?  What does it do?

---

Chris Edwards


From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeremy Lyon
Sent: Thursday, November 20, 2008 4:09 PM
To: linux clustering
Subject: Re: [Linux-cluster] GFS as a service


Ok, so which is better for a Red Hat cluster?

I prefer to do both.  Since they are GFS I want them mounted on all nodes at boot.  So the entries are in /etc/fstab and /etc/init.d/gfs is enabled.  Then I configure the gfs resources and do not enable the force unmount.  So the cluster will mount the file system when needed, if for some reason it was not mounted.  And it will leave the file system as is when the service switches to another node.

-Jeremy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081121/3b4e0cdc/attachment.htm>

From skadlec at gk-software.com  Fri Nov 21 14:26:43 2008
From: skadlec at gk-software.com (Stepan Kadlec)
Date: Fri, 21 Nov 2008 15:26:43 +0100
Subject: [Linux-cluster] qdiskd locks cluster
Message-ID: <4926C523.3070902@gk-software.com>

hi,
I am running cluster 2.03.08.
after adding qdisk feature to twonode cluster, it somehow locks entire 
cluster. without qdisk it runs ok.

initialization log:

Nov 21 15:15:07 xen01 ccsd[15178]: Starting ccsd 2.03.08:
Nov 21 15:15:07 xen01 ccsd[15178]:  Built: Nov 18 2008 14:18:19
Nov 21 15:15:07 xen01 ccsd[15178]:  Copyright (C) Red Hat, Inc. 
2004-2008  All rights reserved.
Nov 21 15:15:07 xen01 ccsd[15178]:   IP Protocol:: IPv4 only
Nov 21 15:15:07 xen01 ccsd[15178]: /etc/cluster/cluster.conf (cluster 
name = xen, version = 1) found.
Nov 21 15:15:10 xen01 ccsd[15178]: Initial status:: Inquorate
Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> 0 heuristics loaded
Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> Quorum Daemon: 0 
heuristics, 1 interval, 10 tko, 1 votes
Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> Run Flags: 00000031
Nov 21 15:15:22 xen01 qdiskd[15202]: <info> Quorum Partition: 
/dev/disk/by-id/scsi-360a9800068706952464a4b544c704271-part2 Label: xen
Nov 21 15:15:22 xen01 qdiskd[15203]: <info> Quorum Daemon Initializing
Nov 21 15:15:22 xen01 qdiskd[15203]: <debug> I/O Size: 512  Page Size: 4096
Nov 21 15:15:22 xen01 qdiskd[15203]: <debug> Permanently setting score 
to 1/1
Nov 21 15:15:22 xen01 kernel: dlm: closing connection to node 2
Nov 21 15:15:22 xen01 kernel: dlm: closing connection to node 1
Nov 21 15:15:25 xen01 qdiskd[15203]: <debug> Node 2 is UP
Nov 21 15:15:32 xen01 qdiskd[15203]: <info> Initial score 1/1
Nov 21 15:15:32 xen01 qdiskd[15203]: <info> Initialization complete
Nov 21 15:15:32 xen01 qdiskd[15203]: <notice> Score sufficient for 
master operation (1/1; required=1); upgrading
Nov 21 15:15:34 xen01 qdiskd[15203]: <debug> Making bid for master
Nov 21 15:15:38 xen01 qdiskd[15203]: <info> Assuming master role

after this, all cluster tools just hang - cman_tool nodes, clustat, ...

and cluster processes are in locked state:

13124 ?        Ssl    0:00 /sbin/ccsd -4
13129 ?        SLl    0:00 aisexec
13154 ?        Ss     0:00 /sbin/groupd
13157 ?        SLs    0:00 /sbin/qdiskd -Q
13162 ?        Ss     0:00 /sbin/fenced
13167 ?        Ss     0:00 /sbin/dlm_controld

any ideas howto fix that?
thanks stepan.

-- 
Eurosoftware s.r.o.
skadlec at gk-software.com
+420 379 307 379
+420 724 554 104



From stepan.kadlec at gmail.com  Thu Nov 20 20:20:26 2008
From: stepan.kadlec at gmail.com (Stepan Kadlec)
Date: Thu, 20 Nov 2008 21:20:26 +0100
Subject: [Linux-cluster] qdiskd rejected because of inquorate
Message-ID: <4925C68A.8060504@gmail.com>

hello,
I want to establish quorum disk for a two node cluster, but have
following problem:

quorum partition /dev/sdb2 is shared over iSCSI

cluster config:

<cman expected_votes="3" two_node="0"/>
...
   <quorumd device="/dev/sdb2" votes="1">
     <heuristic program="ping storage -c1 -t1" tko="3"/>
   </quorumd>
...

starting the cluster:

# ccsd -4 -n
# cman_tool -t 60 -w join
# qdiskd -f -d
[7858] debug: Loading configuration information
[7858] crit: Connection to CCSD failed; cannot start
[7858] crit: Configuration failed

ccsd reports:

Starting ccsd 2.03.08:
  Built: Nov 18 2008 14:18:19
  Copyright (C) Red Hat, Inc.  2004-2008  All rights reserved.
   IP Protocol:: IPv4 only
   No Daemon:: SET

Initial status:: Inquorate
/etc/cluster/cluster.conf (cluster name = xen, version = 1) found.
Cluster is not quorate.  Refusing connection.
Error while processing connect: Connection refused


so ccsd is rejecting qdiskd because of cluster is inquorate, but this is
strange - how else can it became quorate than getting the qdisk vote???

thanks for advices, stepan




From rpeterso at redhat.com  Fri Nov 21 14:58:22 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Fri, 21 Nov 2008 09:58:22 -0500 (EST)
Subject: [Linux-cluster] GFS as a service
In-Reply-To: <61252CC53A97634BA52256DCF2344FBC66CA14597B@OFFICEEXCHANGE.office.smartechcorp.net>
Message-ID: <2071066239.1765521227279502258.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>

----- "Chris Edwards" <cedwards at smartechcorp.net> wrote:
| Great! I can do both. Can someone explain to me just for clarification
| purposes why I would define a resource? What does it do?
| 
| Chris Edwards

Hi Chris,

The purpose of these "services" in rgmanager is to define and manage
any cluster-wide resource and to make sure that if a node fails,
the service continues running somewhere in the cluster.  In other
words, the ball is handed to any remaining nodes when a failure
occurs.  Running the service then becomes the responsibility of the
entire cluster, or a pre-defined list of nodes in the cluster.  See:

http://sources.redhat.com/cluster/wiki/FAQ/RGManager#rgm_what

and the entries that follow on that page.

Regards,

Bob Peterson
Red Hat GFS



From achievement.hk at gmail.com  Fri Nov 21 16:35:18 2008
From: achievement.hk at gmail.com (Achievement Chan)
Date: Sat, 22 Nov 2008 00:35:18 +0800
Subject: [Linux-cluster] Is GFS2 stable for production system?
In-Reply-To: <68fe87e60811131503m25acf15ei7175c93d6ff95453@mail.gmail.com>
References: <f7f193930811131031l5a2c7e3dsc73514cc28801234@mail.gmail.com>
	<68fe87e60811131503m25acf15ei7175c93d6ff95453@mail.gmail.com>
Message-ID: <f7f193930811210835v41fa6167sf0ea1ba490001bc2@mail.gmail.com>

Dear All,
Thank you for your sharing. I will wait RHEL 5.3 ... or 5.4 for
applying gfs2 in production cluster...



2008/11/14 Diego Liziero <diegoliz at gmail.com>:
> On Thu, Nov 13, 2008 at 7:31 PM, Achievement Chan
> <achievement.hk at gmail.com> wrote:
>> Dear All,
>> Is GFS2 stable for production system?
>
> Not as regards the one in current 5.2.
>
> I had this issues:
>
> - locking with samba running on a single node that caused continuous
> freezes of the shares, even when exported read-only after the latest
> kernel+cman update (this could have been cased by the fact that not
> all nodes have been rebooted after the update). After a reboot of all
> nodes and a switch to the old stable gfs this hasn't happened any
> longer.
>
> - the last modification time of a file is not always updated on all
> nodes (doing an ls of the same file on different nodes may show
> different modification time after it has been edited on a node).
>
> - sometimes the space used by deleted files is not freed. Launching
> gfs2_fsck -y on unmounted filesystem detect lots of "Ondisk status is
> 1 (Data) but FSCK thinks it should be 0 (Free)" messages, but, despite
> that, the space is still not freed.
>
> BTW I've still a corrupted gfs2 empty filesystem that shows incorrect
> free space if someone feels like to debug it.
>
> Regards,
> Diego.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From achievement.hk at gmail.com  Fri Nov 21 16:46:00 2008
From: achievement.hk at gmail.com (Achievement Chan)
Date: Sat, 22 Nov 2008 00:46:00 +0800
Subject: [Linux-cluster] GFS performance of imap service (Maildir)
In-Reply-To: <alpine.DEB.2.00.0811121945400.8800@lxserv1.kfki.hu>
References: <f7f193930811120144h6b5c8bc7p36280f0e17176f1c@mail.gmail.com>
	<alpine.DEB.2.00.0811121945400.8800@lxserv1.kfki.hu>
Message-ID: <f7f193930811210846oe65abbft36594090724b2970@mail.gmail.com>

Hello,
I will change my production system to ext3 for solving the performance problem.
Actually, I have tried GFS2 in testing server and found performance
can be improved to a acceptable level (response within 2 seconds)

However, it still not stable for production system.... and I can't
wait until RHEL 5.3 release...



2008/11/13 Kadlecsik Jozsef <kadlec at sunserv.kfki.hu>:
> Hello,
>
> On Wed, 12 Nov 2008, Achievement Chan wrote:
>
>> For handling a mailbox with 10000 email, it takes 6-8 seconds for
>> waiting response from first "SELECT" command.
>> The response time is also unstable too, sometimes it takes 10-20
>> seconds for the same mailbox.
>>
>> Based some online material, i've tried to tune the gfs. But there are
>> seems no improvement.
>> e.g.
>> gfs_tool setflag inherit_jdata /home/domains
>> gfs_tool  settune /home/domains recoverd_secs 60
>> gfs_tool settune /home/domains glock_purge 50
>> gfs_tool settune /home/domains demote_secs 100
>> gfs_tool settune /home/domains scand_secs 3
>> gfs_tool settune /home/domains max_readahead 262144
>> gfs_tool settune /home/domains statfs_fast 1
>>
>> Has anyone tried to provide imap service in GFS?
>
> We have had exactly the same problems with maildir over GFS. There was no
> tuning whatsoever which helped: the fighting for the locks for every
> file in the maildir costs so much that you cannot expect better
> performance.
>
> The best is to avoid maildir and use simple mailbox format instead. We
> went (back) to mailbox and since then our users have not complained about
> performance.
>
> Best regards,
> Jozsef
> --
> E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> Address: KFKI Research Institute for Particle and Nuclear Physics
>         H-1525 Budapest 114, POB. 49, Hungary
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From jbrassow at redhat.com  Fri Nov 21 17:51:32 2008
From: jbrassow at redhat.com (Jonathan Brassow)
Date: Fri, 21 Nov 2008 11:51:32 -0600
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <1227173594.10020.4.camel@dima-desktop>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
	<1227173594.10020.4.camel@dima-desktop>
Message-ID: <646C5AFC-F606-4AE0-8BC3-34B5D5593BFD@redhat.com>

What does /var/log/messages say?

  brassow

On Nov 20, 2008, at 3:33 AM, ??????? ??????? wrote:

> hello
> i use lvm2  2.02.39
> when i write  "lvcreate  -L 60G -m1 --corelog  -n mirror1 volg1"
> i get error
> "Error locking on node recipient1: device-mapper: reload ioctl failed:
> Invalid argument
> Error locking on node recipient2: device-mapper: reload ioctl
> failed:Invalid argument
> Aborting. Failed to activate new LV to wipe the start of it."
>
> i must update lvm2 or other ?
>
> ? ???, 11/11/2008 ? 20:01 -0600, Jonathan Brassow ?????:
>> Sure.  In fact, if you have access to the red hat 5.3 beta, it is
>> ready there.
>>
>>
>> brassow
>>
>> On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:
>>
>>> can i use cluster raid1 if i get development release from
>>> sources.redhat.com/cluster ?
>>>
>>>
>>> 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann GmbH ]
>>> <mad at wol.de>
>>>        Hello,
>>>
>>>
>>>        will the changes to mirroring get merged into stable2 and
>>>        head after
>>>        RHEL-5.3 release?
>>>
>>>
>>>        Marc
>>>
>>>        Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan
>>>        Brassow:
>>>
>>>> that works already.
>>>>
>>>> single machine: linear, stripe, mirror, snapshot
>>>> cluster-aware: linear, stripe, mirror (5.3)
>>>>
>>>>  brassow
>>>>
>>>> On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
>>>>
>>>>> What about CLVM based striping (RAID0)? Does that work
>>>        already or is
>>>>> it planned for the near future?
>>>>>
>>>>> Gordan
>>>>>
>>>>> Jonathan Brassow wrote:
>>>>>> Cluster mirror (RAID1) will be available in rhel5.3 for
>>>        LVM.
>>>>>> brassow
>>>>>> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
>>>>>>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic
>>>        <gordan at bobich.net>
>>>>>>> wrote:
>>>>>>>> I rather doubt md will become cluster aware any time
>>>        soon. CLVM
>>>>>>>> doesn't yet
>>>>>>>> support even more important features like
>>>        snapshotting, so I
>>>>>>>> wouldn't count
>>>>>>>> on it supporting anything more advanced.
>>>>>>>
>>>>>>> I worked a little on clvm snapshots:
>>>>>>>
>>>        https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
>>>>>>>
>>>>>>> Review and testing is required.
>>>>>>> --
>>>>>>> Federico.
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>>        --
>>>
>>>        Linux-cluster mailing list
>>>        Linux-cluster at redhat.com
>>>        https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From andremailinglist at gmail.com  Sat Nov 22 02:55:52 2008
From: andremailinglist at gmail.com (Andrew Hole)
Date: Sat, 22 Nov 2008 02:55:52 +0000
Subject: [Linux-cluster] Red Hat Cluster 5.2
Message-ID: <640358900811211855s1922d3a9me3b7bb48dcb88a71@mail.gmail.com>

Hi guys!

I'm using Red Hat Cluster 5.2. and I cannot define check interval for a
specific script. How can I do that? In previous versions I defined check
interval using system configuration tool.

[root at alertapp01 tmp]# rpm -qa | grep rgmanager
rgmanager-2.0.38-2.el5_2.1
[root at alertapp01 tmp]# rpm -qa | grep cman
cman-2.0.84-2.el5_2.1

Thanks a lot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081122/582ef29b/attachment.htm>

From andremailinglist at gmail.com  Sat Nov 22 03:37:38 2008
From: andremailinglist at gmail.com (Andrew Hole)
Date: Sat, 22 Nov 2008 03:37:38 +0000
Subject: [Linux-cluster] Red Hat Cluster 5.2
Message-ID: <640358900811211937h25ef239ak6ad72bb2aec3236b@mail.gmail.com>

Hello!

How often does Red Hat Cluster 5.2 check the status of services and how can
I configure it?

Thanks a lot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081122/fbc6b600/attachment.htm>

From Harri.Paivaniemi at tietoenator.com  Sat Nov 22 04:37:34 2008
From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com)
Date: Sat, 22 Nov 2008 06:37:34 +0200
Subject: [Linux-cluster] Red Hat Cluster 5.2
References: <640358900811211937h25ef239ak6ad72bb2aec3236b@mail.gmail.com>
Message-ID: <41E8D4F07FCE154CBEBAA60FFC92F67709FE59@apollo.eu.tieto.com>

Hi,

At least by changing the timeout --> /usr/share/cluster/-scripts. Maby there's a better way...

I have used this method when using standard init script- services (it's script.sh out there...)


-hjp



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Andrew Hole
Sent: Sat 11/22/2008 5:37
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Red Hat Cluster 5.2
 
Hello!

How often does Red Hat Cluster 5.2 check the status of services and how can
I configure it?

Thanks a lot

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2704 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081122/2570fa5d/attachment.bin>

From hicheerup at gmail.com  Sun Nov 23 14:55:52 2008
From: hicheerup at gmail.com (linux-crazy)
Date: Sun, 23 Nov 2008 20:25:52 +0530
Subject: [Linux-cluster] Cluster fail over database getting stopped
Message-ID: <29e045b80811230655q48c1a048m7d1416951abab99c@mail.gmail.com>

Hi,

  I am running RHEL3u8  two node cluster,which is running oracle 9i
database.I am facing problem while rebooting second node causing my
oracle database get stopped in the active node 1 which is running my
database.so i checked below probabilities to find out when the
database get stopped.

Version
clumanager-1.2.31-1.x86_64.rpm

I stopped both the node.

started first node

when the clumanager started during boot  cycle on node 2  database
running on  node 1 get stopped(checked my oracle alert log  telling
the database get stopped exactly at the same time clumanager started
on node 2)

After that When i run clustat on node 1 its telling the
service(database) is running.

I am using /etc/init.d/scriptdb.sh in my cluster config file which is
having both start.stop and status check.

test 2:

I stopped both the node and started the node 2 first and waited for 30 minutes.

oracle was up and running by default on node 2  (clumanger started
oracle service)

started  node 1 after 20 minutes

when the clumanager started during boot cycle on node 1  database
running on  node 2 get stopped(checked my oracle alert log  telling
the database get stopped exactly at the same time clumanager started
on node 1)

After that When i run clustat on node 2 its telling the
service(database) is running.

Test2:

 If cluster  relocate the service automatically by itself form node 1
to node 2 or node 2 to node 1 for some reason during the critical day
time my database is not getting up during fail over on both the nodes.

Test 3:

If i manually  relocate the service from node 1 to node 2 and vice
versa my database is not getting stopped and it is working fine.

 Please some one help me to fix out this issue ,it is my critical
production database.

Below is my cluster config file

cluster.xml

<?xml version="1.0"?>
<cluconfig version="3.0">
  <clumembd broadcast="yes" interval="1000000" loglevel="5"
multicast="no" multicast_ipaddress="" thread="yes" tko_count="25"/>
  <cluquorumd loglevel="7" pinginterval="5" tiebreaker_ip=""/>
  <clurmtabd loglevel="7" pollinterval="4"/>
  <clusvcmgrd loglevel="7"/>
  <clulockd loglevel="7"/>
  <cluster config_viewnumber="4"
key="6672bc0a71be2ec9486f6a2f5846c172" name="DBCLUSTER"/>
  <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1"
rawshadow="/dev/raw/raw2" type="raw"/>
  <members>
    <member id="0" name="cluster1" watchdog="yes"/>
    <member id="1" name="cluster2" watchdog="yes"/>
  </members>
  <services>
    <service checkinterval="10" failoverdomain="oracle_db" id="0"
maxfalsestarts="0" maxrestarts="0" name="database"
userscript="/etc/init.d/script_db.sh">
      <service_ipaddresses>
        <service_ipaddress broadcast="None" id="0"
ipaddress="20.2.135.35" monitor_link="1" netmask="255.255.0.0"/>
      </service_ipaddresses>
      <device id="0" name="/dev/cciss/c0d0p1" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol1"
options="rw"/>
      </device>
      <device id="1" name="/dev/cciss/c0d0p2" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol2"
options="rw"/>
      </device>
      <device id="2" name="/dev/cciss/c0d0p5" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol3"
options="rw"/>
      </device>
      <device id="3" name="/dev/cciss/c0d0p6" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol4"
options="rw"/>
      </device>
      <device id="4" name="/dev/cciss/c0d0p7" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol5"
options="rw"/>
      </device>
      <device id="5" name="/dev/cciss/c0d0p8" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol6"
options="rw"/>
      </device>
      <device id="6" name="/dev/cciss/c0d0p9" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol7"
options="rw"/>
      </device>
      <device id="7" name="/dev/cciss/c0d0p10" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol8"
options="rw"/>
      </device>
    </service>
  </services>
  <failoverdomains>
    <failoverdomain id="0" name="oracle_db" ordered="no" restricted="yes">
      <failoverdomainnode id="0" name="cluster1"/>
      <failoverdomainnode id="1" name="cluster2"/>
    </failoverdomain>
  </failoverdomains>
</cluconfig>



Regards,
crazy pap



From andremailinglist at gmail.com  Sun Nov 23 21:11:55 2008
From: andremailinglist at gmail.com (Andrew Hole)
Date: Sun, 23 Nov 2008 21:11:55 +0000
Subject: [Linux-cluster] Red Hat Cluster 5.2
In-Reply-To: <41E8D4F07FCE154CBEBAA60FFC92F67709FE59@apollo.eu.tieto.com>
References: <640358900811211937h25ef239ak6ad72bb2aec3236b@mail.gmail.com>
	<41E8D4F07FCE154CBEBAA60FFC92F67709FE59@apollo.eu.tieto.com>
Message-ID: <640358900811231311n3a1b620o755c3ae2380a94cb@mail.gmail.com>

I do not see anything happen after the service started.
 I kill the service process (kill -9)  which is in the cluster, and it does
not detect anything.

Could you help me?



2008/11/22 <Harri.Paivaniemi at tietoenator.com>

> Hi,
>
> At least by changing the timeout --> /usr/share/cluster/-scripts. Maby
> there's a better way...
>
> I have used this method when using standard init script- services (it's
> script.sh out there...)
>
>
> -hjp
>
>
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com on behalf of Andrew Hole
> Sent: Sat 11/22/2008 5:37
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] Red Hat Cluster 5.2
>
> Hello!
>
> How often does Red Hat Cluster 5.2 check the status of services and how can
> I configure it?
>
> Thanks a lot
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081123/c4fae243/attachment.htm>

From d.vasilets at peterhost.ru  Mon Nov 24 08:46:24 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Mon, 24 Nov 2008 11:46:24 +0300
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <646C5AFC-F606-4AE0-8BC3-34B5D5593BFD@redhat.com>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
	<1227173594.10020.4.camel@dima-desktop>
	<646C5AFC-F606-4AE0-8BC3-34B5D5593BFD@redhat.com>
Message-ID: <1227516384.8167.1.camel@dima-desktop>

Hello
This is /var/log/messages 
"Nov 20 07:57:14 recipient1 kernel: device-mapper: dirty region log:
Module for logging type "clustered-core" not found.
Nov 20 07:57:14 recipient1 kernel: device-mapper: table: 253:2: mirror:
Error creating mirror dirty log
Nov 20 07:57:14 recipient1 kernel: device-mapper: ioctl: error adding
target to table
Nov 20 08:43:37 recipient1 kernel: device-mapper: dirty region log:
Module for logging type "clustered-core" not found.
Nov 20 08:43:37 recipient1 kernel: device-mapper: table: 253:2: mirror:
Error creating mirror dirty log
Nov 20 08:43:37 recipient1 kernel: device-mapper: ioctl: error adding
target to table
Nov 20 08:47:05 recipient1 kernel: device-mapper: dirty region log:
Module for logging type "clustered-core" not found.
Nov 20 08:47:05 recipient1 kernel: device-mapper: table: 253:2: mirror:
Error creating mirror dirty log
Nov 20 08:47:05 recipient1 kernel: device-mapper: ioctl: error adding
target to table
"

? ???, 21/11/2008 ? 11:51 -0600, Jonathan Brassow ?????:
> What does /var/log/messages say?
> 
>   brassow
> 
> On Nov 20, 2008, at 3:33 AM, ??????? ??????? wrote:
> 
> > hello
> > i use lvm2  2.02.39
> > when i write  "lvcreate  -L 60G -m1 --corelog  -n mirror1 volg1"
> > i get error
> > "Error locking on node recipient1: device-mapper: reload ioctl failed:
> > Invalid argument
> > Error locking on node recipient2: device-mapper: reload ioctl
> > failed:Invalid argument
> > Aborting. Failed to activate new LV to wipe the start of it."
> >
> > i must update lvm2 or other ?
> >
> > ? ???, 11/11/2008 ? 20:01 -0600, Jonathan Brassow ?????:
> >> Sure.  In fact, if you have access to the red hat 5.3 beta, it is
> >> ready there.
> >>
> >>
> >> brassow
> >>
> >> On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:
> >>
> >>> can i use cluster raid1 if i get development release from
> >>> sources.redhat.com/cluster ?
> >>>
> >>>
> >>> 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann GmbH ]
> >>> <mad at wol.de>
> >>>        Hello,
> >>>
> >>>
> >>>        will the changes to mirroring get merged into stable2 and
> >>>        head after
> >>>        RHEL-5.3 release?
> >>>
> >>>
> >>>        Marc
> >>>
> >>>        Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan
> >>>        Brassow:
> >>>
> >>>> that works already.
> >>>>
> >>>> single machine: linear, stripe, mirror, snapshot
> >>>> cluster-aware: linear, stripe, mirror (5.3)
> >>>>
> >>>>  brassow
> >>>>
> >>>> On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
> >>>>
> >>>>> What about CLVM based striping (RAID0)? Does that work
> >>>        already or is
> >>>>> it planned for the near future?
> >>>>>
> >>>>> Gordan
> >>>>>
> >>>>> Jonathan Brassow wrote:
> >>>>>> Cluster mirror (RAID1) will be available in rhel5.3 for
> >>>        LVM.
> >>>>>> brassow
> >>>>>> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> >>>>>>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic
> >>>        <gordan at bobich.net>
> >>>>>>> wrote:
> >>>>>>>> I rather doubt md will become cluster aware any time
> >>>        soon. CLVM
> >>>>>>>> doesn't yet
> >>>>>>>> support even more important features like
> >>>        snapshotting, so I
> >>>>>>>> wouldn't count
> >>>>>>>> on it supporting anything more advanced.
> >>>>>>>
> >>>>>>> I worked a little on clvm snapshots:
> >>>>>>>
> >>>        https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
> >>>>>>>
> >>>>>>> Review and testing is required.
> >>>>>>> --
> >>>>>>> Federico.
> >>>>>>>
> >>>>>>> --
> >>>>>>> Linux-cluster mailing list
> >>>>>>> Linux-cluster at redhat.com
> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>> --
> >>>>>> Linux-cluster mailing list
> >>>>>> Linux-cluster at redhat.com
> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>
> >>>>> --
> >>>>> Linux-cluster mailing list
> >>>>> Linux-cluster at redhat.com
> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>
> >>>> --
> >>>> Linux-cluster mailing list
> >>>> Linux-cluster at redhat.com
> >>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>
> >>>
> >>>        --
> >>>
> >>>        Linux-cluster mailing list
> >>>        Linux-cluster at redhat.com
> >>>        https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From hlawatschek at atix.de  Mon Nov 24 11:26:00 2008
From: hlawatschek at atix.de (Mark Hlawatschek)
Date: Mon, 24 Nov 2008 12:26:00 +0100
Subject: [Linux-cluster] GFS performance of imap service (Maildir)
In-Reply-To: <f7f193930811210846oe65abbft36594090724b2970@mail.gmail.com>
References: <f7f193930811120144h6b5c8bc7p36280f0e17176f1c@mail.gmail.com>
	<alpine.DEB.2.00.0811121945400.8800@lxserv1.kfki.hu>
	<f7f193930811210846oe65abbft36594090724b2970@mail.gmail.com>
Message-ID: <200811241226.02443.hlawatschek@atix.de>

Hi,

could you strace your imapd process ? Please add the -T option to print the 
time the process spent in the system calls.

-Mark

On Friday 21 November 2008 17:46:00 Achievement Chan wrote:
> Hello,
> I will change my production system to ext3 for solving the performance
> problem. Actually, I have tried GFS2 in testing server and found
> performance can be improved to a acceptable level (response within 2
> seconds)
>
> However, it still not stable for production system.... and I can't
> wait until RHEL 5.3 release...
>
> 2008/11/13 Kadlecsik Jozsef <kadlec at sunserv.kfki.hu>:
> > Hello,
> >
> > On Wed, 12 Nov 2008, Achievement Chan wrote:
> >> For handling a mailbox with 10000 email, it takes 6-8 seconds for
> >> waiting response from first "SELECT" command.
> >> The response time is also unstable too, sometimes it takes 10-20
> >> seconds for the same mailbox.
> >>
> >> Based some online material, i've tried to tune the gfs. But there are
> >> seems no improvement.
> >> e.g.
> >> gfs_tool setflag inherit_jdata /home/domains
> >> gfs_tool  settune /home/domains recoverd_secs 60
> >> gfs_tool settune /home/domains glock_purge 50
> >> gfs_tool settune /home/domains demote_secs 100
> >> gfs_tool settune /home/domains scand_secs 3
> >> gfs_tool settune /home/domains max_readahead 262144
> >> gfs_tool settune /home/domains statfs_fast 1
> >>
> >> Has anyone tried to provide imap service in GFS?
> >
> > We have had exactly the same problems with maildir over GFS. There was no
> > tuning whatsoever which helped: the fighting for the locks for every
> > file in the maildir costs so much that you cannot expect better
> > performance.
> >
> > The best is to avoid maildir and use simple mailbox format instead. We
> > went (back) to mailbox and since then our users have not complained about
> > performance.
> >
> > Best regards,
> > Jozsef
> > --
> > E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
> > PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> > Address: KFKI Research Institute for Particle and Nuclear Physics
> >         H-1525 Budapest 114, POB. 49, Hungary
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



-- 
Gruss / Regards,

Dipl.-Ing. Mark Hlawatschek
http://www.atix.de/
http://www.open-sharedroot.org/

**
ATIX Informationstechnologie und Consulting AG
Einsteinstr. 10 
85716 Unterschleissheim
Deutschland/Germany




From d.vasilets at peterhost.ru  Mon Nov 24 12:56:36 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Mon, 24 Nov 2008 15:56:36 +0300
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <1227516384.8167.1.camel@dima-desktop>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
	<1227173594.10020.4.camel@dima-desktop>
	<646C5AFC-F606-4AE0-8BC3-34B5D5593BFD@redhat.com>
	<1227516384.8167.1.camel@dima-desktop>
Message-ID: <1227531396.14886.2.camel@dima-desktop>

i get many errors when i try compile dm-clog.c dm-clog-tfr.c from
cluster 2.03.09


../cluster/cmirror-kernel/src/dm-clog.c:583: error: unknown field
'clear_region' specified in initializer
../cluster/cmirror-kernel/src/dm-clog.c:583: warning: excess elements in
struct initializer
../cluster/cmirror-kernel/src/dm-clog.c:583: warning: (near
initialization for '_clustered_disk_type')
../cluster/cmirror-kernel/src/dm-clog.c:584: error: unknown field
'get_resync_work' specified in initializer
../cluster/cmirror-kernel/src/dm-clog.c:584: warning: excess elements in
struct initializer
../cluster/cmirror-kernel/src/dm-clog.c:584: warning: (near
initialization for '_clustered_disk_type')
../cluster/cmirror-kernel/src/dm-clog.c:585: error: unknown field
'set_region_sync' specified in initializer
../cluster/cmirror-kernel/src/dm-clog.c:585: warning: excess elements in
struct initializer
../cluster/cmirror-kernel/src/dm-clog.c:585: warning: (near
initialization for '_clustered_disk_type')
../cluster/cmirror-kernel/src/dm-clog.c:586: error: unknown field
'get_sync_count' specified in initializer
../cluster/cmirror-kernel/src/dm-clog.c:586: warning: excess elements in
struct initializer
../cluster/cmirror-kernel/src/dm-clog.c:586: warning: (near
initialization for '_clustered_disk_type')
../cluster/cmirror-kernel/src/dm-clog.c:587: error: unknown field
'status' specified in initializer
../cluster/cmirror-kernel/src/dm-clog.c:587: warning: excess elements in
struct initializer
../cluster/cmirror-kernel/src/dm-clog.c:587: warning: (near
initialization for '_clustered_disk_type')
../cluster/cmirror-kernel/src/dm-clog.c:588: error: unknown field
'get_failure_response' specified in initializer
../cluster/cmirror-kernel/src/dm-clog.c:588: warning: excess elements in
struct initializer
../cluster/cmirror-kernel/src/dm-clog.c:588: warning: (near
initialization for '_clustered_disk_type')
../cluster/cmirror-kernel/src/dm-clog.c:591: error: expected '=', ',',
';', 'asm' or '__attribute__' before 'cluster_dirty_log_init'
../cluster/cmirror-kernel/src/dm-clog.c:619: error: expected '=', ',',
';', 'asm' or '__attribute__' before 'cluster_dirty_log_exit

? ???, 24/11/2008 ? 11:46 +0300, ??????? ??????? ?????:
> Hello
> This is /var/log/messages 
> "Nov 20 07:57:14 recipient1 kernel: device-mapper: dirty region log:
> Module for logging type "clustered-core" not found.
> Nov 20 07:57:14 recipient1 kernel: device-mapper: table: 253:2: mirror:
> Error creating mirror dirty log
> Nov 20 07:57:14 recipient1 kernel: device-mapper: ioctl: error adding
> target to table
> Nov 20 08:43:37 recipient1 kernel: device-mapper: dirty region log:
> Module for logging type "clustered-core" not found.
> Nov 20 08:43:37 recipient1 kernel: device-mapper: table: 253:2: mirror:
> Error creating mirror dirty log
> Nov 20 08:43:37 recipient1 kernel: device-mapper: ioctl: error adding
> target to table
> Nov 20 08:47:05 recipient1 kernel: device-mapper: dirty region log:
> Module for logging type "clustered-core" not found.
> Nov 20 08:47:05 recipient1 kernel: device-mapper: table: 253:2: mirror:
> Error creating mirror dirty log
> Nov 20 08:47:05 recipient1 kernel: device-mapper: ioctl: error adding
> target to table
> "
> 
> ? ???, 21/11/2008 ? 11:51 -0600, Jonathan Brassow ?????:
> > What does /var/log/messages say?
> > 
> >   brassow
> > 
> > On Nov 20, 2008, at 3:33 AM, ??????? ??????? wrote:
> > 
> > > hello
> > > i use lvm2  2.02.39
> > > when i write  "lvcreate  -L 60G -m1 --corelog  -n mirror1 volg1"
> > > i get error
> > > "Error locking on node recipient1: device-mapper: reload ioctl failed:
> > > Invalid argument
> > > Error locking on node recipient2: device-mapper: reload ioctl
> > > failed:Invalid argument
> > > Aborting. Failed to activate new LV to wipe the start of it."
> > >
> > > i must update lvm2 or other ?
> > >
> > > ? ???, 11/11/2008 ? 20:01 -0600, Jonathan Brassow ?????:
> > >> Sure.  In fact, if you have access to the red hat 5.3 beta, it is
> > >> ready there.
> > >>
> > >>
> > >> brassow
> > >>
> > >> On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:
> > >>
> > >>> can i use cluster raid1 if i get development release from
> > >>> sources.redhat.com/cluster ?
> > >>>
> > >>>
> > >>> 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann GmbH ]
> > >>> <mad at wol.de>
> > >>>        Hello,
> > >>>
> > >>>
> > >>>        will the changes to mirroring get merged into stable2 and
> > >>>        head after
> > >>>        RHEL-5.3 release?
> > >>>
> > >>>
> > >>>        Marc
> > >>>
> > >>>        Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan
> > >>>        Brassow:
> > >>>
> > >>>> that works already.
> > >>>>
> > >>>> single machine: linear, stripe, mirror, snapshot
> > >>>> cluster-aware: linear, stripe, mirror (5.3)
> > >>>>
> > >>>>  brassow
> > >>>>
> > >>>> On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
> > >>>>
> > >>>>> What about CLVM based striping (RAID0)? Does that work
> > >>>        already or is
> > >>>>> it planned for the near future?
> > >>>>>
> > >>>>> Gordan
> > >>>>>
> > >>>>> Jonathan Brassow wrote:
> > >>>>>> Cluster mirror (RAID1) will be available in rhel5.3 for
> > >>>        LVM.
> > >>>>>> brassow
> > >>>>>> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> > >>>>>>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic
> > >>>        <gordan at bobich.net>
> > >>>>>>> wrote:
> > >>>>>>>> I rather doubt md will become cluster aware any time
> > >>>        soon. CLVM
> > >>>>>>>> doesn't yet
> > >>>>>>>> support even more important features like
> > >>>        snapshotting, so I
> > >>>>>>>> wouldn't count
> > >>>>>>>> on it supporting anything more advanced.
> > >>>>>>>
> > >>>>>>> I worked a little on clvm snapshots:
> > >>>>>>>
> > >>>        https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
> > >>>>>>>
> > >>>>>>> Review and testing is required.
> > >>>>>>> --
> > >>>>>>> Federico.
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Linux-cluster mailing list
> > >>>>>>> Linux-cluster at redhat.com
> > >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >>>>>> --
> > >>>>>> Linux-cluster mailing list
> > >>>>>> Linux-cluster at redhat.com
> > >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >>>>>
> > >>>>> --
> > >>>>> Linux-cluster mailing list
> > >>>>> Linux-cluster at redhat.com
> > >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >>>>
> > >>>> --
> > >>>> Linux-cluster mailing list
> > >>>> Linux-cluster at redhat.com
> > >>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >>>
> > >>>
> > >>>        --
> > >>>
> > >>>        Linux-cluster mailing list
> > >>>        Linux-cluster at redhat.com
> > >>>        https://www.redhat.com/mailman/listinfo/linux-cluster
> > >>>
> > >>>
> > >>> --
> > >>> Linux-cluster mailing list
> > >>> Linux-cluster at redhat.com
> > >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >>
> > >>
> > >> --
> > >> Linux-cluster mailing list
> > >> Linux-cluster at redhat.com
> > >> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From skadlec at gk-software.com  Mon Nov 24 13:14:30 2008
From: skadlec at gk-software.com (Stepan Kadlec)
Date: Mon, 24 Nov 2008 14:14:30 +0100
Subject: [Linux-cluster] Re: UPDATE qdiskd locks cluster
In-Reply-To: <4926C523.3070902@gk-software.com>
References: <4926C523.3070902@gk-software.com>
Message-ID: <492AA8B6.6050104@gk-software.com>

still can't make the qdisk working - after 5-10 seconds of qdiskd start 
it locks awaiting some communication:

some more debugging of this issue:

the strace of qdiskd when getting locked:

lseek(6, 65536, SEEK_SET)               = 65536
read(6, "\36\273\336\0`\224\213P\2265\v\3P\0\0\0\0\0\0\0\0\0\0\0"..., 
512) = 512
select(4, [3], NULL, NULL, {0, 0})      = 0 (Timeout)
writev(3, [{"NAMC\3\0\0\20\30\0\0\0\267\0\0\200\0\0\0\0", 20}, 
{"\1\0\0\0", 4}], 2) = 24
recvfrom(3,

and gdb stack trace in locked state

#0  0x00007f2e642adc75 in recv () from /lib64/libpthread.so.0
#1  0x00007f2e643bba81 in cman_dispatch (handle=0x50e010, flags=26) at 
/usr/src/packages/BUILD/cluster-2.03.09/cman/lib/libcman.c:501
#2  0x00007f2e643bbc7b in info_call (h=0x50e010, msgtype=<value 
optimized out>, inbuf=<value optimized out>, inlen=<value optimized 
out>, outbuf=0x0, outlen=0)
     at /usr/src/packages/BUILD/cluster-2.03.09/cman/lib/libcman.c:59
#3  0x00007f2e643bc07a in cman_poll_quorum_device (handle=0x6, 
isavailable=1) at 
/usr/src/packages/BUILD/cluster-2.03.09/cman/lib/libcman.c:1016
#4  0x0000000000406d6f in quorum_loop (ctx=0x7fff6c5d5a10, 
ni=0x7fff6c5d5110, max=16) at 
/usr/src/packages/BUILD/cluster-2.03.09/cman/qdisk/main.c:985
#5  0x0000000000407b50 in main (argc=<value optimized out>, 
argv=0x7fff6c5d6508) at 
/usr/src/packages/BUILD/cluster-2.03.09/cman/qdisk/main.c:1540

what can be wrong?
thanks stepan


Stepan Kadlec wrote:
> hi,
> I am running cluster 2.03.08.
> after adding qdisk feature to twonode cluster, it somehow locks entire 
> cluster. without qdisk it runs ok.
> 
> initialization log:
> 
> Nov 21 15:15:07 xen01 ccsd[15178]: Starting ccsd 2.03.08:
> Nov 21 15:15:07 xen01 ccsd[15178]:  Built: Nov 18 2008 14:18:19
> Nov 21 15:15:07 xen01 ccsd[15178]:  Copyright (C) Red Hat, Inc. 
> 2004-2008  All rights reserved.
> Nov 21 15:15:07 xen01 ccsd[15178]:   IP Protocol:: IPv4 only
> Nov 21 15:15:07 xen01 ccsd[15178]: /etc/cluster/cluster.conf (cluster 
> name = xen, version = 1) found.
> Nov 21 15:15:10 xen01 ccsd[15178]: Initial status:: Inquorate
> Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> 0 heuristics loaded
> Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> Quorum Daemon: 0 
> heuristics, 1 interval, 10 tko, 1 votes
> Nov 21 15:15:22 xen01 qdiskd[15202]: <debug> Run Flags: 00000031
> Nov 21 15:15:22 xen01 qdiskd[15202]: <info> Quorum Partition: 
> /dev/disk/by-id/scsi-360a9800068706952464a4b544c704271-part2 Label: xen
> Nov 21 15:15:22 xen01 qdiskd[15203]: <info> Quorum Daemon Initializing
> Nov 21 15:15:22 xen01 qdiskd[15203]: <debug> I/O Size: 512  Page Size: 4096
> Nov 21 15:15:22 xen01 qdiskd[15203]: <debug> Permanently setting score 
> to 1/1
> Nov 21 15:15:22 xen01 kernel: dlm: closing connection to node 2
> Nov 21 15:15:22 xen01 kernel: dlm: closing connection to node 1
> Nov 21 15:15:25 xen01 qdiskd[15203]: <debug> Node 2 is UP
> Nov 21 15:15:32 xen01 qdiskd[15203]: <info> Initial score 1/1
> Nov 21 15:15:32 xen01 qdiskd[15203]: <info> Initialization complete
> Nov 21 15:15:32 xen01 qdiskd[15203]: <notice> Score sufficient for 
> master operation (1/1; required=1); upgrading
> Nov 21 15:15:34 xen01 qdiskd[15203]: <debug> Making bid for master
> Nov 21 15:15:38 xen01 qdiskd[15203]: <info> Assuming master role
> 
> after this, all cluster tools just hang - cman_tool nodes, clustat, ...
> 
> and cluster processes are in locked state:
> 
> 13124 ?        Ssl    0:00 /sbin/ccsd -4
> 13129 ?        SLl    0:00 aisexec
> 13154 ?        Ss     0:00 /sbin/groupd
> 13157 ?        SLs    0:00 /sbin/qdiskd -Q
> 13162 ?        Ss     0:00 /sbin/fenced
> 13167 ?        Ss     0:00 /sbin/dlm_controld
> 
> any ideas howto fix that?
> thanks stepan.
> 

-- 
Eurosoftware s.r.o.
skadlec at gk-software.com
+420 379 307 379
+420 724 554 104



From martin.fuerstenau at oce.com  Mon Nov 24 15:24:27 2008
From: martin.fuerstenau at oce.com (Martin Fuerstenau)
Date: Mon, 24 Nov 2008 16:24:27 +0100
Subject: [Linux-cluster] post_fail_delay
In-Reply-To: <1227516384.8167.1.camel@dima-desktop>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
	<1227173594.10020.4.camel@dima-desktop>
	<646C5AFC-F606-4AE0-8BC3-34B5D5593BFD@redhat.com>
	<1227516384.8167.1.camel@dima-desktop>
Message-ID: <1227540267.4325.46.camel@lx002140.ops.de>

Hi all,

can post_fail_delay been set to different values to avoid the
simultanously fencing of 2 nodes in a 2 node cluster? In theory it could
be done using -f for fenced. But is it wise to do so? If I use -f what
have i to place in cluster.conf? We hat the problem whit a short failure
in the network. Both shot, both died. 

Thx in advance
Martin Fuerstenau

Senior System Engineer

Oc? Printing Systems GmbH




This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law.

If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.

If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message.

Thank you for your co-operation.





From dwm at doc.ic.ac.uk  Mon Nov 24 16:13:56 2008
From: dwm at doc.ic.ac.uk (David McBride)
Date: Mon, 24 Nov 2008 16:13:56 +0000
Subject: [Linux-cluster] GFS performance of imap service (Maildir)
In-Reply-To: <f7f193930811120144h6b5c8bc7p36280f0e17176f1c@mail.gmail.com>
References: <f7f193930811120144h6b5c8bc7p36280f0e17176f1c@mail.gmail.com>
Message-ID: <1227543236.17917.13.camel@nighthawk.lair.dwm.me.uk>

On Wed, 2008-11-12 at 17:44 +0800, Achievement Chan wrote:

> I've setup a courier-imap server which store the email data in Maildir format.
> The mailbox are saved under a LUN in ISCSI SAN.

I have no experience running the Courier IMAP server.  However, the
Dovecot[0] IMAP/POP server implementation -- which maintains its own
indexes of mbox / maildir contents -- may provide better performance in
this configuration.

Cheers,
David

[0] http://www.dovecot.org/
-- 
David McBride <dwm at doc.ic.ac.uk>
Department of Computing, Imperial College, London
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081124/219e9d8d/attachment.sig>

From neuroticimbecile at yahoo.com  Mon Nov 24 17:02:12 2008
From: neuroticimbecile at yahoo.com (eric rosel)
Date: Mon, 24 Nov 2008 09:02:12 -0800 (PST)
Subject: [Linux-cluster] Unable to retrieve batch 1493885544 status from
	x.x.x.86:11111: module scheduled for execution
Message-ID: <985917.89277.qm@web53211.mail.re2.yahoo.com>

Hi All,

I'm trying to set up a 2-node cluster using luci.  As of now, I have only configured it with a single service, with a single resource: an IP address, so I could test if the IP address fails over to the other node.  So far, it doesn't.

In /var/log/messages of the first node, it says: "Unable to retrieve batch 1493885544 status from x.x.x.86:11111: module scheduled for execution"


It seems that each node is unaware of the other, "cman_tool nodes" says, respectively:

===<snip>===
Node  Sts   Inc   Joined               Name
   1   M     48   2008-11-24 23:46:04  x.x.x.85
   2   X      0                        x.x.x.86
===<snip>===
Node  Sts   Inc   Joined               Name
   1   X      0                        x.x.x.85
   2   M     72   2008-11-24 23:32:43  x.x.x.86
===<snip>===


My /etc/cluster/cluster.conf contains:
===<snip>===
<?xml version="1.0"?>
<cluster alias="binary.cluster" config_version="18" name="binary.cluster">
        <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="202.81.160.85" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="202.81.160.86" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <resources>
                        <ip address="202.81.160.87" monitor_link="0"/>
                </resources>
                <service autostart="1" exclusive="1" name="binary.service" recovery="relocate">
                        <ip ref="202.81.160.87"/>
                </service>
                <failoverdomains/>
        </rm>
</cluster>
===<snip>===


I've already tried some things mentioned in the list archives:
1. "ccs_test connect" returns "Connect successful." on both nodes.

2. Although I'm using IP addresses in cluster.conf, I've added hostname definitions in /etc/hosts on both nodes:
===<snip>===
x.x.x.85   node1.domain.com    node1
x.x.x.86   node2.domain.com    node2
===<snip>===

3. When I manually copy /etc/cluster/cluster.conf to both nodes and do a "cman_tool version -r <version_number>", luci shows both nodes' "Status" as "Cluster Member". But when I try to make any changes using luci, the second node becomes "Not a Cluster Member"; and doing a "Have node join cluster" doesn't make it a member.


I'm running on CentOS 5.2 with:
luci-0.12.0-7.el5.centos.3
ricci-0.12.0-7.el5.centos.3
cman-2.0.84-2.el5_2.1
rgmanager-2.0.38-2.el5_2.1


Any pointers on how to make this work?

TIA,
-eric


      



From neuroticimbecile at yahoo.com  Mon Nov 24 17:09:55 2008
From: neuroticimbecile at yahoo.com (eric rosel)
Date: Mon, 24 Nov 2008 09:09:55 -0800 (PST)
Subject: [Linux-cluster] Unable to retrieve batch 1493885544 status from
	x.x.x.86:11111: module scheduled for execution
In-Reply-To: <985917.89277.qm@web53211.mail.re2.yahoo.com>
Message-ID: <605470.89933.qm@web53206.mail.re2.yahoo.com>

oops, forgot to anonymize my cluster.conf
sorry about that.


--- On Tue, 11/25/08, eric rosel <neuroticimbecile at yahoo.com> wrote:

> Hi All,
> 
> I'm trying to set up a 2-node cluster using luci.  As
> of now, I have only configured it with a single service,
> with a single resource: an IP address, so I could test if
> the IP address fails over to the other node.  So far, it
> doesn't.
> 
> In /var/log/messages of the first node, it says:
> "Unable to retrieve batch 1493885544 status from
> x.x.x.86:11111: module scheduled for execution"
> 
> 
<snip>


      



From minoritystorm at gmail.com  Mon Nov 24 18:23:20 2008
From: minoritystorm at gmail.com (Brain Stormer)
Date: Mon, 24 Nov 2008 20:23:20 +0200
Subject: [Linux-cluster] Geographical load balancing & GFS related question!
Message-ID: <aa0f51a10811241023h625cb700r76ea470509cf3441@mail.gmail.com>

I know Red Hat Cluster Suite relies on OpenAIS and GFS relies on DLM for
locking,

So the question is am I capable to have a DRBD block device running in an
active/active mode and have a GFS file system over it with the nodes located
in different geographical locations ? considering that there is no
multicast/broadcast between the 2 nodes in a cluster and whether DLM will be
able to operate in that case.

Also having in mind that the connection between the 2 locations having the
nodes is connected with a fiber-optics connections and the delay between the
2 server is not more than 3ms.

Your input is highly appreciated.


Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081124/14b4c1ea/attachment.htm>

From kabot23 at yahoo.es  Mon Nov 24 18:52:59 2008
From: kabot23 at yahoo.es (karim boum)
Date: Mon, 24 Nov 2008 18:52:59 +0000 (GMT)
Subject: [Linux-cluster] RH3 cluster s configuration
Message-ID: <723982.90900.qm@web24304.mail.ird.yahoo.com>

hi we have a two nodes cluster configured years ago by some guy? and that isnt working
its configuration is composed of two services :
-shared :one IP address , 2 discs accessed through a EVA and mounted 
-SAP: a script to launch oracle instance and then SAP

those services are not linked in the current setup,which causes the problem as the script needs to access data located on one of the shared disk mounted by the "shared" service.
i d like to be sure? the proper way to configure is? to only have one service and adding SAP script as an extra ressource 
If i configure the cluster this way, will? the script be executed only after getting the IP and mounting the discs ?

thanks for the help!

Karim BOUMEDHEL
Guitarist 
Linux addict



Mail:karim.boumedhel at karimo.dyndns.org






      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081124/08879458/attachment.htm>

From Harri.Paivaniemi at tietoenator.com  Tue Nov 25 05:43:13 2008
From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com)
Date: Tue, 25 Nov 2008 07:43:13 +0200
Subject: [Linux-cluster] Unable to retrieve batch 1493885544 status
	fromx.x.x.86:11111: module scheduled for execution
References: <985917.89277.qm@web53211.mail.re2.yahoo.com>
Message-ID: <41E8D4F07FCE154CBEBAA60FFC92F67709FE72@apollo.eu.tieto.com>

Hi,

Are you sure multicast is working in your switch?

Openais uses it... I have had several really odd misbehaviours because of multicast not working...


-hjp



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of eric rosel
Sent: Mon 11/24/2008 19:02
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Unable to retrieve batch 1493885544 status fromx.x.x.86:11111: module scheduled for execution
 
Hi All,

I'm trying to set up a 2-node cluster using luci.  As of now, I have only configured it with a single service, with a single resource: an IP address, so I could test if the IP address fails over to the other node.  So far, it doesn't.

In /var/log/messages of the first node, it says: "Unable to retrieve batch 1493885544 status from x.x.x.86:11111: module scheduled for execution"


It seems that each node is unaware of the other, "cman_tool nodes" says, respectively:

===<snip>===
Node  Sts   Inc   Joined               Name
   1   M     48   2008-11-24 23:46:04  x.x.x.85
   2   X      0                        x.x.x.86
===<snip>===
Node  Sts   Inc   Joined               Name
   1   X      0                        x.x.x.85
   2   M     72   2008-11-24 23:32:43  x.x.x.86
===<snip>===


My /etc/cluster/cluster.conf contains:
===<snip>===
<?xml version="1.0"?>
<cluster alias="binary.cluster" config_version="18" name="binary.cluster">
        <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="202.81.160.85" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="202.81.160.86" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <resources>
                        <ip address="202.81.160.87" monitor_link="0"/>
                </resources>
                <service autostart="1" exclusive="1" name="binary.service" recovery="relocate">
                        <ip ref="202.81.160.87"/>
                </service>
                <failoverdomains/>
        </rm>
</cluster>
===<snip>===


I've already tried some things mentioned in the list archives:
1. "ccs_test connect" returns "Connect successful." on both nodes.

2. Although I'm using IP addresses in cluster.conf, I've added hostname definitions in /etc/hosts on both nodes:
===<snip>===
x.x.x.85   node1.domain.com    node1
x.x.x.86   node2.domain.com    node2
===<snip>===

3. When I manually copy /etc/cluster/cluster.conf to both nodes and do a "cman_tool version -r <version_number>", luci shows both nodes' "Status" as "Cluster Member". But when I try to make any changes using luci, the second node becomes "Not a Cluster Member"; and doing a "Have node join cluster" doesn't make it a member.


I'm running on CentOS 5.2 with:
luci-0.12.0-7.el5.centos.3
ricci-0.12.0-7.el5.centos.3
cman-2.0.84-2.el5_2.1
rgmanager-2.0.38-2.el5_2.1


Any pointers on how to make this work?

TIA,
-eric


      

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4546 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081125/69f7a924/attachment.bin>

From Bevan.Broun at ardec.com.au  Tue Nov 25 05:59:01 2008
From: Bevan.Broun at ardec.com.au (Bevan Broun)
Date: Tue, 25 Nov 2008 16:59:01 +1100
Subject: [Linux-cluster] Unable to retrieve batch 1493885544 status
	fromx.x.x.86:11111: module scheduled for execution
In-Reply-To: <41E8D4F07FCE154CBEBAA60FFC92F67709FE72@apollo.eu.tieto.com>
References: <985917.89277.qm@web53211.mail.re2.yahoo.com>
	<41E8D4F07FCE154CBEBAA60FFC92F67709FE72@apollo.eu.tieto.com>
Message-ID: <6008E5CED89FD44A86D3C376519E1DB21025538C5A@megatron.ms.a2end.com>

Regarding Multicast: A symptom I had when multicasting was disabled was "ccs_tool update cluster.conf" didn't work, ie didn't push out the updated cluster.conf (which makes some sense!). So you can check by updating the version of cluster.conf on one member and running this (from in /etc/cluster).

Have you got all your firewall rules done or turned off?

Your error message seems to be saying a problem with ricci (that 11111 port). On RH 5.1 on a two node cluster I had too many issues with luci and ricci misbehaving or giving wrong information. The command line tools worked much better for me. "cman_tool status" :-)

Bevan Broun
Solutions Architect
Ardec International
 
http://www.ardec.com.au
http://www.lisasoft.com
http://www.terrapages.com
Sydney
-----------------------
Suite 112,The Lower Deck
19-21 Jones Bay Wharf
Pirrama Road, Pyrmont 2009
Ph:  +61 2 8570 5000
Fax: +61 2 8570 5099 
-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Harri.Paivaniemi at tietoenator.com
Sent: Tuesday, 25 November 2008 4:43 PM
To: neuroticimbecile at yahoo.com; linux-cluster at redhat.com
Subject: RE: [Linux-cluster] Unable to retrieve batch 1493885544 status fromx.x.x.86:11111: module scheduled for execution

Hi,

Are you sure multicast is working in your switch?

Openais uses it... I have had several really odd misbehaviours because of multicast not working...


-hjp



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of eric rosel
Sent: Mon 11/24/2008 19:02
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Unable to retrieve batch 1493885544 status fromx.x.x.86:11111: module scheduled for execution
 
Hi All,

I'm trying to set up a 2-node cluster using luci.  As of now, I have only configured it with a single service, with a single resource: an IP address, so I could test if the IP address fails over to the other node.  So far, it doesn't.

In /var/log/messages of the first node, it says: "Unable to retrieve batch 1493885544 status from x.x.x.86:11111: module scheduled for execution"


It seems that each node is unaware of the other, "cman_tool nodes" says, respectively:

===<snip>===
Node  Sts   Inc   Joined               Name
   1   M     48   2008-11-24 23:46:04  x.x.x.85
   2   X      0                        x.x.x.86
===<snip>===
Node  Sts   Inc   Joined               Name
   1   X      0                        x.x.x.85
   2   M     72   2008-11-24 23:32:43  x.x.x.86
===<snip>===


My /etc/cluster/cluster.conf contains:
===<snip>===
<?xml version="1.0"?>
<cluster alias="binary.cluster" config_version="18" name="binary.cluster">
        <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="202.81.160.85" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="202.81.160.86" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <resources>
                        <ip address="202.81.160.87" monitor_link="0"/>
                </resources>
                <service autostart="1" exclusive="1" name="binary.service" recovery="relocate">
                        <ip ref="202.81.160.87"/>
                </service>
                <failoverdomains/>
        </rm>
</cluster>
===<snip>===


I've already tried some things mentioned in the list archives:
1. "ccs_test connect" returns "Connect successful." on both nodes.

2. Although I'm using IP addresses in cluster.conf, I've added hostname definitions in /etc/hosts on both nodes:
===<snip>===
x.x.x.85   node1.domain.com    node1
x.x.x.86   node2.domain.com    node2
===<snip>===

3. When I manually copy /etc/cluster/cluster.conf to both nodes and do a "cman_tool version -r <version_number>", luci shows both nodes' "Status" as "Cluster Member". But when I try to make any changes using luci, the second node becomes "Not a Cluster Member"; and doing a "Have node join cluster" doesn't make it a member.


I'm running on CentOS 5.2 with:
luci-0.12.0-7.el5.centos.3
ricci-0.12.0-7.el5.centos.3
cman-2.0.84-2.el5_2.1
rgmanager-2.0.38-2.el5_2.1


Any pointers on how to make this work?

TIA,
-eric


      

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




From jbrassow at redhat.com  Tue Nov 25 20:23:38 2008
From: jbrassow at redhat.com (Jonathan Brassow)
Date: Tue, 25 Nov 2008 14:23:38 -0600
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <1227516384.8167.1.camel@dima-desktop>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
	<1227173594.10020.4.camel@dima-desktop>
	<646C5AFC-F606-4AE0-8BC3-34B5D5593BFD@redhat.com>
	<1227516384.8167.1.camel@dima-desktop>
Message-ID: <89DC73CC-A975-4BB6-9F43-F29B1D47FD7E@redhat.com>

You do not have the module installed (or in the right place?).

You need the cmirror and cmirror-kmod packages.  If you don't have  
access to them, and would like to compile them for yourself, then you  
will need to check-out the cluster tree on the RHEL5 or RHEL53 branch.

  brassow

On Nov 24, 2008, at 2:46 AM, ??????? ??????? wrote:

> Hello
> This is /var/log/messages
> "Nov 20 07:57:14 recipient1 kernel: device-mapper: dirty region log:
> Module for logging type "clustered-core" not found.
> Nov 20 07:57:14 recipient1 kernel: device-mapper: table: 253:2:  
> mirror:
> Error creating mirror dirty log
> Nov 20 07:57:14 recipient1 kernel: device-mapper: ioctl: error adding
> target to table
> Nov 20 08:43:37 recipient1 kernel: device-mapper: dirty region log:
> Module for logging type "clustered-core" not found.
> Nov 20 08:43:37 recipient1 kernel: device-mapper: table: 253:2:  
> mirror:
> Error creating mirror dirty log
> Nov 20 08:43:37 recipient1 kernel: device-mapper: ioctl: error adding
> target to table
> Nov 20 08:47:05 recipient1 kernel: device-mapper: dirty region log:
> Module for logging type "clustered-core" not found.
> Nov 20 08:47:05 recipient1 kernel: device-mapper: table: 253:2:  
> mirror:
> Error creating mirror dirty log
> Nov 20 08:47:05 recipient1 kernel: device-mapper: ioctl: error adding
> target to table
> "
>
> ? ???, 21/11/2008 ? 11:51 -0600, Jonathan Brassow ?????:
>> What does /var/log/messages say?
>>
>>  brassow
>>
>> On Nov 20, 2008, at 3:33 AM, ??????? ??????? wrote:
>>
>>> hello
>>> i use lvm2  2.02.39
>>> when i write  "lvcreate  -L 60G -m1 --corelog  -n mirror1 volg1"
>>> i get error
>>> "Error locking on node recipient1: device-mapper: reload ioctl  
>>> failed:
>>> Invalid argument
>>> Error locking on node recipient2: device-mapper: reload ioctl
>>> failed:Invalid argument
>>> Aborting. Failed to activate new LV to wipe the start of it."
>>>
>>> i must update lvm2 or other ?
>>>
>>> ? ???, 11/11/2008 ? 20:01 -0600, Jonathan Brassow ?????:
>>>> Sure.  In fact, if you have access to the red hat 5.3 beta, it is
>>>> ready there.
>>>>
>>>>
>>>> brassow
>>>>
>>>> On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:
>>>>
>>>>> can i use cluster raid1 if i get development release from
>>>>> sources.redhat.com/cluster ?
>>>>>
>>>>>
>>>>> 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann GmbH ]
>>>>> <mad at wol.de>
>>>>>       Hello,
>>>>>
>>>>>
>>>>>       will the changes to mirroring get merged into stable2 and
>>>>>       head after
>>>>>       RHEL-5.3 release?
>>>>>
>>>>>
>>>>>       Marc
>>>>>
>>>>>       Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan
>>>>>       Brassow:
>>>>>
>>>>>> that works already.
>>>>>>
>>>>>> single machine: linear, stripe, mirror, snapshot
>>>>>> cluster-aware: linear, stripe, mirror (5.3)
>>>>>>
>>>>>> brassow
>>>>>>
>>>>>> On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
>>>>>>
>>>>>>> What about CLVM based striping (RAID0)? Does that work
>>>>>       already or is
>>>>>>> it planned for the near future?
>>>>>>>
>>>>>>> Gordan
>>>>>>>
>>>>>>> Jonathan Brassow wrote:
>>>>>>>> Cluster mirror (RAID1) will be available in rhel5.3 for
>>>>>       LVM.
>>>>>>>> brassow
>>>>>>>> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
>>>>>>>>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic
>>>>>       <gordan at bobich.net>
>>>>>>>>> wrote:
>>>>>>>>>> I rather doubt md will become cluster aware any time
>>>>>       soon. CLVM
>>>>>>>>>> doesn't yet
>>>>>>>>>> support even more important features like
>>>>>       snapshotting, so I
>>>>>>>>>> wouldn't count
>>>>>>>>>> on it supporting anything more advanced.
>>>>>>>>>
>>>>>>>>> I worked a little on clvm snapshots:
>>>>>>>>>
>>>>>       https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
>>>>>>>>>
>>>>>>>>> Review and testing is required.
>>>>>>>>> --
>>>>>>>>> Federico.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Linux-cluster mailing list
>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>>
>>>>>       --
>>>>>
>>>>>       Linux-cluster mailing list
>>>>>       Linux-cluster at redhat.com
>>>>>       https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From lei_michelle at hotmail.com  Tue Nov 25 20:25:42 2008
From: lei_michelle at hotmail.com (Michelle Lai)
Date: Tue, 25 Nov 2008 15:25:42 -0500
Subject: [Linux-cluster] Query resources of a service
Message-ID: <BLU140-W34E6746817C124AAA4226BFF0B0@phx.gbl>


Hi,

I wonder how to query resources (and their types) of a service defined in RedHat cluster. Is there any command can be used for this?

Your response will be helpful.

Michelle


_________________________________________________________________

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081125/2694762d/attachment.htm>

From bergman at merctech.com  Tue Nov 25 21:09:05 2008
From: bergman at merctech.com (bergman at merctech.com)
Date: Tue, 25 Nov 2008 16:09:05 -0500
Subject: [Linux-cluster] new fence device script available, API question,
	warning re. TrippLite PDU
In-Reply-To: Your message of "Sat, 22 Nov 2008 06:37:34 +0200."
	<41E8D4F07FCE154CBEBAA60FFC92F67709FE59@apollo.eu.tieto.com>
References: <41E8D4F07FCE154CBEBAA60FFC92F67709FE59@apollo.eu.tieto.com>
	<640358900811211937h25ef239ak6ad72bb2aec3236b@mail.gmail.com>
Message-ID: <5939.1227647345@mirchi>



I've written a script to control outlets on a TrippLite PDUMH15ATNET PDU, and 
I'd like to contribute it to the community.

Before doing that, what needs to happen to make the cluster software (RHCS 5.2, 
under CentOS 5.2) aware of the new agent so that I can complete testing?
Is it sufficient for me to put the fence_tripplite script into /sbin?



I've got a question about the API (and XML parsing) for using my 
fence_tripplite script within the cluster.conf. Specifically, will the cluster 
software and XML configuration file correctly handle a <fencedevice> line that 
has multiple instances of a named parameter, as in:

	<fencedevice agent="fence_tripplite" receptacle="1" receptacle="2" login="admin"...>

Due to a fundamental problem with the Tripplite hardware (see below), there's a 
big advantage in passing multiple "receptacle" options to a single call to 
fence_tripplite, rather than having separate <fencedevice> entries for each 
power outlet as is the normal behavior.




Finally, I'd like to warn people away from using the TrippLite PDU model 
PDUMH15ATNET as a fencing device. While it seems to have nice features, it has 
a design choice that is a serious problem with fencing--when a command is 
given to power down an outlet, there is a "random" delay (observed to be 
about 17 to 35 seconds) before that command is executed. This has been 
acknowledged by TrippLite support as a design choice, with no option or setting 
to override this behavior.

Thanks,

Mark




From tom at netspot.com.au  Tue Nov 25 23:09:51 2008
From: tom at netspot.com.au (Tom Lanyon)
Date: Wed, 26 Nov 2008 09:39:51 +1030
Subject: [Linux-cluster] Geographical load balancing & GFS related
	question!
In-Reply-To: <aa0f51a10811241023h625cb700r76ea470509cf3441@mail.gmail.com>
References: <aa0f51a10811241023h625cb700r76ea470509cf3441@mail.gmail.com>
Message-ID: <453CE0E4-31A7-408C-889D-3D840061B5F6@netspot.com.au>

On 25/11/2008, at 4:53 AM, Brain Stormer wrote:

> I know Red Hat Cluster Suite relies on OpenAIS and GFS relies on DLM  
> for locking,
>
> So the question is am I capable to have a DRBD block device running  
> in an active/active mode and have a GFS file system over it with the  
> nodes located in different geographical locations ? considering that  
> there is no multicast/broadcast between the 2 nodes in a cluster and  
> whether DLM will be able to operate in that case.
>
> Also having in mind that the connection between the 2 locations  
> having the nodes is connected with a fiber-optics connections and  
> the delay between the 2 server is not more than 3ms.
>
> Your input is highly appreciated.


If your sites are linked with fiber then surely you can either span  
your VLAN across both sites, or configure multicast routing between a  
VLAN at each site?

You will need multicast between nodes for this to work; as far as I  
know, there is no other solution available.

Regards,
Tom



From tom at netspot.com.au  Tue Nov 25 23:14:10 2008
From: tom at netspot.com.au (Tom Lanyon)
Date: Wed, 26 Nov 2008 09:44:10 +1030
Subject: [Linux-cluster] SELinux contexts not propagating between GFS nodes
In-Reply-To: <88A3D32D-1F53-4CAC-950A-D3EBCAE47547@netspot.com.au>
References: <88A3D32D-1F53-4CAC-950A-D3EBCAE47547@netspot.com.au>
Message-ID: <3EBCAF5A-5027-4C99-BF08-E50A1C2CB2EA@netspot.com.au>

On 27/10/2008, at 4:50 PM, Tom Lanyon wrote:

> Hi list,
>
> I'm seeing an occasional issue where an SELinux file context is  
> applied on a cluster node to a file on a GFS1 filesystem, but the  
> old context remains on one (or more) other nodes.
>
> A simple 'restorecon /path/to/file' fixes the context on the  
> "broken" node.
>
> We're running CentOS 5.2 x86_64 with all the latest stable cluster  
> and GFS versions.
>
> Any ideas why this could be happening and/or how to debug it?
>
> Thanks,
> Tom

Hi all,

Just wanted to re-raise this question as I'm still seeing it and it's  
causing issues.

Anyone else heard of/seen this happen?

Regards,
Tom



From julien.brule at obspm.fr  Wed Nov 26 08:59:41 2008
From: julien.brule at obspm.fr (=?ISO-8859-1?Q?Julien_Brul=E9?=)
Date: Wed, 26 Nov 2008 09:59:41 +0100
Subject: [Linux-cluster] SELinux contexts not propagating between GFS nodes
In-Reply-To: <3EBCAF5A-5027-4C99-BF08-E50A1C2CB2EA@netspot.com.au>
References: <88A3D32D-1F53-4CAC-950A-D3EBCAE47547@netspot.com.au>
	<3EBCAF5A-5027-4C99-BF08-E50A1C2CB2EA@netspot.com.au>
Message-ID: <492D0FFD.8090904@obspm.fr>

Tom Lanyon wrote:
> On 27/10/2008, at 4:50 PM, Tom Lanyon wrote:
>
>> Hi list,
>>
>> I'm seeing an occasional issue where an SELinux file context is 
>> applied on a cluster node to a file on a GFS1 filesystem, but the old 
>> context remains on one (or more) other nodes.
>>
>> A simple 'restorecon /path/to/file' fixes the context on the "broken" 
>> node.
>>
>> We're running CentOS 5.2 x86_64 with all the latest stable cluster 
>> and GFS versions.
>>
>> Any ideas why this could be happening and/or how to debug it?
>>
>> Thanks,
>> Tom
>
> Hi all,
>
hi,

i have the same trouble with 5.2 et gfs2 and i stop using it waiting for 
stable release

julien



> Just wanted to re-raise this question as I'm still seeing it and it's 
> causing issues.
>
> Anyone else heard of/seen this happen?
>
> Regards,
> Tom
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From neuroticimbecile at yahoo.com  Wed Nov 26 09:13:49 2008
From: neuroticimbecile at yahoo.com (eric rosel)
Date: Wed, 26 Nov 2008 01:13:49 -0800 (PST)
Subject: [Linux-cluster] Unable to retrieve batch 1493885544 status
	fromx.x.x.86:11111: module scheduled for execution
In-Reply-To: <6008E5CED89FD44A86D3C376519E1DB21025538C5A@megatron.ms.a2end.com>
Message-ID: <614565.16552.qm@web53212.mail.re2.yahoo.com>

Hi List,

Indeed it was the switch!  Two of them, in fact!
For the benefit of future googlers, these switches didn't work with RHCS:

3Com Switch 4226T
HP ProCurve Switch 1800-24G

...and this one worked:

HP ProCurve 4104GL

Many thanks to Harri.Paivaniemi and Bevan Broun for pointing it out.

peace!
-eric



--- On Tue, 11/25/08, Bevan Broun <Bevan.Broun at ardec.com.au> wrote:

> From: Bevan Broun <Bevan.Broun at ardec.com.au>
> Subject: RE: [Linux-cluster] Unable to retrieve batch 1493885544 status fromx.x.x.86:11111: module scheduled for execution
> To: "linux clustering" <linux-cluster at redhat.com>
> Date: Tuesday, November 25, 2008, 1:59 PM
> 
> Regarding Multicast: A symptom I had when multicasting was
> disabled was "ccs_tool update cluster.conf"
> didn't work, ie didn't push out the updated
> cluster.conf (which makes some sense!). So you can check by
> updating the version of cluster.conf on one member and
> running this (from in /etc/cluster).
> 
> Have you got all your firewall rules done or turned off?
> 
> Your error message seems to be saying a problem with ricci
> (that 11111 port). On RH 5.1 on a two node cluster I had too
> many issues with luci and ricci misbehaving or giving wrong
> information. The command line tools worked much better for
> me. "cman_tool status" :-)
> 
> Bevan Broun
> Solutions Architect
> Ardec International
>  
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> Sent: Tuesday, 25 November 2008 4:43 PM
> Subject: RE: [Linux-cluster] Unable to retrieve batch
> 1493885544 status fromx.x.x.86:11111: module scheduled for
> execution
> 
> Hi,
> 
> Are you sure multicast is working in your switch?
> 
> Openais uses it... I have had several really odd
> misbehaviours because of multicast not working...
> 
> 
> -hjp



      



From tom at netspot.com.au  Wed Nov 26 10:29:28 2008
From: tom at netspot.com.au (Tom Lanyon)
Date: Wed, 26 Nov 2008 20:59:28 +1030
Subject: [Linux-cluster] Problem in clvmd/dlm_recoverd
In-Reply-To: <824ffea00811191407g7829f368wc13e054fe37134ee@mail.gmail.com>
References: <200811141000.13623.npf-mlists@eurotux.com>
	<20081114162649.GA4054@redhat.com>
	<200811142153.13752.npf-mlists@eurotux.com>
	<20081114220515.GB7394@redhat.com>
	<B2F48526-C962-474D-ACFC-F95852D3F387@netspot.com.au>
	<20081118153619.GA10717@redhat.com>
	<7A795AFA-ED36-4718-B224-5BE02A5FBAD7@netspot.com.au>
	<D35F1EE5-6145-453F-A65F-9CEBA3AF3F98@netspot.com.au>
	<824ffea00811191407g7829f368wc13e054fe37134ee@mail.gmail.com>
Message-ID: <E46CD6BB-FE7B-4182-8DB7-1E4FF6C7F937@netspot.com.au>

On 20/11/2008, at 8:37 AM, Brandon Young wrote:

> Could you please share the parameters you tuned, and perhaps a brief  
> explanation of your thinking?  I have hideously slow backups, too,  
> and haven't been successful in improving it through tuning.
>
> --
> Brandon
>
>
> On Wed, Nov 19, 2008 at 3:56 PM, Tom Lanyon <tom at netspot.com.au>  
> wrote:
>> After tuning some GFS parameters yesterday, last night's backup ran  
>> without a hitch! :)
>
>>


I changed settings to try and reduce the amount of locks held on the  
backup server. I was assuming that as the application servers were  
still trying to access the GFS mountpoint, they had to wait until the  
locks held on the backup server demoted and this was causing the  
applications to hang.

I tuned glock_purge (it was disabled, I set it to 50%) and reduced  
demote_secs to 100; I also turned on fast_statfs and increased the  
number of statfs_slots to try and increase backup performance (in case  
it does a lot of statfs calls).

However, after further testing we encountered more instability issues.  
Performance was greatly improved, but we were still finding storage  
locking up on multiple cluster nodes.

This seems to be more of an issue with GNBD (on which we're running  
GFS) rather than GFS itself. When the lock up happens, all machines  
continue to run but any commands that reference the GNBD export hang  
(ie, lvm, df, mount).

I haven't been able to confirm this since GNBD isn't giving me any  
errors, but we're still investigating.

Regards,
Tom



From d.vasilets at peterhost.ru  Wed Nov 26 12:34:13 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Wed, 26 Nov 2008 15:34:13 +0300
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <89DC73CC-A975-4BB6-9F43-F29B1D47FD7E@redhat.com>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
	<1227173594.10020.4.camel@dima-desktop>
	<646C5AFC-F606-4AE0-8BC3-34B5D5593BFD@redhat.com>
	<1227516384.8167.1.camel@dima-desktop>
	<89DC73CC-A975-4BB6-9F43-F29B1D47FD7E@redhat.com>
Message-ID: <1227702853.11626.1.camel@dima-desktop>

I use last source from git stable RHEL5
but, when i try compile cmirror kernel mudule i get syntax error


[root at node1 cmirror-kernel]# make
cd src && make all
make[1]: Entering directory `/home/test2/cluster/cmirror-kernel/src'
make -C /lib/modules/2.6.27.5-117.storage.x86_64/source
M=/home/test2/cluster/cmirror-kernel/src   modules USING_KBUILD=yes
make[2]: Entering directory
`/usr/src/kernels/2.6.27.5-117.storage.x86_64'
  CC [M]  /home/test2/cluster/cmirror-kernel/src/dm-clog.o
/home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
?cluster_ctr?:
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:148: warning: format ?%
llu? expects type ?long long unsigned int?, but argument 3 has type
?sector_t?
/home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
?cluster_flush?:
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:417: warning:
comparison of distinct pointer types lacks a cast
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:419: warning:
comparison of distinct pointer types lacks a cast
/home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
?cluster_mark_region?:
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:478: warning:
comparison of distinct pointer types lacks a cast
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:482: warning:
comparison of distinct pointer types lacks a cast
/home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
?cluster_clear_region?:
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:517: warning:
comparison of distinct pointer types lacks a cast
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:521: warning:
comparison of distinct pointer types lacks a cast
/home/test2/cluster/cmirror-kernel/src/dm-clog.c: At top level:
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:709: error: unknown
field ?get_failure_response? specified in initializer
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:709: warning: excess
elements in struct initializer
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:709: warning: (near
initialization for ?_clustered_core_type?)
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:710: error: unknown
field ?is_remote_recovering? specified in initializer
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:710: warning: excess
elements in struct initializer
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:710: warning: (near
initialization for ?_clustered_core_type?)
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:731: error: unknown
field ?get_failure_response? specified in initializer
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:731: warning: excess
elements in struct initializer
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:731: warning: (near
initialization for ?_clustered_disk_type?)
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:732: error: unknown
field ?is_remote_recovering? specified in initializer
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:732: warning: excess
elements in struct initializer
/home/test2/cluster/cmirror-kernel/src/dm-clog.c:732: warning: (near
initialization for ?_clustered_disk_type?)
make[3]: *** [/home/test2/cluster/cmirror-kernel/src/dm-clog.o] Error 1
make[2]: *** [_module_/home/test2/cluster/cmirror-kernel/src] Error 2
make[2]: Leaving directory
`/usr/src/kernels/2.6.27.5-117.storage.x86_64'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/test2/cluster/cmirror-kernel/src'
make: *** [all] Error 2

Thank you

? ???, 25/11/2008 ? 14:23 -0600, Jonathan Brassow ?????:
> You do not have the module installed (or in the right place?).
> 
> You need the cmirror and cmirror-kmod packages.  If you don't have  
> access to them, and would like to compile them for yourself, then you  
> will need to check-out the cluster tree on the RHEL5 or RHEL53 branch.
> 
>   brassow
> 
> On Nov 24, 2008, at 2:46 AM, ??????? ??????? wrote:
> 
> > Hello
> > This is /var/log/messages
> > "Nov 20 07:57:14 recipient1 kernel: device-mapper: dirty region log:
> > Module for logging type "clustered-core" not found.
> > Nov 20 07:57:14 recipient1 kernel: device-mapper: table: 253:2:  
> > mirror:
> > Error creating mirror dirty log
> > Nov 20 07:57:14 recipient1 kernel: device-mapper: ioctl: error adding
> > target to table
> > Nov 20 08:43:37 recipient1 kernel: device-mapper: dirty region log:
> > Module for logging type "clustered-core" not found.
> > Nov 20 08:43:37 recipient1 kernel: device-mapper: table: 253:2:  
> > mirror:
> > Error creating mirror dirty log
> > Nov 20 08:43:37 recipient1 kernel: device-mapper: ioctl: error adding
> > target to table
> > Nov 20 08:47:05 recipient1 kernel: device-mapper: dirty region log:
> > Module for logging type "clustered-core" not found.
> > Nov 20 08:47:05 recipient1 kernel: device-mapper: table: 253:2:  
> > mirror:
> > Error creating mirror dirty log
> > Nov 20 08:47:05 recipient1 kernel: device-mapper: ioctl: error adding
> > target to table
> > "
> >
> > ? ???, 21/11/2008 ? 11:51 -0600, Jonathan Brassow ?????:
> >> What does /var/log/messages say?
> >>
> >>  brassow
> >>
> >> On Nov 20, 2008, at 3:33 AM, ??????? ??????? wrote:
> >>
> >>> hello
> >>> i use lvm2  2.02.39
> >>> when i write  "lvcreate  -L 60G -m1 --corelog  -n mirror1 volg1"
> >>> i get error
> >>> "Error locking on node recipient1: device-mapper: reload ioctl  
> >>> failed:
> >>> Invalid argument
> >>> Error locking on node recipient2: device-mapper: reload ioctl
> >>> failed:Invalid argument
> >>> Aborting. Failed to activate new LV to wipe the start of it."
> >>>
> >>> i must update lvm2 or other ?
> >>>
> >>> ? ???, 11/11/2008 ? 20:01 -0600, Jonathan Brassow ?????:
> >>>> Sure.  In fact, if you have access to the red hat 5.3 beta, it is
> >>>> ready there.
> >>>>
> >>>>
> >>>> brassow
> >>>>
> >>>> On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:
> >>>>
> >>>>> can i use cluster raid1 if i get development release from
> >>>>> sources.redhat.com/cluster ?
> >>>>>
> >>>>>
> >>>>> 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann GmbH ]
> >>>>> <mad at wol.de>
> >>>>>       Hello,
> >>>>>
> >>>>>
> >>>>>       will the changes to mirroring get merged into stable2 and
> >>>>>       head after
> >>>>>       RHEL-5.3 release?
> >>>>>
> >>>>>
> >>>>>       Marc
> >>>>>
> >>>>>       Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan
> >>>>>       Brassow:
> >>>>>
> >>>>>> that works already.
> >>>>>>
> >>>>>> single machine: linear, stripe, mirror, snapshot
> >>>>>> cluster-aware: linear, stripe, mirror (5.3)
> >>>>>>
> >>>>>> brassow
> >>>>>>
> >>>>>> On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
> >>>>>>
> >>>>>>> What about CLVM based striping (RAID0)? Does that work
> >>>>>       already or is
> >>>>>>> it planned for the near future?
> >>>>>>>
> >>>>>>> Gordan
> >>>>>>>
> >>>>>>> Jonathan Brassow wrote:
> >>>>>>>> Cluster mirror (RAID1) will be available in rhel5.3 for
> >>>>>       LVM.
> >>>>>>>> brassow
> >>>>>>>> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> >>>>>>>>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic
> >>>>>       <gordan at bobich.net>
> >>>>>>>>> wrote:
> >>>>>>>>>> I rather doubt md will become cluster aware any time
> >>>>>       soon. CLVM
> >>>>>>>>>> doesn't yet
> >>>>>>>>>> support even more important features like
> >>>>>       snapshotting, so I
> >>>>>>>>>> wouldn't count
> >>>>>>>>>> on it supporting anything more advanced.
> >>>>>>>>>
> >>>>>>>>> I worked a little on clvm snapshots:
> >>>>>>>>>
> >>>>>       https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
> >>>>>>>>>
> >>>>>>>>> Review and testing is required.
> >>>>>>>>> --
> >>>>>>>>> Federico.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Linux-cluster mailing list
> >>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>> --
> >>>>>>>> Linux-cluster mailing list
> >>>>>>>> Linux-cluster at redhat.com
> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>
> >>>>>>> --
> >>>>>>> Linux-cluster mailing list
> >>>>>>> Linux-cluster at redhat.com
> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>
> >>>>>> --
> >>>>>> Linux-cluster mailing list
> >>>>>> Linux-cluster at redhat.com
> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>
> >>>>>
> >>>>>       --
> >>>>>
> >>>>>       Linux-cluster mailing list
> >>>>>       Linux-cluster at redhat.com
> >>>>>       https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Linux-cluster mailing list
> >>>>> Linux-cluster at redhat.com
> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>
> >>>>
> >>>> --
> >>>> Linux-cluster mailing list
> >>>> Linux-cluster at redhat.com
> >>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From jbrassow at redhat.com  Wed Nov 26 16:45:21 2008
From: jbrassow at redhat.com (Jonathan Brassow)
Date: Wed, 26 Nov 2008 10:45:21 -0600
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <1227702853.11626.1.camel@dima-desktop>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
	<1227173594.10020.4.camel@dima-desktop>
	<646C5AFC-F606-4AE0-8BC3-34B5D5593BFD@redhat.com>
	<1227516384.8167.1.camel@dima-desktop>
	<89DC73CC-A975-4BB6-9F43-F29B1D47FD7E@redhat.com>
	<1227702853.11626.1.camel@dima-desktop>
Message-ID: <3CC18688-1CCA-4C61-8B15-458FE5DCD6BB@redhat.com>

:(  You will need the latest RHEL5 kernel in order to compile  
against.  Your 2.6.27.5-177 kernel does not have the log API changes  
necessary to facilitate the log module.

  brassow

On Nov 26, 2008, at 6:34 AM, ??????? ??????? wrote:

> I use last source from git stable RHEL5
> but, when i try compile cmirror kernel mudule i get syntax error
>
>
> [root at node1 cmirror-kernel]# make
> cd src && make all
> make[1]: Entering directory `/home/test2/cluster/cmirror-kernel/src'
> make -C /lib/modules/2.6.27.5-117.storage.x86_64/source
> M=/home/test2/cluster/cmirror-kernel/src   modules USING_KBUILD=yes
> make[2]: Entering directory
> `/usr/src/kernels/2.6.27.5-117.storage.x86_64'
>  CC [M]  /home/test2/cluster/cmirror-kernel/src/dm-clog.o
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
> ?cluster_ctr?:
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:148: warning:  
> format ?%
> llu? expects type ?long long unsigned int?, but argument 3 has  
> type
> ?sector_t?
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
> ?cluster_flush?:
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:417: warning:
> comparison of distinct pointer types lacks a cast
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:419: warning:
> comparison of distinct pointer types lacks a cast
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
> ?cluster_mark_region?:
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:478: warning:
> comparison of distinct pointer types lacks a cast
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:482: warning:
> comparison of distinct pointer types lacks a cast
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
> ?cluster_clear_region?:
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:517: warning:
> comparison of distinct pointer types lacks a cast
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:521: warning:
> comparison of distinct pointer types lacks a cast
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c: At top level:
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:709: error: unknown
> field ?get_failure_response? specified in initializer
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:709: warning: excess
> elements in struct initializer
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:709: warning: (near
> initialization for ?_clustered_core_type?)
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:710: error: unknown
> field ?is_remote_recovering? specified in initializer
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:710: warning: excess
> elements in struct initializer
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:710: warning: (near
> initialization for ?_clustered_core_type?)
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:731: error: unknown
> field ?get_failure_response? specified in initializer
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:731: warning: excess
> elements in struct initializer
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:731: warning: (near
> initialization for ?_clustered_disk_type?)
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:732: error: unknown
> field ?is_remote_recovering? specified in initializer
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:732: warning: excess
> elements in struct initializer
> /home/test2/cluster/cmirror-kernel/src/dm-clog.c:732: warning: (near
> initialization for ?_clustered_disk_type?)
> make[3]: *** [/home/test2/cluster/cmirror-kernel/src/dm-clog.o]  
> Error 1
> make[2]: *** [_module_/home/test2/cluster/cmirror-kernel/src] Error 2
> make[2]: Leaving directory
> `/usr/src/kernels/2.6.27.5-117.storage.x86_64'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `/home/test2/cluster/cmirror-kernel/src'
> make: *** [all] Error 2
>
> Thank you
>
> ? ???, 25/11/2008 ? 14:23 -0600, Jonathan Brassow ?????:
>> You do not have the module installed (or in the right place?).
>>
>> You need the cmirror and cmirror-kmod packages.  If you don't have
>> access to them, and would like to compile them for yourself, then you
>> will need to check-out the cluster tree on the RHEL5 or RHEL53  
>> branch.
>>
>>  brassow
>>
>> On Nov 24, 2008, at 2:46 AM, ??????? ??????? wrote:
>>
>>> Hello
>>> This is /var/log/messages
>>> "Nov 20 07:57:14 recipient1 kernel: device-mapper: dirty region log:
>>> Module for logging type "clustered-core" not found.
>>> Nov 20 07:57:14 recipient1 kernel: device-mapper: table: 253:2:
>>> mirror:
>>> Error creating mirror dirty log
>>> Nov 20 07:57:14 recipient1 kernel: device-mapper: ioctl: error  
>>> adding
>>> target to table
>>> Nov 20 08:43:37 recipient1 kernel: device-mapper: dirty region log:
>>> Module for logging type "clustered-core" not found.
>>> Nov 20 08:43:37 recipient1 kernel: device-mapper: table: 253:2:
>>> mirror:
>>> Error creating mirror dirty log
>>> Nov 20 08:43:37 recipient1 kernel: device-mapper: ioctl: error  
>>> adding
>>> target to table
>>> Nov 20 08:47:05 recipient1 kernel: device-mapper: dirty region log:
>>> Module for logging type "clustered-core" not found.
>>> Nov 20 08:47:05 recipient1 kernel: device-mapper: table: 253:2:
>>> mirror:
>>> Error creating mirror dirty log
>>> Nov 20 08:47:05 recipient1 kernel: device-mapper: ioctl: error  
>>> adding
>>> target to table
>>> "
>>>
>>> ? ???, 21/11/2008 ? 11:51 -0600, Jonathan Brassow ?????:
>>>> What does /var/log/messages say?
>>>>
>>>> brassow
>>>>
>>>> On Nov 20, 2008, at 3:33 AM, ??????? ??????? wrote:
>>>>
>>>>> hello
>>>>> i use lvm2  2.02.39
>>>>> when i write  "lvcreate  -L 60G -m1 --corelog  -n mirror1 volg1"
>>>>> i get error
>>>>> "Error locking on node recipient1: device-mapper: reload ioctl
>>>>> failed:
>>>>> Invalid argument
>>>>> Error locking on node recipient2: device-mapper: reload ioctl
>>>>> failed:Invalid argument
>>>>> Aborting. Failed to activate new LV to wipe the start of it."
>>>>>
>>>>> i must update lvm2 or other ?
>>>>>
>>>>> ? ???, 11/11/2008 ? 20:01 -0600, Jonathan Brassow ?????:
>>>>>> Sure.  In fact, if you have access to the red hat 5.3 beta, it is
>>>>>> ready there.
>>>>>>
>>>>>>
>>>>>> brassow
>>>>>>
>>>>>> On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:
>>>>>>
>>>>>>> can i use cluster raid1 if i get development release from
>>>>>>> sources.redhat.com/cluster ?
>>>>>>>
>>>>>>>
>>>>>>> 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann  
>>>>>>> GmbH ]
>>>>>>> <mad at wol.de>
>>>>>>>      Hello,
>>>>>>>
>>>>>>>
>>>>>>>      will the changes to mirroring get merged into stable2 and
>>>>>>>      head after
>>>>>>>      RHEL-5.3 release?
>>>>>>>
>>>>>>>
>>>>>>>      Marc
>>>>>>>
>>>>>>>      Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan
>>>>>>>      Brassow:
>>>>>>>
>>>>>>>> that works already.
>>>>>>>>
>>>>>>>> single machine: linear, stripe, mirror, snapshot
>>>>>>>> cluster-aware: linear, stripe, mirror (5.3)
>>>>>>>>
>>>>>>>> brassow
>>>>>>>>
>>>>>>>> On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
>>>>>>>>
>>>>>>>>> What about CLVM based striping (RAID0)? Does that work
>>>>>>>      already or is
>>>>>>>>> it planned for the near future?
>>>>>>>>>
>>>>>>>>> Gordan
>>>>>>>>>
>>>>>>>>> Jonathan Brassow wrote:
>>>>>>>>>> Cluster mirror (RAID1) will be available in rhel5.3 for
>>>>>>>      LVM.
>>>>>>>>>> brassow
>>>>>>>>>> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
>>>>>>>>>>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic
>>>>>>>      <gordan at bobich.net>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> I rather doubt md will become cluster aware any time
>>>>>>>      soon. CLVM
>>>>>>>>>>>> doesn't yet
>>>>>>>>>>>> support even more important features like
>>>>>>>      snapshotting, so I
>>>>>>>>>>>> wouldn't count
>>>>>>>>>>>> on it supporting anything more advanced.
>>>>>>>>>>>
>>>>>>>>>>> I worked a little on clvm snapshots:
>>>>>>>>>>>
>>>>>>>      https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
>>>>>>>>>>>
>>>>>>>>>>> Review and testing is required.
>>>>>>>>>>> --
>>>>>>>>>>> Federico.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>> --
>>>>>>>>>> Linux-cluster mailing list
>>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Linux-cluster mailing list
>>>>>>>>> Linux-cluster at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>>
>>>>>>>> --
>>>>>>>> Linux-cluster mailing list
>>>>>>>> Linux-cluster at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>>
>>>>>>>      --
>>>>>>>
>>>>>>>      Linux-cluster mailing list
>>>>>>>      Linux-cluster at redhat.com
>>>>>>>      https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From Bennie_R_Thomas at raytheon.com  Wed Nov 26 17:18:32 2008
From: Bennie_R_Thomas at raytheon.com (Bennie Thomas)
Date: Wed, 26 Nov 2008 11:18:32 -0600
Subject: [Linux-cluster] RHCS 4U6 deadnode_timeout
Message-ID: <492D84E8.1020307@raytheon.com>

I am trying to add <cman deadnode_timeout="10"/>   to my cluster.conf 
file. to spead up the
failover time. Howerver when I add this to the cluster.conf then check 
the settings with system-config-cluster
I receive errors stating Invalid attribute deadnode_timeout for element 
cman.  Where exactly am I suppose
to place this in the cluster.conf.

-- 
Bennie Thomas
Sr. Information Systems Technologist II
Raytheon Company

972.205.4126
972.205.6363 fax
888.347.1660 pager
Bennie_R_Thomas at raytheon.com


DISCLAIMER: This message contains information that may be confidential and privileged. Unless you are the addressee (or authorized to receive mail for the addressee), you should not use, copy or disclose to anyone this message or any information contained in this message. If you have received this message in error, please so advise the sender by reply e-mail and delete this message. Thank you for your cooperation.

Any views or opinions presented are solely those of the author and do not necessarily represent those of Raytheon unless specifically stated. 
Electronic communications including email may be monitored by Raytheon
for operational or business reasons.






From skadlec at gk-software.com  Wed Nov 26 22:41:18 2008
From: skadlec at gk-software.com (Stepan Kadlec)
Date: Wed, 26 Nov 2008 23:41:18 +0100
Subject: [Linux-cluster] qdisk
Message-ID: <492DD08E.9050607@gk-software.com>

hi,
any idea why the qdiskd doesn't work? I am running fresh compilation of 
2.03.09 cman on 2.6.25 xen kernel on twonode cluster. without quorum, 
the cluster works fine, but the qdisk would be appreciated. the qorum 
disk is shared over iSCSI. the config looks like:

<?xml version="1.0"?>
<cluster name="xen" config_version="1">
   <cman expected_votes="1" two_node="1"/>
   <clusternodes>
    ...
   <clusternodes>

   <quorumd label="xen" votes="1">
   </quorumd>

   <fencedevices>
   </fencedevices>
</cluster>

when the cluster goes up and the quorum disc becomes online, it takes 
exactly 10 seconds until all cluster tools hang on execution (eg. 
clustat, cman_tool nodes, or the qdiskd) - all stacked in blocking 
cman_dispatch call (awaiting data with recv).

is there anyone having ever similar problem?

thanks steve.


-- 
Eurosoftware s.r.o.
skadlec at gk-software.com
+420 379 307 379
+420 724 554 104



From jazzn1doog at comcast.net  Thu Nov 27 00:51:03 2008
From: jazzn1doog at comcast.net (Douglas Morris)
Date: Wed, 26 Nov 2008 17:51:03 -0700
Subject: [Linux-cluster] RHCS 4U6 deadnode_timeout
In-Reply-To: <492D84E8.1020307@raytheon.com>
References: <492D84E8.1020307@raytheon.com>
Message-ID: <200811261751.03370.jazzn1doog@comcast.net>


Bennie; 

I have looked in all the dovumentation an knowledge bases I could fine, but 
nothing other than what you are doing.


So;  have you tried:

echo 100 > /proc/cluster/config/cman/deadnode_timeout


do this BEFORE you add the node to the cluster. 



Doug




On Wednesday 26 November 2008 10:18:32 am Bennie Thomas wrote:
> I am trying to add <cman deadnode_timeout="10"/>   to my cluster.conf
> file. to spead up the
> failover time. Howerver when I add this to the cluster.conf then check
> the settings with system-config-cluster
> I receive errors stating Invalid attribute deadnode_timeout for element
> cman.  Where exactly am I suppose
> to place this in the cluster.conf.



-- 
Douglas Morris

Romans 15: 13
    Now may the God of hope fill you with all joy and peace in believing,
    that you may abound in hope by the power of the Holy Spirit.



From Harri.Paivaniemi at tietoenator.com  Thu Nov 27 04:35:23 2008
From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com)
Date: Thu, 27 Nov 2008 06:35:23 +0200
Subject: [Linux-cluster] RHCS 4U6 deadnode_timeout
References: <492D84E8.1020307@raytheon.com>
Message-ID: <41E8D4F07FCE154CBEBAA60FFC92F67709FE90@apollo.eu.tieto.com>

Don't know,

but maby it has changed allready in 4U6 to "<totem token="xxx"/>", like it's in RHEL5...

Just guessing...

-hjp



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Bennie Thomas
Sent: Wed 11/26/2008 19:18
To: linux-cluster at redhat.com
Subject: [Linux-cluster] RHCS 4U6 deadnode_timeout
 
I am trying to add <cman deadnode_timeout="10"/>   to my cluster.conf 
file. to spead up the
failover time. Howerver when I add this to the cluster.conf then check 
the settings with system-config-cluster
I receive errors stating Invalid attribute deadnode_timeout for element 
cman.  Where exactly am I suppose
to place this in the cluster.conf.

-- 
Bennie Thomas
Sr. Information Systems Technologist II
Raytheon Company

972.205.4126
972.205.6363 fax
888.347.1660 pager
Bennie_R_Thomas at raytheon.com


DISCLAIMER: This message contains information that may be confidential and privileged. Unless you are the addressee (or authorized to receive mail for the addressee), you should not use, copy or disclose to anyone this message or any information contained in this message. If you have received this message in error, please so advise the sender by reply e-mail and delete this message. Thank you for your cooperation.

Any views or opinions presented are solely those of the author and do not necessarily represent those of Raytheon unless specifically stated. 
Electronic communications including email may be monitored by Raytheon
for operational or business reasons.




--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3486 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081127/8a2a11c4/attachment.bin>

From Harri.Paivaniemi at tietoenator.com  Thu Nov 27 04:38:42 2008
From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com)
Date: Thu, 27 Nov 2008 06:38:42 +0200
Subject: [Linux-cluster] qdisk
References: <492DD08E.9050607@gk-software.com>
Message-ID: <41E8D4F07FCE154CBEBAA60FFC92F67709FE91@apollo.eu.tieto.com>

When using qdisk in 2-node cluster,

you should have: two_node="0". The quorum vote becomes like a third node... like you can see in "cman_tool nodes"...

Have you?

-hjp



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Stepan Kadlec
Sent: Thu 11/27/2008 0:41
To: linux clustering
Subject: [Linux-cluster] qdisk
 
hi,
any idea why the qdiskd doesn't work? I am running fresh compilation of 
2.03.09 cman on 2.6.25 xen kernel on twonode cluster. without quorum, 
the cluster works fine, but the qdisk would be appreciated. the qorum 
disk is shared over iSCSI. the config looks like:

<?xml version="1.0"?>
<cluster name="xen" config_version="1">
   <cman expected_votes="1" two_node="1"/>
   <clusternodes>
    ...
   <clusternodes>

   <quorumd label="xen" votes="1">
   </quorumd>

   <fencedevices>
   </fencedevices>
</cluster>

when the cluster goes up and the quorum disc becomes online, it takes 
exactly 10 seconds until all cluster tools hang on execution (eg. 
clustat, cman_tool nodes, or the qdiskd) - all stacked in blocking 
cman_dispatch call (awaiting data with recv).

is there anyone having ever similar problem?

thanks steve.


-- 
Eurosoftware s.r.o.
skadlec at gk-software.com
+420 379 307 379
+420 724 554 104

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3186 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081127/701feaa0/attachment.bin>

From stepan.kadlec at gmail.com  Thu Nov 27 07:28:13 2008
From: stepan.kadlec at gmail.com (Stepan Kadlec)
Date: Thu, 27 Nov 2008 08:28:13 +0100
Subject: [Linux-cluster] qdisk
In-Reply-To: <41E8D4F07FCE154CBEBAA60FFC92F67709FE91@apollo.eu.tieto.com>
References: <492DD08E.9050607@gk-software.com>
	<41E8D4F07FCE154CBEBAA60FFC92F67709FE91@apollo.eu.tieto.com>
Message-ID: <492E4C0D.7000409@gmail.com>

oh sorry, I have pasted wrong configuration (the one that I use for 
non-qdisk configuration) - for testing the qdisk I am using the 
expected_votes="3" and two_node="0" like:

  <?xml version="1.0"?>
  <cluster name="xen" config_version="1">
     <cman expected_votes="3" two_node="0"/>
     <clusternodes>
      ...
     <clusternodes>

     <quorumd label="xen" votes="1">
     </quorumd>

     <fencedevices>
     </fencedevices>
  </cluster>

Harri.Paivaniemi at tietoenator.com wrote:
> When using qdisk in 2-node cluster,
> 
> you should have: two_node="0". The quorum vote becomes like a third node... like you can see in "cman_tool nodes"...
> 
> Have you?
> 
> -hjp
> 
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com on behalf of Stepan Kadlec
> Sent: Thu 11/27/2008 0:41
> To: linux clustering
> Subject: [Linux-cluster] qdisk
>  
> hi,
> any idea why the qdiskd doesn't work? I am running fresh compilation of 
> 2.03.09 cman on 2.6.25 xen kernel on twonode cluster. without quorum, 
> the cluster works fine, but the qdisk would be appreciated. the qorum 
> disk is shared over iSCSI. the config looks like:
> 
> <?xml version="1.0"?>
> <cluster name="xen" config_version="1">
>    <cman expected_votes="1" two_node="1"/>
>    <clusternodes>
>     ...
>    <clusternodes>
> 
>    <quorumd label="xen" votes="1">
>    </quorumd>
> 
>    <fencedevices>
>    </fencedevices>
> </cluster>
> 
> when the cluster goes up and the quorum disc becomes online, it takes 
> exactly 10 seconds until all cluster tools hang on execution (eg. 
> clustat, cman_tool nodes, or the qdiskd) - all stacked in blocking 
> cman_dispatch call (awaiting data with recv).
> 
> is there anyone having ever similar problem?
> 
> thanks steve.
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From d.vasilets at peterhost.ru  Thu Nov 27 09:32:23 2008
From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=)
Date: Thu, 27 Nov 2008 12:32:23 +0300
Subject: [Linux-cluster] Distributed RAID
In-Reply-To: <3CC18688-1CCA-4C61-8B15-458FE5DCD6BB@redhat.com>
References: <491275EE.8050508@auckland.ac.nz> <4912BAFA.2080901@bobich.net>
	<a01fe36d0811060740h5e27a84o3d4d14340838364e@mail.gmail.com>
	<EB166715-4704-4E9B-9E87-5119FE489541@redhat.com>
	<49135647.3020701@bobich.net>
	<940E48BD-8C04-4525-86AD-B4ED1998883E@redhat.com>
	<1226060556.12833.4.camel@marc>
	<639ce0480811070459l18a0071bia889bb56402f6ab9@mail.gmail.com>
	<322B81E0-5500-475F-B00B-4FE81C9D15B7@redhat.com>
	<1227173594.10020.4.camel@dima-desktop>
	<646C5AFC-F606-4AE0-8BC3-34B5D5593BFD@redhat.com>
	<1227516384.8167.1.camel@dima-desktop>
	<89DC73CC-A975-4BB6-9F43-F29B1D47FD7E@redhat.com>
	<1227702853.11626.1.camel@dima-desktop>
	<3CC18688-1CCA-4C61-8B15-458FE5DCD6BB@redhat.com>
Message-ID: <1227778343.7602.0.camel@dima-desktop>

where i can get this API ?

? ???, 26/11/2008 ? 10:45 -0600, Jonathan Brassow ?????:
> :(  You will need the latest RHEL5 kernel in order to compile  
> against.  Your 2.6.27.5-177 kernel does not have the log API changes  
> necessary to facilitate the log module.
> 
>   brassow
> 
> On Nov 26, 2008, at 6:34 AM, ??????? ??????? wrote:
> 
> > I use last source from git stable RHEL5
> > but, when i try compile cmirror kernel mudule i get syntax error
> >
> >
> > [root at node1 cmirror-kernel]# make
> > cd src && make all
> > make[1]: Entering directory `/home/test2/cluster/cmirror-kernel/src'
> > make -C /lib/modules/2.6.27.5-117.storage.x86_64/source
> > M=/home/test2/cluster/cmirror-kernel/src   modules USING_KBUILD=yes
> > make[2]: Entering directory
> > `/usr/src/kernels/2.6.27.5-117.storage.x86_64'
> >  CC [M]  /home/test2/cluster/cmirror-kernel/src/dm-clog.o
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
> > ?cluster_ctr?:
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:148: warning:  
> > format ?%
> > llu? expects type ?long long unsigned int?, but argument 3 has  
> > type
> > ?sector_t?
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
> > ?cluster_flush?:
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:417: warning:
> > comparison of distinct pointer types lacks a cast
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:419: warning:
> > comparison of distinct pointer types lacks a cast
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
> > ?cluster_mark_region?:
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:478: warning:
> > comparison of distinct pointer types lacks a cast
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:482: warning:
> > comparison of distinct pointer types lacks a cast
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c: In function
> > ?cluster_clear_region?:
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:517: warning:
> > comparison of distinct pointer types lacks a cast
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:521: warning:
> > comparison of distinct pointer types lacks a cast
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c: At top level:
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:709: error: unknown
> > field ?get_failure_response? specified in initializer
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:709: warning: excess
> > elements in struct initializer
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:709: warning: (near
> > initialization for ?_clustered_core_type?)
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:710: error: unknown
> > field ?is_remote_recovering? specified in initializer
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:710: warning: excess
> > elements in struct initializer
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:710: warning: (near
> > initialization for ?_clustered_core_type?)
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:731: error: unknown
> > field ?get_failure_response? specified in initializer
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:731: warning: excess
> > elements in struct initializer
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:731: warning: (near
> > initialization for ?_clustered_disk_type?)
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:732: error: unknown
> > field ?is_remote_recovering? specified in initializer
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:732: warning: excess
> > elements in struct initializer
> > /home/test2/cluster/cmirror-kernel/src/dm-clog.c:732: warning: (near
> > initialization for ?_clustered_disk_type?)
> > make[3]: *** [/home/test2/cluster/cmirror-kernel/src/dm-clog.o]  
> > Error 1
> > make[2]: *** [_module_/home/test2/cluster/cmirror-kernel/src] Error 2
> > make[2]: Leaving directory
> > `/usr/src/kernels/2.6.27.5-117.storage.x86_64'
> > make[1]: *** [all] Error 2
> > make[1]: Leaving directory `/home/test2/cluster/cmirror-kernel/src'
> > make: *** [all] Error 2
> >
> > Thank you
> >
> > ? ???, 25/11/2008 ? 14:23 -0600, Jonathan Brassow ?????:
> >> You do not have the module installed (or in the right place?).
> >>
> >> You need the cmirror and cmirror-kmod packages.  If you don't have
> >> access to them, and would like to compile them for yourself, then you
> >> will need to check-out the cluster tree on the RHEL5 or RHEL53  
> >> branch.
> >>
> >>  brassow
> >>
> >> On Nov 24, 2008, at 2:46 AM, ??????? ??????? wrote:
> >>
> >>> Hello
> >>> This is /var/log/messages
> >>> "Nov 20 07:57:14 recipient1 kernel: device-mapper: dirty region log:
> >>> Module for logging type "clustered-core" not found.
> >>> Nov 20 07:57:14 recipient1 kernel: device-mapper: table: 253:2:
> >>> mirror:
> >>> Error creating mirror dirty log
> >>> Nov 20 07:57:14 recipient1 kernel: device-mapper: ioctl: error  
> >>> adding
> >>> target to table
> >>> Nov 20 08:43:37 recipient1 kernel: device-mapper: dirty region log:
> >>> Module for logging type "clustered-core" not found.
> >>> Nov 20 08:43:37 recipient1 kernel: device-mapper: table: 253:2:
> >>> mirror:
> >>> Error creating mirror dirty log
> >>> Nov 20 08:43:37 recipient1 kernel: device-mapper: ioctl: error  
> >>> adding
> >>> target to table
> >>> Nov 20 08:47:05 recipient1 kernel: device-mapper: dirty region log:
> >>> Module for logging type "clustered-core" not found.
> >>> Nov 20 08:47:05 recipient1 kernel: device-mapper: table: 253:2:
> >>> mirror:
> >>> Error creating mirror dirty log
> >>> Nov 20 08:47:05 recipient1 kernel: device-mapper: ioctl: error  
> >>> adding
> >>> target to table
> >>> "
> >>>
> >>> ? ???, 21/11/2008 ? 11:51 -0600, Jonathan Brassow ?????:
> >>>> What does /var/log/messages say?
> >>>>
> >>>> brassow
> >>>>
> >>>> On Nov 20, 2008, at 3:33 AM, ??????? ??????? wrote:
> >>>>
> >>>>> hello
> >>>>> i use lvm2  2.02.39
> >>>>> when i write  "lvcreate  -L 60G -m1 --corelog  -n mirror1 volg1"
> >>>>> i get error
> >>>>> "Error locking on node recipient1: device-mapper: reload ioctl
> >>>>> failed:
> >>>>> Invalid argument
> >>>>> Error locking on node recipient2: device-mapper: reload ioctl
> >>>>> failed:Invalid argument
> >>>>> Aborting. Failed to activate new LV to wipe the start of it."
> >>>>>
> >>>>> i must update lvm2 or other ?
> >>>>>
> >>>>> ? ???, 11/11/2008 ? 20:01 -0600, Jonathan Brassow ?????:
> >>>>>> Sure.  In fact, if you have access to the red hat 5.3 beta, it is
> >>>>>> ready there.
> >>>>>>
> >>>>>>
> >>>>>> brassow
> >>>>>>
> >>>>>> On Nov 7, 2008, at 6:59 AM, pronix pronix wrote:
> >>>>>>
> >>>>>>> can i use cluster raid1 if i get development release from
> >>>>>>> sources.redhat.com/cluster ?
> >>>>>>>
> >>>>>>>
> >>>>>>> 2008/11/7 Marc - A. Dahlhaus [ Administration | Westermann  
> >>>>>>> GmbH ]
> >>>>>>> <mad at wol.de>
> >>>>>>>      Hello,
> >>>>>>>
> >>>>>>>
> >>>>>>>      will the changes to mirroring get merged into stable2 and
> >>>>>>>      head after
> >>>>>>>      RHEL-5.3 release?
> >>>>>>>
> >>>>>>>
> >>>>>>>      Marc
> >>>>>>>
> >>>>>>>      Am Donnerstag, den 06.11.2008, 16:34 -0600 schrieb Jonathan
> >>>>>>>      Brassow:
> >>>>>>>
> >>>>>>>> that works already.
> >>>>>>>>
> >>>>>>>> single machine: linear, stripe, mirror, snapshot
> >>>>>>>> cluster-aware: linear, stripe, mirror (5.3)
> >>>>>>>>
> >>>>>>>> brassow
> >>>>>>>>
> >>>>>>>> On Nov 6, 2008, at 2:40 PM, Gordan Bobic wrote:
> >>>>>>>>
> >>>>>>>>> What about CLVM based striping (RAID0)? Does that work
> >>>>>>>      already or is
> >>>>>>>>> it planned for the near future?
> >>>>>>>>>
> >>>>>>>>> Gordan
> >>>>>>>>>
> >>>>>>>>> Jonathan Brassow wrote:
> >>>>>>>>>> Cluster mirror (RAID1) will be available in rhel5.3 for
> >>>>>>>      LVM.
> >>>>>>>>>> brassow
> >>>>>>>>>> On Nov 6, 2008, at 9:40 AM, Federico Simoncelli wrote:
> >>>>>>>>>>> On Thu, Nov 6, 2008 at 10:38 AM, Gordan Bobic
> >>>>>>>      <gordan at bobich.net>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> I rather doubt md will become cluster aware any time
> >>>>>>>      soon. CLVM
> >>>>>>>>>>>> doesn't yet
> >>>>>>>>>>>> support even more important features like
> >>>>>>>      snapshotting, so I
> >>>>>>>>>>>> wouldn't count
> >>>>>>>>>>>> on it supporting anything more advanced.
> >>>>>>>>>>>
> >>>>>>>>>>> I worked a little on clvm snapshots:
> >>>>>>>>>>>
> >>>>>>>      https://www.redhat.com/archives/linux-lvm/2008-October/msg00027.html
> >>>>>>>>>>>
> >>>>>>>>>>> Review and testing is required.
> >>>>>>>>>>> --
> >>>>>>>>>>> Federico.
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>> --
> >>>>>>>>>> Linux-cluster mailing list
> >>>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Linux-cluster mailing list
> >>>>>>>>> Linux-cluster at redhat.com
> >>>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Linux-cluster mailing list
> >>>>>>>> Linux-cluster at redhat.com
> >>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>
> >>>>>>>
> >>>>>>>      --
> >>>>>>>
> >>>>>>>      Linux-cluster mailing list
> >>>>>>>      Linux-cluster at redhat.com
> >>>>>>>      https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Linux-cluster mailing list
> >>>>>>> Linux-cluster at redhat.com
> >>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Linux-cluster mailing list
> >>>>>> Linux-cluster at redhat.com
> >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>>
> >>>>> --
> >>>>> Linux-cluster mailing list
> >>>>> Linux-cluster at redhat.com
> >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>
> >>>>
> >>>> --
> >>>> Linux-cluster mailing list
> >>>> Linux-cluster at redhat.com
> >>>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>>
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From bpkroth at gmail.com  Thu Nov 27 14:57:50 2008
From: bpkroth at gmail.com (Brian Kroth)
Date: Thu, 27 Nov 2008 08:57:50 -0600
Subject: [Linux-cluster] heartbeat to rgmanager design question
Message-ID: <20081127145749.GB5577@gmail.com>

Hello all,

I've been using Heartbeat in past to do resource failover with the
following scheme:

1) Each node in the cluster runs a dummy monitoring resource agent as a
clone.  This resource agent monitors the health of a service on the node
using whatever rules one wants to write into it.  For instance, make
sure the service is not in maintenance mode, mysql is running, queries
return timely, and replication is up to date.  If all the checks pass it
uses attrd_updater to set an attribute for that service on the node to
1.  Else, it is set to 0.  Note that this resource agent in no way
affects the service it is monitoring.

2) The cluster configuration uses the attributes for each of the
monitored services to generate a score for the machine.  The machine
with the highest score gets to host the virtual ip for that service.

This scheme allows one to, for instance, touch a file on a machine that
will signify that it's in maintenance mode.  The service ip would then
be moved to another node, leaving one to test out the service on the
machine's management ip without removing it from the cluster itself
which would cause a lack of gfs access.  It also provides for more
granular monitoring of each service.

I want to know how I would configure rgmanager with something similar to
this - to have resource agents that continually monitor the status of a
service on each node and then move service IPs accordingly.  

I see that one can write their own agents, but I don't see a scoring
scheme anywhere.  My concern is that if I simply write an agent to
monitor a service and have an ip depend upon the return code of that
monitoring agent the service would not ever be failed back to the
original host.

Does this make sense?

Thanks,
Brian



From Shaun.Mccullagh at espritxb.nl  Thu Nov 27 17:08:24 2008
From: Shaun.Mccullagh at espritxb.nl (Shaun Mccullagh)
Date: Thu, 27 Nov 2008 18:08:24 +0100
Subject: [Linux-cluster] Mounting GFS and EXT3 filesystems on one GFS
	Cluster Node (RHEL 5.2)
Message-ID: <F46232AE0350EF4DA5AB5376AB2D138807BAF6@XB-EXCHANGE.xb.local>

Hi,

I'm presently setting up an 8 node GFS cluster.

All nodes bar one will mount two GFS files systems from a SAN.

We will used lvm2-cluster on all nodes. All the GFS partitions will use
LVM.

I've defined locking_type = 2 and locking_library =
"liblvm2clusterlock.so" in /etc/lvm/lvm.conf on all nodes.

My question is: Can I mount a VDISK from the SAN as a block device with
an ext3 filesystem on it and mount GFS partitions at the same time?
Given that LVM will be configured on this node with locking_type = 2 and
locking_library = "liblvm2clusterlock.so"

The filesystems will be mounted on unique mount points on the host.

Can I also mount yet another VDISK on the same node this time ext3 on
LVM?

I would emphasize that only one node will mount the ext3 filesystems.

I guess this is permissable but I wanted to check.


Many thanks ...

Shaun
 
 




Op dit e-mailbericht is een disclaimer van toepassing, welke te vinden is op http://www.espritxb.nl/disclaimer




From robejrm at gmail.com  Thu Nov 27 17:57:12 2008
From: robejrm at gmail.com (Juan Ramon Martin Blanco)
Date: Thu, 27 Nov 2008 18:57:12 +0100
Subject: [Linux-cluster] Mounting GFS and EXT3 filesystems on one GFS
	Cluster Node (RHEL 5.2)
In-Reply-To: <F46232AE0350EF4DA5AB5376AB2D138807BAF6@XB-EXCHANGE.xb.local>
References: <F46232AE0350EF4DA5AB5376AB2D138807BAF6@XB-EXCHANGE.xb.local>
Message-ID: <8a5668960811270957o498697ddl94fb5b658e7c3c69@mail.gmail.com>

On Thu, Nov 27, 2008 at 6:08 PM, Shaun Mccullagh <
Shaun.Mccullagh at espritxb.nl> wrote:

> Hi,
>
Hi!

>
> I'm presently setting up an 8 node GFS cluster.
>
> All nodes bar one will mount two GFS files systems from a SAN.
>
> We will used lvm2-cluster on all nodes. All the GFS partitions will use
> LVM.
>
> I've defined locking_type = 2 and locking_library =
> "liblvm2clusterlock.so" in /etc/lvm/lvm.conf on all nodes.
>
> My question is: Can I mount a VDISK from the SAN as a block device with
> an ext3 filesystem on it and mount GFS partitions at the same time?
> Given that LVM will be configured on this node with locking_type = 2 and
> locking_library = "liblvm2clusterlock.so"
>
> The filesystems will be mounted on unique mount points on the host.
>
> Can I also mount yet another VDISK on the same node this time ext3 on
> LVM?
>
Yes, why not? Just define the volume group you create with that VDISK as
non-clustered. All in all, the VDISK should be visible only to "the one"
node so the others won't notice there is some lvm stuff there.


>
> I would emphasize that only one node will mount the ext3 filesystems.
>
> I guess this is permissable but I wanted to check.
>
>
> Many thanks ...
>
You're welcome,

Juanra

>
> Shaun
>
>
>
>
>
>
> Op dit e-mailbericht is een disclaimer van toepassing, welke te vinden is
> op http://www.espritxb.nl/disclaimer
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081127/1b449527/attachment.htm>

From kiss.zoltan at bardiauto.hu  Thu Nov 27 20:49:10 2008
From: kiss.zoltan at bardiauto.hu (=?iso-8859-2?B?S2lzcyBab2x04W4=?=)
Date: Thu, 27 Nov 2008 21:49:10 +0100
Subject: [Linux-cluster] Using dlm in my own applications
Message-ID: <eaa00ca6db61624c830b8b7de3871828@mail.bardiauto.hu>

Hello,

 

I have my own application and i would like to cluster it. I tested it on
many paralell filesystems, but the database byte range locking
performance is very slow (15-20 times slower than a local fs).

After a month of testing, i figured it out - with the help of
linux-cluster irc channel members - i must use the dlm directly to speed
up things, and bypass openais.

Can anybody help me, how can i write the dlm locking code to my
application? I tried it with the examples, the manuals, and i can't
understand it clearly how can i write the fcntl() lock fuction with the
DLM api. Any suggestion, or a little example of code are welcome.

 

Thanks for your help!

 

Best regards,

 

Zoltan Kiss

System Administrator

B?rdi Aut? Zrt.

zoltan.kiss at bardiauto.hu

 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081127/a09870a4/attachment.htm>

From beekhof at gmail.com  Fri Nov 28 07:50:24 2008
From: beekhof at gmail.com (Andrew Beekhof)
Date: Fri, 28 Nov 2008 08:50:24 +0100
Subject: [Linux-cluster] heartbeat to rgmanager design question
In-Reply-To: <20081127145749.GB5577@gmail.com>
References: <20081127145749.GB5577@gmail.com>
Message-ID: <26ef5e70811272350w7e236600w2bea3a01fc09ec30@mail.gmail.com>

On Thu, Nov 27, 2008 at 15:57, Brian Kroth <bpkroth at gmail.com> wrote:
> Hello all,
>
> I've been using Heartbeat in past to do resource failover with the
> following scheme:
>
> 1) Each node in the cluster runs a dummy monitoring resource agent as a
> clone.  This resource agent monitors the health of a service on the node
> using whatever rules one wants to write into it.  For instance, make
> sure the service is not in maintenance mode, mysql is running, queries
> return timely, and replication is up to date.  If all the checks pass it
> uses attrd_updater to set an attribute for that service on the node to
> 1.  Else, it is set to 0.  Note that this resource agent in no way
> affects the service it is monitoring.
>
> 2) The cluster configuration uses the attributes for each of the
> monitored services to generate a score for the machine.  The machine
> with the highest score gets to host the virtual ip for that service.
>
> This scheme allows one to, for instance, touch a file on a machine that
> will signify that it's in maintenance mode.  The service ip would then
> be moved to another node, leaving one to test out the service on the
> machine's management ip without removing it from the cluster itself
> which would cause a lack of gfs access.  It also provides for more
> granular monitoring of each service.
>
> I want to know how I would configure rgmanager with something similar to
> this - to have resource agents that continually monitor the status of a
> service on each node and then move service IPs accordingly.

Just out of interest, where did the rgmanager requirement come from?

<blatant-advertisement>
The Heartbeat resource manager also runs on OpenAIS now which, IIRC,
is what rgmanager uses... so, in theory, it can manage anything
rgmanager can.
</blatant-advertisement>

>
> I see that one can write their own agents, but I don't see a scoring
> scheme anywhere.  My concern is that if I simply write an agent to
> monitor a service and have an ip depend upon the return code of that
> monitoring agent the service would not ever be failed back to the
> original host.
>
> Does this make sense?
>
> Thanks,
> Brian
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From bpkroth at gmail.com  Fri Nov 28 17:34:15 2008
From: bpkroth at gmail.com (Brian Kroth)
Date: Fri, 28 Nov 2008 11:34:15 -0600
Subject: [Linux-cluster] heartbeat to rgmanager design question
In-Reply-To: <26ef5e70811272350w7e236600w2bea3a01fc09ec30@mail.gmail.com>
References: <20081127145749.GB5577@gmail.com>
	<26ef5e70811272350w7e236600w2bea3a01fc09ec30@mail.gmail.com>
Message-ID: <20081128173414.GC5577@gmail.com>

Andrew Beekhof <beekhof at gmail.com> 2008-11-28 08:50:
> On Thu, Nov 27, 2008 at 15:57, Brian Kroth <bpkroth at gmail.com> wrote:
> > Hello all,
> >
> > I've been using Heartbeat in past to do resource failover with the
> > following scheme:
> >
> > 1) Each node in the cluster runs a dummy monitoring resource agent as a
> > clone.  This resource agent monitors the health of a service on the node
> > using whatever rules one wants to write into it.  For instance, make
> > sure the service is not in maintenance mode, mysql is running, queries
> > return timely, and replication is up to date.  If all the checks pass it
> > uses attrd_updater to set an attribute for that service on the node to
> > 1.  Else, it is set to 0.  Note that this resource agent in no way
> > affects the service it is monitoring.
> >
> > 2) The cluster configuration uses the attributes for each of the
> > monitored services to generate a score for the machine.  The machine
> > with the highest score gets to host the virtual ip for that service.
> >
> > This scheme allows one to, for instance, touch a file on a machine that
> > will signify that it's in maintenance mode.  The service ip would then
> > be moved to another node, leaving one to test out the service on the
> > machine's management ip without removing it from the cluster itself
> > which would cause a lack of gfs access.  It also provides for more
> > granular monitoring of each service.
> >
> > I want to know how I would configure rgmanager with something similar to
> > this - to have resource agents that continually monitor the status of a
> > service on each node and then move service IPs accordingly.
> 
> Just out of interest, where did the rgmanager requirement come from?
> 
> <blatant-advertisement>
> The Heartbeat resource manager also runs on OpenAIS now which, IIRC,
> is what rgmanager uses... so, in theory, it can manage anything
> rgmanager can.
> </blatant-advertisement>

I think that's still in development, but I may be wrong.  If it's the
case I'll look back into it.  rgmanager is not a requirement.  I'm just
curious if I can get by running one cluster stack instead of two.

It occurred to me last night that I might be able to cook something up
that used ccs_tool to adjust the priorities of a node in a failover
domain.  I'll report back if this works out.  I haven't done any further
research yet.

> > I see that one can write their own agents, but I don't see a scoring
> > scheme anywhere.  My concern is that if I simply write an agent to
> > monitor a service and have an ip depend upon the return code of that
> > monitoring agent the service would not ever be failed back to the
> > original host.
> >
> > Does this make sense?
> >
> > Thanks,
> > Brian
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster



From beekhof at gmail.com  Sat Nov 29 14:37:10 2008
From: beekhof at gmail.com (Andrew Beekhof)
Date: Sat, 29 Nov 2008 15:37:10 +0100
Subject: [Linux-cluster] heartbeat to rgmanager design question
In-Reply-To: <20081128173414.GC5577@gmail.com>
References: <20081127145749.GB5577@gmail.com>
	<26ef5e70811272350w7e236600w2bea3a01fc09ec30@mail.gmail.com>
	<20081128173414.GC5577@gmail.com>
Message-ID: <26ef5e70811290637p1c3d327br5ab9671b91ea8fc5@mail.gmail.com>

On Fri, Nov 28, 2008 at 18:34, Brian Kroth <bpkroth at gmail.com> wrote:
> Andrew Beekhof <beekhof at gmail.com> 2008-11-28 08:50:
>> On Thu, Nov 27, 2008 at 15:57, Brian Kroth <bpkroth at gmail.com> wrote:
>> > Hello all,
>> >
>> > I've been using Heartbeat in past to do resource failover with the
>> > following scheme:
>> >
>> > 1) Each node in the cluster runs a dummy monitoring resource agent as a
>> > clone.  This resource agent monitors the health of a service on the node
>> > using whatever rules one wants to write into it.  For instance, make
>> > sure the service is not in maintenance mode, mysql is running, queries
>> > return timely, and replication is up to date.  If all the checks pass it
>> > uses attrd_updater to set an attribute for that service on the node to
>> > 1.  Else, it is set to 0.  Note that this resource agent in no way
>> > affects the service it is monitoring.
>> >
>> > 2) The cluster configuration uses the attributes for each of the
>> > monitored services to generate a score for the machine.  The machine
>> > with the highest score gets to host the virtual ip for that service.
>> >
>> > This scheme allows one to, for instance, touch a file on a machine that
>> > will signify that it's in maintenance mode.  The service ip would then
>> > be moved to another node, leaving one to test out the service on the
>> > machine's management ip without removing it from the cluster itself
>> > which would cause a lack of gfs access.  It also provides for more
>> > granular monitoring of each service.
>> >
>> > I want to know how I would configure rgmanager with something similar to
>> > this - to have resource agents that continually monitor the status of a
>> > service on each node and then move service IPs accordingly.
>>
>> Just out of interest, where did the rgmanager requirement come from?
>>
>> <blatant-advertisement>
>> The Heartbeat resource manager also runs on OpenAIS now which, IIRC,
>> is what rgmanager uses... so, in theory, it can manage anything
>> rgmanager can.
>> </blatant-advertisement>
>
> I think that's still in development, but I may be wrong.

Its available and fully-functional right now (I happen to be the lead
developer :-)

http://clusterlabs.org has all the info.



From Shaun.Mccullagh at espritxb.nl  Sun Nov 30 12:01:09 2008
From: Shaun.Mccullagh at espritxb.nl (Shaun Mccullagh)
Date: Sun, 30 Nov 2008 13:01:09 +0100
Subject: [Linux-cluster] GFS2 on CentOS v5.2
In-Reply-To: <eaa00ca6db61624c830b8b7de3871828@mail.bardiauto.hu>
References: <eaa00ca6db61624c830b8b7de3871828@mail.bardiauto.hu>
Message-ID: <F46232AE0350EF4DA5AB5376AB2D138807BBD4@XB-EXCHANGE.xb.local>

Hi,

I'm setting a GFS cluster on CentOS v5.2 with latest rpms.

I think gfs2 is still at the Technology Preview level.

Does this mean I should use gfs in production at the moment?

Thanks

Shaun





Op dit e-mailbericht is een disclaimer van toepassing, welke te vinden is op http://www.espritxb.nl/disclaimer




From jos at xos.nl  Sun Nov 30 16:34:07 2008
From: jos at xos.nl (Jos Vos)
Date: Sun, 30 Nov 2008 17:34:07 +0100
Subject: [Linux-cluster] GFS2 on CentOS v5.2
In-Reply-To: <F46232AE0350EF4DA5AB5376AB2D138807BBD4@XB-EXCHANGE.xb.local>
References: <eaa00ca6db61624c830b8b7de3871828@mail.bardiauto.hu>
	<F46232AE0350EF4DA5AB5376AB2D138807BBD4@XB-EXCHANGE.xb.local>
Message-ID: <20081130163407.GA15616@jasmine.xos.nl>

On Sun, Nov 30, 2008 at 01:01:09PM +0100, Shaun Mccullagh wrote:

> I'm setting a GFS cluster on CentOS v5.2 with latest rpms.
> 
> I think gfs2 is still at the Technology Preview level.
> 
> Does this mean I should use gfs in production at the moment?

Yes, this is true for RHEL 5.2, but GFS2 is will be changed to
production level in RHEL 5.3 (now in beta).

-- 
--    Jos Vos <jos at xos.nl>
--    X/OS Experts in Open Systems BV   |   Phone: +31 20 6938364
--    Amsterdam, The Netherlands        |     Fax: +31 20 6948204



From mpartio at gmail.com  Sun Nov 30 17:12:07 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Sun, 30 Nov 2008 19:12:07 +0200
Subject: [Linux-cluster] GFS2 on CentOS v5.2
In-Reply-To: <20081130163407.GA15616@jasmine.xos.nl>
References: <eaa00ca6db61624c830b8b7de3871828@mail.bardiauto.hu>
	<F46232AE0350EF4DA5AB5376AB2D138807BBD4@XB-EXCHANGE.xb.local>
	<20081130163407.GA15616@jasmine.xos.nl>
Message-ID: <2ca799770811300912y2813bb6cv339ae169588a5b7d@mail.gmail.com>

On Sun, Nov 30, 2008 at 6:34 PM, Jos Vos <jos at xos.nl> wrote:

> On Sun, Nov 30, 2008 at 01:01:09PM +0100, Shaun Mccullagh wrote:
>
> > I'm setting a GFS cluster on CentOS v5.2 with latest rpms.
> >
> > I think gfs2 is still at the Technology Preview level.
> >
> > Does this mean I should use gfs in production at the moment?
>
> Yes, this is true for RHEL 5.2, but GFS2 is will be changed to
> production level in RHEL 5.3 (now in beta).


There has to be some drastic changes in GFS2 between 5.2 and 5.3 since I
still haven't seen a single post in this list stating that GFS2 in 5.2 (or
previous RHEL versions) is working for them. :)

Regards

Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081130/055efdc4/attachment.htm>