From xavier.montagutelli at unilim.fr  Wed Dec  1 08:42:51 2010
From: xavier.montagutelli at unilim.fr (Xavier Montagutelli)
Date: Wed, 1 Dec 2010 09:42:51 +0100
Subject: [Linux-cluster] cluster without fencing device
In-Reply-To: <4CF3BC18.4020405@alteeve.com>
References: <AANLkTimXZazDxfUYyYB6vtSev8Sq_i2+Y16g69NOLoCz@mail.gmail.com>
	<4CF3BC18.4020405@alteeve.com>
Message-ID: <201012010942.51213.xavier.montagutelli@unilim.fr>

On Monday 29 November 2010 15:43:36 Digimer wrote:
> On 11/29/2010 03:42 AM, Mohamed Arif Khan wrote:
> > How to configure cluster without fencing device ?
> 
> In RHCS, it is not possible.
> 
> http://wiki.alteeve.com/index.php/Red_Hat_Cluster_Service_3_Tutorial#Concep
> t.3B_Fencing
> 

I suppose you can create a "fake" fence device which responds "ok" (/bin/true 
?). But you are warned, you will live in a dangerous, unsupported configuration 
;-)

-- 
Xavier Montagutelli                      Tel : +33 (0)5 55 45 77 20
Service Commun Informatique              Fax : +33 (0)5 55 45 75 95
Universite de Limoges
123, avenue Albert Thomas
87060 Limoges cedex



From laszlo at beres.me  Wed Dec  1 12:45:01 2010
From: laszlo at beres.me (Laszlo Beres)
Date: Wed, 1 Dec 2010 13:45:01 +0100
Subject: [Linux-cluster] OT: where is the wiki?
Message-ID: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>

Hi,

just recognized that http://sources.redhat.com/cluster/wiki/ does not
exist anymore. Is there a new location?

Regards,

-- 
L?szl? B?res? ? ? ? ? ? Unix system engineer
http://www.google.com/profiles/beres.laszlo



From bmr at redhat.com  Wed Dec  1 13:30:15 2010
From: bmr at redhat.com (Bryn M. Reeves)
Date: Wed, 01 Dec 2010 13:30:15 +0000
Subject: [Linux-cluster] OT: where is the wiki?
In-Reply-To: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>
References: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>
Message-ID: <4CF64DE7.6010101@redhat.com>

On 12/01/2010 12:45 PM, Laszlo Beres wrote:
> Hi,
> 
> just recognized that http://sources.redhat.com/cluster/wiki/ does not
> exist anymore. Is there a new location?
> 
> Regards,
> 

It's working for me (also via the redirect from http://sourceware.org/cluster).

Are you seeing an error loading the page?

Cheers,
Bryn.



From fdinitto at redhat.com  Wed Dec  1 13:34:31 2010
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 01 Dec 2010 14:34:31 +0100
Subject: [Linux-cluster] OT: where is the wiki?
In-Reply-To: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>
References: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>
Message-ID: <4CF64EE7.1010200@redhat.com>

On 12/1/2010 1:45 PM, Laszlo Beres wrote:
> Hi,
> 
> just recognized that http://sources.redhat.com/cluster/wiki/ does not
> exist anymore. Is there a new location?
> 
> Regards,
> 

Sorry? what do you mean it doesn?t exist....?

I just opened it after reading this email..

Fabio



From thomas at sjolshagen.net  Wed Dec  1 13:56:59 2010
From: thomas at sjolshagen.net (Thomas Sjolshagen)
Date: Wed, 01 Dec 2010 08:56:59 -0500
Subject: [Linux-cluster] =?utf-8?q?OT=3A_where_is_the_wiki=3F?=
In-Reply-To: <4CF64EE7.1010200@redhat.com>
References: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>
	<4CF64EE7.1010200@redhat.com>
Message-ID: <ddfd6eb4c9a4960c84a180391c5bb032@www.sjolshagen.net>

 On Wed, 01 Dec 2010 14:34:31 +0100, "Fabio M. Di Nitto" 
 <fdinitto at redhat.com> wrote:
 [SNIP]
>
> Sorry? what do you mean it doesn?t exist....?
>
> I just opened it after reading this email..
>
> Fabio
>

 Attached is what I see.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster-wiki-page-404.png
Type: image/png
Size: 186313 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101201/4abb7e33/attachment.png>

From linko22 at gmail.com  Wed Dec  1 14:01:48 2010
From: linko22 at gmail.com (Lynx Ginger)
Date: Wed, 1 Dec 2010 17:01:48 +0300
Subject: [Linux-cluster] OT: where is the wiki?
In-Reply-To: <4CF64EE7.1010200@redhat.com>
References: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>
	<4CF64EE7.1010200@redhat.com>
Message-ID: <AANLkTikMfH8VUPVApYnLREObDZ81sQmhaZqD8ZgRJ39-@mail.gmail.com>

404 - not found.

2010/12/1 Fabio M. Di Nitto <fdinitto at redhat.com>

> On 12/1/2010 1:45 PM, Laszlo Beres wrote:
> > Hi,
> >
> > just recognized that http://sources.redhat.com/cluster/wiki/ does not
> > exist anymore. Is there a new location?
> >
> > Regards,
> >
>
> Sorry? what do you mean it doesn?t exist....?
>
> I just opened it after reading this email..
>
> Fabio
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101201/bee94d1c/attachment.htm>

From oheinz at fbihome.de  Wed Dec  1 14:04:52 2010
From: oheinz at fbihome.de (Oliver Heinz)
Date: Wed, 1 Dec 2010 15:04:52 +0100
Subject: [Linux-cluster] OT: where is the wiki?
In-Reply-To: <4CF64EE7.1010200@redhat.com>
References: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>
	<4CF64EE7.1010200@redhat.com>
Message-ID: <201012011504.52518.oheinz@fbihome.de>

Am Mittwoch, 1. Dezember 2010, um 14:34:31 schrieb Fabio M. Di Nitto:
> On 12/1/2010 1:45 PM, Laszlo Beres wrote:
> > Hi,
> > 
> > just recognized that http://sources.redhat.com/cluster/wiki/ does not
> > exist anymore. Is there a new location?
> > 
> > Regards,
> 
> Sorry? what do you mean it doesn?t exist....?

I get a 404:

Page Not Found (404)

Sorry! The page you are looking for has been moved or no longer exists. You 
may search for it, or try looking in one of these areas:


Oliver


> 
> I just opened it after reading this email..
> 
> Fabio



From crosa at redhat.com  Wed Dec  1 14:11:33 2010
From: crosa at redhat.com (Cleber Rosa)
Date: Wed, 01 Dec 2010 12:11:33 -0200
Subject: [Linux-cluster] OT: where is the wiki?
In-Reply-To: <AANLkTikMfH8VUPVApYnLREObDZ81sQmhaZqD8ZgRJ39-@mail.gmail.com>
References: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>	<4CF64EE7.1010200@redhat.com>
	<AANLkTikMfH8VUPVApYnLREObDZ81sQmhaZqD8ZgRJ39-@mail.gmail.com>
Message-ID: <4CF65795.9090401@redhat.com>

Works from inside (our firewall, via VPN, etc). Does *not* work from 
outside.

On 12/01/2010 12:01 PM, Lynx Ginger wrote:
> 404 - not found.
>
> 2010/12/1 Fabio M. Di Nitto <fdinitto at redhat.com 
> <mailto:fdinitto at redhat.com>>
>
>     On 12/1/2010 1:45 PM, Laszlo Beres wrote:
>     > Hi,
>     >
>     > just recognized that http://sources.redhat.com/cluster/wiki/
>     does not
>     > exist anymore. Is there a new location?
>     >
>     > Regards,
>     >
>
>     Sorry? what do you mean it doesn?t exist....?
>
>     I just opened it after reading this email..
>
>     Fabio
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101201/861c1d03/attachment.htm>

From cos at aaaaa.org  Wed Dec  1 14:12:07 2010
From: cos at aaaaa.org (Ofer Inbar)
Date: Wed, 1 Dec 2010 09:12:07 -0500
Subject: [Linux-cluster] OT: where is the wiki?
In-Reply-To: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>
References: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>
Message-ID: <20101201141207.GZ18254@mip.aaaaa.org>

Laszlo Beres <laszlo at beres.me> wrote:
> just recognized that http://sources.redhat.com/cluster/wiki/ does not
> exist anymore. Is there a new location?

The new location appears to be: http://sourceware.org/cluster/wiki

Unfortunately http://sourceware.org/cluster/ redirects to redhat.com
which gives the 404.  But if you add /wiki/ you get the wiki.
  -- Cos



From arif4linux at gmail.com  Wed Dec  1 14:17:13 2010
From: arif4linux at gmail.com (Mohamed Arif Khan)
Date: Wed, 1 Dec 2010 19:47:13 +0530
Subject: [Linux-cluster] cluster without fencing device
Message-ID: <AANLkTinCNz2Z1rj1WoT_kZMgoxp8tpYvROXrKAZn6tcM@mail.gmail.com>

Thanks for reply

Can we configure cluster without shared storage, means can we make
replicated database on individual nodes ?

-- 
*Thanks & Regards*
*M.Arif Khan*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101201/65c28543/attachment.htm>

From thomas at sjolshagen.net  Wed Dec  1 14:27:46 2010
From: thomas at sjolshagen.net (Thomas Sjolshagen)
Date: Wed, 01 Dec 2010 09:27:46 -0500
Subject: [Linux-cluster] cluster without fencing device
In-Reply-To: <AANLkTinCNz2Z1rj1WoT_kZMgoxp8tpYvROXrKAZn6tcM@mail.gmail.com>
References: <AANLkTinCNz2Z1rj1WoT_kZMgoxp8tpYvROXrKAZn6tcM@mail.gmail.com>
Message-ID: <6be5cdc415f018b0e5f0b3378be27193@www.sjolshagen.net>



On Wed, 1 Dec 2010 19:47:13 +0530, Mohamed Arif Khan  wrote: 


Thanks for reply

Can we configure cluster without shared storage,
means can we make replicated database on individual nodes ? 


Absolutely, but why would you even need to use the cluster stack if
your only purpose is to have a group (cluster) of DB servers that
replicate between them? If you've got DB replication configured and
using a DB proxy, you'll get the same result with (much) less overhead -
imho - in terms of system management overhead, etc. 

// Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101201/5eb01806/attachment.htm>

From bmr at redhat.com  Wed Dec  1 14:33:12 2010
From: bmr at redhat.com (Bryn M. Reeves)
Date: Wed, 01 Dec 2010 14:33:12 +0000
Subject: [Linux-cluster] OT: where is the wiki?
In-Reply-To: <4CF65795.9090401@redhat.com>
References: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>	<4CF64EE7.1010200@redhat.com>	<AANLkTikMfH8VUPVApYnLREObDZ81sQmhaZqD8ZgRJ39-@mail.gmail.com>
	<4CF65795.9090401@redhat.com>
Message-ID: <4CF65CA8.10701@redhat.com>

On 12/01/2010 02:11 PM, Cleber Rosa wrote:
>   Works from inside (our firewall, via VPN, etc). Does *not* work from outside.

Confirmed; same failure via my 3G provider.

Regards,
Bryn.



From linux at alteeve.com  Wed Dec  1 15:38:33 2010
From: linux at alteeve.com (Digimer)
Date: Wed, 01 Dec 2010 10:38:33 -0500
Subject: [Linux-cluster] cluster without fencing device
In-Reply-To: <201012010942.51213.xavier.montagutelli@unilim.fr>
References: <AANLkTimXZazDxfUYyYB6vtSev8Sq_i2+Y16g69NOLoCz@mail.gmail.com>
	<4CF3BC18.4020405@alteeve.com>
	<201012010942.51213.xavier.montagutelli@unilim.fr>
Message-ID: <4CF66BF9.5050100@alteeve.com>

On 12/01/2010 03:42 AM, Xavier Montagutelli wrote:
> On Monday 29 November 2010 15:43:36 Digimer wrote:
>> On 11/29/2010 03:42 AM, Mohamed Arif Khan wrote:
>>> How to configure cluster without fencing device ?
>>
>> In RHCS, it is not possible.
>>
>> http://wiki.alteeve.com/index.php/Red_Hat_Cluster_Service_3_Tutorial#Concep
>> t.3B_Fencing
>>
>
> I suppose you can create a "fake" fence device which responds "ok" (/bin/true
> ?). But you are warned, you will live in a dangerous, unsupported configuration
> ;-)

That is exceedingly unwise.

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org



From linux at alteeve.com  Wed Dec  1 20:12:18 2010
From: linux at alteeve.com (Digimer)
Date: Wed, 01 Dec 2010 15:12:18 -0500
Subject: [Linux-cluster] OT: where is the wiki?
In-Reply-To: <20101201141207.GZ18254@mip.aaaaa.org>
References: <AANLkTi=zOXL3jGBVj3e=tqxBy6qb=pei4f_R3dhuqD3b@mail.gmail.com>
	<20101201141207.GZ18254@mip.aaaaa.org>
Message-ID: <4CF6AC22.9000502@alteeve.com>

On 12/01/2010 09:12 AM, Ofer Inbar wrote:
> Laszlo Beres<laszlo at beres.me>  wrote:
>> just recognized that http://sources.redhat.com/cluster/wiki/ does not
>> exist anymore. Is there a new location?
>
> The new location appears to be: http://sourceware.org/cluster/wiki
>
> Unfortunately http://sourceware.org/cluster/ redirects to redhat.com
> which gives the 404.  But if you add /wiki/ you get the wiki.
>    -- Cos

Might want to put some forwarders into your web server. :)

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org



From fdinitto at redhat.com  Thu Dec  2 13:24:06 2010
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 02 Dec 2010 14:24:06 +0100
Subject: [Linux-cluster] Announcing 3.1.0 releases (cluster, fence-agents,
 resource-agents, gfs2-utils)
Message-ID: <4CF79DF6.8060605@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

The cluster team and its community are proud to announce the 3.1.0
stable releases.

As previously announced
(https://www.redhat.com/archives/linux-cluster/2010-October/msg00012.html),
this release is the first step towards the split of the main source tree
into separate trees.

The cluster, fence-agents, resource-agents and gfs2-utils projects will
be released independently from each other from now on, so stay tuned for
announcements from the different maintainers (see also wiki for details).

cluster 3.1.0:

requires:
- - corosync 1.3.0 (or higher)
- - openais 1.1.4 (or higher)
- - any recent kernel header will work just fine (required for dlm)

download:
https://fedorahosted.org/releases/c/l/cluster/cluster-3.1.0.tar.xz

fence-agents 3.1.0:

requires:
- - cluster 3.1.0 (or higher)

download:
https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-3.1.0.tar.xz

resource-agents 3.1.0:

requires:
- - cluster 3.1.0 (or higher)

download:
https://fedorahosted.org/releases/r/e/resource-agents/resource-agents-3.1.0.tar.xz

gfs2-utils 3.1.0:

requires:
- - cluster 3.1.0 (or higher)
- - openais 1.1.4 (or higher)
- - corosync 1.3.0 (or higher)

download:
http://www.kernel.org/pub/linux/kernel/people/steve/gfs2/gfs2-utils-3.1.0.tar.bz2

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

Happy clustering,
Fabio
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJM953zAAoJEFA6oBJjVJ+Ow/kP/jEOMFM3+SSZ/jlTrVGQY2YT
61FVmE/CMWfjNLTe1blaMGQqqXBxhl3gjZ1fTqZ600fH/F2Ge7dpsaX7tPM/O2ug
9olXqg+/5prjUTeLOMpwsoQ5gBNNoVOYzAFxR75gtjDsgONMeFQLI//SYIRrdeJ0
3KYHsmQozwEwRYDfvWxO0saUj5HdOLvdFksGlgkpeOAEP3SwcC5gWo4vhKlF9jf8
CCMxu4/WWQyAReCv2kvIYgAqYAKbljG1UZDVe4GKZl9TORN7JabCZEEXmex6K5Nk
Rn/yn/Jvo1eZMF+n3ZzjF084GtznUipfKEWBLBJmcxUXUTsBvl2hYm28Ky+ZoUxd
5tbe772bIHzOvy2hCMNy97C+OoMkyJhaHVJfqXclwCS2YqYTeHXJw4OFMeAz3KJh
pA5ECbUqqWpOmssatPnohV3UFs3qo3vY3vOogLCqe9edPVD0lfZyTvHrRoZOEUUx
k14f67c2o7KqSymz6+hdbiNZrTh9FAu9Kit/j1gN+gv1AgSUPLcjb2hDEOLbz00I
5w41NhKBIcF7jkdfdAgD3q/pCnCIWEV17XLG5IOuoDdSMjxxxQ917V2Uv1bbMyRR
/S1F4+rRSCLdxZxoU5TlvLwnzYBcqG68BJi3Pj7ro/zyRLxMte333BdiIazngLmS
n7dhtmNm9pftHqliV+7V
=gsuQ
-----END PGP SIGNATURE-----



From scooter at cgl.ucsf.edu  Thu Dec  2 18:27:59 2010
From: scooter at cgl.ucsf.edu (Scooter Morris)
Date: Thu, 02 Dec 2010 10:27:59 -0800
Subject: [Linux-cluster] Question about gfs2_tools lockdump
Message-ID: <4CF7E52F.9010501@cgl.ucsf.edu>

Hi all,
     I've got a 5 node RHEL 5.5 cluster with a number of gfs2 
filesystems.  After a lot of effort (and help from RedHat) we've gotten 
to the stage where the cluster is quite stable, but now we're starting 
to see some performance degradation.   In investigating this, I've been 
poking around and I'm seeing some things that I can't explain.  In 
particular, on a quite filesystem (no processes according to lsof on all 
nodes), a gfs2_tool lockdump gives 1,000's of lock entries (G: lines).  
Of those several have R: entries (resource group?) and several have H: 
entries.  The H: entries are particularly strange because all H: entries 
are of the form:
     H: s:EX f:H e:0 p:8953 [(ended)] ...

My understanding is that this indicates a lock holder with an exclusive 
lock, but the process has ended (?).   Why aren't these locks going 
away?  Shouldn't they be cleared after the process ends (particularly 
since some of them are exclusive locks...)?  Any help in understanding 
these entries would be very helpful.

-- scooter



From rpeterso at redhat.com  Thu Dec  2 19:22:49 2010
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 2 Dec 2010 14:22:49 -0500 (EST)
Subject: [Linux-cluster] Question about gfs2_tools lockdump
In-Reply-To: <1551976847.1121861291317464590.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID: <1374982786.1122461291317769916.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>

----- "Scooter Morris" <scooter at cgl.ucsf.edu> wrote:
| Hi all,
|      I've got a 5 node RHEL 5.5 cluster with a number of gfs2 
| filesystems.  After a lot of effort (and help from RedHat) we've
| gotten 
| to the stage where the cluster is quite stable, but now we're starting
| 
| to see some performance degradation.   In investigating this, I've
| been 
| poking around and I'm seeing some things that I can't explain.  In 
| particular, on a quite filesystem (no processes according to lsof on
| all 
| nodes), a gfs2_tool lockdump gives 1,000's of lock entries (G: lines).
|  
| Of those several have R: entries (resource group?) and several have H:
| 
| entries.  The H: entries are particularly strange because all H:
| entries 
| are of the form:
|      H: s:EX f:H e:0 p:8953 [(ended)] ...
| 
| My understanding is that this indicates a lock holder with an
| exclusive 
| lock, but the process has ended (?).   Why aren't these locks going 
| away?  Shouldn't they be cleared after the process ends (particularly
| 
| since some of them are exclusive locks...)?  Any help in understanding
| 
| these entries would be very helpful.
| 
| -- scooter
| 
| --
| Linux-cluster mailing list
| Linux-cluster at redhat.com
| https://www.redhat.com/mailman/listinfo/linux-cluster

Hi Scooter,

There are lots of different types of glocks, and the type is given
before the slash.  Type 2 is inode, so 2/9009 is for a disk inode
located at block 0x9009 (in hex).  Type 3 is for resource groups,
so 3/170003 is for the resource group starting at block 0x170003.
Type 5 is for i_open glocks, which also correspond mostly to files.
So if you open a file and write some data, you can get both a inode
glock for 2/9009 and a corresponding i_open glock for 5/9009.
The inode glocks will also have a corresponding "I:" entry.
The resource group glocks may have an R: entry as well.

Each "H:" corresponds to a process that is holding or trying to hold
that particular glock.  A holder may persist even after a process
has ended.  For example, if I'm the first process to write to a
gfs2 file system, I could cause all the resource groups to be read in,
but the resource groups and their corresponding glocks will stay
in memory.

A holder record is said to be holding the glock if it has the
f:H flag.  It's waiting for the lock if it has the f:W flag.
If it says "s:SH", that's a shared hold.  If it says "s:EX"
that's an exclusive hold on the glock, etc.  So for example,
"s:EX f:W" corresponds to someone waiting for an exclusive lock
for that glock.

Another complication is that some versions of gfs2 sometimes
did not keep track of the process id (pid) when a glock was
transferred.  So some older versions report the pid as the
old pid, which would have ended, and not the correct holder.
That made debugging glock issues difficult, but it didn't hurt
anything.  I think that issue is fixed in 5.5 or 5.6.

It's a lot more complicated than that, but those are the basics.

I think Steve Whitehouse wrote a paper on glocks, but I don't
have the info handy.

Regards,

Bob Peterson
Red Hat File Systems



From corey.kovacs at gmail.com  Thu Dec  2 21:31:13 2010
From: corey.kovacs at gmail.com (Corey Kovacs)
Date: Thu, 2 Dec 2010 21:31:13 +0000
Subject: [Linux-cluster] Clarification...
Message-ID: <AANLkTiksKtMr7bVUmSo=JrZdZ+6GPPo0-zo-jVCp2Yvv@mail.gmail.com>

I've been watching the development of the cluster stack from the
sidelines for quite some time but somewhere things got a bit mixed up
for me.

It appears to me the following is true...

openais, heartbeat and corosync are equivalent in terms of purpose.
rgmanager and pacemaker are equivalent in terms of purpose.

If these are true, can someone point me to a run-down of the
differences and similarities or point me to a document?

How does this all relate to what ships with RHEL6?

Finally, is the wiki woefully out of date are is there a better place
to be getting information other than git repos?





Corey



From zachar at awst.at  Thu Dec  2 22:14:44 2010
From: zachar at awst.at (Balazs Zachar)
Date: Thu, 02 Dec 2010 23:14:44 +0100
Subject: [Linux-cluster] Clarification...
In-Reply-To: <AANLkTiksKtMr7bVUmSo=JrZdZ+6GPPo0-zo-jVCp2Yvv@mail.gmail.com>
References: <AANLkTiksKtMr7bVUmSo=JrZdZ+6GPPo0-zo-jVCp2Yvv@mail.gmail.com>
Message-ID: <4CF81A54.80103@awst.at>


On 12/02/2010 10:31 PM, Corey Kovacs wrote:
> I've been watching the development of the cluster stack from the
> sidelines for quite some time but somewhere things got a bit mixed up
> for me.
>
> It appears to me the following is true...
>
> openais, heartbeat and corosync are equivalent in terms of purpose.
>    
AFAIK not true anymore:
heartbeat is equivalent with corosync + openais (corosync was a fork of 
openais but now openais is an additional part for corosync)
Corosync + openais is recommended. (pacemaker website)
> rgmanager and pacemaker are equivalent in terms of purpose.
>    
True:
http://sources.redhat.com/cluster/wiki/RGManagerVsPacemaker

> If these are true, can someone point me to a run-down of the
> differences and similarities or point me to a document?
>
> How does this all relate to what ships with RHEL6?
>    
RHCS is using rgmanager in RHEL6. Pacemaker is in technology preview 
state (from release notes: "not fully integrated with the RHCS stack").

I heard that Pacemaker is going to replace rgmanager in the future. (the 
source wasn't official! Maybe we will get some more official answer for 
this here :) )

Regards,
Bal?zs
> Finally, is the wiki woefully out of date are is there a better place
> to be getting information other than git repos?
>
>
>
>
>
> Corey
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>    



From Chris.Jankowski at hp.com  Fri Dec  3 05:33:37 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Fri, 3 Dec 2010 05:33:37 +0000
Subject: [Linux-cluster] Validation failure of cluster.conf.
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A04EBC2@GVW1113EXC.americas.hpqcorp.net>

Hi,

I am in a process of building a cluster on RHEL6.
I elected to build the /etc/cluster/cluster.conf (attached) by hand i.e. no Conga.
After I added fencing and fence devices the configuration file no longer passes validation check.

ccs_config_validate reports the following error:

[root at booboo1 cluster]# ccs_config_validate -f cluster.conf.3.XX
Relax-NG validity error : Extra element fencedevices in interleave
tempfile:27: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
tempfile:18: element device: validity error : IDREF attribute name references an unknown ID "booboo2-ilo"
Configuration fails to validate

No matter how long I look at the file I cannot find any mistake in it.

I would appreciate if you could run the file through your validation tools and tell me what am I doing wrong.

Thanks and regards,

Chris Jankowski

<cluster name="auslab-test" config_version="3">

  <fence_daemon clean_start="0" post_fail_delay="1" post_join_delay="60"/>

  <clusternodes>

    <clusternode name="booboo1-ci.XXXX" nodeid="1" votes="1">
      <fence>
        <method name="iLO">
          <device name="booboo1-ilo"/>
        </method>
      </fence>
    </clusternode>

    <clusternode name="booboo2-ci.XXXX" nodeid="2" votes="1">
      <fence>
        <method name="iLO">
          <device name="booboo2-ilo"/>
        </method>
      </fence>
    </clusternode>

  </clusternodes>

  <cman expected_votes="3" two_node="0"/>

  <fencedevices>
    <fencedevice name="booboo1-ilo" agent="fence_ilo" hostname="booboo1-ilo.XXXX" login="XXXXX" passwd="XXXXX"/>
    <fencedevice name="booboo2-ilo" agent="fence_ilo" hostname="booboo2-ilo.XXXX" login="XXXXX" passwd="XXXXX"/>
  </fencedevices>

  <rm>

    <failoverdomains>
    </failoverdomains>

    <resources>
    </resources>

  </rm>

  <quorumd interval="3" label="auslab-test-quorum-disk" min_score="1" tko="7" votes="1">
    <heuristic interval="2" program="/bin/true" score="1"/>
  </quorumd>

  <totem token="43200"/>

</cluster>








-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101203/121db22b/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ccs_config_validate.out
Type: application/octet-stream
Size: 374 bytes
Desc: ccs_config_validate.out
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101203/121db22b/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf.3.XX
Type: application/octet-stream
Size: 1156 bytes
Desc: cluster.conf.3.XX
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101203/121db22b/attachment-0001.obj>

From andrew at beekhof.net  Fri Dec  3 07:14:09 2010
From: andrew at beekhof.net (Andrew Beekhof)
Date: Fri, 3 Dec 2010 08:14:09 +0100
Subject: [Linux-cluster] Clarification...
In-Reply-To: <4CF81A54.80103@awst.at>
References: <AANLkTiksKtMr7bVUmSo=JrZdZ+6GPPo0-zo-jVCp2Yvv@mail.gmail.com>
	<4CF81A54.80103@awst.at>
Message-ID: <AANLkTinw41PN6_ELQEQUh4u54=zdNCiXcY9zmmvDEByf@mail.gmail.com>

On Thu, Dec 2, 2010 at 11:14 PM, Balazs Zachar <zachar at awst.at> wrote:
>
> On 12/02/2010 10:31 PM, Corey Kovacs wrote:
>>
>> I've been watching the development of the cluster stack from the
>> sidelines for quite some time but somewhere things got a bit mixed up
>> for me.
>>
>> It appears to me the following is true...
>>
>> openais, heartbeat and corosync are equivalent in terms of purpose.
>>
>
> AFAIK not true anymore:
> heartbeat is equivalent with corosync + openais (corosync was a fork of
> openais but now openais is an additional part for corosync)
> Corosync + openais is recommended. (pacemaker website)
>>
>> rgmanager and pacemaker are equivalent in terms of purpose.
>>
>
> True:
> http://sources.redhat.com/cluster/wiki/RGManagerVsPacemaker
>
>> If these are true, can someone point me to a run-down of the
>> differences and similarities or point me to a document?
>>
>> How does this all relate to what ships with RHEL6?
>>
>
> RHCS is using rgmanager in RHEL6. Pacemaker is in technology preview state
> (from release notes: "not fully integrated with the RHCS stack").

Specifically there is no integration with luci yet.
Other than that its works just fine with the rest of the stack

> I heard that Pacemaker is going to replace rgmanager in the future. (the
> source wasn't official! Maybe we will get some more official answer for this
> here :) )

That is the current intention

>
> Regards,
> Bal?zs
>>
>> Finally, is the wiki woefully out of date are is there a better place
>> to be getting information other than git repos?
>>
>>
>>
>>
>>
>> Corey
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From corey.kovacs at gmail.com  Fri Dec  3 07:55:22 2010
From: corey.kovacs at gmail.com (Corey Kovacs)
Date: Fri, 3 Dec 2010 07:55:22 +0000
Subject: [Linux-cluster] Clarification...
In-Reply-To: <AANLkTinw41PN6_ELQEQUh4u54=zdNCiXcY9zmmvDEByf@mail.gmail.com>
References: <AANLkTiksKtMr7bVUmSo=JrZdZ+6GPPo0-zo-jVCp2Yvv@mail.gmail.com>
	<4CF81A54.80103@awst.at>
	<AANLkTinw41PN6_ELQEQUh4u54=zdNCiXcY9zmmvDEByf@mail.gmail.com>
Message-ID: <AANLkTimaBvA7ejWSXxmnuCD8Go-Bo+V8BQejUzwjWreF@mail.gmail.com>

Folks, thanks for the info.

If openais and corosync were at one time, serving the same function
but now are separate, what is the division?


Thanks again

-C

On Fri, Dec 3, 2010 at 7:14 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Thu, Dec 2, 2010 at 11:14 PM, Balazs Zachar <zachar at awst.at> wrote:
>>
>> On 12/02/2010 10:31 PM, Corey Kovacs wrote:
>>>
>>> I've been watching the development of the cluster stack from the
>>> sidelines for quite some time but somewhere things got a bit mixed up
>>> for me.
>>>
>>> It appears to me the following is true...
>>>
>>> openais, heartbeat and corosync are equivalent in terms of purpose.
>>>
>>
>> AFAIK not true anymore:
>> heartbeat is equivalent with corosync + openais (corosync was a fork of
>> openais but now openais is an additional part for corosync)
>> Corosync + openais is recommended. (pacemaker website)
>>>
>>> rgmanager and pacemaker are equivalent in terms of purpose.
>>>
>>
>> True:
>> http://sources.redhat.com/cluster/wiki/RGManagerVsPacemaker
>>
>>> If these are true, can someone point me to a run-down of the
>>> differences and similarities or point me to a document?
>>>
>>> How does this all relate to what ships with RHEL6?
>>>
>>
>> RHCS is using rgmanager in RHEL6. Pacemaker is in technology preview state
>> (from release notes: "not fully integrated with the RHCS stack").
>
> Specifically there is no integration with luci yet.
> Other than that its works just fine with the rest of the stack
>
>> I heard that Pacemaker is going to replace rgmanager in the future. (the
>> source wasn't official! Maybe we will get some more official answer for this
>> here :) )
>
> That is the current intention
>
>>
>> Regards,
>> Bal?zs
>>>
>>> Finally, is the wiki woefully out of date are is there a better place
>>> to be getting information other than git repos?
>>>
>>>
>>>
>>>
>>>
>>> Corey
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From fdinitto at redhat.com  Fri Dec  3 08:26:49 2010
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Fri, 03 Dec 2010 09:26:49 +0100
Subject: [Linux-cluster] Validation failure of cluster.conf.
In-Reply-To: <036B68E61A28CA49AC2767596576CD596F5A04EBC2@GVW1113EXC.americas.hpqcorp.net>
References: <036B68E61A28CA49AC2767596576CD596F5A04EBC2@GVW1113EXC.americas.hpqcorp.net>
Message-ID: <4CF8A9C9.80307@redhat.com>

On 12/3/2010 6:33 AM, Jankowski, Chris wrote:
> Hi,
> I am in a process of building a cluster on RHEL6.
> I elected to build the /etc/cluster/cluster.conf (attached) by hand i.e.
> no Conga.
> After I added fencing and fence devices the configuration file no longer
> passes validation check.
>  
> ccs_config_validate reports the following error:
>  
> [root at booboo1 cluster]# ccs_config_validate -f cluster.conf.3.XX
> Relax-NG validity error : Extra element fencedevices in interleave
> tempfile:27: element fencedevices: Relax-NG validity error : Element
> cluster failed to validate content
> tempfile:18: element device: validity error : IDREF attribute name
> references an unknown ID "booboo2-ilo"
> Configuration fails to validate
>  
> No matter how long I look at the file I cannot find any mistake in it. 
>  
> I would appreciate if you could run the file through your validation
> tools and tell me what am I doing wrong.
>  
> Thanks and regards,
>  
> Chris Jankowski
>  

>   <fencedevices>
>     <fencedevice name="booboo1-ilo" agent="fence_ilo"
> hostname="booboo1-ilo.XXXX" login="XXXXX" passwd="XXXXX"/>
>     <fencedevice name="booboo2-ilo" agent="fence_ilo"
> hostname="booboo2-ilo.XXXX" login="XXXXX" passwd="XXXXX"/>
>   </fencedevices>
>  

looking at man fence_ilo.8 (STDIN PARAMETERS section), you probably want
(untested as I don?t have ilo here):

<fencedevice name="booboo1-ilo" agent="fence_ilo" ipaddr="booboo..."
login="XXXXX" passwd="XXXXX"/>

Fabio



From Chris.Jankowski at hp.com  Fri Dec  3 09:08:38 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Fri, 3 Dec 2010 09:08:38 +0000
Subject: [Linux-cluster] Validation failure of cluster.conf.
In-Reply-To: <4CF8A9C9.80307@redhat.com>
References: <036B68E61A28CA49AC2767596576CD596F5A04EBC2@GVW1113EXC.americas.hpqcorp.net>
	<4CF8A9C9.80307@redhat.com>
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A04ED45@GVW1113EXC.americas.hpqcorp.net>

Fabio,

Indeed, you are 100% right. I should have ipaddr= and instead I had hostname= in the list of attributes for the fence_ilo device.

Syntax must have changed between RHEL5 and RHEL6.
I changed hostname= to ipaddr= and everything works as expected.

Thank you very much for your help. I really appreciate it.

Regards,

Chris Jankowski


-----Original Message-----
From: Fabio M. Di Nitto [mailto:fdinitto at redhat.com] 
Sent: Friday, 3 December 2010 19:27
To: linux clustering
Cc: Jankowski, Chris
Subject: Re: [Linux-cluster] Validation failure of cluster.conf.

On 12/3/2010 6:33 AM, Jankowski, Chris wrote:
> Hi,
> I am in a process of building a cluster on RHEL6.
> I elected to build the /etc/cluster/cluster.conf (attached) by hand i.e.
> no Conga.
> After I added fencing and fence devices the configuration file no longer
> passes validation check.
>  
> ccs_config_validate reports the following error:
>  
> [root at booboo1 cluster]# ccs_config_validate -f cluster.conf.3.XX
> Relax-NG validity error : Extra element fencedevices in interleave
> tempfile:27: element fencedevices: Relax-NG validity error : Element
> cluster failed to validate content
> tempfile:18: element device: validity error : IDREF attribute name
> references an unknown ID "booboo2-ilo"
> Configuration fails to validate
>  
> No matter how long I look at the file I cannot find any mistake in it. 
>  
> I would appreciate if you could run the file through your validation
> tools and tell me what am I doing wrong.
>  
> Thanks and regards,
>  
> Chris Jankowski
>  

>   <fencedevices>
>     <fencedevice name="booboo1-ilo" agent="fence_ilo"
> hostname="booboo1-ilo.XXXX" login="XXXXX" passwd="XXXXX"/>
>     <fencedevice name="booboo2-ilo" agent="fence_ilo"
> hostname="booboo2-ilo.XXXX" login="XXXXX" passwd="XXXXX"/>
>   </fencedevices>
>  

looking at man fence_ilo.8 (STDIN PARAMETERS section), you probably want
(untested as I don?t have ilo here):

<fencedevice name="booboo1-ilo" agent="fence_ilo" ipaddr="booboo..."
login="XXXXX" passwd="XXXXX"/>

Fabio



From andrew at beekhof.net  Fri Dec  3 09:16:32 2010
From: andrew at beekhof.net (Andrew Beekhof)
Date: Fri, 3 Dec 2010 10:16:32 +0100
Subject: [Linux-cluster] Clarification...
In-Reply-To: <AANLkTimaBvA7ejWSXxmnuCD8Go-Bo+V8BQejUzwjWreF@mail.gmail.com>
References: <AANLkTiksKtMr7bVUmSo=JrZdZ+6GPPo0-zo-jVCp2Yvv@mail.gmail.com>
	<4CF81A54.80103@awst.at>
	<AANLkTinw41PN6_ELQEQUh4u54=zdNCiXcY9zmmvDEByf@mail.gmail.com>
	<AANLkTimaBvA7ejWSXxmnuCD8Go-Bo+V8BQejUzwjWreF@mail.gmail.com>
Message-ID: <AANLkTimmZo5oa2=Be3Ji2S+Ffje=TH3kqknsN2tNk+Uu@mail.gmail.com>

On Fri, Dec 3, 2010 at 8:55 AM, Corey Kovacs <corey.kovacs at gmail.com> wrote:
> Folks, thanks for the info.
>
> If openais and corosync were at one time, serving the same function
> but now are separate, what is the division?

Different parts of the puzzle.
Corosync is core functionality, Openais has the implementation of the SAF APIs:
   http://www.openais.org/doku.php

>
>
> Thanks again
>
> -C
>
> On Fri, Dec 3, 2010 at 7:14 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>> On Thu, Dec 2, 2010 at 11:14 PM, Balazs Zachar <zachar at awst.at> wrote:
>>>
>>> On 12/02/2010 10:31 PM, Corey Kovacs wrote:
>>>>
>>>> I've been watching the development of the cluster stack from the
>>>> sidelines for quite some time but somewhere things got a bit mixed up
>>>> for me.
>>>>
>>>> It appears to me the following is true...
>>>>
>>>> openais, heartbeat and corosync are equivalent in terms of purpose.
>>>>
>>>
>>> AFAIK not true anymore:
>>> heartbeat is equivalent with corosync + openais (corosync was a fork of
>>> openais but now openais is an additional part for corosync)
>>> Corosync + openais is recommended. (pacemaker website)
>>>>
>>>> rgmanager and pacemaker are equivalent in terms of purpose.
>>>>
>>>
>>> True:
>>> http://sources.redhat.com/cluster/wiki/RGManagerVsPacemaker
>>>
>>>> If these are true, can someone point me to a run-down of the
>>>> differences and similarities or point me to a document?
>>>>
>>>> How does this all relate to what ships with RHEL6?
>>>>
>>>
>>> RHCS is using rgmanager in RHEL6. Pacemaker is in technology preview state
>>> (from release notes: "not fully integrated with the RHCS stack").
>>
>> Specifically there is no integration with luci yet.
>> Other than that its works just fine with the rest of the stack
>>
>>> I heard that Pacemaker is going to replace rgmanager in the future. (the
>>> source wasn't official! Maybe we will get some more official answer for this
>>> here :) )
>>
>> That is the current intention
>>
>>>
>>> Regards,
>>> Bal?zs
>>>>
>>>> Finally, is the wiki woefully out of date are is there a better place
>>>> to be getting information other than git repos?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Corey
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From zachar at awst.at  Fri Dec  3 09:49:38 2010
From: zachar at awst.at (zachar at awst.at)
Date: Fri, 03 Dec 2010 10:49:38 +0100 (CET)
Subject: [Linux-cluster] =?utf-8?q?Clarification=2E=2E=2E?=
Message-ID: <mtranet.20101203104938.2104901161@telekom.at>

Andrew Beekhof schrieb:
> On Thu, Dec 2, 2010 at 11:14 PM, Balazs Zachar <zachar at awst.at> wrote:
> >
> > On 12/02/2010 10:31 PM, Corey Kovacs wrote:
> >>
> >> I've been watching the development of the cluster stack from the
> >> sidelines for quite some time but somewhere things got a bit mixed up
> >> for me.
> >>
> >> It appears to me the following is true...
> >>
> >> openais, heartbeat and corosync are equivalent in terms of purpose.
> >>
> >
> > AFAIK not true anymore:
> > heartbeat is equivalent with corosync + openais (corosync was a fork 
> of
> > openais but now openais is an additional part for corosync)
> > Corosync + openais is recommended. (pacemaker website)
> >>
> >> rgmanager and pacemaker are equivalent in terms of purpose.
> >>
> >
> > True:
> > http://sources.redhat.com/cluster/wiki/RGManagerVsPacemaker
> >
> >> If these are true, can someone point me to a run-down of the
> >> differences and similarities or point me to a document?
> >>
> >> How does this all relate to what ships with RHEL6?
> >>
> >
> > RHCS is using rgmanager in RHEL6. Pacemaker is in technology preview 
> state
> > (from release notes: "not fully integrated with the RHCS stack").
> 
> Specifically there is no integration with luci yet.
> Other than that its works just fine with the rest of the stack
> 
> > I heard that Pacemaker is going to replace rgmanager in the future. 
> (the
> > source wasn't official! Maybe we will get some more official answer 
> for this
> > here :) )
> 
> That is the current intention

Andrew,
What are the plans about in which version of RHEL will RedHat support pacemaker?

By the way, nice job ;)

> 
> >
> > Regards,
> > Bal?zs
> >>
> >> Finally, is the wiki woefully out of date are is there a better place
> >> to be getting information other than git repos?
> >>
> >>
> >>
> >>
> >>
> >> Corey
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From andrew at beekhof.net  Fri Dec  3 10:01:26 2010
From: andrew at beekhof.net (Andrew Beekhof)
Date: Fri, 3 Dec 2010 11:01:26 +0100
Subject: [Linux-cluster] Clarification...
In-Reply-To: <mtranet.20101203104938.2104901161@telekom.at>
References: <mtranet.20101203104938.2104901161@telekom.at>
Message-ID: <AANLkTi=hAusAKocFwpHpP70jkmeP2LwAgY3JkzbCfjM4@mail.gmail.com>

On Fri, Dec 3, 2010 at 10:49 AM,  <zachar at awst.at> wrote:
> Andrew Beekhof schrieb:
>> On Thu, Dec 2, 2010 at 11:14 PM, Balazs Zachar <zachar at awst.at> wrote:
>> >
>> > On 12/02/2010 10:31 PM, Corey Kovacs wrote:
>> >>
>> >> I've been watching the development of the cluster stack from the
>> >> sidelines for quite some time but somewhere things got a bit mixed up
>> >> for me.
>> >>
>> >> It appears to me the following is true...
>> >>
>> >> openais, heartbeat and corosync are equivalent in terms of purpose.
>> >>
>> >
>> > AFAIK not true anymore:
>> > heartbeat is equivalent with corosync + openais (corosync was a fork
>> of
>> > openais but now openais is an additional part for corosync)
>> > Corosync + openais is recommended. (pacemaker website)
>> >>
>> >> rgmanager and pacemaker are equivalent in terms of purpose.
>> >>
>> >
>> > True:
>> > http://sources.redhat.com/cluster/wiki/RGManagerVsPacemaker
>> >
>> >> If these are true, can someone point me to a run-down of the
>> >> differences and similarities or point me to a document?
>> >>
>> >> How does this all relate to what ships with RHEL6?
>> >>
>> >
>> > RHCS is using rgmanager in RHEL6. Pacemaker is in technology preview
>> state
>> > (from release notes: "not fully integrated with the RHCS stack").
>>
>> Specifically there is no integration with luci yet.
>> Other than that its works just fine with the rest of the stack
>>
>> > I heard that Pacemaker is going to replace rgmanager in the future.
>> (the
>> > source wasn't official! Maybe we will get some more official answer
>> for this
>> > here :) )
>>
>> That is the current intention
>
> Andrew,
> What are the plans about in which version of RHEL will RedHat support pacemaker?

Alas we're not allowed to publicly discuss those kinds of details.

>
> By the way, nice job ;)

Thanks :)

>
>>
>> >
>> > Regards,
>> > Bal?zs
>> >>
>> >> Finally, is the wiki woefully out of date are is there a better place
>> >> to be getting information other than git repos?
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Corey
>> >>
>> >> --
>> >> Linux-cluster mailing list
>> >> Linux-cluster at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>> >>
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From Chris.Jankowski at hp.com  Fri Dec  3 10:10:10 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Fri, 3 Dec 2010 10:10:10 +0000
Subject: [Linux-cluster] Heuristics for quorum disk used as a tiebreaker in
 a two node cluster.
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A04ED98@GVW1113EXC.americas.hpqcorp.net>

Hi,

I am configuring a two node HA cluster that has only one service.
The sole purpose of the cluster is to keep the service up with minimum disruption for the widest possible range of failure scenarios.

I configured a quorum disk to make sure that after a failure of a node, the cluster (now consisting of only one node) continues to have quorum.

I am considering a partitioned cluster scenario.  Partitioned means to me that the cluster nodes lost the cluster communication path.  Without quorum disk each of the nodes in the cluster will fence the other.

However the manual page for qdisk gives premise of solving the problem in the list of design requirement that it apparently fulfils:

Quote:
Ability to use the external reasons for deciding which partition is the quorate partition in a partitioned cluster.  For example, a user may have a service running on one node, and that node must always be the master in the event of a network partition.
Unquote.

This is exactly what I would like to achieve.  I know which node should stay alive - the one running my service, and it is trivial for me to find this out directly, as I can query for its status locally on a node. I do not have use the network.  This can be used as a heuristic for the quorum disc.

What I am missing is how to make that into a workable whole.  Specifically the following aspects are of concern:

1.
I do not want the other node to be ejected from the cluster just because it does not run the service.  But the test is binary, so it looks like it will be ejected.

2.
Startup time, before the service started.  As no node has the service, both will be candidates for ejection.

3.
Service migration time.
During service migration from one node to another, there is a transient period of time when the service is not active on either node.

Questions:

1.
How do I put all of this together to achieve the overall objective of the node with the service surviving the partitioning event uninterrupted?

2.
What is the relationship between  fencing and node suicide due to communication through quorum disk?

3.
How does the master election relate to this?

I would be grateful for any insights, pointers to documentation, etc.

Thanks and regards,

Chris Jankowski





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101203/77d4447d/attachment.htm>

From linux-cluster at redhat.com  Sat Dec  4 07:49:08 2010
From: linux-cluster at redhat.com (Mailbot for etexusa.com)
Date: Fri, 3 Dec 2010 23:49:08 -0800
Subject: [Linux-cluster] DSN: failed (mspss@gto.net.om)
Message-ID: <mAWtxIeCDcrRuWlID02@etexusa.com>


This is a Delivery Status Notification (DSN).

I was unable to deliver your message to
mspss at gto.net.om.

The error was;
  Domain "gto.net.om" can't receive email

 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/rfc822-headers
Size: 483 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101203/38f53761/attachment.bin>

From Chris.Jankowski at hp.com  Mon Dec  6 01:23:42 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Mon, 6 Dec 2010 01:23:42 +0000
Subject: [Linux-cluster] Difference between -d and -s options of clusvcadm
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A04EF6D@GVW1113EXC.americas.hpqcorp.net>

Hi,

What is the difference between -d and -s options of clusvcadm?  When would I prefer using one over the other?

The manual page for clusvcadm(8) says:

-d  <service>       Stops and disables the user service named <service>
-s  <service>        Stops the service named <service> until a member transition or until it is enabled again.

I also read the manual page for rgmanager(8), but the usefulness of the distinction between stopped and disabled states escapes me.

Thanks and regards,

Chris


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101206/3d4c2419/attachment.htm>

From gcharles at ups.com  Mon Dec  6 12:18:26 2010
From: gcharles at ups.com (gcharles at ups.com)
Date: Mon, 6 Dec 2010 07:18:26 -0500
Subject: [Linux-cluster] Difference between -d and -s options of
	clusvcadm
In-Reply-To: <036B68E61A28CA49AC2767596576CD596F5A04EF6D@GVW1113EXC.americas.hpqcorp.net>
References: <036B68E61A28CA49AC2767596576CD596F5A04EF6D@GVW1113EXC.americas.hpqcorp.net>
Message-ID: <49CCA172B74C1B4D916CB9B71FB952DA27941AF745@njrarsvr3bef.us.ups.com>

If you "disable" a service it won't start up again without manual intervention, like with "clusvcadm -e...".  If you "stop" a service and let's say the node it was running on was rebooted, your service will start up on another node in the cluster if it was configured to do so.


Greg Charles
gcharles at ups.com



________________________________
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jankowski, Chris
Sent: Sunday, December 05, 2010 8:24 PM
To: linux clustering
Subject: [Linux-cluster] Difference between -d and -s options of clusvcadm

Hi,

What is the difference between -d and -s options of clusvcadm?  When would I prefer using one over the other?

The manual page for clusvcadm(8) says:

-d <service>       Stops and disables the user service named <service>
-s  <service>        Stops the service named <service> until a member transition or until it is enabled again.

I also read the manual page for rgmanager(8), but the usefulness of the distinction between stopped and disabled states escapes me.

Thanks and regards,

Chris


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101206/88cfb6df/attachment.htm>

From Chris.Jankowski at hp.com  Mon Dec  6 12:27:58 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Mon, 6 Dec 2010 12:27:58 +0000
Subject: [Linux-cluster] How do I implement an unmount only filesystem
	resource agent
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A04F2FF@GVW1113EXC.americas.hpqcorp.net>

Hi,

I am configuring a service that uses HA-LVM and XFS filesystem on top of it.
The filesystem will be backed up by a separate script run from cron(8) creating an LVM snapshot of the filesystem and mounting it on a mountpoint.

To have a foolproof HA service I need to:

-       Check, if the snapshot filesystem is mounted
-       If it is, all processes running in it need to be killed
-       Then the snapshot filesystem needs to be unmounted.

All of that is a prerequisite for HA-LVM to be able to do its work on the volume group.  HA-LVM needs to deactivate the volume group.  Once this is done the relocation of the service to another node will succeed

I could configure a script resource with a script that would do the 3 steps listed above as part of its stop action.  It would have essentially null start and status actions.

Is there a better, more elegant way of achieving the same result e.g. using the filesystem resource?

Thanks and regards,

Chris Jankowski


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101206/b40f0287/attachment.htm>

From alvaro.fernandez at sivsa.com  Mon Dec  6 21:11:24 2010
From: alvaro.fernandez at sivsa.com (Alvaro Jose Fernandez)
Date: Mon, 6 Dec 2010 22:11:24 +0100
Subject: [Linux-cluster] question about number of fencing devices needed for
	a two node cluster
Message-ID: <607D6181D9919041BE792D70EF2AEC48014D4071@LIMENS.sivsa.int>

Hi,

 

I would like to know about wheter it would suffice for a two-node RHCS cluster a single power switch (APC 7921) fencing device. The power switch has 8 power outlets and I intend to use four of them for each node's dual power supplies. 

 

I know it would be desirable to have two devices for a fully redundant configuration, but after reading some examples from the docs (they are meant for two power switch), I still cannot understand why a single power switch connected to both servers and the switch taking power from the UPS, would not be a good configuration. There is a single UPS in the environment.

 

?any experiences over this issue? 

 

regards,

 

alvaro

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101206/e864b484/attachment.htm>

From linux at alteeve.com  Mon Dec  6 21:35:16 2010
From: linux at alteeve.com (Digimer)
Date: Mon, 06 Dec 2010 16:35:16 -0500
Subject: [Linux-cluster] question about number of fencing devices needed
 for	a two node cluster
In-Reply-To: <607D6181D9919041BE792D70EF2AEC48014D4071@LIMENS.sivsa.int>
References: <607D6181D9919041BE792D70EF2AEC48014D4071@LIMENS.sivsa.int>
Message-ID: <4CFD5714.8080305@alteeve.com>

On 12/06/2010 04:11 PM, Alvaro Jose Fernandez wrote:
> Hi,
>
> I would like to know about wheter it would suffice for a two-node RHCS
> cluster a single power switch (APC 7921) fencing device. The power
> switch has 8 power outlets and I intend to use four of them for each
> node's dual power supplies.
>
> I know it would be desirable to have two devices for a fully redundant
> configuration, but after reading some examples from the docs (they are
> meant for two power switch), I still cannot understand why a single
> power switch connected to both servers and the switch taking power from
> the UPS, would not be a good configuration. There is a single UPS in the
> environment.
>
> ?any experiences over this issue?
>
> regards,
>
> alvaro

That is sufficient. The only concern is that the PDU doesn't verify node 
death, so success is returned when the power is cut. This requires a 
little extra testing to make sure that your config is accurate. Once 
setup, use 'fence_node <name>' against either node and ensure that they 
really do go down.

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org



From jakov.sosic at srce.hr  Mon Dec  6 23:40:40 2010
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Tue, 07 Dec 2010 00:40:40 +0100
Subject: [Linux-cluster] question about number of fencing devices needed
 for	a two node cluster
In-Reply-To: <607D6181D9919041BE792D70EF2AEC48014D4071@LIMENS.sivsa.int>
References: <607D6181D9919041BE792D70EF2AEC48014D4071@LIMENS.sivsa.int>
Message-ID: <4CFD7478.8050009@srce.hr>

On 12/06/2010 10:11 PM, Alvaro Jose Fernandez wrote:
> Hi,
> 
>  
> 
> I would like to know about wheter it would suffice for a two-node RHCS
> cluster a single power switch (APC 7921) fencing device. The power
> switch has 8 power outlets and I intend to use four of them for each
> node's dual power supplies.
> 
>  
> 
> I know it would be desirable to have two devices for a fully redundant
> configuration, but after reading some examples from the docs (they are
> meant for two power switch), I still cannot understand why a single
> power switch connected to both servers and the switch taking power from
> the UPS, would not be a good configuration. There is a single UPS in the
> environment.
> 
>  
> 
> ?any experiences over this issue?

It's because you still have SPOF. In this case, SPOF is the electronic
module of the powerswitch, so, if the electronics go down, there's no
way to fence the node. It would be better to have for example iDRAC or
IPMI as primary fencing device and APC as secondary.

But, as in many things in IT, you are back to price/performance ratio.
If you really must achieve 5x9 uptime, or else you have money penalty,
then you'll invest in secondary fencing device. For most clusters, one
fencing device is enough, though.


-- 
Jakov Sosic



From alvaro.fernandez at sivsa.com  Tue Dec  7 00:37:45 2010
From: alvaro.fernandez at sivsa.com (Alvaro Jose Fernandez)
Date: Tue, 7 Dec 2010 01:37:45 +0100
Subject: [Linux-cluster] question about number of fencing devices needed
	for	a two node cluster
References: <607D6181D9919041BE792D70EF2AEC48014D4071@LIMENS.sivsa.int>
	<4CFD7478.8050009@srce.hr>
Message-ID: <607D6181D9919041BE792D70EF2AEC48014D407A@LIMENS.sivsa.int>

Thanks for the tip, Jakov. 
regards,

alvaro
> ?any experiences over this issue?

It's because you still have SPOF. In this case, SPOF is the electronic
module of the powerswitch, so, if the electronics go down, there's no
way to fence the node. It would be better to have for example iDRAC or
IPMI as primary fencing device and APC as secondary.

But, as in many things in IT, you are back to price/performance ratio.
If you really must achieve 5x9 uptime, or else you have money penalty,
then you'll invest in secondary fencing device. For most clusters, one
fencing device is enough, though.


-- 
Jakov Sosic

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From alvaro.fernandez at sivsa.com  Tue Dec  7 00:40:12 2010
From: alvaro.fernandez at sivsa.com (Alvaro Jose Fernandez)
Date: Tue, 7 Dec 2010 01:40:12 +0100
Subject: [Linux-cluster] question about number of fencing devices needed
	for	a two node cluster
References: <607D6181D9919041BE792D70EF2AEC48014D4071@LIMENS.sivsa.int>
	<4CFD5714.8080305@alteeve.com>
Message-ID: <607D6181D9919041BE792D70EF2AEC48014D407B@LIMENS.sivsa.int>

Many thanks for the advice, Digimer. 

regards.

> I know it would be desirable to have two devices for a fully redundant
> configuration, but after reading some examples from the docs (they are
> meant for two power switch), I still cannot understand why a single
> power switch connected to both servers and the switch taking power from
> the UPS, would not be a good configuration. There is a single UPS in the
> environment.
>
> ?any experiences over this issue?
>
> regards,
>
> alvaro

That is sufficient. The only concern is that the PDU doesn't verify node 
death, so success is returned when the power is cut. This requires a 
little extra testing to make sure that your config is accurate. Once 
setup, use 'fence_node <name>' against either node and ensure that they 
really do go down.

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org



From lamshuyin at gmail.com  Tue Dec  7 07:48:10 2010
From: lamshuyin at gmail.com (Jacky Lam)
Date: Tue, 7 Dec 2010 15:48:10 +0800
Subject: [Linux-cluster] GFS on AOE
Message-ID: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>

Dear all,

     I am new to GFS. I search through web but could not get a definite
answer.

     I have 2 pc (A and B) connecting by Ethernet. 1 harddisk is attaching
on A and sharing through ATA over Ethernet. Is it possible for B to access
hardisk using GFS over AOE? Any know issue (like caching). I suppose A must
need to access the harddisk through GFS as well, am I correct?

     If any, is there comparison between GFS (on AOE?) and NFS on throughput
and CPU loading?
     Thanks a lot.

Best Regards,
Jacky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101207/60411474/attachment.htm>

From swhiteho at redhat.com  Tue Dec  7 09:52:35 2010
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 07 Dec 2010 09:52:35 +0000
Subject: [Linux-cluster] GFS on AOE
In-Reply-To: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>
References: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>
Message-ID: <1291715555.2451.2.camel@dolmen>

Hi,

On Tue, 2010-12-07 at 15:48 +0800, Jacky Lam wrote:
> Dear all,
> 
>      I am new to GFS. I search through web but could not get a
> definite answer.
> 
>      I have 2 pc (A and B) connecting by Ethernet. 1 harddisk is
> attaching on A and sharing through ATA over Ethernet. Is it possible
> for B to access hardisk using GFS over AOE? Any know issue (like
> caching). I suppose A must need to access the harddisk through GFS as
> well, am I correct?
> 
Yes. Both machines would need direct access to the shared disk. That
should be possible using AoE, although I've not tried it myself.

>      If any, is there comparison between GFS (on AOE?) and NFS on
> throughput and CPU loading?
>      Thanks a lot.
> 
> Best Regards,
> Jacky
> --
Bearing in mind that AoE is a really simple protocol, I'd expect that
NFS would create more cpu loading. However, that is a bit of an odd way
to compare the two solutions. Normally the cpu is not the limiting
factor, especially with lower end solutions such as AoE, it is more
likely that the shared disk will be the bottleneck,

Steve.




From fdinitto at redhat.com  Tue Dec  7 10:41:39 2010
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 07 Dec 2010 11:41:39 +0100
Subject: [Linux-cluster] GFS on AOE
In-Reply-To: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>
References: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>
Message-ID: <4CFE0F63.4040009@redhat.com>

On 12/07/2010 08:48 AM, Jacky Lam wrote:
> Dear all,
> 
>      I am new to GFS. I search through web but could not get a definite
> answer.
> 
>      I have 2 pc (A and B) connecting by Ethernet. 1 harddisk is
> attaching on A and sharing through ATA over Ethernet. Is it possible for
> B to access hardisk using GFS over AOE? Any know issue (like caching). I
> suppose A must need to access the harddisk through GFS as well, am I
> correct?

This won't work. It is also part of the official FAQ.

The problem being that AOE (as you suspect) adds a different level of
caching.

All nodes need to have a consistent access path to the disk.

Fabio



From mad at wol.de  Tue Dec  7 11:18:01 2010
From: mad at wol.de (Marc - A. Dahlhaus)
Date: Tue, 07 Dec 2010 12:18:01 +0100
Subject: [Linux-cluster] GFS on AOE
In-Reply-To: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>
References: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>
Message-ID: <1291720682.7239.68.camel@marc>

Hello Jacky,

Am Dienstag, den 07.12.2010, 15:48 +0800 schrieb Jacky Lam:
> Dear all,
> 
>      I am new to GFS. I search through web but could not get a
> definite answer.
> 
>      I have 2 pc (A and B) connecting by Ethernet. 1 harddisk is
> attaching on A and sharing through ATA over Ethernet. Is it possible
> for B to access hardisk using GFS over AOE? Any know issue (like
> caching). I suppose A must need to access the harddisk through GFS as
> well, am I correct?

Should work without problems.

My test-clusters are using this setup and i faced no problems even under
bonnie++ load...

I use ggaoed because other target-creators didn't allowed (i last
checked this over a year ago) access to the same target over lo and eth
interfaces...

I can give more details (eg. configs) if you need them.

>      If any, is there comparison between GFS (on AOE?) and NFS on
> throughput and CPU loading?
>      Thanks a lot.

GFS as blockdevice filesystem and NFS as network protocol can't be
compared easily...

An NFS-share is hosted on some random (even GFS is possible) blockdevice
filesystem hidden behind the protocol of the NFS-server. So this
NFS-servers architecture plays a huge role in such a comparison of
client-performance...

> Best Regards,
> Jacky

Marc



From linux-cluster at redhat.com  Tue Dec  7 11:56:01 2010
From: linux-cluster at redhat.com (Mailbot for etexusa.com)
Date: Tue, 7 Dec 2010 03:56:01 -0800
Subject: [Linux-cluster] DSN: failed (delivery failed)
Message-ID: <mAWtxIeCDcrRuX3Cb02@etexusa.com>


This is a Delivery Status Notification (DSN).

I was unable to deliver your message to
vallalar2006 at vsnl.com.

I said 
  RCPT TO:<vallalar2006 at vsnl.com>

And they gave me the error;
  550 5.1.1 unknown or illegal alias: vallalar2006 at vsnl.com

 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/rfc822-headers
Size: 492 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101207/25c04385/attachment.bin>

From jeff.sturm at eprize.com  Tue Dec  7 15:39:13 2010
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Tue, 7 Dec 2010 10:39:13 -0500
Subject: [Linux-cluster] GFS on AOE
In-Reply-To: <4CFE0F63.4040009@redhat.com>
References: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>
	<4CFE0F63.4040009@redhat.com>
Message-ID: <64D0546C5EBBD147B75DE133D798665F06A12877@hugo.eprize.local>

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com]
> On Behalf Of Fabio M. Di Nitto
> Sent: Tuesday, December 07, 2010 5:42 AM
> To: linux-cluster at redhat.com
> Subject: Re: [Linux-cluster] GFS on AOE
> 
> The problem being that AOE (as you suspect) adds a different level of
caching.

Note however that the AoE protocol does not specify caching, except for
optional asynchronous writes.  (The aoe Linux module does not utilize
asynchronous writes.)

Nevertheless, the configuration suggested by the OP is unusual, and
won't be very useful in my opinion.  Having node B rely on a hard disk
in node A leaves node A as a single point of failure.

We use GFS over AoE extensively, and find it works well.  However we use
an AoE target that runs independent of the cluster and provides
high-availability on its own.

-Jeff





From gordan at bobich.net  Tue Dec  7 16:24:17 2010
From: gordan at bobich.net (Gordan Bobic)
Date: Tue, 07 Dec 2010 16:24:17 +0000
Subject: [Linux-cluster] GFS on AOE
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F06A12877@hugo.eprize.local>
References: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>	<4CFE0F63.4040009@redhat.com>
	<64D0546C5EBBD147B75DE133D798665F06A12877@hugo.eprize.local>
Message-ID: <4CFE5FB1.5010404@bobich.net>

Jeff Sturm wrote:
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com]
>> On Behalf Of Fabio M. Di Nitto
>> Sent: Tuesday, December 07, 2010 5:42 AM
>> To: linux-cluster at redhat.com
>> Subject: Re: [Linux-cluster] GFS on AOE
>>
>> The problem being that AOE (as you suspect) adds a different level of
> caching.
> 
> Note however that the AoE protocol does not specify caching, except for
> optional asynchronous writes.  (The aoe Linux module does not utilize
> asynchronous writes.)

It's still an unusual setup. Rather than use a lopsided setup of one 
node using the disk directly and the other via AoE, it would probably be 
safer and more reasonable to have the physical disk only accessed by the 
AoE server daemon and have both nodes connect to that..

> Nevertheless, the configuration suggested by the OP is unusual, and
> won't be very useful in my opinion.  Having node B rely on a hard disk
> in node A leaves node A as a single point of failure.

Arguably a "proper" SAN would also be a SPOF itself - unless you have 
two mirrored in real-time.

DRBD is good for a "poor man's SAN" that does away with the SPOF, unlike 
most "enterprise grade" SANs that are based on the assumption that the 
SAN will never fail.

Gordan



From jeff.sturm at eprize.com  Tue Dec  7 18:18:08 2010
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Tue, 7 Dec 2010 13:18:08 -0500
Subject: [Linux-cluster] GFS on AOE
In-Reply-To: <4CFE5FB1.5010404@bobich.net>
References: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>	<4CFE0F63.4040009@redhat.com><64D0546C5EBBD147B75DE133D798665F06A12877@hugo.eprize.local>
	<4CFE5FB1.5010404@bobich.net>
Message-ID: <64D0546C5EBBD147B75DE133D798665F06A1287C@hugo.eprize.local>

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com]
> On Behalf Of Gordan Bobic
> Sent: Tuesday, December 07, 2010 11:24 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] GFS on AOE
> 
> > Note however that the AoE protocol does not specify caching, except
> > for optional asynchronous writes.  (The aoe Linux module does not
> > utilize asynchronous writes.)
> 
> It's still an unusual setup. Rather than use a lopsided setup of one
node using the disk
> directly and the other via AoE, it would probably be safer and more
reasonable to have
> the physical disk only accessed by the AoE server daemon and have both
nodes
> connect to that..

No question about it... I was commenting on one aspect of AoE, while
you're giving the OP better advice as to how he can configure a good
2-node cluster.

> DRBD is good for a "poor man's SAN" that does away with the SPOF,
unlike most
> "enterprise grade" SANs that are based on the assumption that the SAN
will never fail.

Agreed, DRBD works well for that.  If you need more than a 2-node
cluster, it might make sense to run AoE (or iSCSI) over DRBD.

Most "enterprise grade" SANs have some provisions for
failover/redundancy, but you make a good point--even if a single SAN
chassis is indeed bulletproof, you'll need to take them offline for
maintenance (e.g. firmware updates) from time to time.  (Then, there's
human error to deal with as well.)

-Jeff





From yvette at dbtgroup.com  Tue Dec  7 19:03:24 2010
From: yvette at dbtgroup.com (yvette hirth)
Date: Tue, 07 Dec 2010 19:03:24 +0000
Subject: [Linux-cluster] gfs2 tuning
Message-ID: <4CFE84FC.3090307@dbtgroup.com>

hi all,

we've now defined three nodes with two more being added soon, and the 
GFS2 filesystems are shared between all nodes.

and, of course, i have questions.  8^O

the servers are HP DL380 G6's.  initially i used ipmi_lan as the fence 
manager, with limited success; now i'm using ILO as the fence manager, 
and at boot, fenced takes forever (well, 5 min or so, which in IT time 
is forever) to start.  is this normal?  the ilo2 connections are all on 
a separate unmanaged dell 2624 switch, which has only the three ILO2 
node connections, and nothing else.

next, we've added SANbox2 as a backup fencing agent, and the fibre 
switch is an HP 8/20q (QLogic).  i'm not sure if the SANbox2 support is 
usable on the 8/20q.  anyone have any experience with this?  if this is 
supported, wouldn't it be faster to fence/unfence than ip-based fencing?

we've got ping_pong downloaded and tested the cluster.  we're getting 
about 2500-3000 locks/sec when ping_pong runs on one node; on two, the 
locks/sec drops a bit; and on all three nodes, the most we've seen with 
ping_pong running on all three nodes is ~1800 locks/sec.  googling has 
produced claims of 200k-300k locks/sec when running ping_pong on one node...

most of the GFS2 filesystems (600-6000 resource groups) store a 
relatively small number of very large (2GB+) files.  the extremes among 
the GFS2 filesystems are:  86 files comprising 800GB, to ~98k files 
comprising 256GB.  we've googled "gfs2 tuning" but don't seem to be 
coming up with anything specific, and rather than "experiment" - which 
on GFS2 filesystems can take "a while" - i thought i'd ask, "have we 
done something wrong?"

finally, how does the cluster.conf resource definitions interact with 
GFS2?  is it only for "cluster operation"; i.e., only when fencing / 
unfencing?  we specified "noatime,noquota,data=writeback" on all GFS2 
filesytems (journals = 5).  is this causing our lock rate to fall?  and 
even tho we've changed the resource definition in cluster.conf and set 
the same parms on /etc/fstab, when mounts are displayed, we do not see 
"noquota" anywhere...

thanks in advance for any info y'all can provide us!

yvette



From swhiteho at redhat.com  Tue Dec  7 20:03:14 2010
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 07 Dec 2010 20:03:14 +0000
Subject: [Linux-cluster] gfs2 tuning
In-Reply-To: <4CFE84FC.3090307@dbtgroup.com>
References: <4CFE84FC.3090307@dbtgroup.com>
Message-ID: <1291752194.2451.58.camel@dolmen>

Hi,

On Tue, 2010-12-07 at 19:03 +0000, yvette hirth wrote:
> hi all,
> 
> we've now defined three nodes with two more being added soon, and the 
> GFS2 filesystems are shared between all nodes.
> 
> and, of course, i have questions.  8^O
> 
> the servers are HP DL380 G6's.  initially i used ipmi_lan as the fence 
> manager, with limited success; now i'm using ILO as the fence manager, 
> and at boot, fenced takes forever (well, 5 min or so, which in IT time 
> is forever) to start.  is this normal?  the ilo2 connections are all on 
> a separate unmanaged dell 2624 switch, which has only the three ILO2 
> node connections, and nothing else.
> 
> next, we've added SANbox2 as a backup fencing agent, and the fibre 
> switch is an HP 8/20q (QLogic).  i'm not sure if the SANbox2 support is 
> usable on the 8/20q.  anyone have any experience with this?  if this is 
> supported, wouldn't it be faster to fence/unfence than ip-based fencing?
> 
> we've got ping_pong downloaded and tested the cluster.  we're getting 
> about 2500-3000 locks/sec when ping_pong runs on one node; on two, the 
> locks/sec drops a bit; and on all three nodes, the most we've seen with 
> ping_pong running on all three nodes is ~1800 locks/sec.  googling has 
> produced claims of 200k-300k locks/sec when running ping_pong on one node...
> 
Don't worry too much about the performance of this test. It probably
isn't that important for most real applications, particularly since you
seem to be using larger files. The total time is likely to be dominated
by the actual data operation on the file, rather than fcntl locking
overhead.

> most of the GFS2 filesystems (600-6000 resource groups) store a 
> relatively small number of very large (2GB+) files.  the extremes among 
> the GFS2 filesystems are:  86 files comprising 800GB, to ~98k files 
> comprising 256GB.  we've googled "gfs2 tuning" but don't seem to be 
> coming up with anything specific, and rather than "experiment" - which 
> on GFS2 filesystems can take "a while" - i thought i'd ask, "have we 
> done something wrong?"
> 
Normally performance issues tend to relate to the way in which the
workload is distributed across the nodes and the I/O pattern which
arises. That can result in a bottleneck of a single resource. The
locking is done on a per-inode basis, so sometimes directories can be
the source of contention if there are lots of creates/deletes in that
directory from multiple nodes in a relatively short period.

> finally, how does the cluster.conf resource definitions interact with 
> GFS2?  is it only for "cluster operation"; i.e., only when fencing / 
> unfencing?  we specified "noatime,noquota,data=writeback" on all GFS2 
> filesytems (journals = 5).  is this causing our lock rate to fall?  and 
> even tho we've changed the resource definition in cluster.conf and set 
> the same parms on /etc/fstab, when mounts are displayed, we do not see 
> "noquota" anywhere...
> 
> thanks in advance for any info y'all can provide us!
> 
> yvette
> 
You might find that the default data=ordered is faster than writeback,
depending on the workload. There shouldn't be anything in cluster.conf
which is likely to affect the filesystem's performance beyond the limit
on fcntl locks, which you must have already set correctly in order to
get the fcntl locking rates that you mention above,

Steve.

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From fdinitto at redhat.com  Wed Dec  8 02:53:19 2010
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 08 Dec 2010 03:53:19 +0100
Subject: [Linux-cluster] GFS on AOE
In-Reply-To: <64D0546C5EBBD147B75DE133D798665F06A12877@hugo.eprize.local>
References: <AANLkTi=zZ7O4czV5e3J_FLAgaGssVjgnqXHdMWq=GZy1@mail.gmail.com>	<4CFE0F63.4040009@redhat.com>
	<64D0546C5EBBD147B75DE133D798665F06A12877@hugo.eprize.local>
Message-ID: <4CFEF31F.6000405@redhat.com>

On 12/07/2010 04:39 PM, Jeff Sturm wrote:
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com]
>> On Behalf Of Fabio M. Di Nitto
>> Sent: Tuesday, December 07, 2010 5:42 AM
>> To: linux-cluster at redhat.com
>> Subject: Re: [Linux-cluster] GFS on AOE
>>
>> The problem being that AOE (as you suspect) adds a different level of
> caching.
> 
> Note however that the AoE protocol does not specify caching, except for
> optional asynchronous writes.  (The aoe Linux module does not utilize
> asynchronous writes.)

In our testing we did have several issues with the setup described above
and trimmed down the problem to have:

node A -> controller/driver X -> harddisk
node B -> (any network block device, including AOE) -> controller/driver
X -> harddisk.

And isolated the issue to the asymmetry of the setup.

> 
> Nevertheless, the configuration suggested by the OP is unusual, and
> won't be very useful in my opinion.  Having node B rely on a hard disk
> in node A leaves node A as a single point of failure.

Yes absolutely. It does not make any sense, but for basic testing is
"good enough".


> 
> We use GFS over AoE extensively, and find it works well.  However we use
> an AoE target that runs independent of the cluster and provides
> high-availability on its own.

Yes, this is also tested and works fine. As you might have noticed in
the FAQ, we only describe the asymmetric setup as "not-working".

Fabio

> 
> -Jeff
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From Chris.Jankowski at hp.com  Wed Dec  8 03:11:39 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Wed, 8 Dec 2010 03:11:39 +0000
Subject: [Linux-cluster] rgmanager gets stuck on shutdown,
 if no services are running on its node.
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A0DEF29@GVW1113EXC.americas.hpqcorp.net>

Hi,

I configured a cluster of 2 RHEL6 nodes.
The cluster has only one HA service defined.

I have a problem with rgmanager getting stuck on shutdown when certain set of conditions are met.  The details follow.

1.
If I execute "shutdown -h now" on the node that is *not* running the HA service then the shutdown process gets stuck with the last message in the /var/log/messages being:

'date' my_node_name  rgmanager[PID#]: Shutting down

The shutdown never completes, until I send terminate signal to the two instances of the rgmanager process.  Then shutdown completes normally.

2.
By comparison, if I execute "shutdown -h now" on a node that *is* running the HA service, then shutdown proceeds normally.

3.
The problem walks with the absence of the service i.e. each of the two nodes has the problem when the service is *not* running on it and does not have the problem when the service *is* running on it.

4.
I have set the following debug level in the cluster.conf:

<logging>
        <logging_daemon debug="on" name="rgmanager"/>
</logging>

But I am not getting any additional messages when the rgmanager is stuck during shutdown.

Questions:
Is this a known problem?
How can I avoid it short of having some dummy service running on each node, as a workaround?

Thanks and regards,

Chris Jankowski




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101208/e12dfdd4/attachment.htm>

From fdinitto at redhat.com  Wed Dec  8 03:59:36 2010
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 08 Dec 2010 04:59:36 +0100
Subject: [Linux-cluster] rgmanager gets stuck on shutdown,
 if no services are running on its node.
In-Reply-To: <036B68E61A28CA49AC2767596576CD596F5A0DEF29@GVW1113EXC.americas.hpqcorp.net>
References: <036B68E61A28CA49AC2767596576CD596F5A0DEF29@GVW1113EXC.americas.hpqcorp.net>
Message-ID: <4CFF02A8.1080407@redhat.com>

Hi,

On 12/08/2010 04:11 AM, Jankowski, Chris wrote:
> Hi,
>  
> I configured a cluster of 2 RHEL6 nodes. 
> The cluster has only one HA service defined.
>  
> I have a problem with rgmanager getting stuck on shutdown when certain
> set of conditions are met.  The details follow.
>  
> 1.
> If I execute ?shutdown ?h now? on the node that is **not** running the
> HA service then the shutdown process gets stuck with the last message in
> the /var/log/messages being:
>  
> ?date? my_node_name  rgmanager[PID#]: Shutting down
>  
> The shutdown never completes, until I send terminate signal to the two
> instances of the rgmanager process. Then shutdown completes normally.
>  
> 2.
> By comparison, if I execute ?shutdown ?h now? on a node that **is**
> running the HA service, then shutdown proceeds normally.
>  
> 3.
> The problem walks with the absence of the service i.e. each of the two
> nodes has the problem when the service is **not** running on it and does
> not have the problem when the service **is** running on it.
>  
> 4.
> I have set the following debug level in the cluster.conf:
>  
> <logging>
>         <logging_daemon debug=?on? name=?rgmanager?/>
> </logging>

Try also:
<logging debug="on"/>

and collect logs from all daemons. The rgmanager being stuck could be
only a consequence of something else being blocked and not necessarily
the root cause of the problem.

>  
> But I am not getting any additional messages when the rgmanager is stuck
> during shutdown.
>  
> Questions:
> Is this a known problem?

No, can you please follow the standard procedure and report the issue
through support? or file at least a bugzilla?

> How can I avoid it short of having some dummy service running on each
> node, as a workaround?

Send us all debugging logs and cluster.conf so we can actually fix the
problem asap.

Fabio



From Chris.Jankowski at hp.com  Wed Dec  8 04:55:21 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Wed, 8 Dec 2010 04:55:21 +0000
Subject: [Linux-cluster] rgmanager gets stuck on shutdown,
 if no services are running on its node.
In-Reply-To: <4CFF02A8.1080407@redhat.com>
References: <036B68E61A28CA49AC2767596576CD596F5A0DEF29@GVW1113EXC.americas.hpqcorp.net>
	<4CFF02A8.1080407@redhat.com>
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A0DEFC0@GVW1113EXC.americas.hpqcorp.net>

Fabio,

Thank you. I asked the customer to log a support call with HP, who are providing 1st and 2nd level of support for them.

In the meantime, I followed your advice and configured debug level of logging for all daemons. However, this did not produce any new information when I tested the scenario again.

Regards,

Chris Jankowski



-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Fabio M. Di Nitto
Sent: Wednesday, 8 December 2010 15:00
To: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] rgmanager gets stuck on shutdown, if no services are running on its node.

Hi,

On 12/08/2010 04:11 AM, Jankowski, Chris wrote:
> Hi,
>  
> I configured a cluster of 2 RHEL6 nodes. 
> The cluster has only one HA service defined.
>  
> I have a problem with rgmanager getting stuck on shutdown when certain
> set of conditions are met.  The details follow.
>  
> 1.
> If I execute "shutdown -h now" on the node that is **not** running the
> HA service then the shutdown process gets stuck with the last message in
> the /var/log/messages being:
>  
> 'date' my_node_name  rgmanager[PID#]: Shutting down
>  
> The shutdown never completes, until I send terminate signal to the two
> instances of the rgmanager process. Then shutdown completes normally.
>  
> 2.
> By comparison, if I execute "shutdown -h now" on a node that **is**
> running the HA service, then shutdown proceeds normally.
>  
> 3.
> The problem walks with the absence of the service i.e. each of the two
> nodes has the problem when the service is **not** running on it and does
> not have the problem when the service **is** running on it.
>  
> 4.
> I have set the following debug level in the cluster.conf:
>  
> <logging>
>         <logging_daemon debug="on" name="rgmanager"/>
> </logging>

Try also:
<logging debug="on"/>

and collect logs from all daemons. The rgmanager being stuck could be
only a consequence of something else being blocked and not necessarily
the root cause of the problem.

>  
> But I am not getting any additional messages when the rgmanager is stuck
> during shutdown.
>  
> Questions:
> Is this a known problem?

No, can you please follow the standard procedure and report the issue
through support? or file at least a bugzilla?

> How can I avoid it short of having some dummy service running on each
> node, as a workaround?

Send us all debugging logs and cluster.conf so we can actually fix the
problem asap.

Fabio

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From lhh at redhat.com  Wed Dec  8 19:46:09 2010
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 08 Dec 2010 14:46:09 -0500
Subject: [Linux-cluster] rgmanager gets stuck on shutdown,
 if no services are running on its node.
In-Reply-To: <036B68E61A28CA49AC2767596576CD596F5A0DEF29@GVW1113EXC.americas.hpqcorp.net>
References: <036B68E61A28CA49AC2767596576CD596F5A0DEF29@GVW1113EXC.americas.hpqcorp.net>
Message-ID: <1291837569.3865.3.camel@ayanami.boston.devel.redhat.com>

On Wed, 2010-12-08 at 03:11 +0000, Jankowski, Chris wrote:
> Hi,
>  
> I configured a cluster of 2 RHEL6 nodes.  
> The cluster has only one HA service defined.
>  
> I have a problem with rgmanager getting stuck on shutdown when certain
> set of conditions are met.  The details follow.
>  
> 1.
> If I execute ?shutdown ?h now? on the node that is *not* running the
> HA service then the shutdown process gets stuck with the last message
> in the /var/log/messages being:
>  

Is this reproducible outside of 'shutdown -h now', ex: does 'service
rgmanager stop' work in your configuration?

If you can still reach the machine (ssh or whatever) after executing
'shutdown -h now':

1) Install 'rgmanager-debuginfo' and gdb.

2) When rgmanager hangs on shutdown, run:

  - gdb /usr/sbin/rgmanager `pidof -s rgmanager`

3) When inside gdb, run:

  - thr a a bt

There's a related bug in RHEL5 related to releasing the lockspace if
CMAN exits before rgmanager, but I was unable to reproduce it on the
STABLE3/31 branches when I tested.

-- Lon




From lhh at redhat.com  Wed Dec  8 19:49:27 2010
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 08 Dec 2010 14:49:27 -0500
Subject: [Linux-cluster] How do I implement an unmount only filesystem
 resource agent
In-Reply-To: <036B68E61A28CA49AC2767596576CD596F5A04F2FF@GVW1113EXC.americas.hpqcorp.net>
References: <036B68E61A28CA49AC2767596576CD596F5A04F2FF@GVW1113EXC.americas.hpqcorp.net>
Message-ID: <1291837767.3865.7.camel@ayanami.boston.devel.redhat.com>

On Mon, 2010-12-06 at 12:27 +0000, Jankowski, Chris wrote:
>  
> To have a foolproof HA service I need to:
>  
>       * Check, if the snapshot filesystem is mounted
>       * If it is, all processes running in it need to be killed
>       * Then the snapshot filesystem needs to be unmounted.


>  
> I could configure a script resource with a script that would do the 3
> steps listed above as part of its stop action.  It would have
> essentially null start and status actions.
>  
> Is there a better, more elegant way of achieving the same result e.g.
> using the filesystem resource?

In theory you could delete the 'start' operation from the agent, but I
think rgmanager will ignore that and try to start it anyway...

You could edit the 'fs' agent and make the 'stop' and 'status'
operations return 0 immediately, though.

-- Lon




From lhh at redhat.com  Wed Dec  8 20:33:14 2010
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 08 Dec 2010 15:33:14 -0500
Subject: [Linux-cluster] Heuristics for quorum disk used as a tiebreaker
 in a two node cluster.
In-Reply-To: <036B68E61A28CA49AC2767596576CD596F5A04ED98@GVW1113EXC.americas.hpqcorp.net>
References: <036B68E61A28CA49AC2767596576CD596F5A04ED98@GVW1113EXC.americas.hpqcorp.net>
Message-ID: <1291840394.3865.51.camel@ayanami.boston.devel.redhat.com>

On Fri, 2010-12-03 at 10:10 +0000, Jankowski, Chris wrote:

> This is exactly what I would like to achieve.  I know which node
> should stay alive - the one running my service, and it is trivial for
> me to find this out directly, as I can query for its status locally on
> a node. I do not have use the network. This can be used as a heuristic
> for the quorum disc.
>  
> What I am missing is how to make that into a workable whole.
> Specifically the following aspects are of concern:
>  
> 1.
> I do not want the other node to be ejected from the cluster just
> because it does not run the service.  But the test is binary, so it
> looks like it will be ejected.

When a two node cluster partitions, someone has to die.

> 2.
> Startup time, before the service started.  As no node has the service,
> both will be candidates for ejection.

One node will die and the other will start the service.

> 3.
> Service migration time.
> During service migration from one node to another, there is a
> transient period of time when the service is not active on either
> node.

If you partition during a 'relocation' operation, rgmanager will
evaluate the service and start it after fencing completes.


> 1.
> How do I put all of this together to achieve the overall objective of
> the node with the service surviving the partitioning event
> uninterrupted?

As it turns out, using qdiskd to do this is not the easiest thing in the
world.  This has to do with a variety of factors, but the biggest is
that qdiskd has to make choices -before- CMAN/corosync do, so it's hard
to ensure correct behavior in this particular case.

The simplest thing I know of to do this is to selectively delay fencing.
It's a bit of a hack (though less so than using qdiskd, as it turns
out).

NOTE: This agent _MUST_ be used in conjunction with a real fencing
agent.  Put the reference to the agent before the real fencing agent
within the same method.

It might look like this:

#!/bin/sh

me=$(hostname)
service=empty1

owner=$(clustat -lfs $service | grep '^  Owner' | cut -f2 -d: ; exit
${PIPESTATUS[0]})
state=$?

echo Eval $service state $state $owner

if [ $state -eq 0 ] && [ "$owner" != "$me" ]; then
        echo Not the owner - Delaying 30 seconds
        sleep 30
fi

exit 0

What it does is give preference to the node running the service by
making the non-owner delay a bit before trying to perform real fencing
operation.  If the real owner is alive, it will fence first.  If the
service was not running before the partition, no node gets preference.

If the primary driving reason for using qdiskd was to solve this
problem, then you can you can avoid using qdiskd.

 
> 2.
> What is the relationship between  fencing and node suicide due to
> communication through quorum disk?

None.  Both occur.

> 3.
> How does the master election relate to this?

It doesn't, really.  To get a node to drop master, you have to turn
'reboot' off.  After 'reboot' is off, a node will abdicate 'master' mode
if its score drops.

-- Lon




From Chris.Jankowski at hp.com  Thu Dec  9 03:57:33 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Thu, 9 Dec 2010 03:57:33 +0000
Subject: [Linux-cluster] Heuristics for quorum disk used as a tiebreaker
 in a two node cluster.
In-Reply-To: <1291840394.3865.51.camel@ayanami.boston.devel.redhat.com>
References: <036B68E61A28CA49AC2767596576CD596F5A04ED98@GVW1113EXC.americas.hpqcorp.net>
	<1291840394.3865.51.camel@ayanami.boston.devel.redhat.com>
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A0DF47C@GVW1113EXC.americas.hpqcorp.net>

Lon,

Thank you for your suggestions. 

1.
I like very much your idea of having additional fencing agent (called as the first one in the chain) with delay dependent on the presence of the service on the node.  I understand the code.  What I do not know is what are the steps in adding my own fencing agents. They all live in /usr/sbin.  

Is it as simple as placing the new fencing agent in /usr/bin?  Is some kind of registration required e.g. so ccs_config_validate will recognise it?

2.
I'd guess that the extra fencing agent can also solve the problem of both nodes being fenced when the inter-node link goes down.  This is a distinct from the scenario where the communication through quorum disk ceases.  This will be a bonus.

3.
I am using quorum disk as a natural way to assure that the cluster of 2 nodes has quorum with just one node. I am aware of the <cman two_node="1"/> option. 

What are the advantages or disadvantages of using quorum disk for two nodes compared with no quorum disk and the two_node="1" attribute set?

Thanks and regards,

Chris Jankowski

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: Thursday, 9 December 2010 07:33
To: linux clustering
Subject: Re: [Linux-cluster] Heuristics for quorum disk used as a tiebreaker in a two node cluster.

On Fri, 2010-12-03 at 10:10 +0000, Jankowski, Chris wrote:

> This is exactly what I would like to achieve.  I know which node
> should stay alive - the one running my service, and it is trivial for
> me to find this out directly, as I can query for its status locally on
> a node. I do not have use the network. This can be used as a heuristic
> for the quorum disc.
>  
> What I am missing is how to make that into a workable whole.
> Specifically the following aspects are of concern:
>  
> 1.
> I do not want the other node to be ejected from the cluster just
> because it does not run the service.  But the test is binary, so it
> looks like it will be ejected.

When a two node cluster partitions, someone has to die.

> 2.
> Startup time, before the service started.  As no node has the service,
> both will be candidates for ejection.

One node will die and the other will start the service.

> 3.
> Service migration time.
> During service migration from one node to another, there is a
> transient period of time when the service is not active on either
> node.

If you partition during a 'relocation' operation, rgmanager will
evaluate the service and start it after fencing completes.


> 1.
> How do I put all of this together to achieve the overall objective of
> the node with the service surviving the partitioning event
> uninterrupted?

As it turns out, using qdiskd to do this is not the easiest thing in the
world.  This has to do with a variety of factors, but the biggest is
that qdiskd has to make choices -before- CMAN/corosync do, so it's hard
to ensure correct behavior in this particular case.

The simplest thing I know of to do this is to selectively delay fencing.
It's a bit of a hack (though less so than using qdiskd, as it turns
out).

NOTE: This agent _MUST_ be used in conjunction with a real fencing
agent.  Put the reference to the agent before the real fencing agent
within the same method.

It might look like this:

#!/bin/sh

me=$(hostname)
service=empty1

owner=$(clustat -lfs $service | grep '^  Owner' | cut -f2 -d: ; exit
${PIPESTATUS[0]})
state=$?

echo Eval $service state $state $owner

if [ $state -eq 0 ] && [ "$owner" != "$me" ]; then
        echo Not the owner - Delaying 30 seconds
        sleep 30
fi

exit 0

What it does is give preference to the node running the service by
making the non-owner delay a bit before trying to perform real fencing
operation.  If the real owner is alive, it will fence first.  If the
service was not running before the partition, no node gets preference.

If the primary driving reason for using qdiskd was to solve this
problem, then you can you can avoid using qdiskd.

 
> 2.
> What is the relationship between  fencing and node suicide due to
> communication through quorum disk?

None.  Both occur.

> 3.
> How does the master election relate to this?

It doesn't, really.  To get a node to drop master, you have to turn
'reboot' off.  After 'reboot' is off, a node will abdicate 'master' mode
if its score drops.

-- Lon


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From Chris.Jankowski at hp.com  Thu Dec  9 05:07:50 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Thu, 9 Dec 2010 05:07:50 +0000
Subject: [Linux-cluster] rgmanager gets stuck on shutdown,
 if no services are running on its node.
In-Reply-To: <1291837569.3865.3.camel@ayanami.boston.devel.redhat.com>
References: <036B68E61A28CA49AC2767596576CD596F5A0DEF29@GVW1113EXC.americas.hpqcorp.net>
	<1291837569.3865.3.camel@ayanami.boston.devel.redhat.com>
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A0DF4B5@GVW1113EXC.americas.hpqcorp.net>

Lon,

The problem is reproducible at will. I do have access to the system after the "shutdown -h now" command is issued and rgmanager blocks.

I have gdb installed, but I do not know how to obtain rgmanager-debuginfo. The system is on an isolated network and I pointed you to an on-disk repository that is a copy of the RHEL6 distribution DVD copied to local disk.

Thanks and regards,

Chris

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: Thursday, 9 December 2010 06:46
To: linux clustering
Subject: Re: [Linux-cluster] rgmanager gets stuck on shutdown, if no services are running on its node.

On Wed, 2010-12-08 at 03:11 +0000, Jankowski, Chris wrote:
> Hi,
>  
> I configured a cluster of 2 RHEL6 nodes.  
> The cluster has only one HA service defined.
>  
> I have a problem with rgmanager getting stuck on shutdown when certain
> set of conditions are met.  The details follow.
>  
> 1.
> If I execute ?shutdown ?h now? on the node that is *not* running the
> HA service then the shutdown process gets stuck with the last message
> in the /var/log/messages being:
>  

Is this reproducible outside of 'shutdown -h now', ex: does 'service
rgmanager stop' work in your configuration?

If you can still reach the machine (ssh or whatever) after executing
'shutdown -h now':

1) Install 'rgmanager-debuginfo' and gdb.

2) When rgmanager hangs on shutdown, run:

  - gdb /usr/sbin/rgmanager `pidof -s rgmanager`

3) When inside gdb, run:

  - thr a a bt

There's a related bug in RHEL5 related to releasing the lockspace if
CMAN exits before rgmanager, but I was unable to reproduce it on the
STABLE3/31 branches when I tested.

-- Lon


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From Chris.Jankowski at hp.com  Thu Dec  9 05:09:44 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Thu, 9 Dec 2010 05:09:44 +0000
Subject: [Linux-cluster] How do I implement an unmount only filesystem
 resource agent
In-Reply-To: <1291837767.3865.7.camel@ayanami.boston.devel.redhat.com>
References: <036B68E61A28CA49AC2767596576CD596F5A04F2FF@GVW1113EXC.americas.hpqcorp.net>
	<1291837767.3865.7.camel@ayanami.boston.devel.redhat.com>
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A0DF4B7@GVW1113EXC.americas.hpqcorp.net>

Lon,

Thank you for your suggestion.

In the meantime, I developed a script to do the unmount of a snapshot on stop and configured it as an additional resource agent of the type script.

This works very well.

Regards,

Chris Jankowski

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: Thursday, 9 December 2010 06:49
To: linux clustering
Subject: Re: [Linux-cluster] How do I implement an unmount only filesystem resource agent

On Mon, 2010-12-06 at 12:27 +0000, Jankowski, Chris wrote:
>  
> To have a foolproof HA service I need to:
>  
>       * Check, if the snapshot filesystem is mounted
>       * If it is, all processes running in it need to be killed
>       * Then the snapshot filesystem needs to be unmounted.


>  
> I could configure a script resource with a script that would do the 3
> steps listed above as part of its stop action.  It would have
> essentially null start and status actions.
>  
> Is there a better, more elegant way of achieving the same result e.g.
> using the filesystem resource?

In theory you could delete the 'start' operation from the agent, but I
think rgmanager will ignore that and try to start it anyway...

You could edit the 'fs' agent and make the 'stop' and 'status'
operations return 0 immediately, though.

-- Lon


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From Chris.Jankowski at hp.com  Thu Dec  9 06:58:41 2010
From: Chris.Jankowski at hp.com (Jankowski, Chris)
Date: Thu, 9 Dec 2010 06:58:41 +0000
Subject: [Linux-cluster] rgmanager gets stuck on shutdown,
 if no services are running on its node.
References: <036B68E61A28CA49AC2767596576CD596F5A0DEF29@GVW1113EXC.americas.hpqcorp.net>
	<1291837569.3865.3.camel@ayanami.boston.devel.redhat.com> 
Message-ID: <036B68E61A28CA49AC2767596576CD596F5A0DF556@GVW1113EXC.americas.hpqcorp.net>

Lon,

I think that I got to the bottom of the problem:

If there are *no* services running on a node and you issue "shutdown -h now" on the node, then when it comes to shutting down rgmanger, it executes the following sequence:

1. Outputs "Shutting down" message to /var/adm/messages
2. Waits for the "status_poll_interval" value of seconds
3. Outputs the message: "Shutdown complete, exiting" and completes its own shutdown.

In my case, I had <rm status_poll_interval="3600"/>, as my service scripts do not have a viable check of their status, and the status check messages were clogging up the /var/adm/messages file.  So, rgmanager appeared to be stuck, whereas it was just really waiting.

I think this is a bug in logic here.  It should not be waiting in this situation.

------------
By comparison, if there is a service running on a node and you issue "shutdown -h now" on the node, then when it comes to shutting down rgmanger, it executes the following sequence:

1. Outputs "Shutting down" message to /var/adm/messages
2. Proceeds *immediately* (no wait) to shutting down the service
3. When the service is shutdown the rgmanager *immediately* outputs "Shutdown complete, exiting" and completes its own shutdown.

-------------
As a workaround, I set status_poll_interval="10" for the time being, although I believe that I should be forced to rely on short polling interval.

Regards,

Chris Jankowski

-----Original Message-----
From: Jankowski, Chris 
Sent: Thursday, 9 December 2010 16:08
To: linux clustering
Subject: RE: [Linux-cluster] rgmanager gets stuck on shutdown, if no services are running on its node.

Lon,

The problem is reproducible at will. I do have access to the system after the "shutdown -h now" command is issued and rgmanager blocks.

I have gdb installed, but I do not know how to obtain rgmanager-debuginfo. The system is on an isolated network and I pointed you to an on-disk repository that is a copy of the RHEL6 distribution DVD copied to local disk.

Thanks and regards,

Chris

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: Thursday, 9 December 2010 06:46
To: linux clustering
Subject: Re: [Linux-cluster] rgmanager gets stuck on shutdown, if no services are running on its node.

On Wed, 2010-12-08 at 03:11 +0000, Jankowski, Chris wrote:
> Hi,
>  
> I configured a cluster of 2 RHEL6 nodes.  
> The cluster has only one HA service defined.
>  
> I have a problem with rgmanager getting stuck on shutdown when certain
> set of conditions are met.  The details follow.
>  
> 1.
> If I execute ?shutdown ?h now? on the node that is *not* running the
> HA service then the shutdown process gets stuck with the last message
> in the /var/log/messages being:
>  

Is this reproducible outside of 'shutdown -h now', ex: does 'service
rgmanager stop' work in your configuration?

If you can still reach the machine (ssh or whatever) after executing
'shutdown -h now':

1) Install 'rgmanager-debuginfo' and gdb.

2) When rgmanager hangs on shutdown, run:

  - gdb /usr/sbin/rgmanager `pidof -s rgmanager`

3) When inside gdb, run:

  - thr a a bt

There's a related bug in RHEL5 related to releasing the lockspace if
CMAN exits before rgmanager, but I was unable to reproduce it on the
STABLE3/31 branches when I tested.

-- Lon


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From rossnick-lists at cybercat.ca  Fri Dec 10 15:22:26 2010
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Fri, 10 Dec 2010 10:22:26 -0500
Subject: [Linux-cluster] To SELinux or not to SELinux ?
Message-ID: <4F027F4093FA4317ACB29F6C64D6C867@versa>

Over the CentOS-users list there is a long on-going thread about SELinux.
Since it's introduction a while back, I alwasy disabled selinux because of
the added complexity and never took the time to learn it.

For our soon to be production cluster of 8 nodes, I will be attempting to at
least set selinux at permissive to see how it works and learn it. Our
services are mostly of 3 type. Database server, apache server, our own
compile, and used in a non-standard locations and java servers, using the
default java, application and data directory on the gfs shared storage.

So, for a cluster, using fencing, gfs, and all the needed tools to run a
cluster, is there any reason not to use selinux ? I am looking to see if
cluster operator use or do not use selinux...

Thanks,
Nicolas 



From deJongm at TEOCO.com  Fri Dec 10 16:29:37 2010
From: deJongm at TEOCO.com (de Jong, Mark-Jan)
Date: Fri, 10 Dec 2010 11:29:37 -0500
Subject: [Linux-cluster] To SELinux or not to SELinux ?
In-Reply-To: <4F027F4093FA4317ACB29F6C64D6C867@versa>
References: <4F027F4093FA4317ACB29F6C64D6C867@versa>
Message-ID: <5E3DCAE61C95FA4397679425D7275D264F66B3A2@HQ-MX03.us.teo.earth>

Hello, 
I've had my share of conversations with the RH cluster folks regarding SELinux. They're answer at the time was (at least regarding RHEL5) that RH cluster suite was not certified to work with SELinux enabled. I HAVE made it work, but there were many instances where kernel or package updates ended up breaking it again. In the end I gave up due to time constraints and set SELinux to permissive in hopes to revisit it again sometime in the future.

Hope that helps.

-M

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Nicolas Ross
Sent: Friday, December 10, 2010 10:22 AM
To: linux clustering
Subject: [Linux-cluster] To SELinux or not to SELinux ?

Over the CentOS-users list there is a long on-going thread about SELinux.
Since it's introduction a while back, I alwasy disabled selinux because of
the added complexity and never took the time to learn it.

For our soon to be production cluster of 8 nodes, I will be attempting to at
least set selinux at permissive to see how it works and learn it. Our
services are mostly of 3 type. Database server, apache server, our own
compile, and used in a non-standard locations and java servers, using the
default java, application and data directory on the gfs shared storage.

So, for a cluster, using fencing, gfs, and all the needed tools to run a
cluster, is there any reason not to use selinux ? I am looking to see if
cluster operator use or do not use selinux...

Thanks,
Nicolas 

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From emsearcy at gmail.com  Fri Dec 10 16:37:41 2010
From: emsearcy at gmail.com (Eric Searcy)
Date: Fri, 10 Dec 2010 08:37:41 -0800
Subject: [Linux-cluster] To SELinux or not to SELinux ?
In-Reply-To: <4F027F4093FA4317ACB29F6C64D6C867@versa>
References: <4F027F4093FA4317ACB29F6C64D6C867@versa>
Message-ID: <AANLkTinJJJs2xMndKeTmQKvzh9KtJGgb+niERKXRpPNk@mail.gmail.com>

On Fri, Dec 10, 2010 at 7:22 AM, Nicolas Ross
<rossnick-lists at cybercat.ca> wrote:
> Over the CentOS-users list there is a long on-going thread about SELinux.
> Since it's introduction a while back, I alwasy disabled selinux because of
> the added complexity and never took the time to learn it.
>
> For our soon to be production cluster of 8 nodes, I will be attempting to at
> least set selinux at permissive to see how it works and learn it. Our
> services are mostly of 3 type. Database server, apache server, our own
> compile, and used in a non-standard locations and java servers, using the
> default java, application and data directory on the gfs shared storage.
>
> So, for a cluster, using fencing, gfs, and all the needed tools to run a
> cluster, is there any reason not to use selinux ? I am looking to see if
> cluster operator use or do not use selinux...

As far as RHCS (at least on 5) is concerned, there are notes that
SELinux isn't supported.  In other words those packages don't set
labels properly or add policy modules that would be needed.  Of
course, that doesn't stop you from using audit2allow to "clean up" the
denies you find while running in permissive (some denies will only
show up during boot).  I also locked myself out of the entire cluster
once and had to use a kernel append option to disable selinux :-)

I decided to run enforcing for greater defense in depth, but for the
time being on everything except RHCS.  For all my other boxes, I
switch it to permissive before minor dist upgrades and then set each
box back to enforcing after the next reboot without denies (I've been
doing this since 5.3, when updates to the enforcing policy broke a
bunch of labeling stuff and I was putting out fires since everything
was in enforcing still).

Eric



From Colin.Simpson at iongeo.com  Fri Dec 10 17:04:37 2010
From: Colin.Simpson at iongeo.com (Colin Simpson)
Date: Fri, 10 Dec 2010 17:04:37 +0000
Subject: [Linux-cluster] To SELinux or not to SELinux ?
In-Reply-To: <AANLkTinJJJs2xMndKeTmQKvzh9KtJGgb+niERKXRpPNk@mail.gmail.com>
References: <4F027F4093FA4317ACB29F6C64D6C867@versa>
	<AANLkTinJJJs2xMndKeTmQKvzh9KtJGgb+niERKXRpPNk@mail.gmail.com>
Message-ID: <1292000677.6237.71.camel@cowie>

I seem to now be supported on RHEL 6 according to the Cluster Admin
Guide.


Colin


On Fri, 2010-12-10 at 16:37 +0000, Eric Searcy wrote:
> On Fri, Dec 10, 2010 at 7:22 AM, Nicolas Ross
> <rossnick-lists at cybercat.ca> wrote:
> > Over the CentOS-users list there is a long on-going thread about
> SELinux.
> > Since it's introduction a while back, I alwasy disabled selinux
> because of
> > the added complexity and never took the time to learn it.
> >
> > For our soon to be production cluster of 8 nodes, I will be
> attempting to at
> > least set selinux at permissive to see how it works and learn it.
> Our
> > services are mostly of 3 type. Database server, apache server, our
> own
> > compile, and used in a non-standard locations and java servers,
> using the
> > default java, application and data directory on the gfs shared
> storage.
> >
> > So, for a cluster, using fencing, gfs, and all the needed tools to
> run a
> > cluster, is there any reason not to use selinux ? I am looking to
> see if
> > cluster operator use or do not use selinux...
> 
> As far as RHCS (at least on 5) is concerned, there are notes that
> SELinux isn't supported.  In other words those packages don't set
> labels properly or add policy modules that would be needed.  Of
> course, that doesn't stop you from using audit2allow to "clean up" the
> denies you find while running in permissive (some denies will only
> show up during boot).  I also locked myself out of the entire cluster
> once and had to use a kernel append option to disable selinux :-)
> 
> I decided to run enforcing for greater defense in depth, but for the
> time being on everything except RHCS.  For all my other boxes, I
> switch it to permissive before minor dist upgrades and then set each
> box back to enforcing after the next reboot without denies (I've been
> doing this since 5.3, when updates to the enforcing policy broke a
> bunch of labeling stuff and I was putting out fires since everything
> was in enforcing still).
> 
> Eric
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.





From jeff.sturm at eprize.com  Fri Dec 10 18:03:43 2010
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Fri, 10 Dec 2010 13:03:43 -0500
Subject: [Linux-cluster] To SELinux or not to SELinux ?
In-Reply-To: <4F027F4093FA4317ACB29F6C64D6C867@versa>
References: <4F027F4093FA4317ACB29F6C64D6C867@versa>
Message-ID: <64D0546C5EBBD147B75DE133D798665F06A128A8@hugo.eprize.local>

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com]
> On Behalf Of Nicolas Ross
> Sent: Friday, December 10, 2010 10:22 AM
> To: linux clustering
> Subject: [Linux-cluster] To SELinux or not to SELinux ?
> 
> So, for a cluster, using fencing, gfs, and all the needed tools to run
a cluster, is there
> any reason not to use selinux ? I am looking to see if cluster
operator use or do not
> use selinux...

Beware that "permissive" mode, far from being benign, can be as
expensive as having SELinux enabled.  See
http://www.mail-archive.com/linux-cluster at redhat.com/msg08317.html for
some details on GFS and extended attributes.

-Jeff





From lhh at redhat.com  Fri Dec 10 18:15:51 2010
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 10 Dec 2010 13:15:51 -0500
Subject: [Linux-cluster] Heuristics for quorum disk used as a tiebreaker
 in a two node cluster.
In-Reply-To: <036B68E61A28CA49AC2767596576CD596F5A0DF47C@GVW1113EXC.americas.hpqcorp.net>
References: <036B68E61A28CA49AC2767596576CD596F5A04ED98@GVW1113EXC.americas.hpqcorp.net>
	<1291840394.3865.51.camel@ayanami.boston.devel.redhat.com>
	<036B68E61A28CA49AC2767596576CD596F5A0DF47C@GVW1113EXC.americas.hpqcorp.net>
Message-ID: <1292004951.7139.113.camel@localhost.localdomain>

On Thu, 2010-12-09 at 03:57 +0000, Jankowski, Chris wrote:
> Lon,
> 
> Thank you for your suggestions. 
> 
> 1.
> I like very much your idea of having additional fencing agent (called as the first one in the chain) with delay dependent on the presence of the service on the node.  I understand the code.  What I do not know is what are the steps in adding my own fencing agents. They all live in /usr/sbin.  
> 
> Is it as simple as placing the new fencing agent in /usr/bin?  Is some kind of registration required e.g. so ccs_config_validate will recognise it?

You can put the absolute path in the fencedevice tag:

  <fencedevice agent="/my/custom/agent" ... />

Your agent should not have extra parameters.

Also, I think my first inclination was wrong; you shouldn't combine it
with other devices in the same level.  My apologies.

Instead:

* Your script should *always* exit 1 (failure).  The only thing we want
this script to do is sleep if the service is running on the other guy;
we do not want it to feed fenced any sort of "success" value - ever.

* If you leave it returning 0 and you delete your "real" fencedevice
later, your data will be at risk.

So, make the script return 1 (always) and the cluster.conf would look
like this:


<clusternode>
  <clusternode name="xxx" >
    <fence>
      <method name="delayservice">
        <device name="delayservice" />
      </method>
      <method name="1">
        <device name="ilo" ... />
      </method>
    </fence>
  </clusternode>
  ...
</clusternode>
...
<fencedevices>
  ...
  <fencedevice name="delayservice" agent="/my/custom/agent.sh" />
</fencedevices>
...


> 2.
> I'd guess that the extra fencing agent can also solve the problem of both nodes being fenced when the inter-node link goes down.  This is a distinct from the scenario where the communication through quorum disk ceases.  This will be a bonus.

That's actually what it does...


> 3.
> I am using quorum disk as a natural way to assure that the cluster of 2 nodes has quorum with just one node. I am aware of the <cman two_node="1"/> option. 

Ok.  With the custom agent, you can use pretty much no heuristics.
Qdiskd will auto-configure everything for you (see below though).

However, when using qdiskd for your configuration (where you are using a
custom, extra fencing agent to delay fencing based on service location),
you should explicitly set master_wins to 0.


> What are the advantages or disadvantages of using quorum disk for two nodes compared with no quorum disk and the two_node="1" attribute set?

If you have a cluster where the fence devices are accessible only over
the same network as the cluster communicates, there is no real advantage
to using qdiskd.

If you have the fencing devices on a separate network than the cluster
uses for communication, then using qdiskd can prevent three fencing
problems: a fence race, fence death, and a fencing loop.

http://people.redhat.com/lhh/ClusterPitfalls.pdf

* The delayservice hack eliminates the fencing race.

* Qdiskd holds off fence-loops, but fence-death can still occur in rare
cases when simultaneously starting both cluster nodes from a total
outage.


-- Lon



From rossnick-lists at cybercat.ca  Fri Dec 10 18:20:52 2010
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Fri, 10 Dec 2010 13:20:52 -0500
Subject: [Linux-cluster] To SELinux or not to SELinux ?
References: <4F027F4093FA4317ACB29F6C64D6C867@versa>
	<64D0546C5EBBD147B75DE133D798665F06A128A8@hugo.eprize.local>
Message-ID: <B00DEE694FB54A0DA1B3ECFBBCC1C940@versa>


>> So, for a cluster, using fencing, gfs, and all the needed tools to run
> a cluster, is there
>> any reason not to use selinux ? I am looking to see if cluster
> operator use or do not
>> use selinux...
>
> Beware that "permissive" mode, far from being benign, can be as
> expensive as having SELinux enabled.  See
> http://www.mail-archive.com/linux-cluster at redhat.com/msg08317.html for
> some details on GFS and extended attributes.

Oh... I didn't tought of performance influence... That alone is enough to 
keep it off completly. We will be hosting a high-volume site where every 
millisecond counts. That site is composed of about a million files of 
different sorts. So, any added delay in accessing a file is not an option. 



From lhh at redhat.com  Fri Dec 10 18:24:56 2010
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 10 Dec 2010 13:24:56 -0500
Subject: [Linux-cluster] rgmanager gets stuck on shutdown,
 if no services are running on its node.
In-Reply-To: <036B68E61A28CA49AC2767596576CD596F5A0DF556@GVW1113EXC.americas.hpqcorp.net>
References: <036B68E61A28CA49AC2767596576CD596F5A0DEF29@GVW1113EXC.americas.hpqcorp.net>
	<1291837569.3865.3.camel@ayanami.boston.devel.redhat.com>
	<036B68E61A28CA49AC2767596576CD596F5A0DF556@GVW1113EXC.americas.hpqcorp.net>
Message-ID: <1292005496.7139.132.camel@localhost.localdomain>

On Thu, 2010-12-09 at 06:58 +0000, Jankowski, Chris wrote:
> Lon,
> 
> I think that I got to the bottom of the problem:
> 
> If there are *no* services running on a node and you issue "shutdown -h now" on the node, then when it comes to shutting down rgmanger, it executes the following sequence:
> 
> 1. Outputs "Shutting down" message to /var/adm/messages
> 2. Waits for the "status_poll_interval" value of seconds
> 3. Outputs the message: "Shutdown complete, exiting" and completes its own shutdown.
> 
> In my case, I had <rm status_poll_interval="3600"/>, as my service scripts do not have a viable check of their status, and the status check messages were clogging up the /var/adm/messages file.  So, rgmanager appeared to be stuck, whereas it was just really waiting.

You should just turn off status checks for your script:

  <rm>
    <service name="foo" >
      <script name="myscript" file... >
        <!-- 1 year status check interval -->
        <action name="status" depth="*" interval="1y" />
      </script>
      ..
    </service
    ..
  </rm>

That should make things work.

-- Lona



From dxh at yahoo.com  Fri Dec 10 19:27:49 2010
From: dxh at yahoo.com (Don Hoover)
Date: Fri, 10 Dec 2010 14:27:49 -0500
Subject: [Linux-cluster] To SELinux or not to SELinux ?
In-Reply-To: <mailman.41.1292000407.32109.linux-cluster@redhat.com>
References: <mailman.41.1292000407.32109.linux-cluster@redhat.com>
Message-ID: <118D4152-E0D7-4456-986E-25695579E436@yahoo.com>

I have been working with RHEL6 and SElinux in targeted and enforcing mode works really well with everything I have tried it with including cluster and KVM. 

They have done a much better job with having policies that just work with most all of the software that comes on the distro.  And the new 'managing secure services' manual on docs.redhat has lots of examples on what you need to do when you step outside of the defaults like how to add non-default directories(eg outside of var/www) for apache, mysql, KVM etc. 

I am shooting for rhel6 to be our first build that has SElinux on by default. 

For the first time I think SElinux might be low enough a hassle that it can be left on. 

Sent from my iPhone

On Dec 10, 2010, at 12:00 PM, linux-cluster-request at redhat.com wrote:

> Re: [Linux-cluster] To SELinux or not to SELinux ?



From pmdyer at ctgcentral2.com  Fri Dec 10 19:34:52 2010
From: pmdyer at ctgcentral2.com (Paul M. Dyer)
Date: Fri, 10 Dec 2010 13:34:52 -0600 (CST)
Subject: [Linux-cluster] To SELinux or not to SELinux ?
In-Reply-To: <B00DEE694FB54A0DA1B3ECFBBCC1C940@versa>
Message-ID: <4690726.2.1292009692820.JavaMail.root@athena>

Hi,

I have used selinux enforcing since RHEL 5.4 on a 3-node RHCS cluster.   I believe it has been supported since that release.   I made some calls back in RHEL 5.3 regarding some issues, but all problems that I experienced have been resolved.   I got plenty of support for my issues.

According to Dan Walsh, performance was addressed early on.   I have not had any performance issues using selinux in RHEL 5, RHCS included.

Paul


----- Original Message -----
From: "Nicolas Ross" <rossnick-lists at cybercat.ca>
To: "linux clustering" <linux-cluster at redhat.com>
Sent: Friday, December 10, 2010 12:20:52 PM
Subject: Re: [Linux-cluster] To SELinux or not to SELinux ?

>> So, for a cluster, using fencing, gfs, and all the needed tools to
>> run
> a cluster, is there
>> any reason not to use selinux ? I am looking to see if cluster
> operator use or do not
>> use selinux...
>
> Beware that "permissive" mode, far from being benign, can be as
> expensive as having SELinux enabled. See
> http://www.mail-archive.com/linux-cluster at redhat.com/msg08317.html for
> some details on GFS and extended attributes.

Oh... I didn't tought of performance influence... That alone is enough
to keep it off completly. We will be hosting a high-volume site where
every millisecond counts. That site is composed of about a million files
of different sorts. So, any added delay in accessing a file is not an
option.

-- Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From jeff.sturm at eprize.com  Fri Dec 10 20:30:57 2010
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Fri, 10 Dec 2010 15:30:57 -0500
Subject: [Linux-cluster] To SELinux or not to SELinux ?
In-Reply-To: <4690726.2.1292009692820.JavaMail.root@athena>
References: <B00DEE694FB54A0DA1B3ECFBBCC1C940@versa>
	<4690726.2.1292009692820.JavaMail.root@athena>
Message-ID: <64D0546C5EBBD147B75DE133D798665F06A128B3@hugo.eprize.local>

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com]
> On Behalf Of Paul M. Dyer
> Sent: Friday, December 10, 2010 2:35 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] To SELinux or not to SELinux ?
>
> According to Dan Walsh, performance was addressed early on.   I have
not had any
> performance issues using selinux in RHEL 5, RHCS included.

Results will probably vary depending on what components you need, and
what versions you run.  For us, SELinux incurred a 30% overhead with GFS
file operations.  That was on CentOS 5.2 or 5.3, can't remember which.
(We're in the middle of an upgrade to 5.5, but haven't started migrating
to GFS2.)

But don't take my word for it, or anyone else's... always benchmark your
own application.

-Jeff





From linux-cluster at redhat.com  Mon Dec 13 11:51:05 2010
From: linux-cluster at redhat.com (Mailbot for etexusa.com)
Date: Mon, 13 Dec 2010 03:51:05 -0800
Subject: [Linux-cluster] DSN: failed (Delivery reports about your e-mail)
Message-ID: <mAWtxIeCDcrRuXn2p02@etexusa.com>


This is a Delivery Status Notification (DSN).

I was unable to deliver your message to
pgmarshall at worldnet.att.net.

I said 
  RCPT TO:<pgmarshall at worldnet.att.net>

And they gave me the error;
  551 not our customer

 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/rfc822-headers
Size: 523 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101213/a4e35b86/attachment.bin>

From paolo.smiraglia at gmail.com  Mon Dec 13 18:17:36 2010
From: paolo.smiraglia at gmail.com (Paolo Smiraglia)
Date: Mon, 13 Dec 2010 19:17:36 +0100
Subject: [Linux-cluster] Clustered LVM locking issues
Message-ID: <AANLkTikmz=5LybhP7X+C-n3vkqCkPmmRoeCF32zdZt4m@mail.gmail.com>

Hi to every ones....

We have configured a shared storage between some nodes with iSCSI, and
we want to use LVM across them.

Our access model is based on syncronized commands remotely executed by
the master node with ssh (authenticated by public key without
password).

The master node is the one that can create/remove logical volumes
which are a snapshot of a "base" logical voume.

Other nodes exclusively access logical volumes created by master node.

In order to do so, we have installed in all nodes

  * RedHat Enterprise 6 beta2
  * Cluster Suite
  * CLVMD

Then we have configured LVM with

   locking_type=3 (lvm.conf)

and fist attept we marked the volume group as "clustered".

Unfortunately we got an error message that is saying snapshot is not
supoprted for clustered volume groups. In order to overcome this issue
we used a workaround that conists in clearing the clustered flag in
the volume group. However, this caused some concurrency problems that
prevent a logical volume to be removed even if not used.

Do you have a solution for this problem? Thanks in advance for replies...


-- 
PAOLO SMIRAGLIA
http://portale.isf.polito.it/paolo-smiraglia



From james.hofmeister at hp.com  Tue Dec 14 06:28:15 2010
From: james.hofmeister at hp.com (Hofmeister, James (WTEC Linux))
Date: Tue, 14 Dec 2010 06:28:15 +0000
Subject: [Linux-cluster] RHCS & snmpd[30622]: Received SNMP packet(s) from
	UDP
Message-ID: <EC61DD7B6048464AB0E1B713AF7521BC191BEF7295@GVW0676EXC.americas.hpqcorp.net>

Hello folks,

RE: SNMP packets from local host.

6-10 per minute 

Dec 10 07:18:57 host1 snmpd[30622]: Connection from UDP: [127.0.0.1]:49231
Dec 10 07:18:57 host1 snmpd[30622]: Received SNMP packet(s) from UDP: [127.0.0.1]:49231

I see this quite often in RHCS clusters and have not determined if the source is a function of RHCS or if this is one of the HP health agents.

I am aware that these messages can be turned off in the SNMP configuration, I am more interested in the source.

Any feedback would be appreciated.

Regards, 
  James Hofmeister? 
  Hewlett Packard Linux Solutions Engineer




From raju.rajsand at gmail.com  Tue Dec 14 08:04:12 2010
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Tue, 14 Dec 2010 13:34:12 +0530
Subject: [Linux-cluster] RHCS & snmpd[30622]: Received SNMP packet(s)
	from UDP
In-Reply-To: <EC61DD7B6048464AB0E1B713AF7521BC191BEF7295@GVW0676EXC.americas.hpqcorp.net>
References: <EC61DD7B6048464AB0E1B713AF7521BC191BEF7295@GVW0676EXC.americas.hpqcorp.net>
Message-ID: <AANLkTi=23Yk6kJKv_TwMB9VKg+KqkWBP5e0++Ha9Aqf=@mail.gmail.com>

Greetings,

On Tue, Dec 14, 2010 at 11:58 AM, Hofmeister, James (WTEC Linux)
<james.hofmeister at hp.com> wrote:
> Hello folks,
>
> Dec 10 07:18:57 host1 snmpd[30622]: Connection from UDP: [127.0.0.1]:49231
> Dec 10 07:18:57 host1 snmpd[30622]: Received SNMP packet(s) from UDP: [127.0.0.1]:49231
>
> I see this quite often in RHCS clusters and have not determined if the source is a function of RHCS or if this is one of the HP health agents.
>
> I am aware that these messages can be turned off in the SNMP configuration, I am more interested in the source.
>
> Any feedback would be appreciated.
>

Looks to me like the HP agent. only the agent shouts peridically
usually in snmp...

/scurries Hmmm.... where are my snmp notes?


Regards,

Rajagopal



From kitgerrits at gmail.com  Tue Dec 14 08:15:07 2010
From: kitgerrits at gmail.com (Kit Gerrits)
Date: Tue, 14 Dec 2010 09:15:07 +0100
Subject: [Linux-cluster] Clustered LVM locking issues
In-Reply-To: <AANLkTikmz=5LybhP7X+C-n3vkqCkPmmRoeCF32zdZt4m@mail.gmail.com>
Message-ID: <4d07278e.1211cc0a.3573.37ab@mx.google.com>


Hello,


I might have misunderstood, but:

I am assuming one machine is exporting local storage via iSCSI.
Might it be easier to use LVM om the base storage device and take your
snapshot there?
(by exporting a LV as iSCSI target)
(keep inmind, this machine would be a SPOF for the entire cluster)


If you are using a Shared Storage Device and exporting the iSCSI target
using a cluster, this would be a problem, as this would also be a Clustered
LV.
In this case, advanced Shared Storage Devices offer internal LVM snapshots,
which do no interfere with the LUN. (HP MSA devices offer this)
It will allow you to chreate a LVM shapshot of your (iSCSI/SCSI/FC)LUN
without interfering with the 'O/S LVM layer', therefore allowing you to
continue exporting your device.


Regards,

Kit

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Paolo Smiraglia
Sent: maandag 13 december 2010 19:18
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Clustered LVM locking issues

Hi to every ones....

We have configured a shared storage between some nodes with iSCSI, and we
want to use LVM across them.

Our access model is based on syncronized commands remotely executed by the
master node with ssh (authenticated by public key without password).

The master node is the one that can create/remove logical volumes which are
a snapshot of a "base" logical voume.

Other nodes exclusively access logical volumes created by master node.

In order to do so, we have installed in all nodes

  * RedHat Enterprise 6 beta2
  * Cluster Suite
  * CLVMD

Then we have configured LVM with

   locking_type=3 (lvm.conf)

and fist attept we marked the volume group as "clustered".

Unfortunately we got an error message that is saying snapshot is not
supoprted for clustered volume groups. In order to overcome this issue we
used a workaround that conists in clearing the clustered flag in the volume
group. However, this caused some concurrency problems that prevent a logical
volume to be removed even if not used.

Do you have a solution for this problem? Thanks in advance for replies...


--
PAOLO SMIRAGLIA
http://portale.isf.polito.it/paolo-smiraglia

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From yamato at redhat.com  Tue Dec 14 12:02:16 2010
From: yamato at redhat.com (Masatake YAMATO)
Date: Tue, 14 Dec 2010 21:02:16 +0900 (JST)
Subject: [Linux-cluster] [Openais] packet dissectors for totempg, cman,
 clvmd, rgmanager, cpg, 
In-Reply-To: <20100527.133950.593311767624382812.yamato@redhat.com>
References: <20100527.132034.642044848160830535.yamato@redhat.com>
	<4BFDF51C.4000808@redhat.com>
	<20100527.133950.593311767624382812.yamato@redhat.com>
Message-ID: <20101214.210216.512133496326900668.yamato@redhat.com>

https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232

The patch for wireshark is not merged yet. Reviewing is very slow or
the patch may be rejected implicitly.

So I decide to provide my dissector as a dyanamic loadable plugin. It
can be built on Fedora-14 with wireshark-devel package.


You, working on RHEL6, may not be interested in ccsd.


https://github.com/masatake/wireshark-plugin-rhcs

I will maintain this source tree. Forking are welcome.

I'd like to make it a rpm package and be available as a part of
Fedora. But I cannot find enough time to be a package maintainer 
now.

Masatake YAMATO



From ccaulfie at redhat.com  Tue Dec 14 13:38:02 2010
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 14 Dec 2010 13:38:02 +0000
Subject: [Linux-cluster] [Openais] packet dissectors for totempg, cman,
 clvmd, rgmanager, cpg, 
In-Reply-To: <20101214.210216.512133496326900668.yamato@redhat.com>
References: <20100527.132034.642044848160830535.yamato@redhat.com>	<4BFDF51C.4000808@redhat.com>	<20100527.133950.593311767624382812.yamato@redhat.com>
	<20101214.210216.512133496326900668.yamato@redhat.com>
Message-ID: <4D07733A.9030300@redhat.com>

Awesome!

Thank you :-)

Chrissie

On 14/12/10 12:02, Masatake YAMATO wrote:
> https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232
>
> The patch for wireshark is not merged yet. Reviewing is very slow or
> the patch may be rejected implicitly.
>
> So I decide to provide my dissector as a dyanamic loadable plugin. It
> can be built on Fedora-14 with wireshark-devel package.
>
>
> You, working on RHEL6, may not be interested in ccsd.
>
>
> https://github.com/masatake/wireshark-plugin-rhcs
>
> I will maintain this source tree. Forking are welcome.
>
> I'd like to make it a rpm package and be available as a part of
> Fedora. But I cannot find enough time to be a package maintainer
> now.
>
> Masatake YAMATO
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jfriesse at redhat.com  Tue Dec 14 13:53:32 2010
From: jfriesse at redhat.com (Jan Friesse)
Date: Tue, 14 Dec 2010 14:53:32 +0100
Subject: [Linux-cluster] [Openais] packet dissectors for totempg, cman,
 clvmd, rgmanager, cpg, 
In-Reply-To: <20101214.210216.512133496326900668.yamato@redhat.com>
References: <20100527.132034.642044848160830535.yamato@redhat.com>	<4BFDF51C.4000808@redhat.com>	<20100527.133950.593311767624382812.yamato@redhat.com>
	<20101214.210216.512133496326900668.yamato@redhat.com>
Message-ID: <4D0776DC.9080003@redhat.com>

Masatake,
I'm pretty sure that biggest problem of your code was that it was 
licensed under BSD (three clause, same as Corosync has) license. 
Wireshark is licensed under GPL and even I like BSD licenses much more, 
I would recommend you to try to relicense code under GPL and send them 
this code.

Regards,
   Honza

Masatake YAMATO napsal(a):
> https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232
> 
> The patch for wireshark is not merged yet. Reviewing is very slow or
> the patch may be rejected implicitly.
> 
> So I decide to provide my dissector as a dyanamic loadable plugin. It
> can be built on Fedora-14 with wireshark-devel package.
> 
> 
> You, working on RHEL6, may not be interested in ccsd.
> 
> 
> https://github.com/masatake/wireshark-plugin-rhcs
> 
> I will maintain this source tree. Forking are welcome.
> 
> I'd like to make it a rpm package and be available as a part of
> Fedora. But I cannot find enough time to be a package maintainer 
> now.
> 
> Masatake YAMATO
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais



From yamato at redhat.com  Tue Dec 14 14:15:25 2010
From: yamato at redhat.com (Masatake YAMATO)
Date: Tue, 14 Dec 2010 23:15:25 +0900 (JST)
Subject: [Linux-cluster] [Openais] packet dissectors for totempg, cman,
 clvmd, rgmanager, cpg, 
In-Reply-To: <4D0776DC.9080003@redhat.com>
References: <20100527.133950.593311767624382812.yamato@redhat.com>
	<20101214.210216.512133496326900668.yamato@redhat.com>
	<4D0776DC.9080003@redhat.com>
Message-ID: <20101214.231525.648039044490713397.yamato@redhat.com>

I'd like to your advice more detail seriously.
I've been developing this code for three years.
I don't want to make this code garbage.

> Masatake,
> I'm pretty sure that biggest problem of your code was that it was
> licensed under BSD (three clause, same as Corosync has)
> license. Wireshark is licensed under GPL and even I like BSD licenses
> much more, I would recommend you to try to relicense code under GPL
> and send them this code.
> 
> Regards,
>   Honza

I got the similar comment from wireshark developer.

Please, read the discussion:
	https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232


In my understanding there is no legal problem in putting 3-clause BSD
code into GPL code.  Acutally wireshark includes some 3-clause BSD
code:

epan/dissectors/packet-radiotap-defs.h:
/*-
 * Copyright (c) 2003, 2004 David Young.  All rights reserved.
 *
 * $Id: packet-radiotap-defs.h 34554 2010-10-18 13:24:10Z morriss $
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. The name of David Young may not be used to endorse or promote
 *    products derived from this software without specific prior
 *    written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY DAVID YOUNG ``AS IS'' AND ANY
 * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
 * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
 * PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL DAVID
 * YOUNG BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
 * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
 * OF SUCH DAMAGE.
 */

I'd like to separate the legal issue and preference. 
I think I understand the importance of preference of upstream 
developers. However, I'd like to clear the legal 
issue first.


I can image there are people who prefer to GPL as the license covering
their software. But here I've taken some corosync code in my
dissector. It is essential part of my dissector. And corosync is
licensed in 3-clause BSD, as you know. I'd like to change the license
to merge my code to upstream project. I cannot do it in this context.

See https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232#c13

Thank you.



From jfriesse at redhat.com  Tue Dec 14 14:51:17 2010
From: jfriesse at redhat.com (Jan Friesse)
Date: Tue, 14 Dec 2010 15:51:17 +0100
Subject: [Linux-cluster] [Openais] packet dissectors for totempg, cman,
 clvmd, rgmanager, cpg, 
In-Reply-To: <20101214.231525.648039044490713397.yamato@redhat.com>
References: <20100527.133950.593311767624382812.yamato@redhat.com>	<20101214.210216.512133496326900668.yamato@redhat.com>	<4D0776DC.9080003@redhat.com>
	<20101214.231525.648039044490713397.yamato@redhat.com>
Message-ID: <4D078465.3020509@redhat.com>

Masatake,

Masatake YAMATO napsal(a):
> I'd like to your advice more detail seriously.
> I've been developing this code for three years.
> I don't want to make this code garbage.
> 
>> Masatake,
>> I'm pretty sure that biggest problem of your code was that it was
>> licensed under BSD (three clause, same as Corosync has)
>> license. Wireshark is licensed under GPL and even I like BSD licenses
>> much more, I would recommend you to try to relicense code under GPL
>> and send them this code.
>>
>> Regards,
>>   Honza
> 
> I got the similar comment from wireshark developer.
> 
> Please, read the discussion:
> 	https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232
> 
> 

I've read that thread long time before I've sent previous mail, so thats 
reason why I think that Wireshark developers just feel MUCH more 
comfortable with GPL and thats reason why they just ignoring it.

> In my understanding there is no legal problem in putting 3-clause BSD
> code into GPL code.  Acutally wireshark includes some 3-clause BSD
> code:
> 

Actually there is really not. BSD to GPL works without problem, but many 
people just don't know it...

> epan/dissectors/packet-radiotap-defs.h:
> /*-
>  * Copyright (c) 2003, 2004 David Young.  All rights reserved.
>  *
>  * $Id: packet-radiotap-defs.h 34554 2010-10-18 13:24:10Z morriss $
>  *
>  * Redistribution and use in source and binary forms, with or without
>  * modification, are permitted provided that the following conditions
>  * are met:
>  * 1. Redistributions of source code must retain the above copyright
>  *    notice, this list of conditions and the following disclaimer.
>  * 2. Redistributions in binary form must reproduce the above copyright
>  *    notice, this list of conditions and the following disclaimer in the
>  *    documentation and/or other materials provided with the distribution.
>  * 3. The name of David Young may not be used to endorse or promote
>  *    products derived from this software without specific prior
>  *    written permission.
>  *
>  * THIS SOFTWARE IS PROVIDED BY DAVID YOUNG ``AS IS'' AND ANY
>  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
>  * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>  * PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL DAVID
>  * YOUNG BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
>  * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
>  * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
>  * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
>  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
>  * OF SUCH DAMAGE.
>  */
> 
> I'd like to separate the legal issue and preference. 
> I think I understand the importance of preference of upstream 
> developers. However, I'd like to clear the legal 
> issue first.
> 

Legally it's ok. But as you said, developers preference are different. 
And because you are trying to change THEIR code it's sometimes better to 
play they rules.

> 
> I can image there are people who prefer to GPL as the license covering
> their software. But here I've taken some corosync code in my
> dissector. It is essential part of my dissector. And corosync is

^^^ This may be problem. Question is how big is that part and if it can 
be possible to make exception there. Can you point that code?

Steve, we were able to relicense HUGE portion of code in case of libqb, 
are we able to make the same for Wireshark dissector?

> licensed in 3-clause BSD, as you know. I'd like to change the license
> to merge my code to upstream project. I cannot do it in this context.
> 
> See https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232#c13
> 
> Thank you.

Regards,
   Honza



From yamato at redhat.com  Tue Dec 14 15:04:29 2010
From: yamato at redhat.com (Masatake YAMATO)
Date: Wed, 15 Dec 2010 00:04:29 +0900 (JST)
Subject: [Linux-cluster] [Openais] packet dissectors for totempg, cman,
 clvmd, rgmanager, cpg, 
In-Reply-To: <4D078465.3020509@redhat.com>
References: <4D0776DC.9080003@redhat.com>
	<20101214.231525.648039044490713397.yamato@redhat.com>
	<4D078465.3020509@redhat.com>
Message-ID: <20101215.000429.721897046580218183.yamato@redhat.com>

Thank you for replying.

> Masatake,
> 
> Masatake YAMATO napsal(a):
>> I'd like to your advice more detail seriously.
>> I've been developing this code for three years.
>> I don't want to make this code garbage.
>> 
>>> Masatake,
>>> I'm pretty sure that biggest problem of your code was that it was
>>> licensed under BSD (three clause, same as Corosync has)
>>> license. Wireshark is licensed under GPL and even I like BSD licenses
>>> much more, I would recommend you to try to relicense code under GPL
>>> and send them this code.
>>>
>>> Regards,
>>>   Honza
>> I got the similar comment from wireshark developer.
>> Please, read the discussion:
>> 	https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232
>> 
> 
> I've read that thread long time before I've sent previous mail, so
> thats reason why I think that Wireshark developers just feel MUCH more
> comfortable with GPL and thats reason why they just ignoring it.

I see.
 
>> In my understanding there is no legal problem in putting 3-clause BSD
>> code into GPL code.  Acutally wireshark includes some 3-clause BSD
>> code:
>> 
> 
> Actually there is really not. BSD to GPL works without problem, but
> many people just don't know it...

...it is too bad. I strongly believe FOSS developers should know the
intent behind of the both licenses.
 
>> epan/dissectors/packet-radiotap-defs.h:
>> /*-
>>  * Copyright (c) 2003, 2004 David Young.  All rights reserved.
>>  *
>>  * $Id: packet-radiotap-defs.h 34554 2010-10-18 13:24:10Z morriss $
>>  *
>>  * Redistribution and use in source and binary forms, with or without
>>  * modification, are permitted provided that the following conditions
>>  * are met:
>>  * 1. Redistributions of source code must retain the above copyright
>>  *    notice, this list of conditions and the following disclaimer.
>>  * 2. Redistributions in binary form must reproduce the above copyright
>>  *    notice, this list of conditions and the following disclaimer in the
>>  *    documentation and/or other materials provided with the distribution.
>>  * 3. The name of David Young may not be used to endorse or promote
>>  *    products derived from this software without specific prior
>>  *    written permission.
>>  *
>>  * THIS SOFTWARE IS PROVIDED BY DAVID YOUNG ``AS IS'' AND ANY
>>  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
>>  * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>>  * PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL DAVID
>>  * YOUNG BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
>>  * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
>>  * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>>  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
>>  * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
>>  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>>  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
>>  * OF SUCH DAMAGE.
>>  */
>> I'd like to separate the legal issue and preference. I think I
>> understand the importance of preference of upstream
>> developers. However, I'd like to clear the legal issue first.
>> 
> 
> Legally it's ok. But as you said, developers preference are
> different. And because you are trying to change THEIR code it's
> sometimes better to play they rules.

I see.
 
>> I can image there are people who prefer to GPL as the license covering
>> their software. But here I've taken some corosync code in my
>> dissector. It is essential part of my dissector. And corosync is
> 
> ^^^ This may be problem. Question is how big is that part and if it
> can be possible to make exception there. Can you point that code?
> 
> Steve, we were able to relicense HUGE portion of code in case of
> libqb, are we able to make the same for Wireshark dissector?

Could you see https://github.com/masatake/wireshark-plugin-rhcs/blob/master/src/packet-corosync-totemnet.c#L156
I refer totemnet.c to write dissect_corosynec_totemnet_with_decryption() function. 

>> licensed in 3-clause BSD, as you know. I'd like to change the license
>> to merge my code to upstream project. I cannot do it in this context.
>> See https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232#c13
>> Thank you.
> 
> Regards,
>   Honza

Masatake YAMATO



From sdake at redhat.com  Tue Dec 14 16:28:57 2010
From: sdake at redhat.com (Steven Dake)
Date: Tue, 14 Dec 2010 09:28:57 -0700
Subject: [Linux-cluster] [Openais] packet dissectors for totempg, cman,
 clvmd, rgmanager, cpg, 
In-Reply-To: <4D078465.3020509@redhat.com>
References: <20100527.133950.593311767624382812.yamato@redhat.com>	<20101214.210216.512133496326900668.yamato@redhat.com>	<4D0776DC.9080003@redhat.com>
	<20101214.231525.648039044490713397.yamato@redhat.com>
	<4D078465.3020509@redhat.com>
Message-ID: <4D079B49.8070009@redhat.com>

On 12/14/2010 07:51 AM, Jan Friesse wrote:
> Masatake,
> 
> Masatake YAMATO napsal(a):
>> I'd like to your advice more detail seriously.
>> I've been developing this code for three years.
>> I don't want to make this code garbage.
>>
>>> Masatake,
>>> I'm pretty sure that biggest problem of your code was that it was
>>> licensed under BSD (three clause, same as Corosync has)
>>> license. Wireshark is licensed under GPL and even I like BSD licenses
>>> much more, I would recommend you to try to relicense code under GPL
>>> and send them this code.
>>>
>>> Regards,
>>>   Honza
>>
>> I got the similar comment from wireshark developer.
>>
>> Please, read the discussion:
>>     https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232
>>
>>
> 
> I've read that thread long time before I've sent previous mail, so thats
> reason why I think that Wireshark developers just feel MUCH more
> comfortable with GPL and thats reason why they just ignoring it.
> 
>> In my understanding there is no legal problem in putting 3-clause BSD
>> code into GPL code.  Acutally wireshark includes some 3-clause BSD
>> code:
>>
> 
> Actually there is really not. BSD to GPL works without problem, but many
> people just don't know it...
> 
>> epan/dissectors/packet-radiotap-defs.h:
>> /*-
>>  * Copyright (c) 2003, 2004 David Young.  All rights reserved.
>>  *
>>  * $Id: packet-radiotap-defs.h 34554 2010-10-18 13:24:10Z morriss $
>>  *
>>  * Redistribution and use in source and binary forms, with or without
>>  * modification, are permitted provided that the following conditions
>>  * are met:
>>  * 1. Redistributions of source code must retain the above copyright
>>  *    notice, this list of conditions and the following disclaimer.
>>  * 2. Redistributions in binary form must reproduce the above copyright
>>  *    notice, this list of conditions and the following disclaimer in the
>>  *    documentation and/or other materials provided with the
>> distribution.
>>  * 3. The name of David Young may not be used to endorse or promote
>>  *    products derived from this software without specific prior
>>  *    written permission.
>>  *
>>  * THIS SOFTWARE IS PROVIDED BY DAVID YOUNG ``AS IS'' AND ANY
>>  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
>>  * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>>  * PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL DAVID
>>  * YOUNG BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
>>  * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
>>  * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>>  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
>>  * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
>>  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>>  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
>>  * OF SUCH DAMAGE.
>>  */
>>
>> I'd like to separate the legal issue and preference. I think I
>> understand the importance of preference of upstream developers.
>> However, I'd like to clear the legal issue first.
>>
> 
> Legally it's ok. But as you said, developers preference are different.
> And because you are trying to change THEIR code it's sometimes better to
> play they rules.
> 
>>
>> I can image there are people who prefer to GPL as the license covering
>> their software. But here I've taken some corosync code in my
>> dissector. It is essential part of my dissector. And corosync is
> 
> ^^^ This may be problem. Question is how big is that part and if it can
> be possible to make exception there. Can you point that code?
> 
> Steve, we were able to relicense HUGE portion of code in case of libqb,
> are we able to make the same for Wireshark dissector?
> 
>> licensed in 3-clause BSD, as you know. I'd like to change the license
>> to merge my code to upstream project. I cannot do it in this context.
>>
>> See https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232#c13
>>
>> Thank you.
> 
> Regards,
>   Honza


I am not changing corosync license to GPL.  I think the separate plugin
works fine, and we can even take up packaging of it in fedora and Red
Hat variants, if it is maintained in an upstream repo.

Regards
-steve



From sdake at redhat.com  Tue Dec 14 16:31:55 2010
From: sdake at redhat.com (Steven Dake)
Date: Tue, 14 Dec 2010 09:31:55 -0700
Subject: [Linux-cluster] [Openais] packet dissectors for totempg, cman,
 clvmd, rgmanager, cpg, 
In-Reply-To: <20101214.210216.512133496326900668.yamato@redhat.com>
References: <20100527.132034.642044848160830535.yamato@redhat.com>	<4BFDF51C.4000808@redhat.com>	<20100527.133950.593311767624382812.yamato@redhat.com>
	<20101214.210216.512133496326900668.yamato@redhat.com>
Message-ID: <4D079BFB.9050906@redhat.com>

On 12/14/2010 05:02 AM, Masatake YAMATO wrote:
> https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232
> 
> The patch for wireshark is not merged yet. Reviewing is very slow or
> the patch may be rejected implicitly.
> 
> So I decide to provide my dissector as a dyanamic loadable plugin. It
> can be built on Fedora-14 with wireshark-devel package.
> 
> 
> You, working on RHEL6, may not be interested in ccsd.
> 
> 
> https://github.com/masatake/wireshark-plugin-rhcs
> 
> I will maintain this source tree. Forking are welcome.
> 
> I'd like to make it a rpm package and be available as a part of
> Fedora. But I cannot find enough time to be a package maintainer 
> now.
> 
> Masatake YAMATO
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Masatake,

I'll hunt around for a package maintainer and get back to you.
Generally this person's role is to package new upstream releases in
Fedora and Red Hat derivatives.  You would do the upstream
releasing/maintenance part.

Regards
-steve



From jfriesse at redhat.com  Tue Dec 14 16:50:20 2010
From: jfriesse at redhat.com (Jan Friesse)
Date: Tue, 14 Dec 2010 17:50:20 +0100
Subject: [Linux-cluster] [Openais] packet dissectors for totempg, cman,
 clvmd, rgmanager, cpg, 
In-Reply-To: <4D079B49.8070009@redhat.com>
References: <20100527.133950.593311767624382812.yamato@redhat.com>	<20101214.210216.512133496326900668.yamato@redhat.com>	<4D0776DC.9080003@redhat.com>
	<20101214.231525.648039044490713397.yamato@redhat.com>
	<4D078465.3020509@redhat.com> <4D079B49.8070009@redhat.com>
Message-ID: <4D07A04C.9020703@redhat.com>

Steven Dake napsal(a):
> On 12/14/2010 07:51 AM, Jan Friesse wrote:
>> Masatake,
>>

....

>>> Thank you.
>> Regards,
>>   Honza
> 
> 
> I am not changing corosync license to GPL.  I think the separate plugin
> works fine, and we can even take up packaging of it in fedora and Red
> Hat variants, if it is maintained in an upstream repo.
> 
> Regards
> -steve

Steve,
I'm not talking about relicensing corosync (it doesn't make any sense 
and I would be first against that), but give permissions to that portion 
of code (seems to be more or less header files) to use GPL (which also 
seems to me like old version without support for NSS). It's same as what 
we did for libqb.

Separate plugin works fine for Fedora, but I'm not sure if it works also 
for other distributions.


Regards,
   Honza



From bernardchew at gmail.com  Wed Dec 15 09:29:01 2010
From: bernardchew at gmail.com (Bernard Chew)
Date: Wed, 15 Dec 2010 17:29:01 +0800
Subject: [Linux-cluster] RHCS & snmpd[30622]: Received SNMP packet(s)
	from UDP
In-Reply-To: <AANLkTi=23Yk6kJKv_TwMB9VKg+KqkWBP5e0++Ha9Aqf=@mail.gmail.com>
References: <EC61DD7B6048464AB0E1B713AF7521BC191BEF7295@GVW0676EXC.americas.hpqcorp.net>
	<AANLkTi=23Yk6kJKv_TwMB9VKg+KqkWBP5e0++Ha9Aqf=@mail.gmail.com>
Message-ID: <AANLkTi=8Q14pv_KKb9NwD=-XvzkFYYf75rY+rVNgEygM@mail.gmail.com>

> On Tue, Dec 14, 2010 at 4:04 PM, Rajagopal Swaminathan <raju.rajsand at gmail.com> wrote:
> Greetings,
>
> On Tue, Dec 14, 2010 at 11:58 AM, Hofmeister, James (WTEC Linux)
> <james.hofmeister at hp.com> wrote:
>> Hello folks,
>>
>> Dec 10 07:18:57 host1 snmpd[30622]: Connection from UDP: [127.0.0.1]:49231
>> Dec 10 07:18:57 host1 snmpd[30622]: Received SNMP packet(s) from UDP: [127.0.0.1]:49231
>>
>> I see this quite often in RHCS clusters and have not determined if the source is a function of RHCS or if this is one of the HP health agents.
>>
>> I am aware that these messages can be turned off in the SNMP configuration, I am more interested in the source.
>>
>> Any feedback would be appreciated.
>>
>
> Looks to me like the HP agent. only the agent shouts peridically
> usually in snmp...
>
> /scurries Hmmm.... where are my snmp notes?
>
>
> Regards,
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

Hi,

Try using tcpdump to check the source

Regards,
Bernard Chew



From lhh at redhat.com  Wed Dec 15 22:14:49 2010
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 15 Dec 2010 17:14:49 -0500
Subject: [Linux-cluster] RHCS & snmpd[30622]: Received SNMP packet(s)
 from UDP
In-Reply-To: <EC61DD7B6048464AB0E1B713AF7521BC191BEF7295@GVW0676EXC.americas.hpqcorp.net>
References: <EC61DD7B6048464AB0E1B713AF7521BC191BEF7295@GVW0676EXC.americas.hpqcorp.net>
Message-ID: <1292451289.3118.2.camel@localhost.localdomain>

On Tue, 2010-12-14 at 06:28 +0000, Hofmeister, James (WTEC Linux) wrote:
> Hello folks,
> 
> RE: SNMP packets from local host.
> 
> 6-10 per minute 
> 
> Dec 10 07:18:57 host1 snmpd[30622]: Connection from UDP: [127.0.0.1]:49231
> Dec 10 07:18:57 host1 snmpd[30622]: Received SNMP packet(s) from UDP: [127.0.0.1]:49231
> 
> I see this quite often in RHCS clusters and have not determined if the source is a function of RHCS or if this is one of the HP health agents.
> 
> I am aware that these messages can be turned off in the SNMP configuration, I am more interested in the source.
> 

Linux-cluster doesn't generate traps/notifications at this point, so I'd
guess the HP agent :)

-- Lon



From james.hofmeister at hp.com  Wed Dec 15 22:41:52 2010
From: james.hofmeister at hp.com (Hofmeister, James (WTEC Linux))
Date: Wed, 15 Dec 2010 22:41:52 +0000
Subject: [Linux-cluster] RHCS & snmpd[30622]: Received SNMP packet(s)
 from UDP
In-Reply-To: <1292451289.3118.2.camel@localhost.localdomain>
References: <EC61DD7B6048464AB0E1B713AF7521BC191BEF7295@GVW0676EXC.americas.hpqcorp.net>
	<1292451289.3118.2.camel@localhost.localdomain>
Message-ID: <EC61DD7B6048464AB0E1B713AF7521BC191C1D177E@GVW0676EXC.americas.hpqcorp.net>

Hello Lon, all,

|Linux-cluster doesn't generate traps/notifications at this point, so I'd
|guess the HP agent :)
|-- Lon

Yep, we found the HP Health agent (cmahostd) that quit sending SNMP messages during the cluster hang:

Dec 10 07:22:24 dm73sr02 kernel: INFO: task cmahostd:31542 blocked for more than 120 seconds.
Dec 10 07:22:24 dm73sr02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 10 07:22:24 dm73sr02 kernel: cmahostd      D ffffffff801508e3     0 
31542      1         31576 31540 (NOTLB)
Dec 10 07:22:24 dm73sr02 kernel:  ffff810c3b889cf8 0000000000000086 0000000000000018 ffffffff884414f8
Dec 10 07:22:24 dm73sr02 kernel:  0000000000000292 000000000000000a ffff810c3f54a820 ffff810c4e1b6040
Dec 10 07:22:24 dm73sr02 kernel:  00007122b167f658 0000000000bb9ecb ffff810c3f54aa08 0000000888442e5f
Call Trace:
[<ffffffff884414f8>] :dlm:request_lock+0x93/0xa0
[<ffffffff8846cee3>] :gfs2:just_schedule+0x0/0xe
[<ffffffff8846ceec>] :gfs2:just_schedule+0x9/0xe
[<ffffffff80063a16>] __wait_on_bit+0x40/0x6e
[<ffffffff8846cee3>] :gfs2:just_schedule+0x0/0xe
[<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff800a0b44>] wake_bit_function+0x0/0x23
[<ffffffff8846cede>] :gfs2:gfs2_glock_wait+0x2b/0x30
[<ffffffff8847b2ba>] :gfs2:gfs2_getattr+0x85/0xc4
[<ffffffff8847b2b2>] :gfs2:gfs2_getattr+0x7d/0xc4
[<ffffffff8000e390>] vfs_getattr+0x2d/0xa9
[<ffffffff800288ec>] vfs_stat_fd+0x32/0x4a
[<ffffffff8000e4db>] free_pages_and_swap_cache+0x67/0x7e
[<ffffffff80083f43>] sys32_stat64+0x11/0x29
[<ffffffff8006153d>] sysenter_tracesys+0x48/0x83
[<ffffffff8006149d>] sysenter_do_call+0x1e/0x76

Regards, 
????? James Hofmeister? 
      Hewlett Packard Linux Solutions Engineer 





From kmaguire at eso.org  Wed Dec 15 23:47:23 2010
From: kmaguire at eso.org (Kevin Maguire)
Date: Thu, 16 Dec 2010 00:47:23 +0100 (CET)
Subject: [Linux-cluster] GFS tuning for combined batch / interactive use
Message-ID: <alpine.LFD.2.00.1012160035290.18604@localhost>

Hi

We are running a 20 node cluster, using Scientific Linux 5.3, with a GFS 
shared filesystem hosted on our SAN. Cluster nodes are dual core units 
with 4 GB of RAM, and a standard Qlogic FC HBA.

Most of the 20 nodes form a batch-processing cluster, and our users are 
happy enough with the performance they get, but some nodes are used 
interactively. When the filesystem is under stress due to large batch 
processing jobs running on other nodes, interactive use becomes very slow 
and painful.

Is there any tuning I (the sysadmin) can do that might help in this 
situation?  Would a migration to gfs2 make a difference? Are all nodes 
treated identically, or can hosts mounting the filesystem have any kind of 
priority/QoS? Which tools could I use to track down any bottlenecks?

In theory we could update kernel+gfs bits to a later release, though we 
saw the same issues when using the same cluster with a SL4.x stack, but 
for now it's

kernel-2.6.18-128.1.1.el5.i686
kmod-gfs-0.1.31-3.el5.i686
gfs-utils-0.1.20-7.el5.i386
gfs2-utils-0.1.53-1.el5_3.1.i386

Thanks for any help/suggestions,
Kevin





From swhiteho at redhat.com  Thu Dec 16 10:53:49 2010
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Thu, 16 Dec 2010 10:53:49 +0000
Subject: [Linux-cluster] GFS tuning for combined batch / interactive use
In-Reply-To: <alpine.LFD.2.00.1012160035290.18604@localhost>
References: <alpine.LFD.2.00.1012160035290.18604@localhost>
Message-ID: <1292496829.2427.5.camel@dolmen>

Hi,

On Thu, 2010-12-16 at 00:47 +0100, Kevin Maguire wrote:
> Hi
> 
> We are running a 20 node cluster, using Scientific Linux 5.3, with a GFS 
> shared filesystem hosted on our SAN. Cluster nodes are dual core units 
> with 4 GB of RAM, and a standard Qlogic FC HBA.
> 
> Most of the 20 nodes form a batch-processing cluster, and our users are 
> happy enough with the performance they get, but some nodes are used 
> interactively. When the filesystem is under stress due to large batch 
> processing jobs running on other nodes, interactive use becomes very slow 
> and painful.
> 
> Is there any tuning I (the sysadmin) can do that might help in this 
> situation?  Would a migration to gfs2 make a difference? Are all nodes 
> treated identically, or can hosts mounting the filesystem have any kind of 
> priority/QoS? Which tools could I use to track down any bottlenecks?
> 
There are no priority/QoS controls currently available to the users, I'm
afraid. All nodes are treated equally as you say. I suspect that the
reason that interactive use becomes slow is just down to locality of
accesses. The GFS locking is done on a per-inode basis, so where writes
are going on to an inode, ensuring that reads to that same inode are
also done on the same node as much as possible should improve
performance.

In other words, it would be better to divide up jobs in the cluster
according to the data which they access rather than according to whether
they are interactive or not.

Are you using mmap() at all? If so then GFS2 should be significantly
more scalable than GFS,

Steve.


> In theory we could update kernel+gfs bits to a later release, though we 
> saw the same issues when using the same cluster with a SL4.x stack, but 
> for now it's
> 
> kernel-2.6.18-128.1.1.el5.i686
> kmod-gfs-0.1.31-3.el5.i686
> gfs-utils-0.1.20-7.el5.i386
> gfs2-utils-0.1.53-1.el5_3.1.i386
> 
> Thanks for any help/suggestions,
> Kevin
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From rpeterso at redhat.com  Thu Dec 16 15:09:39 2010
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 16 Dec 2010 10:09:39 -0500 (EST)
Subject: [Linux-cluster] GFS tuning for combined batch / interactive use
In-Reply-To: <1575839559.1257921292511766147.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
Message-ID: <576470091.1259051292512179726.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>

----- "Kevin Maguire" <kmaguire at eso.org> wrote:
| Hi
| 
| We are running a 20 node cluster, using Scientific Linux 5.3, with a
| GFS 
| shared filesystem hosted on our SAN. Cluster nodes are dual core units
| 
| with 4 GB of RAM, and a standard Qlogic FC HBA.
| 
| Most of the 20 nodes form a batch-processing cluster, and our users
| are 
| happy enough with the performance they get, but some nodes are used 
| interactively. When the filesystem is under stress due to large batch
| 
| processing jobs running on other nodes, interactive use becomes very
| slow 
| and painful.
| 
| Is there any tuning I (the sysadmin) can do that might help in this 
| situation?  Would a migration to gfs2 make a difference? Are all nodes
| 
| treated identically, or can hosts mounting the filesystem have any
| kind of 
| priority/QoS? Which tools could I use to track down any bottlenecks?
| 
| In theory we could update kernel+gfs bits to a later release, though
| we 
| saw the same issues when using the same cluster with a SL4.x stack,
| but 
| for now it's
| 
| kernel-2.6.18-128.1.1.el5.i686
| kmod-gfs-0.1.31-3.el5.i686
| gfs-utils-0.1.20-7.el5.i386
| gfs2-utils-0.1.53-1.el5_3.1.i386
| 
| Thanks for any help/suggestions,
| Kevin

Hi Kevin,

We recently identified a slowdown in RHEL5.x that involves DLM traffic.
There is a patch to speed dlm up, and it's being tested now.  The
patch is built into RHEL5 kernels starting with 2.6.18-232 and newer.
That means it is currently scheduled to be released in RHEL5.6.

It's also being z-streamed back to 5.5.z, but I don't know when that
is scheduled to go out.  Unfortunately, since the problem was
opened by a customer, the bugzilla record is private to protect the
customer's confidential information.  The patch is public though.
If you are a Red Hat customer, you can probably call Red Hat Support
and ask to be put on the list for bugzilla bug 604139 and
maybe find out when the fix will be available.

There is no guarantee this is what your problem is, and there is
no guarantee that the patch will speed you up.  But it might be.

Regards,

Bob Peterson
Red Hat File Systems



From bturner at redhat.com  Thu Dec 16 22:25:56 2010
From: bturner at redhat.com (Ben Turner)
Date: Thu, 16 Dec 2010 17:25:56 -0500 (EST)
Subject: [Linux-cluster] GFS tuning for combined batch / interactive use
In-Reply-To: <997438983.986721292537865284.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
Message-ID: <866555507.987141292538356551.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>

There is some helpful stuff here on the tuning side:

http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_tuning

-b

----- "Bob Peterson" <rpeterso at redhat.com> wrote:

> ----- "Kevin Maguire" <kmaguire at eso.org> wrote:
> | Hi
> | 
> | We are running a 20 node cluster, using Scientific Linux 5.3, with
> a
> | GFS 
> | shared filesystem hosted on our SAN. Cluster nodes are dual core
> units
> | 
> | with 4 GB of RAM, and a standard Qlogic FC HBA.
> | 
> | Most of the 20 nodes form a batch-processing cluster, and our users
> | are 
> | happy enough with the performance they get, but some nodes are used
> 
> | interactively. When the filesystem is under stress due to large
> batch
> | 
> | processing jobs running on other nodes, interactive use becomes
> very
> | slow 
> | and painful.
> | 
> | Is there any tuning I (the sysadmin) can do that might help in this
> 
> | situation?  Would a migration to gfs2 make a difference? Are all
> nodes
> | 
> | treated identically, or can hosts mounting the filesystem have any
> | kind of 
> | priority/QoS? Which tools could I use to track down any
> bottlenecks?
> | 
> | In theory we could update kernel+gfs bits to a later release,
> though
> | we 
> | saw the same issues when using the same cluster with a SL4.x stack,
> | but 
> | for now it's
> | 
> | kernel-2.6.18-128.1.1.el5.i686
> | kmod-gfs-0.1.31-3.el5.i686
> | gfs-utils-0.1.20-7.el5.i386
> | gfs2-utils-0.1.53-1.el5_3.1.i386
> | 
> | Thanks for any help/suggestions,
> | Kevin
> 
> Hi Kevin,
> 
> We recently identified a slowdown in RHEL5.x that involves DLM
> traffic.
> There is a patch to speed dlm up, and it's being tested now.  The
> patch is built into RHEL5 kernels starting with 2.6.18-232 and newer.
> That means it is currently scheduled to be released in RHEL5.6.
> 
> It's also being z-streamed back to 5.5.z, but I don't know when that
> is scheduled to go out.  Unfortunately, since the problem was
> opened by a customer, the bugzilla record is private to protect the
> customer's confidential information.  The patch is public though.
> If you are a Red Hat customer, you can probably call Red Hat Support
> and ask to be put on the list for bugzilla bug 604139 and
> maybe find out when the fix will be available.
> 
> There is no guarantee this is what your problem is, and there is
> no guarantee that the patch will speed you up.  But it might be.
> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From kmaguire at eso.org  Fri Dec 17 16:35:16 2010
From: kmaguire at eso.org (Kevin Maguire)
Date: Fri, 17 Dec 2010 17:35:16 +0100 (CET)
Subject: [Linux-cluster] GFS tuning for combined batch / interactive use
In-Reply-To: <866555507.987141292538356551.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
References: <866555507.987141292538356551.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
Message-ID: <alpine.OSX.2.00.1012171717001.8808@MacMiniSoest>

Hi

Bob/Steven/Ben - many thanks for responding.

> There is some helpful stuff here on the tuning side:
>
> http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_tuning

Indeed, we have implemented many these suggestions, "fast statfs" is on, 
-r 2048 was used, quotas off, the cluster interconnect is a dedicated 
gigabit LAN, hardware RAID (RAID10) on the SAN, and so on. Maybe we are 
just at the limit of the hardware.

I have also asked and it seems the one issue that might cause slowdown, 
multiple nodes all trying to access the same inode (say all updating files 
in a common directory), should not happen with our application. I am told 
that essentially batch jobs will create their own working directory when 
executing, and work almost exclusively within that subtree. Interactive 
work is in another tree entirely.

However I'd like to double check that - but how? When we looked at Lustre 
for a similar app there was a /proc interface that you could probe to see 
what files were being opened/read/written/closed by each connected node - 
does GFS offer something similar? Would mounting debugfs help me there?

Kevin



From swhiteho at redhat.com  Fri Dec 17 16:50:59 2010
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Fri, 17 Dec 2010 16:50:59 +0000
Subject: [Linux-cluster] GFS tuning for combined batch / interactive use
In-Reply-To: <alpine.OSX.2.00.1012171717001.8808@MacMiniSoest>
References: <866555507.987141292538356551.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
	<alpine.OSX.2.00.1012171717001.8808@MacMiniSoest>
Message-ID: <1292604659.2461.14.camel@dolmen>

Hi,

On Fri, 2010-12-17 at 17:35 +0100, Kevin Maguire wrote:
> Hi
> 
> Bob/Steven/Ben - many thanks for responding.
> 
> > There is some helpful stuff here on the tuning side:
> >
> > http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_tuning
> 
> Indeed, we have implemented many these suggestions, "fast statfs" is on, 
> -r 2048 was used, quotas off, the cluster interconnect is a dedicated 
> gigabit LAN, hardware RAID (RAID10) on the SAN, and so on. Maybe we are 
> just at the limit of the hardware.
> 
> I have also asked and it seems the one issue that might cause slowdown, 
> multiple nodes all trying to access the same inode (say all updating files 
> in a common directory), should not happen with our application. I am told 
> that essentially batch jobs will create their own working directory when 
> executing, and work almost exclusively within that subtree. Interactive 
> work is in another tree entirely.
> 
> However I'd like to double check that - but how? When we looked at Lustre 
> for a similar app there was a /proc interface that you could probe to see 
> what files were being opened/read/written/closed by each connected node - 
> does GFS offer something similar? Would mounting debugfs help me there?
> 
> Kevin
> 
You can get a glock dump via debugfs which may show up contention, looks
for type 2 glocks which have lots of lock requests queued but not
granted. The lock requests (holders) are tagged with the relevant
process. In rhel6/upstream there are gfs2 tracepoints which can be used
to get information dynamically. These can also give some pointers to the
processes involved,

Steve.
 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From sendmailrajiv at gmail.com  Fri Dec 17 17:38:41 2010
From: sendmailrajiv at gmail.com (Rajiv Yadav)
Date: Fri, 17 Dec 2010 23:08:41 +0530
Subject: [Linux-cluster] how can install GFS-Cluster on rhel5.2
Message-ID: <AANLkTimXgkSwNevvdkN-cdRK_PeySrkJPSvYBtmUgd--@mail.gmail.com>

Hi..

i want to install cluster GFS on two nodes.
which package i need installation GFS on RHEL5.2...
Please provide full installation and configuration server and client based..
how to use Luci and Ricci..
-- 


Rajiv Yadav
CRIS
(An Organization of the Ministry of Railways, Govt. of India)
Chanakyapuri,New Delhi - 110021
website:-  www.cris.org.in
Cell #:  +91-9711175683
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101217/0f1e31b2/attachment.htm>

From bturner at redhat.com  Fri Dec 17 17:53:50 2010
From: bturner at redhat.com (Ben Turner)
Date: Fri, 17 Dec 2010 12:53:50 -0500 (EST)
Subject: [Linux-cluster] how can install GFS-Cluster on rhel5.2
In-Reply-To: <492506111.1047221292608334777.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
Message-ID: <1585989071.1047501292608430660.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>

Here is a link to the documentation, thats prolly the best place to start:

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/index.html

If you have a support contract with Red Hat I suggest opening a case and one of the support techs can give you more detailed assistance.

-b


----- "Rajiv Yadav" <sendmailrajiv at gmail.com> wrote:

> Hi..
> 
> i want to install cluster GFS on two nodes.
> which package i need installation GFS on RHEL5.2...
> Please provide full installation and configuration server and client
> based..
> how to use Luci and Ricci..
> --
> 
> 
> Rajiv Yadav
> CRIS
> (An Organization of the Ministry of Railways, Govt. of India)
> Chanakyapuri,New Delhi - 110021
> website:- www.cris.org.in
> Cell #: +91-9711175683
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From kmaguire at eso.org  Fri Dec 17 19:06:58 2010
From: kmaguire at eso.org (Kevin Maguire)
Date: Fri, 17 Dec 2010 20:06:58 +0100 (CET)
Subject: [Linux-cluster] GFS tuning for combined batch / interactive use
In-Reply-To: <1292604659.2461.14.camel@dolmen>
References: <866555507.987141292538356551.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
	<alpine.OSX.2.00.1012171717001.8808@MacMiniSoest>
	<1292604659.2461.14.camel@dolmen>
Message-ID: <alpine.OSX.2.00.1012171913360.9280@MacMiniSoest>

Hi

> You can get a glock dump via debugfs which may show up contention, looks 
> for type 2 glocks which have lots of lock requests queued but not 
> granted. The lock requests (holders) are tagged with the relevant 
> process.

Note I am currently using GFS, not GFS2. And before going further I ran 
the ping_pong test on my cluster and see only about 100 locks/second even 
on just 1 node.  So maybe I should look at plock_rate_limit parameter, 
though not sure if that is our core problem.

Anyways, As I write this my test cluster is being heavily used with batch 
jobs, and thus I have a window of opportunity to study it under load (but 
not change it).  I have debugfs mounted. There are 10 nodes in this test 
cluster. My filesystem is called mygfs, and was created via

mkfs.gfs -O -t dfoxen-cluster:mygfs -p lock_dlm -j 10 -r 2048 /dev/mapper/vggfs-lvgfs

This is what I have in debugfs:

# find /sys/kernel/debug/ -type f -exec wc -l {} \;
2309 /sys/kernel/debug/dlm/mygfs_locks
0 /sys/kernel/debug/dlm/mygfs_waiters
16258 /sys/kernel/debug/dlm/mygfs
2 /sys/kernel/debug/dlm/clvmd_locks
0 /sys/kernel/debug/dlm/clvmd_waiters
7 /sys/kernel/debug/dlm/clvmd

The lock dump file has content like:

# cat /sys/kernel/debug/dlm/mygfs_locks
id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
14f19eb 0 0 1038 0 0 0 2 3 -1 0 0 24 "       5         cec3e6d"
3da1a67 0 0 31861 0 0 0 2 3 -1 0 0 24 "       5         a0fafc2"
1120003 1 16f0019 3552 0 408 0 2 0 -1 0 1 24 "       3        2d8b9091"
af0002 1 10024 3552 0 408 0 2 0 -1 0 1 24 "       3        2053fbf8"
...

But I don't really see how to work our which type of lock is which from 
this file - sorry. Given $2 is the nodeid I can work our who has locks and 
that leads to a minor strangeness

node1 # awk 'NR>1{print $2}' /sys/kernel/debug/dlm/mygfs_locks | sort | uniq -c | sort -k +2n
    2142 0
    1619 2
    2001 3
    1586 4
    1566 5
    1624 6
    1610 7
    1733 8
    1592 9
    1612 10

These numbers are much bigger than the counts on the 9 other nodes, e.g.

node2 # awk 'NR>1{print $2}' /sys/kernel/debug/dlm/mygfs_locks | sort | uniq -c | sort -k +2n
     441 0
    1630 1
      75 3
       2 4
      10 5
      25 7
      15 8
      38 10

Is that normal ?

Using gfs_tool's lockdump I see

node1 # gfs_tool lockdump /newcache | egrep '^Glock' | sed 's?(\([0-9]*\).*)?\1?g' | sort | uniq -c
       3 Glock 1
     308 Glock 2
    1538 Glock 3
       2 Glock 4
     233 Glock 5
       2 Glock 8

Only type 2 and type 5 counts seem to change. Across the cluster there is 
one node with a lot more (10x more) Glock type 2 and Glock type 5 locks.

# gfs_tool counters /newcache

                                   locks 2313
                              locks held 781
                            freeze count 0
                           incore inodes 230
                        metadata buffers 1061
                         unlinked inodes 28
                               quota IDs 2
                      incore log buffers 28
                          log space used 1.46%
               meta header cache entries 1304
                      glock dependencies 185
                  glocks on reclaim list 0
                               log wraps 91
                    outstanding LM calls 0
                   outstanding BIO calls 0
                        fh2dentry misses 0
                        glocks reclaimed 2125924
                          glock nq calls 801437507
                          glock dq calls 796261692
                    glock prefetch calls 319835
                           lm_lock calls 6396763
                         lm_unlock calls 1031709
                            lm callbacks 7669741
                      address operations 1267096416
                       dentry operations 35815146
                       export operations 0
                         file operations 233333825
                        inode operations 61818196
                        super operations 148712313
                           vm operations 87114
                         block I/O reads 0
                        block I/O writes 0

Not sure if anyone can make anything from all these numbers ...

Thanks,
Kevin



From swhiteho at redhat.com  Fri Dec 17 19:43:53 2010
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Fri, 17 Dec 2010 19:43:53 +0000
Subject: [Linux-cluster] GFS tuning for combined batch / interactive use
In-Reply-To: <alpine.OSX.2.00.1012171913360.9280@MacMiniSoest>
References: <866555507.987141292538356551.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
	<alpine.OSX.2.00.1012171717001.8808@MacMiniSoest>
	<1292604659.2461.14.camel@dolmen>
	<alpine.OSX.2.00.1012171913360.9280@MacMiniSoest>
Message-ID: <1292615033.2461.23.camel@dolmen>

Hi,

On Fri, 2010-12-17 at 20:06 +0100, Kevin Maguire wrote:
> Hi
> 
> > You can get a glock dump via debugfs which may show up contention, looks 
> > for type 2 glocks which have lots of lock requests queued but not 
> > granted. The lock requests (holders) are tagged with the relevant 
> > process.
> 
> Note I am currently using GFS, not GFS2. And before going further I ran 
> the ping_pong test on my cluster and see only about 100 locks/second even 
> on just 1 node.  So maybe I should look at plock_rate_limit parameter, 
> though not sure if that is our core problem.
> 
The same thing applies to GFS as GFS2, its just that the format of the
debugfs file is different. GFS2 uses a rather smaller format which makes
a big difference in the dump size for larger machines.

The plock_rate_limit can usually be turned off quite safely, but unless
your app uses plocks, it won't make any difference to the performance.

> Anyways, As I write this my test cluster is being heavily used with batch 
> jobs, and thus I have a window of opportunity to study it under load (but 
> not change it).  I have debugfs mounted. There are 10 nodes in this test 
> cluster. My filesystem is called mygfs, and was created via
> 
> mkfs.gfs -O -t dfoxen-cluster:mygfs -p lock_dlm -j 10 -r 2048 /dev/mapper/vggfs-lvgfs
> 
> This is what I have in debugfs:
> 
> # find /sys/kernel/debug/ -type f -exec wc -l {} \;
> 2309 /sys/kernel/debug/dlm/mygfs_locks
> 0 /sys/kernel/debug/dlm/mygfs_waiters
> 16258 /sys/kernel/debug/dlm/mygfs
> 2 /sys/kernel/debug/dlm/clvmd_locks
> 0 /sys/kernel/debug/dlm/clvmd_waiters
> 7 /sys/kernel/debug/dlm/clvmd
> 
> The lock dump file has content like:
> 
> # cat /sys/kernel/debug/dlm/mygfs_locks
> id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
> 14f19eb 0 0 1038 0 0 0 2 3 -1 0 0 24 "       5         cec3e6d"
> 3da1a67 0 0 31861 0 0 0 2 3 -1 0 0 24 "       5         a0fafc2"
> 1120003 1 16f0019 3552 0 408 0 2 0 -1 0 1 24 "       3        2d8b9091"
> af0002 1 10024 3552 0 408 0 2 0 -1 0 1 24 "       3        2053fbf8"
> ...
> 
> But I don't really see how to work our which type of lock is which from 
> this file - sorry. Given $2 is the nodeid I can work our who has locks and 
> that leads to a minor strangeness
> 
> node1 # awk 'NR>1{print $2}' /sys/kernel/debug/dlm/mygfs_locks | sort | uniq -c | sort -k +2n
>     2142 0
>     1619 2
>     2001 3
>     1586 4
>     1566 5
>     1624 6
>     1610 7
>     1733 8
>     1592 9
>     1612 10
> 
> These numbers are much bigger than the counts on the 9 other nodes, e.g.
> 
> node2 # awk 'NR>1{print $2}' /sys/kernel/debug/dlm/mygfs_locks | sort | uniq -c | sort -k +2n
>      441 0
>     1630 1
>       75 3
>        2 4
>       10 5
>       25 7
>       15 8
>       38 10
> 
> Is that normal ?
> 
> Using gfs_tool's lockdump I see
> 
> node1 # gfs_tool lockdump /newcache | egrep '^Glock' | sed 's?(\([0-9]*\).*)?\1?g' | sort | uniq -c
>        3 Glock 1
>      308 Glock 2
>     1538 Glock 3
>        2 Glock 4
>      233 Glock 5
>        2 Glock 8
> 
> Only type 2 and type 5 counts seem to change. Across the cluster there is 
> one node with a lot more (10x more) Glock type 2 and Glock type 5 locks.
> 
This lock dump is what you want to look at first. The dlm dumps are
really only for when something has gone wrong and you need to check
whether dlm has a different idea of what is going on to gfs. The type 2
glocks relate to inodes (as do type 5, but they don't have any bearing
on the performance in this case). Type 3 glocks relates to resource
groups.

It is this gfs_tool lockup output that contains the info that you need.
The interesting locks are those which have a number of "Holders"
attached to them which are on the "Waiters" queues (i.e. not granted)
and the more of those holder there are on a lock, the more interesting
it is from a performance point of view.

In the case of type 2 glocks, the other part of the glock number, is
also the inode number, so when the system it otherwise idle, a find
-inum will tell you which inode was causing the problems, provided it
wasn't a temporary file, of course :-)

If you have access to the Red Hat kbase system, then this is all
described in the docs on that site.

> # gfs_tool counters /newcache
> 
>                                    locks 2313
>                               locks held 781
>                             freeze count 0
>                            incore inodes 230
>                         metadata buffers 1061
>                          unlinked inodes 28
>                                quota IDs 2
>                       incore log buffers 28
>                           log space used 1.46%
>                meta header cache entries 1304
>                       glock dependencies 185
>                   glocks on reclaim list 0
>                                log wraps 91
>                     outstanding LM calls 0
>                    outstanding BIO calls 0
>                         fh2dentry misses 0
>                         glocks reclaimed 2125924
>                           glock nq calls 801437507
>                           glock dq calls 796261692
>                     glock prefetch calls 319835
>                            lm_lock calls 6396763
>                          lm_unlock calls 1031709
>                             lm callbacks 7669741
>                       address operations 1267096416
>                        dentry operations 35815146
>                        export operations 0
>                          file operations 233333825
>                         inode operations 61818196
>                         super operations 148712313
>                            vm operations 87114
>                          block I/O reads 0
>                         block I/O writes 0
> 
> Not sure if anyone can make anything from all these numbers ...
> 
There is nothing that stands out as being a problem there, but the
counters are generally not very useful,

Steve.

> Thanks,
> Kevin
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From jeff.sturm at eprize.com  Fri Dec 17 22:53:54 2010
From: jeff.sturm at eprize.com (Jeff Sturm)
Date: Fri, 17 Dec 2010 17:53:54 -0500
Subject: [Linux-cluster] GFS block size
Message-ID: <64D0546C5EBBD147B75DE133D798665F06A12904@hugo.eprize.local>

One of our GFS filesystems tends to have a large number of very small
files, on average about 1000 bytes each.

 

I realized this week we'd created our filesystems with default options.
As an experiment on a test system, I've recreated a GFS filesystem with
"-b 1024" to reduce overall disk usage and disk bandwidth.

 

Initially, tests look very good-single file creates are less than one
millisecond on average (down from about 5ms each).  Before I go very far
with this, I wanted to ask:  Has anyone else experimented with the block
size option, and are there any tricks or gotchas to report?

 

(This is with CentOS 5.5, GFS 1.)

 

-Jeff

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101217/f65bdfed/attachment.htm>

From kmaguire at eso.org  Fri Dec 17 23:43:11 2010
From: kmaguire at eso.org (Kevin Maguire)
Date: Sat, 18 Dec 2010 00:43:11 +0100 (CET)
Subject: [Linux-cluster] GFS tuning for combined batch / interactive use
In-Reply-To: <1292615033.2461.23.camel@dolmen>
References: <866555507.987141292538356551.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
	<alpine.OSX.2.00.1012171717001.8808@MacMiniSoest>
	<1292604659.2461.14.camel@dolmen>
	<alpine.OSX.2.00.1012171913360.9280@MacMiniSoest>
	<1292615033.2461.23.camel@dolmen>
Message-ID: <alpine.OSX.2.00.1012172335560.9962@MacMiniSoest>

Hi

Steven: Thanks again.

> If you have access to the Red Hat kbase system, then this is all
> described in the docs on that site.

I do as we have RedHat support for other platforms, just not this one. 
The docs I found that are worthy of a slow reading are probably:

https://access.redhat.com/kb/docs/DOC-41624
https://access.redhat.com/kb/docs/DOC-41485

and maybe

https://access.redhat.com/kb/docs/DOC-6533
https://access.redhat.com/kb/docs/DOC-41609
https://access.redhat.com/kb/docs/DOC-34460
https://access.redhat.com/kb/docs/DOC-34401
https://access.redhat.com/kb/docs/DOC-6479

If I missed one that is particularly helpful please let me know.

I'll take this back to our software group with all that I have learned! 
The biggest TBC is whether we give GFS2 a try - the main reason we are not 
using it now is that we ported all of this from RHEL4 and did not want to 
change the filesystem at the same time.

kevin



From linux-cluster at redhat.com  Sat Dec 18 08:25:03 2010
From: linux-cluster at redhat.com (Mailbot for etexusa.com)
Date: Sat, 18 Dec 2010 00:25:03 -0800
Subject: [Linux-cluster] DSN: delayed ()
Message-ID: <mAWtxIeCDcrRuYWr803@etexusa.com>


This is a Delivery Status Notification (DSN).

After several attempts,
I still haven't been able to deliver your message to
christopher at aillon.com.
I will keep trying for a few more days,
but I thought you would want to know.

The error was;
  Can't connect to domain "aillon.com"

 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/rfc822-headers
Size: 481 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101218/20b36eee/attachment.bin>

From linux-cluster at redhat.com  Sun Dec 19 11:25:59 2010
From: linux-cluster at redhat.com (Mailbot for etexusa.com)
Date: Sun, 19 Dec 2010 03:25:59 -0800
Subject: [Linux-cluster] DSN: failed ()
Message-ID: <mAWtxIeCDcrRuYfFY02@etexusa.com>


This is a Delivery Status Notification (DSN).

I was unable to deliver your message to
christopher at aillon.com.

The error was;
  Can't connect to domain "aillon.com"

 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/rfc822-headers
Size: 481 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101219/53fcf993/attachment.bin>

From mika68vaan at gmail.com  Tue Dec 21 07:55:25 2010
From: mika68vaan at gmail.com (Mika i)
Date: Tue, 21 Dec 2010 09:55:25 +0200
Subject: [Linux-cluster] Cluster + NFS + GFS/GFS2 experiences
Message-ID: <AANLkTikmuGBQYwGuhjXVytpeyJYQfnCpgvak2RVuB5G6@mail.gmail.com>

Hi.
I am planing to rhel-cluster with nfs service and that's why i am asking
little bit experiences about what
kind configuration falks has done this kind clusters.

About 2-4 servers .. rhel6,cluster, nfs-service.. but how about gfs2, do
recommend it or something else?


-M
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101221/4bbd7010/attachment.htm>

From linux-cluster at redhat.com  Tue Dec 21 10:27:57 2010
From: linux-cluster at redhat.com (Mailbot for etexusa.com)
Date: Tue, 21 Dec 2010 02:27:57 -0800
Subject: [Linux-cluster] DSN: failed (Message could not be delivered)
Message-ID: <mAWtxIeCDcrRuYqJ502@etexusa.com>


This is a Delivery Status Notification (DSN).

I was unable to deliver your message to
irahotel at otenet.gr.

I said 
  (end of message)

And they gave me the error;
  550 5.7.1 Virus Infected W32.Sality.Q-1

 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/rfc822-headers
Size: 503 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101221/b9aef047/attachment.bin>

From ram at netcore.co.in  Tue Dec 21 15:08:28 2010
From: ram at netcore.co.in (Ram)
Date: Tue, 21 Dec 2010 20:38:28 +0530
Subject: [Linux-cluster] How do I add a fence_vmware with
	system-config-cluster
Message-ID: <4D10C2EC.9030800@netcore.co.in>

Hello
We are using RHEL 5.4 on VM nodes.
When I configure the cluster using system-config-cluster I do not get an 
option of fence_vmware in the User Interface
I am able to run fence_vmware successfully on commandline , but I am not 
sure how do I put in into the cluster.conf

Can someone please help with a sample cluster.conf  or option to add in 
system-config-cluster.


Thanks
Ram





From yvette at dbtgroup.com  Tue Dec 21 19:31:59 2010
From: yvette at dbtgroup.com (yvette hirth)
Date: Tue, 21 Dec 2010 19:31:59 +0000
Subject: [Linux-cluster] question about network config for fencing
Message-ID: <4D1100AF.8090302@dbtgroup.com>

hi,

i've config'ed my DL 380 G6 four ethernet ports so that they are bonded 
to "bond0", and my ILO devices are on the fence switch (separate switch).

i'm 99.9% sure this is wrong.

i'm thinking that eth0-2 and the ILO2 port should be on the same subnet, 
and eth3 on a separate subnet (for the multicast fencing).  that way i 
have fencing on a non-ILO2 ethernet port, and the ILO2 is accessible 
from my main subnet.

could someone please share with me their network config for an HP 
ILO-based server with four ports?  i'd really appreciate it!

thanks
yvette



From emilews2 at csc.com  Tue Dec 21 21:03:21 2010
From: emilews2 at csc.com (Evan J Milewski)
Date: Tue, 21 Dec 2010 16:03:21 -0500
Subject: [Linux-cluster] Linux-cluster Digest, Vol 80, Issue 20
In-Reply-To: <mailman.41.1292950809.14721.linux-cluster@redhat.com>
References: <mailman.41.1292950809.14721.linux-cluster@redhat.com>
Message-ID: <OF327B218B.D1E1D1E1-ON85257800.00736691-85257800.0073A95A@csc.com>

I am currently doing this with RHEL5 cluster, using ext3 (or ext4 for you 
on RHEL6) and basically
doing an active/passive config for each service group. My benchmarking of 
GFS2 performance was
abysmal compared to straight ext3/ext4 on a busy NFS server.

> Hi.
> I am planing to rhel-cluster with nfs service and that's why i am asking
> little bit experiences about what
> kind configuration falks has done this kind clusters.
> 
> About 2-4 servers .. rhel6,cluster, nfs-service.. but how about gfs2, do
> recommend it or something else?
> 
> 
> -M
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101221/cf4352b9/attachment.htm>

From bturner at redhat.com  Tue Dec 21 21:18:53 2010
From: bturner at redhat.com (Ben Turner)
Date: Tue, 21 Dec 2010 16:18:53 -0500 (EST)
Subject: [Linux-cluster] How do I add a fence_vmware
	with	system-config-cluster
In-Reply-To: <4D10C2EC.9030800@netcore.co.in>
Message-ID: <452994304.27339.1292966333033.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>

Fence_vmware is not configurable with s-c-c, you will have to manually edit the cluster.conf.  This doc should work for you:

http://sources.redhat.com/cluster/wiki/VMware_FencingConfig

-Ben

----- Original Message -----
> Hello
> We are using RHEL 5.4 on VM nodes.
> When I configure the cluster using system-config-cluster I do not get
> an
> option of fence_vmware in the User Interface
> I am able to run fence_vmware successfully on commandline , but I am
> not
> sure how do I put in into the cluster.conf
> 
> Can someone please help with a sample cluster.conf or option to add in
> system-config-cluster.
> 
> 
> Thanks
> Ram
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From bturner at redhat.com  Tue Dec 21 21:35:02 2010
From: bturner at redhat.com (Ben Turner)
Date: Tue, 21 Dec 2010 16:35:02 -0500 (EST)
Subject: [Linux-cluster] question about network config for fencing
In-Reply-To: <4D1100AF.8090302@dbtgroup.com>
Message-ID: <1876027540.27523.1292967302370.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>

Some notes on management board style fence devices can be found here:

http://sources.redhat.com/cluster/wiki/IPMI_FencingConfig

Here is a DOC with an example iLO config:

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_Fence_Devices/index.html

It is also recommended that you have your fence devices on the same network as your cluster heartbeat.  I have heard an explanation on this at some point as to why this is but I can't remember 100%.  I'm sure that you could find it if you dug around on past posts to this list though.  Hopes this helps.

-b


----- Original Message -----
> hi,
> 
> i've config'ed my DL 380 G6 four ethernet ports so that they are
> bonded
> to "bond0", and my ILO devices are on the fence switch (separate
> switch).
> 
> i'm 99.9% sure this is wrong.
> 
> i'm thinking that eth0-2 and the ILO2 port should be on the same
> subnet,
> and eth3 on a separate subnet (for the multicast fencing). that way i
> have fencing on a non-ILO2 ethernet port, and the ILO2 is accessible
> from my main subnet.
> 
> could someone please share with me their network config for an HP
> ILO-based server with four ports? i'd really appreciate it!
> 
> thanks
> yvette
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jayesh.shinde at netcore.co.in  Wed Dec 22 04:53:17 2010
From: jayesh.shinde at netcore.co.in (jayesh.shinde)
Date: Wed, 22 Dec 2010 10:23:17 +0530
Subject: [Linux-cluster] How do I get reiserfs and xfs filesystem options in
 system-config-cluster ?
Message-ID: <4D11843D.4040104@netcore.co.in>

Hello ,

I am configuring redhat cluster suite with RHEL 5.4 , 32 bit architecture

I am using the *system-config-cluster* tool for configuring , I have my 
one SAN partition with *reiserfs* and *xfs*  filesystem.

While configuring the resources , In *file system *option I am only 
getting only *"ext2" and "ext3"* in drop down.

How do I get reiserfs and xfs  filesystem options in drop down ?
Is there any updated package for this or do i need to edit the 
cluster.conf file manually ?


Regards
Jayesh Shinde

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101222/a18a7919/attachment.htm>

From raju.rajsand at gmail.com  Wed Dec 22 06:36:48 2010
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Wed, 22 Dec 2010 06:36:48 +0000
Subject: [Linux-cluster] How do I get reiserfs and xfs filesystem
 options in system-config-cluster ?
In-Reply-To: <4D11843D.4040104@netcore.co.in>
References: <4D11843D.4040104@netcore.co.in>
Message-ID: <AANLkTi=zYwqfjmk5MvNG3C8iZ3DH5_2k+X1shwRid6PW@mail.gmail.com>

Greetings,

On Wed, Dec 22, 2010 at 4:53 AM, jayesh.shinde
<jayesh.shinde at netcore.co.in> wrote:
> Hello ,
>
> I am configuring redhat cluster suite with RHEL 5.4 , 32 bit architecture

> I have my one SAN partition with reiserfs and xfs? filesystem.

To the best of my knowledge, XFS support has just started on RHEL6.

I am not sure that ReiserFS was ever supported by Redhat.

If you are trying to use those filesystems in the cluster, I don't
think they are cluster aware.

YMMV.

Regards,

Rajagopal



From jayesh.shinde at netcore.co.in  Wed Dec 22 07:27:26 2010
From: jayesh.shinde at netcore.co.in (jayesh.shinde)
Date: Wed, 22 Dec 2010 12:57:26 +0530
Subject: [Linux-cluster] How do I get reiserfs and xfs filesystem
 options in system-config-cluster ?
In-Reply-To: <AANLkTi=zYwqfjmk5MvNG3C8iZ3DH5_2k+X1shwRid6PW@mail.gmail.com>
References: <4D11843D.4040104@netcore.co.in>
	<AANLkTi=zYwqfjmk5MvNG3C8iZ3DH5_2k+X1shwRid6PW@mail.gmail.com>
Message-ID: <4D11A85E.104@netcore.co.in>

Hi  Rajagopal

I am not clear fully.  I will use RHEL 6 . I want some more 
clarification on below points

1) You mean to say I can't use XFS with cluster ?  OR there is no option 
for XFS with system-config-cluster ?
2) If I edited the cluster.conf file manually for "xfs" will the cluster 
server work well ?

3) what is work around solution ?

Regards
Jayesh Shinde

On 12/22/2010 12:06 PM, Rajagopal Swaminathan wrote:
> Greetings,
>
> On Wed, Dec 22, 2010 at 4:53 AM, jayesh.shinde
> <jayesh.shinde at netcore.co.in>  wrote:
>> Hello ,
>>
>> I am configuring redhat cluster suite with RHEL 5.4 , 32 bit architecture
>> I have my one SAN partition with reiserfs and xfs  filesystem.
> To the best of my knowledge, XFS support has just started on RHEL6.
>
> I am not sure that ReiserFS was ever supported by Redhat.
>
> If you are trying to use those filesystems in the cluster, I don't
> think they are cluster aware.
>
> YMMV.
>
> Regards,
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From raju.rajsand at gmail.com  Wed Dec 22 07:50:24 2010
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Wed, 22 Dec 2010 07:50:24 +0000
Subject: [Linux-cluster] How do I get reiserfs and xfs filesystem
 options in system-config-cluster ?
In-Reply-To: <4D11A85E.104@netcore.co.in>
References: <4D11843D.4040104@netcore.co.in>
	<AANLkTi=zYwqfjmk5MvNG3C8iZ3DH5_2k+X1shwRid6PW@mail.gmail.com>
	<4D11A85E.104@netcore.co.in>
Message-ID: <AANLkTi=P2_EE7eswXP9--AK9HT1LtTnPFFdmTJ7hkyxy@mail.gmail.com>

Greetings,

On Wed, Dec 22, 2010 at 7:27 AM, jayesh.shinde
<jayesh.shinde at netcore.co.in> wrote:
> Hi ?Rajagopal
>
> I am not clear fully. ?I will use RHEL 6 . I want some more clarification on
> below points
>
> 1) You mean to say I can't use XFS with cluster ?

XFS/reiserFS is not a cluster aware filesystem like GFS2 or OCFS or GPFS.

AFAIK, You cannot use it for multiple hosts concurrantly accessing the
filesystem

> OR there is no option for XFS with system-config-cluster ?

I do not have cluster in front of me to attempt answering that...

It can be used in the same sense as ext3/4.

> 2) If I edited the cluster.conf file manually for "xfs" will the cluster
> server work well ?
>
> 3) what is work around solution ?
>

You haven't defined your problem clearly enough.

[commercial-plug] available for a fee currently in Mumbai -- mail me
in private :)

Regards,

Rajagopal



From rafagriman at gmail.com  Wed Dec 22 07:58:03 2010
From: rafagriman at gmail.com (Rafa Griman)
Date: Wed, 22 Dec 2010 08:58:03 +0100
Subject: [Linux-cluster] How do I get reiserfs and xfs filesystem
 options in system-config-cluster ?
In-Reply-To: <4D11A85E.104@netcore.co.in>
References: <4D11843D.4040104@netcore.co.in>
	<AANLkTi=zYwqfjmk5MvNG3C8iZ3DH5_2k+X1shwRid6PW@mail.gmail.com>
	<4D11A85E.104@netcore.co.in>
Message-ID: <AANLkTinqy3RWsDvG+406FQVq_XjafTW2oJy7YVtoe8Lp@mail.gmail.com>

Hi :)

On Wed, Dec 22, 2010 at 8:27 AM, jayesh.shinde
<jayesh.shinde at netcore.co.in> wrote:
> Hi ?Rajagopal
>
> I am not clear fully. ?I will use RHEL 6 . I want some more clarification on
> below points
>
> 1) You mean to say I can't use XFS with cluster ? ?OR there is no option for
> XFS with system-config-cluster ?


Depends on the type of cluster:
   - HA cluster: no problem as long as it's active/passive. That is:
one server mounts the FS and the other is on standby. If server 1
fails, it releases the FS and server2 mounts it.
   - shared/clustered filesystem: you'd have to go with CXFS (get in
touch with SGI). That is: both servers mount the filesystem at the
same time.


> 2) If I edited the cluster.conf file manually for "xfs" will the cluster
> server work well ?
>
> 3) what is work around solution ?
>
> Regards
> Jayesh Shinde
>
> On 12/22/2010 12:06 PM, Rajagopal Swaminathan wrote:
>>
>> Greetings,
>>
>> On Wed, Dec 22, 2010 at 4:53 AM, jayesh.shinde
>> <jayesh.shinde at netcore.co.in> ?wrote:
>>>
>>> Hello ,
>>>
>>> I am configuring redhat cluster suite with RHEL 5.4 , 32 bit architecture
>>> I have my one SAN partition with reiserfs and xfs ?filesystem.
>>
>> To the best of my knowledge, XFS support has just started on RHEL6.
>>
>> I am not sure that ReiserFS was ever supported by Redhat.
>>
>> If you are trying to use those filesystems in the cluster, I don't
>> think they are cluster aware.
>>
>> YMMV.
>>
>> Regards,
>>
>> Rajagopal


HTH

   Rafa



From yvette at dbtgroup.com  Wed Dec 22 17:27:35 2010
From: yvette at dbtgroup.com (yvette hirth)
Date: Wed, 22 Dec 2010 17:27:35 +0000
Subject: [Linux-cluster] gfs2.fsck bug
Message-ID: <4D123507.70006@dbtgroup.com>

hi,

our gfs2 datasets are down; when i try to do a mount i get:

[root at DBT1 ~]# mount -a
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm
/sbin/mount.gfs2: node not a member of the default fence domain
/sbin/mount.gfs2: error mounting lockproto lock_dlm

our cluster.conf is consistent across all devices (listed below).

so i thought an fsck would fix this, then i get:

[root at DBT1 ~]# fsck.gfs2 -fnp /dev/NEWvg/NEWlvTemp
(snippage)
RG #4909212 (0x4ae89c) free count inconsistent: is 16846 should be 17157
Resource group counts updated
Unlinked block 8639983 (0x83d5ef) bitmap fixed.
RG #8639976 (0x83d5e8) free count inconsistent: is 65411 should be 65412
Inode count inconsistent: is 20 should be 19
Resource group counts updated
Pass5 complete
The statfs file is wrong:

Current statfs values:
blocks:  43324224 (0x2951340)
free:    38433917 (0x24a747d)
dinodes: 21085 (0x525d)

Calculated statfs values:
blocks:  43324224 (0x2951340)
free:    38466752 (0x24af4c0)
dinodes: 21083 (0x525b)
The statfs file was fixed.

gfs2_fsck: bad write: Bad file descriptor on line 44 of file buf.c

i read in https://bugzilla.redhat.com/show_bug.cgi?id=457557 that there 
is some way of fixing this with gfs2_edit - are there docs available?

as we've been having fencing issues, i removed two servers (DBT2/DBT3) 
from the cluster fencing, and they are not active at this time.  would 
this cause the mount issues?

tia for any advice / guidance.

yvette

our cluster.conf:

<?xml version="1.0"?>
<cluster alias="DBT0_DBT1_HA" config_version="85" name="DBT0_DBT1_HA">
	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="1"/>
	<clusternodes>
		<clusternode name="DBT0" nodeid="1" votes="3">
			<fence>
				<method name="1">
					<device name="DBT0_ILO2"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="DBT1" nodeid="2" votes="3">
			<fence>
				<method name="1">
					<device name="DBT1_ILO2"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="DEV" nodeid="3" votes="3">
			<fence>
				<method name="1">
					<device name="DEV_ILO2"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="DBT2" nodeid="4" votes="1">
			<fence>
				<method name="1"/>
			</fence>
		</clusternode>
		<clusternode name="DBT3" nodeid="5" votes="1">
			<fence>
				<method name="1"/>
			</fence>
		</clusternode>
	</clusternodes>
	<cman/>
	<fencedevices>
		<fencedevice agent="fence_ilo" hostname="192.168.200.140" login="foo" 
name="DBT0_ILO2" passwd="foo"/>
		<fencedevice agent="fence_ilo" hostname="192.168.200.150" login="foo" 
name="DEV_ILO2" passwd="foo"/>
		<fencedevice agent="fence_ilo" hostname="192.168.200.141" login="foo" 
name="DBT1_ILO2" passwd="foo"/>
	</fencedevices>
	<rm>
		<failoverdomains/>
		<resources>
			<clusterfs device="/dev/foo0vg/foo0vol002" force_unmount="1" 
fsid="19150" fstype="gfs2" mountpoint="/foo0vol002" name="foo0vol002" 
options="data=writeback" self_fence="0"/>
			<clusterfs device="/dev/foo0vg/foo0lvvol003" force_unmount="1" 
fsid="51633" fstype="gfs2" mountpoint="/foo0vol003" name="foo0vol003" 
options="data=writeback" self_fence="0"/>
			<clusterfs device="/dev/foo0vg/foo0lvvol004" force_unmount="1" 
fsid="36294" fstype="gfs2" mountpoint="/foo0vol004" name="foo0vol004" 
options="data=writeback" self_fence="0"/>
			<clusterfs device="/dev/foo0vg/foo0vol005" force_unmount="1" 
fsid="48920" fstype="gfs2" mountpoint="/foo0vol005" name="foo0vol005" 
options="noatime,noquota,data=writeback" self_fence="0"/>
			<clusterfs device="/dev/foo1vg/foo1lvvol000" force_unmount="1" 
fsid="24235" fstype="gfs2" mountpoint="/foo0vol000" name="foo0vol000" 
options="data=ordered" self_fence="0"/>
			<clusterfs device="/dev/foo1vg/foo1lvvol001" force_unmount="1" 
fsid="34088" fstype="gfs2" mountpoint="/foo0vol001" name="foo0vol001" 
options="data=ordered" self_fence="0"/>
		</resources>
	</rm>
	<totem consensus="4800" join="60" token="10000" 
token_retransmits_before_loss_const="20"/>
	<dlm plock_ownership="1" plock_rate_limit="0"/>
	<gfs_controld plock_rate_limit="0"/>
</cluster>



From bturner at redhat.com  Wed Dec 22 17:58:18 2010
From: bturner at redhat.com (Ben Turner)
Date: Wed, 22 Dec 2010 12:58:18 -0500 (EST)
Subject: [Linux-cluster] How do I get reiserfs and xfs filesystem
 options in system-config-cluster ?
In-Reply-To: <AANLkTinqy3RWsDvG+406FQVq_XjafTW2oJy7YVtoe8Lp@mail.gmail.com>
Message-ID: <2040486775.37101.1293040698392.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>

As far as I know RHEL 6 doesn't include system-config-cluster anymore.  I suggest you use the luci interface to configure this.  When creating a new service in luci you can choose the filesystem resource, this resource handles 9 different filesystems including reiser and XFS as well an an autodetect option.  You can also manually edit the cluster.conf file to make these changes.  

You can use XFS with cluster, the point that others were trying to make is that XFS is not a shared filesystem like GFS and can only be mounted on one node at a time.

-Ben

----- Original Message -----
> Hi :)
> 
> On Wed, Dec 22, 2010 at 8:27 AM, jayesh.shinde
> <jayesh.shinde at netcore.co.in> wrote:
> > Hi Rajagopal
> >
> > I am not clear fully. I will use RHEL 6 . I want some more
> > clarification on
> > below points
> >
> > 1) You mean to say I can't use XFS with cluster ? OR there is no
> > option for
> > XFS with system-config-cluster ?
> 
> 
> Depends on the type of cluster:
> - HA cluster: no problem as long as it's active/passive. That is:
> one server mounts the FS and the other is on standby. If server 1
> fails, it releases the FS and server2 mounts it.
> - shared/clustered filesystem: you'd have to go with CXFS (get in
> touch with SGI). That is: both servers mount the filesystem at the
> same time.
> 
> 
> > 2) If I edited the cluster.conf file manually for "xfs" will the
> > cluster
> > server work well ?
> >
> > 3) what is work around solution ?
> >
> > Regards
> > Jayesh Shinde
> >
> > On 12/22/2010 12:06 PM, Rajagopal Swaminathan wrote:
> >>
> >> Greetings,
> >>
> >> On Wed, Dec 22, 2010 at 4:53 AM, jayesh.shinde
> >> <jayesh.shinde at netcore.co.in> wrote:
> >>>
> >>> Hello ,
> >>>
> >>> I am configuring redhat cluster suite with RHEL 5.4 , 32 bit
> >>> architecture
> >>> I have my one SAN partition with reiserfs and xfs filesystem.
> >>
> >> To the best of my knowledge, XFS support has just started on RHEL6.
> >>
> >> I am not sure that ReiserFS was ever supported by Redhat.
> >>
> >> If you are trying to use those filesystems in the cluster, I don't
> >> think they are cluster aware.
> >>
> >> YMMV.
> >>
> >> Regards,
> >>
> >> Rajagopal
> 
> 
> HTH
> 
> Rafa
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From rpeterso at redhat.com  Wed Dec 22 18:10:24 2010
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 22 Dec 2010 13:10:24 -0500 (EST)
Subject: [Linux-cluster] gfs2.fsck bug
In-Reply-To: <4D123507.70006@dbtgroup.com>
Message-ID: <2139528362.53169.1293041424887.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>

----- Original Message -----
| hi,
| 
| our gfs2 datasets are down; when i try to do a mount i get:
| 
| [root at DBT1 ~]# mount -a
| /sbin/mount.gfs2: node not a member of the default fence domain
| /sbin/mount.gfs2: error mounting lockproto lock_dlm
| /sbin/mount.gfs2: node not a member of the default fence domain
| /sbin/mount.gfs2: error mounting lockproto lock_dlm
| /sbin/mount.gfs2: node not a member of the default fence domain
| /sbin/mount.gfs2: error mounting lockproto lock_dlm
| /sbin/mount.gfs2: node not a member of the default fence domain
| /sbin/mount.gfs2: error mounting lockproto lock_dlm
| /sbin/mount.gfs2: node not a member of the default fence domain
| /sbin/mount.gfs2: error mounting lockproto lock_dlm
| /sbin/mount.gfs2: node not a member of the default fence domain
| /sbin/mount.gfs2: error mounting lockproto lock_dlm
| 
| our cluster.conf is consistent across all devices (listed below).
| 
| so i thought an fsck would fix this, then i get:
| 
| [root at DBT1 ~]# fsck.gfs2 -fnp /dev/NEWvg/NEWlvTemp
| (snippage)
| RG #4909212 (0x4ae89c) free count inconsistent: is 16846 should be
| 17157
| Resource group counts updated
| Unlinked block 8639983 (0x83d5ef) bitmap fixed.
| RG #8639976 (0x83d5e8) free count inconsistent: is 65411 should be
| 65412
| Inode count inconsistent: is 20 should be 19
| Resource group counts updated
| Pass5 complete
| The statfs file is wrong:
| 
| Current statfs values:
| blocks: 43324224 (0x2951340)
| free: 38433917 (0x24a747d)
| dinodes: 21085 (0x525d)
| 
| Calculated statfs values:
| blocks: 43324224 (0x2951340)
| free: 38466752 (0x24af4c0)
| dinodes: 21083 (0x525b)
| The statfs file was fixed.
| 
| gfs2_fsck: bad write: Bad file descriptor on line 44 of file buf.c
| 
| i read in https://bugzilla.redhat.com/show_bug.cgi?id=457557 that
| there
| is some way of fixing this with gfs2_edit - are there docs available?

Hi Yvette,

There is not enough information to know whether or not this may
be fixed easily with gfs2_edit since I don't know what block it's
failing on when you run fsck.gfs2.

What version of fsck.gfs2 are you running?  Are you running the version
from my people page?  If not, you could try it.

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/fsck.gfs2

Regards,

Bob Peterson
Red Hat File Systems



From bturner at redhat.com  Wed Dec 22 19:15:43 2010
From: bturner at redhat.com (Ben Turner)
Date: Wed, 22 Dec 2010 14:15:43 -0500 (EST)
Subject: [Linux-cluster] gfs2.fsck bug
In-Reply-To: <4D123507.70006@dbtgroup.com>
Message-ID: <1847865255.38090.1293045343379.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>

My responses inline:

> hi,
> 
> our gfs2 datasets are down; when i try to do a mount i get:
> 
> [root at DBT1 ~]# mount -a
> /sbin/mount.gfs2: node not a member of the default fence domain
> /sbin/mount.gfs2: error mounting lockproto lock_dlm
> /sbin/mount.gfs2: node not a member of the default fence domain
> /sbin/mount.gfs2: error mounting lockproto lock_dlm
> /sbin/mount.gfs2: node not a member of the default fence domain
> /sbin/mount.gfs2: error mounting lockproto lock_dlm
> /sbin/mount.gfs2: node not a member of the default fence domain
> /sbin/mount.gfs2: error mounting lockproto lock_dlm
> /sbin/mount.gfs2: node not a member of the default fence domain
> /sbin/mount.gfs2: error mounting lockproto lock_dlm
> /sbin/mount.gfs2: node not a member of the default fence domain
> /sbin/mount.gfs2: error mounting lockproto lock_dlm

This makes me think the node trying to mount your GFS FS is not currently a member of the cluster.  Check cman_tool services on all nodes, everything should be in the state NONE.  If it is not then there is prolly a membership issue.

> 
> our cluster.conf is consistent across all devices (listed below).
> 
> so i thought an fsck would fix this, then i get:
> 
> [root at DBT1 ~]# fsck.gfs2 -fnp /dev/NEWvg/NEWlvTemp
> (snippage)
> RG #4909212 (0x4ae89c) free count inconsistent: is 16846 should be
> 17157
> Resource group counts updated
> Unlinked block 8639983 (0x83d5ef) bitmap fixed.
> RG #8639976 (0x83d5e8) free count inconsistent: is 65411 should be
> 65412
> Inode count inconsistent: is 20 should be 19
> Resource group counts updated
> Pass5 complete
> The statfs file is wrong:
> 
> Current statfs values:
> blocks: 43324224 (0x2951340)
> free: 38433917 (0x24a747d)
> dinodes: 21085 (0x525d)
> 
> Calculated statfs values:
> blocks: 43324224 (0x2951340)
> free: 38466752 (0x24af4c0)
> dinodes: 21083 (0x525b)
> The statfs file was fixed.
> 
> gfs2_fsck: bad write: Bad file descriptor on line 44 of file buf.c
> 
> i read in https://bugzilla.redhat.com/show_bug.cgi?id=457557 that
> there
> is some way of fixing this with gfs2_edit - are there docs available?

There is a development version of fsck that I have had success fixing several issue with.  It can be found at:

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/

I can't comment on the gfs2_edit procedure, maybe someone else on the list can comment here if that is a better idea than the experimental gfs2 fsck.


> 
> as we've been having fencing issues, i removed two servers (DBT2/DBT3)
> from the cluster fencing, and they are not active at this time. would
> this cause the mount issues?

I see you removed the fence devices from:

 <clusternode name="DBT3" nodeid="5" votes="1">
 <fence>
 <method name="1"/>
 </fence>

If there was a fence event on this node I could see that as a cause for not being able to mount GFS.  Any time there is lost heartbeat all cluster resources will remain frozen until there is a successful fence, without a fence device you should see failed fence messages all through the logs.
 
> tia for any advice / guidance.
> 
> yvette
> 
> our cluster.conf:
> 
> <?xml version="1.0"?>
> <cluster alias="DBT0_DBT1_HA" config_version="85" name="DBT0_DBT1_HA">
> <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="1"/>
> <clusternodes>
> <clusternode name="DBT0" nodeid="1" votes="3">
> <fence>
> <method name="1">
> <device name="DBT0_ILO2"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="DBT1" nodeid="2" votes="3">
> <fence>
> <method name="1">
> <device name="DBT1_ILO2"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="DEV" nodeid="3" votes="3">
> <fence>
> <method name="1">
> <device name="DEV_ILO2"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="DBT2" nodeid="4" votes="1">
> <fence>
> <method name="1"/>
> </fence>
> </clusternode>
> <clusternode name="DBT3" nodeid="5" votes="1">
> <fence>
> <method name="1"/>
> </fence>
> </clusternode>
> </clusternodes>
> <cman/>
> <fencedevices>
> <fencedevice agent="fence_ilo" hostname="192.168.200.140" login="foo"
> name="DBT0_ILO2" passwd="foo"/>
> <fencedevice agent="fence_ilo" hostname="192.168.200.150" login="foo"
> name="DEV_ILO2" passwd="foo"/>
> <fencedevice agent="fence_ilo" hostname="192.168.200.141" login="foo"
> name="DBT1_ILO2" passwd="foo"/>
> </fencedevices>
> <rm>
> <failoverdomains/>
> <resources>
> <clusterfs device="/dev/foo0vg/foo0vol002" force_unmount="1"
> fsid="19150" fstype="gfs2" mountpoint="/foo0vol002" name="foo0vol002"
> options="data=writeback" self_fence="0"/>
> <clusterfs device="/dev/foo0vg/foo0lvvol003" force_unmount="1"
> fsid="51633" fstype="gfs2" mountpoint="/foo0vol003" name="foo0vol003"
> options="data=writeback" self_fence="0"/>
> <clusterfs device="/dev/foo0vg/foo0lvvol004" force_unmount="1"
> fsid="36294" fstype="gfs2" mountpoint="/foo0vol004" name="foo0vol004"
> options="data=writeback" self_fence="0"/>
> <clusterfs device="/dev/foo0vg/foo0vol005" force_unmount="1"
> fsid="48920" fstype="gfs2" mountpoint="/foo0vol005" name="foo0vol005"
> options="noatime,noquota,data=writeback" self_fence="0"/>
> <clusterfs device="/dev/foo1vg/foo1lvvol000" force_unmount="1"
> fsid="24235" fstype="gfs2" mountpoint="/foo0vol000" name="foo0vol000"
> options="data=ordered" self_fence="0"/>
> <clusterfs device="/dev/foo1vg/foo1lvvol001" force_unmount="1"
> fsid="34088" fstype="gfs2" mountpoint="/foo0vol001" name="foo0vol001"
> options="data=ordered" self_fence="0"/>
> </resources>
> </rm>
> <totem consensus="4800" join="60" token="10000"
> token_retransmits_before_loss_const="20"/>
> <dlm plock_ownership="1" plock_rate_limit="0"/>
> <gfs_controld plock_rate_limit="0"/>
> </cluster>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From rossnick-lists at cybercat.ca  Wed Dec 22 19:38:08 2010
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Wed, 22 Dec 2010 14:38:08 -0500
Subject: [Linux-cluster] New cluster : installing...
Message-ID: <88BD849351CE45BBA495A52B387E6B73@versa>

Hi !

Over the last couple of weeks, I've been playing with the cluster suite and 
RHEL 6 beta 2, that was availaible.

Now, I got a 30 day demo of RHEL 6 to begin the re-installation from scratch 
for ou soon to be production cluster. With the beta, I had a deamon running, 
that was clvmd for the cluster logical volume manager daemon. This package 
doesn't seem to exist anymore.

The package lvm2-cluster is on the installation DVD, but I can't seem to 
install it via yum. I did enabled the High Availability channel to our 
servers, but it's not in there. I can't seem to find in wich software 
channel it's located.

Can anyone tell me ? 



From bturner at redhat.com  Wed Dec 22 21:14:23 2010
From: bturner at redhat.com (Ben Turner)
Date: Wed, 22 Dec 2010 16:14:23 -0500 (EST)
Subject: [Linux-cluster] New cluster : installing...
In-Reply-To: <88BD849351CE45BBA495A52B387E6B73@versa>
Message-ID: <1410843832.39645.1293052463218.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>

Did you enable resilient storage?  On my system:

[root at cs-rh6-3 gfs-test-scripts]# yum info lvm2-cluster
Loaded plugins: refresh-packagekit, rhnplugin
Installed Packages
Name       : lvm2-cluster
Arch       : x86_64
Version    : 2.02.72
Release    : 8.el6_0.3
Size       : 581 k
Repo       : installed
>From repo  : rhel-x86_64-server-rs-6
Summary    : Cluster extensions for userland logical volume management tools
URL        : http://sources.redhat.com/lvm2
License    : GPLv2
Description: Extensions to LVM2 to support clusters.

Available Packages
Name       : lvm2-cluster
Arch       : x86_64
Version    : 2.02.72
Release    : 8.el6_0.4
Size       : 307 k
Repo       : rhel-x86_64-server-rs-6
Summary    : Cluster extensions for userland logical volume management tools
License    : GPLv2
Description: Extensions to LVM2 to support clusters.

-b

----- Original Message -----
> Hi !
> 
> Over the last couple of weeks, I've been playing with the cluster
> suite and
> RHEL 6 beta 2, that was availaible.
> 
> Now, I got a 30 day demo of RHEL 6 to begin the re-installation from
> scratch
> for ou soon to be production cluster. With the beta, I had a deamon
> running,
> that was clvmd for the cluster logical volume manager daemon. This
> package
> doesn't seem to exist anymore.
> 
> The package lvm2-cluster is on the installation DVD, but I can't seem
> to
> install it via yum. I did enabled the High Availability channel to our
> servers, but it's not in there. I can't seem to find in wich software
> channel it's located.
> 
> Can anyone tell me ?
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From pradhanparas at gmail.com  Wed Dec 22 21:21:46 2010
From: pradhanparas at gmail.com (Paras pradhan)
Date: Wed, 22 Dec 2010 15:21:46 -0600
Subject: [Linux-cluster] GFS problem
Message-ID: <AANLkTikA1ca+kDcZfeYByX5pdDUcPsq9Eo0s4-6jrxA2@mail.gmail.com>

Hi,

This morning when I rebooted one node out of the 3 nodes cluster, it
came back normally but saw repeated INFO of GFS :

--

INFO: task gfs2_quotad:7957 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
gfs2_quotad   D ffff8800ea0cfd30     0  7957     67          7965  7956 (L-TLB)
 ffff8800ea0cfcd0  0000000000000246  0000000000000000  ffff8800f996d800
 000000000000000a  ffff8800fb3a70c0  ffff8800ffceb7e0  000000000000a429
 ffff8800fb3a72a8  0000000000000000
Call Trace:
 [<ffffffff8886d7b8>] :dlm:dlm_put_lockspace+0x10/0x1f
 [<ffffffff8886be5f>] :dlm:dlm_lock+0x117/0x129
 [<ffffffff88910556>] :lock_dlm:gdlm_ast+0x0/0x311
 [<ffffffff889102c1>] :lock_dlm:gdlm_bast+0x0/0x8d
 [<ffffffff88894e8f>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff88894e98>] :gfs2:just_schedule+0x9/0xe
 [<ffffffff80263825>] __wait_on_bit+0x40/0x6e
 [<ffffffff88894e8f>] :gfs2:just_schedule+0x0/0xe
 [<ffffffff802638bf>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8029c45e>] wake_bit_function+0x0/0x23
 [<ffffffff88894e8a>] :gfs2:gfs2_glock_wait+0x2b/0x30
 [<ffffffff888ab456>] :gfs2:gfs2_statfs_sync+0x3f/0x165
 [<ffffffff888ab44e>] :gfs2:gfs2_statfs_sync+0x37/0x165
 [<ffffffff8025dd7c>] del_timer_sync+0xc/0x16
 [<ffffffff888a5277>] :gfs2:quotad_check_timeo+0x20/0x60
 [<ffffffff888a6d46>] :gfs2:gfs2_quotad+0xde/0x214
 [<ffffffff8029c430>] autoremove_wake_function+0x0/0x2e
 [<ffffffff888a6c68>] :gfs2:gfs2_quotad+0x0/0x214
 [<ffffffff8029c218>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233be4>] kthread+0xfe/0x132
 [<ffffffff80260b2c>] child_rip+0xa/0x12
 [<ffffffff8029c218>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233ae6>] kthread+0x0/0x132
 [<ffffffff80260b22>] child_rip+0x0/0x12
--

clustat was not listing the services too saying Service temoprarily
unavailible. try again later...

Then I ran gfs2_list df. It printed out few lines then it stopped. I
could't do 'ls; on mounted GFS file-systems on all three nodes. Then I
rebooted this node once again. After that everything is normal.

Just wanted to know what might has caused the problem.

messaged logs says:


Dec 22 10:53:57 cvprd2 fenced[7379]: fence "xxxx.xxx.xxx" success
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms1.1: jid=2:
Trying to acquire journal lock...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms2.2: jid=1:
Trying to acquire journal lock...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms3.1: jid=0:
Trying to acquire journal lock...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms4.1: jid=0:
Trying to acquire journal lock...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms5.1: jid=0:
Trying to acquire journal lock...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms6.1: jid=0:
Trying to acquire journal lock...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms1.1: jid=2:
Looking at journal...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms2.2: jid=1:
Looking at journal...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms3.1: jid=0:
Looking at journal...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms5.1: jid=0:
Looking at journal...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms6.1: jid=0:
Looking at journal...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms4.1: jid=0:
Looking at journal...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms1.1: jid=2:
Acquiring the transaction lock...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms1.1: jid=2:
Replaying journal...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms1.1: jid=2:
Replayed 0 of 0 blocks
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms1.1: jid=2:
Found 0 revoke tags
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms1.1: jid=2:
Journal replayed in 0s
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms1.1: jid=2: Done
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms3.1: jid=0:
Acquiring the transaction lock...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms3.1: jid=0:
Replaying journal...
Dec 22 10:53:57 cvprd2 kernel: GFS2: fsid=vprd:guest_vms3.1: jid=0:
Replayed 1 of 1 blocks
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms3.1: jid=0:
Found 0 revoke tags
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms3.1: jid=0:
Journal replayed in 1s
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms3.1: jid=0: Done
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms2.2: jid=1:
Acquiring the transaction lock...
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms2.2: jid=1:
Replaying journal...
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms2.2: jid=1:
Replayed 5 of 5 blocks
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms2.2: jid=1:
Found 0 revoke tags
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms2.2: jid=1:
Journal replayed in 0s
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms2.2: jid=1: Done
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms5.1: jid=0: Done
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms6.1: jid=0: Done
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms4.1: jid=0:
Acquiring the transaction lock...
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms4.1: jid=0:
Replaying journal...
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms4.1: jid=0:
Replayed 0 of 0 blocks
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms4.1: jid=0:
Found 0 revoke tags
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms4.1: jid=0:
Journal replayed in 0s
Dec 22 10:53:58 cvprd2 kernel: GFS2: fsid=vprd:guest_vms4.1: jid=0: Done


OS: RHEL 5.5 64 bit (up to date)

Thanks!
Paras.



From ricks at nerd.com  Wed Dec 22 21:30:51 2010
From: ricks at nerd.com (Rick Stevens)
Date: Wed, 22 Dec 2010 13:30:51 -0800
Subject: [Linux-cluster] New cluster : installing...
In-Reply-To: <88BD849351CE45BBA495A52B387E6B73@versa>
References: <88BD849351CE45BBA495A52B387E6B73@versa>
Message-ID: <4D126E0B.8040703@nerd.com>

On 12/22/2010 11:38 AM, Nicolas Ross wrote:
> Hi !
>
> Over the last couple of weeks, I've been playing with the cluster suite
> and RHEL 6 beta 2, that was availaible.
>
> Now, I got a 30 day demo of RHEL 6 to begin the re-installation from
> scratch for ou soon to be production cluster. With the beta, I had a
> deamon running, that was clvmd for the cluster logical volume manager
> daemon. This package doesn't seem to exist anymore.
>
> The package lvm2-cluster is on the installation DVD, but I can't seem to
> install it via yum.

Mount your DVD and you should be able to install from it by specifying
the full path to the RPM file:

	# yum install /media/cdrom-mount-point/path/to/file.rpm
	# rpm -ivh /media/cdrom-mount-point/path/to/file.rpm

Double clicking on the RPM in the desktop file manager should also offer
you the ability to install it.

>                    I did enabled the High Availability channel to our
> servers, but it's not in there. I can't seem to find in wich software
> channel it's located.
>
> Can anyone tell me ?
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, C2 Hosting          ricks at nerd.com -
- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
-                                                                    -
-                      LOOK OUT!!! BEHIND YOU!!!                     -
----------------------------------------------------------------------



From yvette at dbtgroup.com  Wed Dec 22 21:43:58 2010
From: yvette at dbtgroup.com (yvette hirth)
Date: Wed, 22 Dec 2010 21:43:58 +0000
Subject: [Linux-cluster] fixed
Message-ID: <4D12711E.9060407@dbtgroup.com>

hi,

first, a big thanks to Bob Peterson and Ben Turner:

the "devel" edition of fsck.gfs2 ended normally where the 5.5 current 
version didn't; and

after installing new WTI power switch and reconfiguring the cluster, all 
my gfs2 shares are now visible across the network.

and to all who responded, thank you as well.

seasons greetings!
yvette hirth



From rossnick-lists at cybercat.ca  Wed Dec 22 22:58:48 2010
From: rossnick-lists at cybercat.ca (Nicolas Ross)
Date: Wed, 22 Dec 2010 17:58:48 -0500
Subject: [Linux-cluster] New cluster : installing...
In-Reply-To: <4D126E0B.8040703@nerd.com>
References: <88BD849351CE45BBA495A52B387E6B73@versa>
	<4D126E0B.8040703@nerd.com>
Message-ID: <61598138B246424CBECF9AD400DDC2B7@Inspiron>

>> Now, I got a 30 day demo of RHEL 6 to begin the re-installation from
>> scratch for ou soon to be production cluster. With the beta, I had a
>> deamon running, that was clvmd for the cluster logical volume manager
>> daemon. This package doesn't seem to exist anymore.
>>
>> The package lvm2-cluster is on the installation DVD, but I can't seem to
>> install it via yum.
>
> Mount your DVD and you should be able to install from it by specifying
> the full path to the RPM file:
>
> # yum install /media/cdrom-mount-point/path/to/file.rpm
> # rpm -ivh /media/cdrom-mount-point/path/to/file.rpm
>

That's what I did in the mean time. It appears that I didn't receive a 
resiliant storgae demi, but a cluster suite demo, which doesn't seem to 
include resiliant.

I sent an email to my account manager on that maner.

Regards, 



From jayesh.shinde at netcore.co.in  Thu Dec 23 04:25:32 2010
From: jayesh.shinde at netcore.co.in (jayesh.shinde)
Date: Thu, 23 Dec 2010 09:55:32 +0530
Subject: [Linux-cluster] How do I get reiserfs and xfs filesystem
 options in system-config-cluster ?
In-Reply-To: <2040486775.37101.1293040698392.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
References: <2040486775.37101.1293040698392.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>
Message-ID: <4D12CF3C.4000506@netcore.co.in>

Thanks Ben, Rajagopal & Rafa  for your guidance.

Regards
Jayesh Shinde


On 12/22/2010 11:28 PM, Ben Turner wrote:
> As far as I know RHEL 6 doesn't include system-config-cluster anymore.  I suggest you use the luci interface to configure this.  When creating a new service in luci you can choose the filesystem resource, this resource handles 9 different filesystems including reiser and XFS as well an an autodetect option.  You can also manually edit the cluster.conf file to make these changes.
>
> You can use XFS with cluster, the point that others were trying to make is that XFS is not a shared filesystem like GFS and can only be mounted on one node at a time.
>
> -Ben
>
> ----- Original Message -----
>> Hi :)
>>
>> On Wed, Dec 22, 2010 at 8:27 AM, jayesh.shinde
>> <jayesh.shinde at netcore.co.in>  wrote:
>>> Hi Rajagopal
>>>
>>> I am not clear fully. I will use RHEL 6 . I want some more
>>> clarification on
>>> below points
>>>
>>> 1) You mean to say I can't use XFS with cluster ? OR there is no
>>> option for
>>> XFS with system-config-cluster ?
>>
>> Depends on the type of cluster:
>> - HA cluster: no problem as long as it's active/passive. That is:
>> one server mounts the FS and the other is on standby. If server 1
>> fails, it releases the FS and server2 mounts it.
>> - shared/clustered filesystem: you'd have to go with CXFS (get in
>> touch with SGI). That is: both servers mount the filesystem at the
>> same time.
>>
>>
>>> 2) If I edited the cluster.conf file manually for "xfs" will the
>>> cluster
>>> server work well ?
>>>
>>> 3) what is work around solution ?
>>>
>>> Regards
>>> Jayesh Shinde
>>>
>>> On 12/22/2010 12:06 PM, Rajagopal Swaminathan wrote:
>>>> Greetings,
>>>>
>>>> On Wed, Dec 22, 2010 at 4:53 AM, jayesh.shinde
>>>> <jayesh.shinde at netcore.co.in>  wrote:
>>>>> Hello ,
>>>>>
>>>>> I am configuring redhat cluster suite with RHEL 5.4 , 32 bit
>>>>> architecture
>>>>> I have my one SAN partition with reiserfs and xfs filesystem.
>>>> To the best of my knowledge, XFS support has just started on RHEL6.
>>>>
>>>> I am not sure that ReiserFS was ever supported by Redhat.
>>>>
>>>> If you are trying to use those filesystems in the cluster, I don't
>>>> think they are cluster aware.
>>>>
>>>> YMMV.
>>>>
>>>> Regards,
>>>>
>>>> Rajagopal
>>
>> HTH
>>
>> Rafa
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From parvez.h.shaikh at gmail.com  Fri Dec 24 05:33:12 2010
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Fri, 24 Dec 2010 11:03:12 +0530
Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
Message-ID: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>

Hi all,

I am using Red Hat cluster 6.2.0 (version shown with cman_tool
version) on Red Hat 5.5

I am on host that has multiple network interfaces and all(or some) of
which may be active while I tried to bring up my IP resource up.

My cluster is of simple configuration -
It has only 2 nodes, and service basically consist of only IP
resource, I had to chose random private IP address for test/debugging
purpose (192.168....)

When I tried to start service it failed with message -

clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured

I manually made this virtual IP available on host and then started
service it worked -

clurgmgrd: [31853]: <debug> 192.168.25.135 already configured


My question is - Is it prerequisite for IP resource to be manually
added before it can be protected via cluster?

Thanks
Parvez



From raju.rajsand at gmail.com  Fri Dec 24 06:00:01 2010
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Fri, 24 Dec 2010 06:00:01 +0000
Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
In-Reply-To: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>
References: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>
Message-ID: <AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu@mail.gmail.com>

Greetings,

On Fri, Dec 24, 2010 at 5:33 AM, Parvez Shaikh
<parvez.h.shaikh at gmail.com> wrote:
> Hi all,
>
> I manually made this virtual IP available on host and then started
> service it worked -
>

Can you please elaborate? did you try to assign IP to the ethx devices
and then ping?

> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
>
>
> My question is - Is it prerequisite for IP resource to be manually
> added before it can be protected via cluster?
>

Every resource/service has to be added to the cluster.

And they cannot be used by anything else.

Regards,

Rajagopal



From parvez.h.shaikh at gmail.com  Fri Dec 24 08:41:53 2010
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Fri, 24 Dec 2010 14:11:53 +0530
Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
In-Reply-To: <AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu@mail.gmail.com>
References: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>
	<AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu@mail.gmail.com>
Message-ID: <AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W@mail.gmail.com>

Hi Rajagopal,

Thank you for your response

I have created a cluster configuration by adding IP resource with
value 192.168.25.153 (some value) and created a service which just has
IP resource part of it. I have set all requisite configuration such
two node, node names, failover,fencing etc.

Upon trying to start then service(enable service),it failed -

clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured

Then manually added this IP to host

ifconfig eth0:1 192.168.25.135

Then service could start but it gave message -

clurgmgrd: [31853]: <debug> 192.168.25.135 already configured

So do I have to add virtual interface manually (as above or any other
method?) before I could start service with IP resource under it?

Thanks
Parvez

On Fri, Dec 24, 2010 at 11:30 AM, Rajagopal Swaminathan
<raju.rajsand at gmail.com> wrote:
> Greetings,
>
> On Fri, Dec 24, 2010 at 5:33 AM, Parvez Shaikh
> <parvez.h.shaikh at gmail.com> wrote:
>> Hi all,
>>
>> I manually made this virtual IP available on host and then started
>> service it worked -
>>
>
> Can you please elaborate? did you try to assign IP to the ethx devices
> and then ping?
>
>> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
>>
>>
>> My question is - Is it prerequisite for IP resource to be manually
>> added before it can be protected via cluster?
>>
>
> Every resource/service has to be added to the cluster.
>
> And they cannot be used by anything else.
>
> Regards,
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From jakov.sosic at srce.hr  Fri Dec 24 12:46:05 2010
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Fri, 24 Dec 2010 13:46:05 +0100
Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
In-Reply-To: <AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W@mail.gmail.com>
References: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>	<AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu@mail.gmail.com>
	<AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W@mail.gmail.com>
Message-ID: <4D14960D.50706@srce.hr>

On 12/24/2010 09:41 AM, Parvez Shaikh wrote:
> Hi Rajagopal,
> 
> Thank you for your response
> 
> I have created a cluster configuration by adding IP resource with
> value 192.168.25.153 (some value) and created a service which just has
> IP resource part of it. I have set all requisite configuration such
> two node, node names, failover,fencing etc.
> 
> Upon trying to start then service(enable service),it failed -
> 
> clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured
> 
> Then manually added this IP to host
> 
> ifconfig eth0:1 192.168.25.135
> 
> Then service could start but it gave message -
> 
> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
> 
> So do I have to add virtual interface manually (as above or any other
> method?) before I could start service with IP resource under it?

How is your network configured? For an IP address to work in a cluster,
you have to have interfaces on both machines set up, which are in the
same subnet. For example:

node1 # ifconfig eth0 192.168.25.11 netmask 255.255.255.0
node2 # ifconfig eth0 192.168.25.12 netmask 255.255.255.0

Then and only then will cluster be able to bring up virtual ip address
and bind it as secondary on this interface. You can then see it with:

# ip addr show


I guess you're trying to bring up IP address from network subnet that is
not in any way set up on your host. And that is a prerequisite with
classic IP resource.




-- 
Jakov Sosic
www.srce.hr



From parvez.h.shaikh at gmail.com  Fri Dec 24 16:46:39 2010
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Fri, 24 Dec 2010 22:16:39 +0530
Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
In-Reply-To: <4D14960D.50706@srce.hr>
References: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>
	<AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu@mail.gmail.com>
	<AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W@mail.gmail.com>
	<4D14960D.50706@srce.hr>
Message-ID: <AANLkTinp=BFk0ssVkDieOvXVaQqeRfMDQgPbbessA-Qc@mail.gmail.com>

Hi Jakov

Thank you for your response. My two hosts have multiple network
interfaces or ethernet cards. I understood from your email, that the
IP corresponding to "cluster node name" for both hosts, should be in
the same subnet before a cluster could bring virtual IP up. I will
reconfirm if these are in same subnets.

Gratefully yours,
Parvez

On Fri, Dec 24, 2010 at 6:16 PM, Jakov Sosic <jakov.sosic at srce.hr> wrote:
> On 12/24/2010 09:41 AM, Parvez Shaikh wrote:
>> Hi Rajagopal,
>>
>> Thank you for your response
>>
>> I have created a cluster configuration by adding IP resource with
>> value 192.168.25.153 (some value) and created a service which just has
>> IP resource part of it. I have set all requisite configuration such
>> two node, node names, failover,fencing etc.
>>
>> Upon trying to start then service(enable service),it failed -
>>
>> clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured
>>
>> Then manually added this IP to host
>>
>> ifconfig eth0:1 192.168.25.135
>>
>> Then service could start but it gave message -
>>
>> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
>>
>> So do I have to add virtual interface manually (as above or any other
>> method?) before I could start service with IP resource under it?
>
> How is your network configured? For an IP address to work in a cluster,
> you have to have interfaces on both machines set up, which are in the
> same subnet. For example:
>
> node1 # ifconfig eth0 192.168.25.11 netmask 255.255.255.0
> node2 # ifconfig eth0 192.168.25.12 netmask 255.255.255.0
>
> Then and only then will cluster be able to bring up virtual ip address
> and bind it as secondary on this interface. You can then see it with:
>
> # ip addr show
>
>
> I guess you're trying to bring up IP address from network subnet that is
> not in any way set up on your host. And that is a prerequisite with
> classic IP resource.
>
>
>
>
> --
> Jakov Sosic
> www.srce.hr
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From jakov.sosic at srce.hr  Sat Dec 25 01:04:16 2010
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Sat, 25 Dec 2010 02:04:16 +0100
Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
In-Reply-To: <AANLkTinp=BFk0ssVkDieOvXVaQqeRfMDQgPbbessA-Qc@mail.gmail.com>
References: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>	<AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu@mail.gmail.com>	<AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W@mail.gmail.com>	<4D14960D.50706@srce.hr>
	<AANLkTinp=BFk0ssVkDieOvXVaQqeRfMDQgPbbessA-Qc@mail.gmail.com>
Message-ID: <4D154310.8050804@srce.hr>

On 12/24/2010 05:46 PM, Parvez Shaikh wrote:
> Hi Jakov
> 
> Thank you for your response. My two hosts have multiple network
> interfaces or ethernet cards. I understood from your email, that the
> IP corresponding to "cluster node name" for both hosts, should be in
> the same subnet before a cluster could bring virtual IP up.

No... you misunderstood me. I meant that if the virtual address is
192.168.25.X, than you have to have interface on each node that is set
up with the ip address from the same subnet. That interface does not
need to correspond to the cluster node name. For example:

node1 - eth0 - 192.168.1.11 (netmask 255.255.255.0)
node2 - eth0 - 192.168.1.12 (netmask 255.255.255.0)

IP resource - 192.168.25.100


Now, how do you expect the cluster to know what to do with IP resource?
On which interface can cluster glue 192.168.25.100? eth0? But why eth0?
And what is the netmask? What about routes?

So, you need to have for example eth1 on both machines set up in the
same subnet, so that cluster can glue IP address from IP resource to
that exact interface (which is set up statically). So you also have to
have for example:

node1 - eth1 - 192.168.25.47 (netmask 255.255.255.0)
node2 - eth1 - 192.168.25.48 (netmask 255.255.255.0)

Now, rgmanager will know where to activate IP resource, because
192.168.25.100 belongs to 192.168.25.0/24 subnet, which is active on
node1/eth1 and node2/eth2.

If you were to have another IP resource, for example 192.168.240.44, you
would need another interface with another set of static ip addresses on
every host you intend to run IP resource on...


I hope you get it correctly now.





-- 
Jakov Sosic
www.srce.hr



From parvez.h.shaikh at gmail.com  Sat Dec 25 03:26:49 2010
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Sat, 25 Dec 2010 08:56:49 +0530
Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
In-Reply-To: <4D154310.8050804@srce.hr>
References: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>
	<AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu@mail.gmail.com>
	<AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W@mail.gmail.com>
	<4D14960D.50706@srce.hr>
	<AANLkTinp=BFk0ssVkDieOvXVaQqeRfMDQgPbbessA-Qc@mail.gmail.com>
	<4D154310.8050804@srce.hr>
Message-ID: <AANLkTinCBfu5DAsTUSAzcTY638UhmV54ttV1TRGjbMHY@mail.gmail.com>

Thanks a ton Jakov. It has clarified my doubts.

Yours gratefully,
Parvez

On Sat, Dec 25, 2010 at 6:34 AM, Jakov Sosic <jakov.sosic at srce.hr> wrote:
> On 12/24/2010 05:46 PM, Parvez Shaikh wrote:
>> Hi Jakov
>>
>> Thank you for your response. My two hosts have multiple network
>> interfaces or ethernet cards. I understood from your email, that the
>> IP corresponding to "cluster node name" for both hosts, should be in
>> the same subnet before a cluster could bring virtual IP up.
>
> No... you misunderstood me. I meant that if the virtual address is
> 192.168.25.X, than you have to have interface on each node that is set
> up with the ip address from the same subnet. That interface does not
> need to correspond to the cluster node name. For example:
>
> node1 - eth0 - 192.168.1.11 (netmask 255.255.255.0)
> node2 - eth0 - 192.168.1.12 (netmask 255.255.255.0)
>
> IP resource - 192.168.25.100
>
>
> Now, how do you expect the cluster to know what to do with IP resource?
> On which interface can cluster glue 192.168.25.100? eth0? But why eth0?
> And what is the netmask? What about routes?
>
> So, you need to have for example eth1 on both machines set up in the
> same subnet, so that cluster can glue IP address from IP resource to
> that exact interface (which is set up statically). So you also have to
> have for example:
>
> node1 - eth1 - 192.168.25.47 (netmask 255.255.255.0)
> node2 - eth1 - 192.168.25.48 (netmask 255.255.255.0)
>
> Now, rgmanager will know where to activate IP resource, because
> 192.168.25.100 belongs to 192.168.25.0/24 subnet, which is active on
> node1/eth1 and node2/eth2.
>
> If you were to have another IP resource, for example 192.168.240.44, you
> would need another interface with another set of static ip addresses on
> every host you intend to run IP resource on...
>
>
> I hope you get it correctly now.
>
>
>
>
>
> --
> Jakov Sosic
> www.srce.hr
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From parvez.h.shaikh at gmail.com  Mon Dec 27 04:21:42 2010
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Mon, 27 Dec 2010 09:51:42 +0530
Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
In-Reply-To: <4D154310.8050804@srce.hr>
References: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>
	<AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu@mail.gmail.com>
	<AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W@mail.gmail.com>
	<4D14960D.50706@srce.hr>
	<AANLkTinp=BFk0ssVkDieOvXVaQqeRfMDQgPbbessA-Qc@mail.gmail.com>
	<4D154310.8050804@srce.hr>
Message-ID: <AANLkTi=2K4mD1ZFVG0CAjh6tDm2jwCHnvhe4nzT19bhX@mail.gmail.com>

Hi

I chose my IP resource as 192.168.13.15, I had eth3 configured on
192.168.13.1 but it still failed with error -

Dec 27 17:35:32 datablade1 clurgmgrd[31853]: <err> Error storing ip: Duplicate
Dec 27 17:36:55 datablade1 clurgmgrd[31853]: <notice> Starting
disabled service service:service1
Dec 27 17:36:55 datablade1 clurgmgrd[31853]: <notice> start on ip
"192.168.13.15/24" returned 1 (generic error)
Dec 27 17:36:55 datablade1 clurgmgrd[31853]: <warning> #68: Failed to
start service:service1; return value: 1

Below is set of interfaces -

eth0      Link encap:Ethernet  HWaddr 00:10:18:66:15:70
          inet addr:192.168.10.1  Bcast:192.168.10.255  Mask:255.255.255.0
          inet6 addr: fe80::210:18ff:fe66:1570/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:125 errors:0 dropped:0 overruns:0 frame:0
          TX packets:305 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:32679 (31.9 KiB)  TX bytes:42477 (41.4 KiB)
          Interrupt:177 Memory:98000000-98012800

eth1      Link encap:Ethernet  HWaddr 00:10:18:66:15:72
          inet addr:192.168.11.1  Bcast:192.168.11.255  Mask:255.255.255.0
          inet6 addr: fe80::210:18ff:fe66:1572/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1237019 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1919245 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:183885611 (175.3 MiB)  TX bytes:337885336 (322.2 MiB)
          Interrupt:154 Memory:9a000000-9a012800

eth2      Link encap:Ethernet  HWaddr 00:10:18:66:15:74
          inet addr:192.168.12.1  Bcast:192.168.12.255  Mask:255.255.255.0
          inet6 addr: fe80::210:18ff:fe66:1574/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:419008 errors:0 dropped:0 overruns:0 frame:0
          TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:26822898 (25.5 MiB)  TX bytes:5992 (5.8 KiB)
          Interrupt:185 Memory:94000000-94012800

eth3      Link encap:Ethernet  HWaddr 00:10:18:66:15:76
          inet addr:192.168.13.1  Bcast:192.168.13.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:162 Memory:96000000-96012800


On Sat, Dec 25, 2010 at 6:34 AM, Jakov Sosic <jakov.sosic at srce.hr> wrote:
> On 12/24/2010 05:46 PM, Parvez Shaikh wrote:
>> Hi Jakov
>>
>> Thank you for your response. My two hosts have multiple network
>> interfaces or ethernet cards. I understood from your email, that the
>> IP corresponding to "cluster node name" for both hosts, should be in
>> the same subnet before a cluster could bring virtual IP up.
>
> No... you misunderstood me. I meant that if the virtual address is
> 192.168.25.X, than you have to have interface on each node that is set
> up with the ip address from the same subnet. That interface does not
> need to correspond to the cluster node name. For example:
>
> node1 - eth0 - 192.168.1.11 (netmask 255.255.255.0)
> node2 - eth0 - 192.168.1.12 (netmask 255.255.255.0)
>
> IP resource - 192.168.25.100
>
>
> Now, how do you expect the cluster to know what to do with IP resource?
> On which interface can cluster glue 192.168.25.100? eth0? But why eth0?
> And what is the netmask? What about routes?
>
> So, you need to have for example eth1 on both machines set up in the
> same subnet, so that cluster can glue IP address from IP resource to
> that exact interface (which is set up statically). So you also have to
> have for example:
>
> node1 - eth1 - 192.168.25.47 (netmask 255.255.255.0)
> node2 - eth1 - 192.168.25.48 (netmask 255.255.255.0)
>
> Now, rgmanager will know where to activate IP resource, because
> 192.168.25.100 belongs to 192.168.25.0/24 subnet, which is active on
> node1/eth1 and node2/eth2.
>
> If you were to have another IP resource, for example 192.168.240.44, you
> would need another interface with another set of static ip addresses on
> every host you intend to run IP resource on...
>
>
> I hope you get it correctly now.
>
>
>
>
>
> --
> Jakov Sosic
> www.srce.hr
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From raju.rajsand at gmail.com  Mon Dec 27 06:48:18 2010
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Mon, 27 Dec 2010 12:18:18 +0530
Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
In-Reply-To: <AANLkTi=2K4mD1ZFVG0CAjh6tDm2jwCHnvhe4nzT19bhX@mail.gmail.com>
References: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>
	<AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu@mail.gmail.com>
	<AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W@mail.gmail.com>
	<4D14960D.50706@srce.hr>
	<AANLkTinp=BFk0ssVkDieOvXVaQqeRfMDQgPbbessA-Qc@mail.gmail.com>
	<4D154310.8050804@srce.hr>
	<AANLkTi=2K4mD1ZFVG0CAjh6tDm2jwCHnvhe4nzT19bhX@mail.gmail.com>
Message-ID: <AANLkTi=KiG-aHbo35n8Qir6mfm2W12OpW3RwHppDA72d@mail.gmail.com>

Greetinds,

On Mon, Dec 27, 2010 at 9:51 AM, Parvez Shaikh
<parvez.h.shaikh at gmail.com> wrote:
> Hi
>
>
> Dec 27 17:35:32 datablade1 clurgmgrd[31853]: <err> Error storing ip: Duplicate
> Dec 27 17:36:55 datablade1 clurgmgrd[31853]: <notice> Starting
> disabled service service:service1
> Dec 27 17:36:55 datablade1 clurgmgrd[31853]: <notice> start on ip
> "192.168.13.15/24" returned 1 (generic error)
> Dec 27 17:36:55 datablade1 clurgmgrd[31853]: <warning> #68: Failed to
> start service:service1; return value: 1
>
> Below is set of interfaces -
>

What does the ip addr show command say?

Regards,

Rajagopal



From parvez.h.shaikh at gmail.com  Mon Dec 27 07:05:27 2010
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Mon, 27 Dec 2010 12:35:27 +0530
Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
In-Reply-To: <AANLkTi=KiG-aHbo35n8Qir6mfm2W12OpW3RwHppDA72d@mail.gmail.com>
References: <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t@mail.gmail.com>
	<AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu@mail.gmail.com>
	<AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W@mail.gmail.com>
	<4D14960D.50706@srce.hr>
	<AANLkTinp=BFk0ssVkDieOvXVaQqeRfMDQgPbbessA-Qc@mail.gmail.com>
	<4D154310.8050804@srce.hr>
	<AANLkTi=2K4mD1ZFVG0CAjh6tDm2jwCHnvhe4nzT19bhX@mail.gmail.com>
	<AANLkTi=KiG-aHbo35n8Qir6mfm2W12OpW3RwHppDA72d@mail.gmail.com>
Message-ID: <AANLkTi=sX2_E5anGR_41TqoV2Bw6A=snzzY1ePLqkoPP@mail.gmail.com>

Hi all

Issue has been resolved. After debugging a bit I found that link to
eth was not detected -

"ethtool ethX | grep "Link detected:" |  awk '{print $3}'"
Output - no

After resolving around this, I could get my IP resource up.

Thank you for your kind suggestions and interest in this problem

Gratefully yours

On Mon, Dec 27, 2010 at 12:18 PM, Rajagopal Swaminathan
<raju.rajsand at gmail.com> wrote:
> Greetinds,
>
> On Mon, Dec 27, 2010 at 9:51 AM, Parvez Shaikh
> <parvez.h.shaikh at gmail.com> wrote:
>> Hi
>>
>>
>> Dec 27 17:35:32 datablade1 clurgmgrd[31853]: <err> Error storing ip: Duplicate
>> Dec 27 17:36:55 datablade1 clurgmgrd[31853]: <notice> Starting
>> disabled service service:service1
>> Dec 27 17:36:55 datablade1 clurgmgrd[31853]: <notice> start on ip
>> "192.168.13.15/24" returned 1 (generic error)
>> Dec 27 17:36:55 datablade1 clurgmgrd[31853]: <warning> #68: Failed to
>> start service:service1; return value: 1
>>
>> Below is set of interfaces -
>>
>
> What does the ip addr show command say?
>
> Regards,
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From sandy.rhce at gmail.com  Mon Dec 27 07:08:39 2010
From: sandy.rhce at gmail.com (sandeeep)
Date: Mon, 27 Dec 2010 12:38:39 +0530
Subject: [Linux-cluster] Linux-cluster Digest, Vol 80, Issue 23
In-Reply-To: <mailman.25.1293210005.4342.linux-cluster@redhat.com>
References: <mailman.25.1293210005.4342.linux-cluster@redhat.com>
Message-ID: <AANLkTi=BA5Q+t=X3Z2zuaHJvaDtOW6285rT+8qh0o6Fb@mail.gmail.com>

Hi,
 I am using RHEL5.4  and trying to make cluster using conga, i did
every  thing,  like in main server i installed luci* cluster* related
package using :"yum groupinstall luci* cluster*,
and same time installed cman*
and in other two nodes i have installed ricci* packege using yum.
now every thing is done, but in server when i am running "service cman
restart" its giving an error like " local  node name is not found in
main conficuration file and /usr/sbin/cman_tool: aisex daemon not
started ."   please have a look into my querry, as i am facing this
problem since many days.


THanks
Sandeep

On 12/24/10, linux-cluster-request at redhat.com
<linux-cluster-request at redhat.com> wrote:
> Send Linux-cluster mailing list submissions to
> 	linux-cluster at redhat.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://www.redhat.com/mailman/listinfo/linux-cluster
> or, via email, send a message with subject or body 'help' to
> 	linux-cluster-request at redhat.com
>
> You can reach the person managing the list at
> 	linux-cluster-owner at redhat.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-cluster digest..."
>
>
> Today's Topics:
>
>    1. IP Resource behavior with Red Hat Cluster (Parvez Shaikh)
>    2. Re: IP Resource behavior with Red Hat Cluster
>       (Rajagopal Swaminathan)
>    3. Re: IP Resource behavior with Red Hat Cluster (Parvez Shaikh)
>    4. Re: IP Resource behavior with Red Hat Cluster (Jakov Sosic)
>    5. Re: IP Resource behavior with Red Hat Cluster (Parvez Shaikh)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 24 Dec 2010 11:03:12 +0530
> From: Parvez Shaikh <parvez.h.shaikh at gmail.com>
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
> Message-ID:
> 	<AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi all,
>
> I am using Red Hat cluster 6.2.0 (version shown with cman_tool
> version) on Red Hat 5.5
>
> I am on host that has multiple network interfaces and all(or some) of
> which may be active while I tried to bring up my IP resource up.
>
> My cluster is of simple configuration -
> It has only 2 nodes, and service basically consist of only IP
> resource, I had to chose random private IP address for test/debugging
> purpose (192.168....)
>
> When I tried to start service it failed with message -
>
> clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured
>
> I manually made this virtual IP available on host and then started
> service it worked -
>
> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
>
>
> My question is - Is it prerequisite for IP resource to be manually
> added before it can be protected via cluster?
>
> Thanks
> Parvez
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 24 Dec 2010 06:00:01 +0000
> From: Rajagopal Swaminathan <raju.rajsand at gmail.com>
> To: linux clustering <linux-cluster at redhat.com>
> Subject: Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster
> Message-ID:
> 	<AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Greetings,
>
> On Fri, Dec 24, 2010 at 5:33 AM, Parvez Shaikh
> <parvez.h.shaikh at gmail.com> wrote:
>> Hi all,
>>
>> I manually made this virtual IP available on host and then started
>> service it worked -
>>
>
> Can you please elaborate? did you try to assign IP to the ethx devices
> and then ping?
>
>> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
>>
>>
>> My question is - Is it prerequisite for IP resource to be manually
>> added before it can be protected via cluster?
>>
>
> Every resource/service has to be added to the cluster.
>
> And they cannot be used by anything else.
>
> Regards,
>
> Rajagopal
>
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 24 Dec 2010 14:11:53 +0530
> From: Parvez Shaikh <parvez.h.shaikh at gmail.com>
> To: linux clustering <linux-cluster at redhat.com>
> Subject: Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster
> Message-ID:
> 	<AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi Rajagopal,
>
> Thank you for your response
>
> I have created a cluster configuration by adding IP resource with
> value 192.168.25.153 (some value) and created a service which just has
> IP resource part of it. I have set all requisite configuration such
> two node, node names, failover,fencing etc.
>
> Upon trying to start then service(enable service),it failed -
>
> clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured
>
> Then manually added this IP to host
>
> ifconfig eth0:1 192.168.25.135
>
> Then service could start but it gave message -
>
> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
>
> So do I have to add virtual interface manually (as above or any other
> method?) before I could start service with IP resource under it?
>
> Thanks
> Parvez
>
> On Fri, Dec 24, 2010 at 11:30 AM, Rajagopal Swaminathan
> <raju.rajsand at gmail.com> wrote:
>> Greetings,
>>
>> On Fri, Dec 24, 2010 at 5:33 AM, Parvez Shaikh
>> <parvez.h.shaikh at gmail.com> wrote:
>>> Hi all,
>>>
>>> I manually made this virtual IP available on host and then started
>>> service it worked -
>>>
>>
>> Can you please elaborate? did you try to assign IP to the ethx devices
>> and then ping?
>>
>>> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
>>>
>>>
>>> My question is - Is it prerequisite for IP resource to be manually
>>> added before it can be protected via cluster?
>>>
>>
>> Every resource/service has to be added to the cluster.
>>
>> And they cannot be used by anything else.
>>
>> Regards,
>>
>> Rajagopal
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 24 Dec 2010 13:46:05 +0100
> From: Jakov Sosic <jakov.sosic at srce.hr>
> To: linux clustering <linux-cluster at redhat.com>
> Subject: Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster
> Message-ID: <4D14960D.50706 at srce.hr>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On 12/24/2010 09:41 AM, Parvez Shaikh wrote:
>> Hi Rajagopal,
>>
>> Thank you for your response
>>
>> I have created a cluster configuration by adding IP resource with
>> value 192.168.25.153 (some value) and created a service which just has
>> IP resource part of it. I have set all requisite configuration such
>> two node, node names, failover,fencing etc.
>>
>> Upon trying to start then service(enable service),it failed -
>>
>> clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured
>>
>> Then manually added this IP to host
>>
>> ifconfig eth0:1 192.168.25.135
>>
>> Then service could start but it gave message -
>>
>> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
>>
>> So do I have to add virtual interface manually (as above or any other
>> method?) before I could start service with IP resource under it?
>
> How is your network configured? For an IP address to work in a cluster,
> you have to have interfaces on both machines set up, which are in the
> same subnet. For example:
>
> node1 # ifconfig eth0 192.168.25.11 netmask 255.255.255.0
> node2 # ifconfig eth0 192.168.25.12 netmask 255.255.255.0
>
> Then and only then will cluster be able to bring up virtual ip address
> and bind it as secondary on this interface. You can then see it with:
>
> # ip addr show
>
>
> I guess you're trying to bring up IP address from network subnet that is
> not in any way set up on your host. And that is a prerequisite with
> classic IP resource.
>
>
>
>
> --
> Jakov Sosic
> www.srce.hr
>
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 24 Dec 2010 22:16:39 +0530
> From: Parvez Shaikh <parvez.h.shaikh at gmail.com>
> To: linux clustering <linux-cluster at redhat.com>
> Subject: Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster
> Message-ID:
> 	<AANLkTinp=BFk0ssVkDieOvXVaQqeRfMDQgPbbessA-Qc at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi Jakov
>
> Thank you for your response. My two hosts have multiple network
> interfaces or ethernet cards. I understood from your email, that the
> IP corresponding to "cluster node name" for both hosts, should be in
> the same subnet before a cluster could bring virtual IP up. I will
> reconfirm if these are in same subnets.
>
> Gratefully yours,
> Parvez
>
> On Fri, Dec 24, 2010 at 6:16 PM, Jakov Sosic <jakov.sosic at srce.hr> wrote:
>> On 12/24/2010 09:41 AM, Parvez Shaikh wrote:
>>> Hi Rajagopal,
>>>
>>> Thank you for your response
>>>
>>> I have created a cluster configuration by adding IP resource with
>>> value 192.168.25.153 (some value) and created a service which just has
>>> IP resource part of it. I have set all requisite configuration such
>>> two node, node names, failover,fencing etc.
>>>
>>> Upon trying to start then service(enable service),it failed -
>>>
>>> clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured
>>>
>>> Then manually added this IP to host
>>>
>>> ifconfig eth0:1 192.168.25.135
>>>
>>> Then service could start but it gave message -
>>>
>>> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
>>>
>>> So do I have to add virtual interface manually (as above or any other
>>> method?) before I could start service with IP resource under it?
>>
>> How is your network configured? For an IP address to work in a cluster,
>> you have to have interfaces on both machines set up, which are in the
>> same subnet. For example:
>>
>> node1 # ifconfig eth0 192.168.25.11 netmask 255.255.255.0
>> node2 # ifconfig eth0 192.168.25.12 netmask 255.255.255.0
>>
>> Then and only then will cluster be able to bring up virtual ip address
>> and bind it as secondary on this interface. You can then see it with:
>>
>> # ip addr show
>>
>>
>> I guess you're trying to bring up IP address from network subnet that is
>> not in any way set up on your host. And that is a prerequisite with
>> classic IP resource.
>>
>>
>>
>>
>> --
>> Jakov Sosic
>> www.srce.hr
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> ------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> End of Linux-cluster Digest, Vol 80, Issue 23
> *********************************************
>



From susvirkar.3616 at gmail.com  Mon Dec 27 09:02:45 2010
From: susvirkar.3616 at gmail.com (umesh susvirkar)
Date: Mon, 27 Dec 2010 14:32:45 +0530
Subject: [Linux-cluster] Linux-cluster Digest, Vol 80, Issue 23
In-Reply-To: <AANLkTi=BA5Q+t=X3Z2zuaHJvaDtOW6285rT+8qh0o6Fb@mail.gmail.com>
References: <mailman.25.1293210005.4342.linux-cluster@redhat.com>
	<AANLkTi=BA5Q+t=X3Z2zuaHJvaDtOW6285rT+8qh0o6Fb@mail.gmail.com>
Message-ID: <AANLkTikfpdBHKdDv_MY2b1jOvNyw7kTVd4GCipJtzJjG@mail.gmail.com>

Hi

Your server hostname & name you specify in cluster.conf file for cluster
node name should be same.is this 2 value are different.
if values are different make it similar and check.




Regards
Umesh Susvirkar


On Mon, Dec 27, 2010 at 12:38 PM, sandeeep <sandy.rhce at gmail.com> wrote:

> Hi,
>  I am using RHEL5.4  and trying to make cluster using conga, i did
> every  thing,  like in main server i installed luci* cluster* related
> package using :"yum groupinstall luci* cluster*,
> and same time installed cman*
> and in other two nodes i have installed ricci* packege using yum.
> now every thing is done, but in server when i am running "service cman
> restart" its giving an error like " local  node name is not found in
> main conficuration file and /usr/sbin/cman_tool: aisex daemon not
> started ."   please have a look into my querry, as i am facing this
> problem since many days.
>
>
> THanks
> Sandeep
>
> On 12/24/10, linux-cluster-request at redhat.com
> <linux-cluster-request at redhat.com> wrote:
> > Send Linux-cluster mailing list submissions to
> >       linux-cluster at redhat.com
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >       https://www.redhat.com/mailman/listinfo/linux-cluster
> > or, via email, send a message with subject or body 'help' to
> >       linux-cluster-request at redhat.com
> >
> > You can reach the person managing the list at
> >       linux-cluster-owner at redhat.com
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Linux-cluster digest..."
> >
> >
> > Today's Topics:
> >
> >    1. IP Resource behavior with Red Hat Cluster (Parvez Shaikh)
> >    2. Re: IP Resource behavior with Red Hat Cluster
> >       (Rajagopal Swaminathan)
> >    3. Re: IP Resource behavior with Red Hat Cluster (Parvez Shaikh)
> >    4. Re: IP Resource behavior with Red Hat Cluster (Jakov Sosic)
> >    5. Re: IP Resource behavior with Red Hat Cluster (Parvez Shaikh)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Fri, 24 Dec 2010 11:03:12 +0530
> > From: Parvez Shaikh <parvez.h.shaikh at gmail.com>
> > To: linux-cluster at redhat.com
> > Subject: [Linux-cluster] IP Resource behavior with Red Hat Cluster
> > Message-ID:
> >       <AANLkTin5PGXoHiLq_5wTfaA-HRcUxcF1hGzGd-1WQP3t at mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Hi all,
> >
> > I am using Red Hat cluster 6.2.0 (version shown with cman_tool
> > version) on Red Hat 5.5
> >
> > I am on host that has multiple network interfaces and all(or some) of
> > which may be active while I tried to bring up my IP resource up.
> >
> > My cluster is of simple configuration -
> > It has only 2 nodes, and service basically consist of only IP
> > resource, I had to chose random private IP address for test/debugging
> > purpose (192.168....)
> >
> > When I tried to start service it failed with message -
> >
> > clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured
> >
> > I manually made this virtual IP available on host and then started
> > service it worked -
> >
> > clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
> >
> >
> > My question is - Is it prerequisite for IP resource to be manually
> > added before it can be protected via cluster?
> >
> > Thanks
> > Parvez
> >
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Fri, 24 Dec 2010 06:00:01 +0000
> > From: Rajagopal Swaminathan <raju.rajsand at gmail.com>
> > To: linux clustering <linux-cluster at redhat.com>
> > Subject: Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster
> > Message-ID:
> >       <AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa+Jk-0JBu at mail.gmail.com<AANLkTik08h5urUqPYZ6azUWW2YGnsPdBbnOa%2BJk-0JBu at mail.gmail.com>
> >
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Greetings,
> >
> > On Fri, Dec 24, 2010 at 5:33 AM, Parvez Shaikh
> > <parvez.h.shaikh at gmail.com> wrote:
> >> Hi all,
> >>
> >> I manually made this virtual IP available on host and then started
> >> service it worked -
> >>
> >
> > Can you please elaborate? did you try to assign IP to the ethx devices
> > and then ping?
> >
> >> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
> >>
> >>
> >> My question is - Is it prerequisite for IP resource to be manually
> >> added before it can be protected via cluster?
> >>
> >
> > Every resource/service has to be added to the cluster.
> >
> > And they cannot be used by anything else.
> >
> > Regards,
> >
> > Rajagopal
> >
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Fri, 24 Dec 2010 14:11:53 +0530
> > From: Parvez Shaikh <parvez.h.shaikh at gmail.com>
> > To: linux clustering <linux-cluster at redhat.com>
> > Subject: Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster
> > Message-ID:
> >       <AANLkTimTfMFE7MvrQRLMOhpfKBRS+EDSF4OGwBgJu9-W at mail.gmail.com<AANLkTimTfMFE7MvrQRLMOhpfKBRS%2BEDSF4OGwBgJu9-W at mail.gmail.com>
> >
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Hi Rajagopal,
> >
> > Thank you for your response
> >
> > I have created a cluster configuration by adding IP resource with
> > value 192.168.25.153 (some value) and created a service which just has
> > IP resource part of it. I have set all requisite configuration such
> > two node, node names, failover,fencing etc.
> >
> > Upon trying to start then service(enable service),it failed -
> >
> > clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured
> >
> > Then manually added this IP to host
> >
> > ifconfig eth0:1 192.168.25.135
> >
> > Then service could start but it gave message -
> >
> > clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
> >
> > So do I have to add virtual interface manually (as above or any other
> > method?) before I could start service with IP resource under it?
> >
> > Thanks
> > Parvez
> >
> > On Fri, Dec 24, 2010 at 11:30 AM, Rajagopal Swaminathan
> > <raju.rajsand at gmail.com> wrote:
> >> Greetings,
> >>
> >> On Fri, Dec 24, 2010 at 5:33 AM, Parvez Shaikh
> >> <parvez.h.shaikh at gmail.com> wrote:
> >>> Hi all,
> >>>
> >>> I manually made this virtual IP available on host and then started
> >>> service it worked -
> >>>
> >>
> >> Can you please elaborate? did you try to assign IP to the ethx devices
> >> and then ping?
> >>
> >>> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
> >>>
> >>>
> >>> My question is - Is it prerequisite for IP resource to be manually
> >>> added before it can be protected via cluster?
> >>>
> >>
> >> Every resource/service has to be added to the cluster.
> >>
> >> And they cannot be used by anything else.
> >>
> >> Regards,
> >>
> >> Rajagopal
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> >
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Fri, 24 Dec 2010 13:46:05 +0100
> > From: Jakov Sosic <jakov.sosic at srce.hr>
> > To: linux clustering <linux-cluster at redhat.com>
> > Subject: Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster
> > Message-ID: <4D14960D.50706 at srce.hr>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > On 12/24/2010 09:41 AM, Parvez Shaikh wrote:
> >> Hi Rajagopal,
> >>
> >> Thank you for your response
> >>
> >> I have created a cluster configuration by adding IP resource with
> >> value 192.168.25.153 (some value) and created a service which just has
> >> IP resource part of it. I have set all requisite configuration such
> >> two node, node names, failover,fencing etc.
> >>
> >> Upon trying to start then service(enable service),it failed -
> >>
> >> clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured
> >>
> >> Then manually added this IP to host
> >>
> >> ifconfig eth0:1 192.168.25.135
> >>
> >> Then service could start but it gave message -
> >>
> >> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
> >>
> >> So do I have to add virtual interface manually (as above or any other
> >> method?) before I could start service with IP resource under it?
> >
> > How is your network configured? For an IP address to work in a cluster,
> > you have to have interfaces on both machines set up, which are in the
> > same subnet. For example:
> >
> > node1 # ifconfig eth0 192.168.25.11 netmask 255.255.255.0
> > node2 # ifconfig eth0 192.168.25.12 netmask 255.255.255.0
> >
> > Then and only then will cluster be able to bring up virtual ip address
> > and bind it as secondary on this interface. You can then see it with:
> >
> > # ip addr show
> >
> >
> > I guess you're trying to bring up IP address from network subnet that is
> > not in any way set up on your host. And that is a prerequisite with
> > classic IP resource.
> >
> >
> >
> >
> > --
> > Jakov Sosic
> > www.srce.hr
> >
> >
> >
> > ------------------------------
> >
> > Message: 5
> > Date: Fri, 24 Dec 2010 22:16:39 +0530
> > From: Parvez Shaikh <parvez.h.shaikh at gmail.com>
> > To: linux clustering <linux-cluster at redhat.com>
> > Subject: Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster
> > Message-ID:
> >       <AANLkTinp=BFk0ssVkDieOvXVaQqeRfMDQgPbbessA-Qc at mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Hi Jakov
> >
> > Thank you for your response. My two hosts have multiple network
> > interfaces or ethernet cards. I understood from your email, that the
> > IP corresponding to "cluster node name" for both hosts, should be in
> > the same subnet before a cluster could bring virtual IP up. I will
> > reconfirm if these are in same subnets.
> >
> > Gratefully yours,
> > Parvez
> >
> > On Fri, Dec 24, 2010 at 6:16 PM, Jakov Sosic <jakov.sosic at srce.hr>
> wrote:
> >> On 12/24/2010 09:41 AM, Parvez Shaikh wrote:
> >>> Hi Rajagopal,
> >>>
> >>> Thank you for your response
> >>>
> >>> I have created a cluster configuration by adding IP resource with
> >>> value 192.168.25.153 (some value) and created a service which just has
> >>> IP resource part of it. I have set all requisite configuration such
> >>> two node, node names, failover,fencing etc.
> >>>
> >>> Upon trying to start then service(enable service),it failed -
> >>>
> >>> clurgmgrd: [31853]: <debug> 192.168.25.135 is not configured
> >>>
> >>> Then manually added this IP to host
> >>>
> >>> ifconfig eth0:1 192.168.25.135
> >>>
> >>> Then service could start but it gave message -
> >>>
> >>> clurgmgrd: [31853]: <debug> 192.168.25.135 already configured
> >>>
> >>> So do I have to add virtual interface manually (as above or any other
> >>> method?) before I could start service with IP resource under it?
> >>
> >> How is your network configured? For an IP address to work in a cluster,
> >> you have to have interfaces on both machines set up, which are in the
> >> same subnet. For example:
> >>
> >> node1 # ifconfig eth0 192.168.25.11 netmask 255.255.255.0
> >> node2 # ifconfig eth0 192.168.25.12 netmask 255.255.255.0
> >>
> >> Then and only then will cluster be able to bring up virtual ip address
> >> and bind it as secondary on this interface. You can then see it with:
> >>
> >> # ip addr show
> >>
> >>
> >> I guess you're trying to bring up IP address from network subnet that is
> >> not in any way set up on your host. And that is a prerequisite with
> >> classic IP resource.
> >>
> >>
> >>
> >>
> >> --
> >> Jakov Sosic
> >> www.srce.hr
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> >
> >
> > ------------------------------
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > End of Linux-cluster Digest, Vol 80, Issue 23
> > *********************************************
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101227/4c6e3ee9/attachment.htm>

From linux-cluster at redhat.com  Tue Dec 28 06:36:47 2010
From: linux-cluster at redhat.com (Mailbot for etexusa.com)
Date: Mon, 27 Dec 2010 22:36:47 -0800
Subject: [Linux-cluster] DSN: failed (Hi)
Message-ID: <mAWtxIeCDcrRuZNrj02@etexusa.com>


This is a Delivery Status Notification (DSN).

I was unable to deliver your message to
hr at holista.in.

I said 
  (end of message)

And they gave me the error;
  552-5.7.0 Our system detected an illegal attachment on your message. Please
  552-5.7.0 visit http://mail.google.com/support/bin/answer.py?answer=6590 to
  552 5.7.0 review our attachment guidelines. w27si27262803wfh.2

 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/rfc822-headers
Size: 465 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101227/cdfa97ab/attachment.bin>

From jayesh.shinde at netcore.co.in  Wed Dec 29 05:17:04 2010
From: jayesh.shinde at netcore.co.in (jayesh.shinde)
Date: Wed, 29 Dec 2010 10:47:04 +0530
Subject: [Linux-cluster] suggestion require about fence_vmware
Message-ID: <4D1AC450.1080801@netcore.co.in>

Hi all ,

I am testing RHCS "Active - Passive " on RHEL 5.5 64 bit OS along with 
VMWARE ESX 4.0 for *mailing* application . I want suggestion and 
guidance that , what should be the best architecture and best practice 
for the below mention setup.

My architecture details are as follows  :--
------------------------------------
1) I have one VM called "node1"  under one physical VMWARE server.
2) second VM  called "node2" under second  physical VMWARE server.
3) Both the physical servers are connected to each other with switch 
along with NIC teaming  and fail over switch is also available.
4) For fencing I am using fencedevice agent as "*fence_vmware*"
5) one SAN partition with LVM configure.  User's mailbox data is inside 
this SAN partition.
6) File system is "EXT3"

My queries :---
=========
1) Is above architecture's  points from  1-6 are correct for 
Active-passive configuration with VMWARE. ?
2) While testing I observer that when I purposely stop the network 
service on active "node1" then "node2" fence the "node1" properly.
      But  "node1" getting fence    by "*poweroff*" .
    Since all service was running on "node1" and SAN partition was also 
mounted and suddenly "node1" get fence.

*So will this immediate poweroff cause the corruption of SAN ext3 file 
system and local HDD too* ?

3) If yes how to avoid this ?
4) Is it a correct way of fencing ?
5) Is this correct setup  for  production environment ?

I purposely stop network service on "node1"  because I want to test and 
know that , what will happen when network goes down on "node1" .
I also observe that the files which was open in VIM editor also got 
recover properly because of the Journaling feature of EXT3 FS.

Happy christmas.

Thanks & Regards
Jayesh Shinde


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101229/fd93af85/attachment.htm>

From stefan at lsd.co.za  Wed Dec 29 19:02:53 2010
From: stefan at lsd.co.za (Stefan Lesicnik)
Date: Wed, 29 Dec 2010 21:02:53 +0200 (SAST)
Subject: [Linux-cluster] Multiple communication channels
In-Reply-To: <2119444257.0.1293649365513.JavaMail.root@zimbra>
Message-ID: <1218306684.1.1293649373746.JavaMail.root@zimbra>

Hi all,

I am running RHCS 5 and have a two node cluster with a shared qdisk. I have a bonded network bond0 and a back to back crossover eth1.

Currently I have multicast cluster communication over the crossover, but was wondering if it was possible to use bond0 as an alternative / failover. So if eth1 was down, it could still communicate?

I havent been able to find anything in the FAQ / documentation that would suggest this, so I thought I would ask.

Thanks alot and I hope everyone has a great new year :)

Stefan 



From linux at alteeve.com  Wed Dec 29 19:33:45 2010
From: linux at alteeve.com (Digimer)
Date: Wed, 29 Dec 2010 14:33:45 -0500
Subject: [Linux-cluster] Multiple communication channels
In-Reply-To: <1218306684.1.1293649373746.JavaMail.root@zimbra>
References: <1218306684.1.1293649373746.JavaMail.root@zimbra>
Message-ID: <4D1B8D19.1040805@alteeve.com>

On 12/29/2010 02:02 PM, Stefan Lesicnik wrote:
> Hi all,
> 
> I am running RHCS 5 and have a two node cluster with a shared qdisk. I have a bonded network bond0 and a back to back crossover eth1.
> 
> Currently I have multicast cluster communication over the crossover, but was wondering if it was possible to use bond0 as an alternative / failover. So if eth1 was down, it could still communicate?
> 
> I havent been able to find anything in the FAQ / documentation that would suggest this, so I thought I would ask.
> 
> Thanks alot and I hope everyone has a great new year :)
> 
> Stefan 

From: http://wiki.alteeve.com/index.php/Openais.conf

------------------------------------
	### Below here are the 'interface' directive(s).

	# At least one 'interface' directive is required within the 'totem'
	# directive. When two are specified, the one with 'ringnumber' of '0'
	# is the primary ring and the second with 'ringnumber' of '1' is the
	# backup ring.
	interface {
		# Increment the ring number for each 'interface' directive.
		ringnumber:  0

		# This must match the subnet of this interface. The final octal
		# must be '0'. In this case, this directive will bind to the
		# interface on the 192.168.1.0/24 subnet, so this should be set
		# to '192.168.1.0'. This can be an IPv6 address, however, you
		# will be required to set the 'nodeid' in the 'totem' directive
		# above. Further, there will be no automatic interface
		# selection within a specified subnet as there is with IPv4.
		# In this case, the primary ring will be on the interface with
		# IPs on the 10.0.0.0/24 network (ie: eth1).
		bindnetaddr: 10.0.0.0

		# This is the multicast address used by OpenAIS. Avoid the
		# '224.0.0.0/8' range as that is used for configuration. If you
		# use an IPv6 address, be sure to specify a 'nodeid' in the
		# 'totem' directive above.
		mcastaddr:   226.94.1.1

		# This is the UDP port used with the multicast address above.
		mcastport:   5405
	}

	# This is a second optional, redundant interface directive. If you use
	# two 'interface' directives, be sure to review the four 'rrp_*'
	# variables.
	# Note that two is the maximum number of interface directives.
	interface {
		# Increment the ring number for each 'interface' directive.
		ringnumber:  1
		# In this case, the backup ring will be on the interface with
		# IPs on the 192.168.1.0/24 network (ie: eth0).
		bindnetaddr: 192.168.1.0
		# MADI: Does this have to be different? How much different?
		#       Can I just use a different port?
		mcastaddr:   227.94.1.1
		# MADI: If this is different, can 'mcastaddr' be the same?
		mcastport:   5406
	}
------------------------------------

Hope this helps. :)

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org



From linux at alteeve.com  Wed Dec 29 19:46:52 2010
From: linux at alteeve.com (Digimer)
Date: Wed, 29 Dec 2010 14:46:52 -0500
Subject: [Linux-cluster] Multiple communication channels
In-Reply-To: <4D1B8D19.1040805@alteeve.com>
References: <1218306684.1.1293649373746.JavaMail.root@zimbra>
	<4D1B8D19.1040805@alteeve.com>
Message-ID: <4D1B902C.8080009@alteeve.com>

On 12/29/2010 02:33 PM, Digimer wrote:
> On 12/29/2010 02:02 PM, Stefan Lesicnik wrote:
>> Hi all,
>>
>> I am running RHCS 5 and have a two node cluster with a shared qdisk. I have a bonded network bond0 and a back to back crossover eth1.
>>
>> Currently I have multicast cluster communication over the crossover, but was wondering if it was possible to use bond0 as an alternative / failover. So if eth1 was down, it could still communicate?
>>
>> I havent been able to find anything in the FAQ / documentation that would suggest this, so I thought I would ask.
>>
>> Thanks alot and I hope everyone has a great new year :)
>>
>> Stefan 
> 
> From: http://wiki.alteeve.com/index.php/Openais.conf
> 

I forgot to mention that there are the redundant ring options as well.

------------------------------------
	### Redundant Ring Protocol options are below. These are ignored if
	### only one 'interface' directive is defined.

	# This is used to control how the Redundant Ring Protocol is used. If
	# you only have one 'interface' directive, the default is 'none'. If
	# you have two, then please set 'active' or 'passive'. The trade off
	# is that, when the network is degraded, 'active' provides lower
	# latency from transmit to delivery and 'passive' may nearly double the
	# speed of the totem protocol when not CPU bound.
	# Valid options: none, active, passive.
	rrp_mode: passive

	# The next three variables are relevant depending on which mode
	# 'rrp_mode' is set to. Both modes use 'rrp_problem_count_threshold'
	# but only 'active' uses 'rrp_problem_count_timeout' and
	# 'rrp_token_expired_timeout'.
	#
	# - In 'active' mode:
	# If a token doesn't arrive in 'rrp_token_expired_timeout' milliseconds
	# an internal counter called 'problem_count' is incremented by 1. If a
	# token arrives within 'rrp_problem_count_timeout' however, the
	# internal decreases by '1'. If the internal counter equals or exceeds
	# the 'rrp_problem_count_threshold' at any time, the effected interface
	# will be flagged as faulty and it will no longer be used.
	#
	# - In 'passive' mode:
	# The two interfaces have internal counters called 'token_recv_count'
	# and 'mcast_recv_count' that are incremented by 1 each time a token
	# or multicast message is received, respectively. These counts for each
	# interface is counted and if the counts should differ by more than
	# 'rrp_problem_count_threshold', then the interface with the lower
	# count is flagged as faulty and it will no longer be used.
	#
	# If an interface is flagged as faulty, an administrator will need to
	# manually re-enable it.

	# The default problem count timeout is '1000' milliseconds.
	rrp_problem_count_timeout: 1000

	# The default problem count threshold is '20'.
	rrp_problem_count_threshold: 20

	# This is the time in milliseconds to wait before incrementing the
	# internal problem counter. Normally, this variable is automatically
	# calculated by openais and, thus, should not be defined here without
	# fully understanding the effects of doing so.
	#
	# In short; The should always be at least 'rrp_problem_count_timeout'
	# minus 50 milliseconds with the result being divided by
	# 'rrp_problem_count_threshold' or else a reconfiguration can occur.
	# Using the default values then, the default is (1000 - 50)/20=47.5,
	# rounded down to '47'.
	#rrp_token_expired_timeout: 47
------------------------------------

Cheers

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org



From kitgerrits at gmail.com  Wed Dec 29 19:49:26 2010
From: kitgerrits at gmail.com (Kit Gerrits)
Date: Wed, 29 Dec 2010 20:49:26 +0100
Subject: [Linux-cluster] Multiple communication channels
In-Reply-To: <1218306684.1.1293649373746.JavaMail.root@zimbra>
Message-ID: <4d1b90ff.857a0e0a.45e5.ffff8b97@mx.google.com>

Hello,
 
AFAIK, Multi-interface heartbeat is something that was only recently added
to RHCS (earlier this year, if I recall correctly).
 
Until then, the failover part was usually achieved by using a bonded
interface as heartbeat interface.
If possible, I would suggest using 2 (connected) Multicast switches and
running a bond from each server to each switch.
Or 2 regular switches and broadcast heartbeat (switches only connected to
eachother)
Otherwise, using an active-active bond (channel?) with 2 crossover cables
might also work, but offers less protection against interface failures.
 
 
Regards,
 
Kit


  _____  

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Stefan Lesicnik
Sent: woensdag 29 december 2010 20:03
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Multiple communication channels



Hi all,

I am running RHCS 5 and have a two node cluster with a shared qdisk. I have
a bonded network bond0 and a back to back crossover eth1.

Currently I have multicast cluster communication over the crossover, but was
wondering if it was possible to use bond0 as an alternative / failover. So
if eth1 was down, it could still communicate?

I havent been able to find anything in the FAQ / documentation that would
suggest this, so I thought I would ask.

Thanks alot and I hope everyone has a great new year :)

Stefan

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster 

  _____  

No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1191 / Virus Database: 1435/3346 - Release Date: 12/29/10

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20101229/ab916e94/attachment.htm>

From linux at alteeve.com  Wed Dec 29 19:57:53 2010
From: linux at alteeve.com (Digimer)
Date: Wed, 29 Dec 2010 14:57:53 -0500
Subject: [Linux-cluster] Multiple communication channels
In-Reply-To: <4d1b90ff.857a0e0a.45e5.ffff8b97@mx.google.com>
References: <4d1b90ff.857a0e0a.45e5.ffff8b97@mx.google.com>
Message-ID: <4D1B92C1.2030900@alteeve.com>

On 12/29/2010 02:49 PM, Kit Gerrits wrote:
> Hello,
>  
> AFAIK, Multi-interface heartbeat is something that was only recently
> added to RHCS (earlier this year, if I recall correctly).
>  
> Until then, the failover part was usually achieved by using a bonded
> interface as heartbeat interface.
> If possible, I would suggest using 2 (connected) Multicast switches and
> running a bond from each server to each switch.
> Or 2 regular switches and broadcast heartbeat (switches only connected
> to eachother)
> Otherwise, using an active-active bond (channel?) with 2 crossover
> cables might also work, but offers less protection against interface
> failures.
>  
>  
> Regards,
>  
> Kit

Hi,

  It was around in el5. Perhaps not in the early versions, I am not sure
exactly when it was added, but certainly by 5.4.

  In the recent 3.x branch, openais was replaced by corosync (for core
cluster communications), which is where rrp is controlled.

  Of course, I could always be wrong. :)

Cheers.

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org



From cos at aaaaa.org  Wed Dec 29 20:28:36 2010
From: cos at aaaaa.org (Ofer Inbar)
Date: Wed, 29 Dec 2010 15:28:36 -0500
Subject: [Linux-cluster] question ccsd for multiple clusters on same subnet
Message-ID: <20101229202836.GX934@mip.aaaaa.org>

CentOS 5.3, cman-2.0.115-34.el5_5.3.

Working in a test environment with VMware, setting up test clusters,
so we're not setting up separate VLANs for each cluster (though we
will do that outside this initial test environment).

I accidentally started cman on a new node for a new cluster before I
copied the cluster.conf file I wanted to use into /etc/cluster, and
to my surprise, it picked up a cluster.conf from an older cluster
that's already up and running.  In /var/log/messages, I see:

ccsd[4475]: Unable to parse /etc/cluster/cluster.conf 
ccsd[4475]: Searching cluster for valid copy. 
ccsd[4475]: Remote copy of cluster.conf (version = 15) found. 
ccsd[4475]: Remote copy of cluster.conf is from quorate node. 

This let me to notice the documention of the -P option in ccsd's
man page.  It seems that each ccsd listens on three default ports,
and uses broadcast for some things, such that ccsd's from separate
clusters might potentially see and talk to each other if they don't
have separate VLANs.  I assume this is the reason my new host picked
up a cluster.conf from an unrelated cluster.

However, the documentation is vague and uninformative.  I don't really
know what ccsd uses each port for.  It doesn't even say what the
defaults are (though I can see from lsof that they're 50006,7,8, and
I could experiment to figure out which port was which).  I also don't
know whether there's any standard practice about how to run with
non-default ports (it does look like /etc/init.d/cman looks for an
environment variable named CCSD_OPTS).

If I properly initialize my clusters by copying the right cluster.conf
into place before I first start cman, I won't encounter the specific
problem I had here.  However, does that make it okay, or will I run
into other problems running multiple clusters with ccsd's using the
same ports on the same subnet?

Where can I find documentation about this, aimed at sysadmins?
  -- Cos