From jayesh.shinde at netcore.co.in  Mon Apr  2 15:01:54 2012
From: jayesh.shinde at netcore.co.in (jayesh.shinde)
Date: Mon, 02 Apr 2012 20:31:54 +0530
Subject: [Linux-cluster] 2 node cluster network query
Message-ID: <4F79BF62.2000505@netcore.co.in>

Hi all ,

I am using the 2 node cluster with drbd.

For sending the cluster packets & drbd packet I have connected the 3 
cross cable between 2 server.

The IP and config is as follows :--

      service                    mailbox1              mailbox2  
             Interface
---------------------------------------------------------------------------
1)  drbd res0                10.10.10.10/16     10.10.10.20/16       eth1
2)  drbd res1                10.10.10.30/16     10.10.10.40/16       eth2
3)  Cluster Packet        10.10.20.1/16       10.10.20.1/16         eth5


cat  /etc/hosts

10.10.20.1            mailbox1
10.10.20.2            mailbox2


################# part of cluster.conf  ########################
<clusternode name="mailbox1" nodeid="1" votes="1">
<multicast addr="224.0.0.1" interface="eth5"/>

<clusternode name="mailbox2" nodeid="2" votes="1">
<multicast addr="224.0.0.1" interface="eth5"/>
################# part of cluster.conf  ########################


Cluster log :--
----------------
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ] CLM CONFIGURATION CHANGE
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ] New Configuration:
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ]    r(0) ip(10.10.10.20)
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ] Members Left:
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ] Members Joined:
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ] CLM CONFIGURATION CHANGE
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ] New Configuration:
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ]    r(0) ip(10.10.10.10)
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ]    r(0) ip(10.10.10.20)
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ] Members Left:
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ] Members Joined:
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ]    r(0) ip(10.10.10.10)
Apr  1 00:45:39 mailbox2 openais[4228]: [SYNC ] This node is within the 
primary component and will provide service.
Apr  1 00:45:39 mailbox2 openais[4228]: [TOTEM] entering OPERATIONAL state.
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ] got nodejoin message 
10.10.10.10
Apr  1 00:45:39 mailbox2 openais[4228]: [CLM  ] got nodejoin message 
10.10.10.20


  route -n :---
Kernel IP routing table
Destination     Gateway         Genmask           Flags  Metric Ref    
Use Iface
192.168.1.0     0.0.0.0             255.255.255.0   U       0          
0        0 bond0
10.10.0.0         0.0.0.0             255.255.0.0       U       0       
    0        0 eth1
10.10.0.0        0.0.0.0             255.255.0.0        U       
0          0        0 eth2
10.10.0.0        0.0.0.0             255.255.0.0        U       0       
    0        0 eth5
69.254.0.0     0.0.0.0             255.255.0.0        U       0          
0        0 eth5
0.0.0.0          192.168.1.13    0.0.0.0                UG     
0          0        0 bond0


*Question :-- *

While starting the cluster , the log are saying that the "*got nodejoin 
message 10.10.10.10 &   10.10.10.20*"
where as the /etc/hosts are define with 10.10.20.1 <--->  mailbox1 ,   
10.10.20.2 <----->mailbox2

1) Why the cluster is not taking the IP 10.10.20.1 & 10.10.20.2  ( i.e 
cross cable IPs. )  ?
2) Is it because my all cross cable IPs are in 10.10.x.x  series , and 
cluster is finding the nearest IP ( 10.10.10.10 & 10.10.10.20) for 
communication ?
3) Do I need to take 172.16.x.x  or some other IP series ?


Regards
Jayesh Shinde
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120402/84a6474e/attachment.htm>

From jonathan.barber at gmail.com  Mon Apr  2 17:00:47 2012
From: jonathan.barber at gmail.com (Jonathan Barber)
Date: Mon, 2 Apr 2012 18:00:47 +0100
Subject: [Linux-cluster] Where to find information on HA-LVM
In-Reply-To: <036B68E61A28CA49AC2767596576CD597578CF33D0@GVW1113EXC.americas.hpqcorp.net>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
	<036B68E61A28CA49AC2767596576CD597578CF33D0@GVW1113EXC.americas.hpqcorp.net>
Message-ID: <CAPEiEj5y=aBRY53aTiPRx=EFHeb9qUgPwOd58qfjziHtoZhEkw@mail.gmail.com>

On 28 March 2012 06:51, Jankowski, Chris <Chris.Jankowski at hp.com> wrote:
> Ming,
>

<snip>

> Could I ask you to publish the list of most relevant information of HA-LVM
> that you?d find on this list, please? ?We?ll all benefit.

I found the following to be informative when I first configured HA LVM:
https://fedorahosted.org/cluster/wiki/LVMFailover#HALVM

Reading the LVM resource scripts was also useful:
/usr/share/cluster/lvm.sh
/usr/share/cluster/lvm_by_lv.sh
/usr/share/cluster/lvm_by_vg.sh

Cheers

> Chris Jankowski
-- 
Jonathan Barber <jonathan.barber at gmail.com>



From ajb2 at mssl.ucl.ac.uk  Tue Apr  3 13:14:09 2012
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Tue, 03 Apr 2012 14:14:09 +0100
Subject: [Linux-cluster] caching of san devices....
Message-ID: <4F7AF7A1.3000701@mssl.ucl.ac.uk>


Real Dumb Question[tm] time....

Has anyone tried putting bcache/flashcache in front of shared storage in 
a GFS2 cluster (on each node, of course)

Did it work?

Should it work?

Is it safe?

Are there ways of making it safe?

Am I mad for thinking about it?

Rationale:

Spinning disks are slow to seek, large arrays even more so.

As soon as there's a significant load on our GFS2 cluster the random io 
limitations of the SAN hardware become the single most important factor 
limiting performance.

Only "so much" ram can be installed in any hardware to increase page and 
dentry caching before physical limits are hit.

SSD SAN arrays are hideously expensive and can't always be justified to 
"the powers that be".

Universities are always tightly funded, but there are many other 
entities facing similar problems.





From swhiteho at redhat.com  Tue Apr  3 13:28:47 2012
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 03 Apr 2012 14:28:47 +0100
Subject: [Linux-cluster] caching of san devices....
In-Reply-To: <4F7AF7A1.3000701@mssl.ucl.ac.uk>
References: <4F7AF7A1.3000701@mssl.ucl.ac.uk>
Message-ID: <1333459727.2702.21.camel@menhir>

Hi,

On Tue, 2012-04-03 at 14:14 +0100, Alan Brown wrote:
> Real Dumb Question[tm] time....
> 
> Has anyone tried putting bcache/flashcache in front of shared storage in 
> a GFS2 cluster (on each node, of course)
> 
> Did it work?
> 
> Should it work?
> 
> Is it safe?
> 
> Are there ways of making it safe?
> 
> Am I mad for thinking about it?
> 
> Rationale:
> 
> Spinning disks are slow to seek, large arrays even more so.
> 
Large arrays should be much faster, provided the data is in cache.

> As soon as there's a significant load on our GFS2 cluster the random io 
> limitations of the SAN hardware become the single most important factor 
> limiting performance.
> 
> Only "so much" ram can be installed in any hardware to increase page and 
> dentry caching before physical limits are hit.
> 
> SSD SAN arrays are hideously expensive and can't always be justified to 
> "the powers that be".
> 
> Universities are always tightly funded, but there are many other 
> entities facing similar problems.
> 
I can't see any mention that bcache supports clusters at all. I don't
think that it is likely to work. Certainly the web page I found suggests
that it doesn't support barriers (silently dropped) but I'm not sure
whether that refers to "real" barriers or the flush based system that we
use now. I'd be very surprised if that would work.

What do you mean by flashcache? This perhaps:

http://www.netapp.com/uk/products/storage-systems/flash-cache/

It looks like a hardware implementation of the same thing, and I can't
see anything to suggest that it is cluster aware on a first reading of
the docs,

Steve.

> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From ajb2 at mssl.ucl.ac.uk  Tue Apr  3 14:55:15 2012
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Tue, 03 Apr 2012 15:55:15 +0100
Subject: [Linux-cluster] caching of san devices....
In-Reply-To: <1333459727.2702.21.camel@menhir>
References: <4F7AF7A1.3000701@mssl.ucl.ac.uk> <1333459727.2702.21.camel@menhir>
Message-ID: <4F7B0F53.30402@mssl.ucl.ac.uk>

On 03/04/12 14:28, Steven Whitehouse wrote:

>> Spinning disks are slow to seek, large arrays even more so.
>>
> Large arrays should be much faster, provided the data is in cache.

Or not, when there's a lot of random IO involved and it's not in cache.

I'm talking about arrays such as nexsan atabeasts (a drawer full of sata 
drives)

> I can't see any mention that bcache supports clusters at all. I don't
> think that it is likely to work. Certainly the web page I found suggests
> that it doesn't support barriers (silently dropped)

It doesn't and there are specific warnings to disable barriers on ext4 
and friends when using it.

Bcache is writethrough by default. Writeback can be enabled but is beta 
quality and I think it would conflict badly with clustered filesystems.

> What do you mean by flashcache? This perhaps:

Facebook's caching implementation which is almost like bcache but much 
simpler in its implementation.

> http://www.netapp.com/uk/products/storage-systems/flash-cache/
>
> It looks like a hardware implementation of the same thing, and I can't
> see anything to suggest that it is cluster aware on a first reading of
> the docs,

There are a few SAN-level accelerators but the cost of those things 
starts around $20,000 and climbs from there.





From florian at hastexo.com  Tue Apr  3 18:15:15 2012
From: florian at hastexo.com (Florian Haas)
Date: Tue, 3 Apr 2012 20:15:15 +0200
Subject: [Linux-cluster] caching of san devices....
In-Reply-To: <4F7AF7A1.3000701@mssl.ucl.ac.uk>
References: <4F7AF7A1.3000701@mssl.ucl.ac.uk>
Message-ID: <CAPUexz9ob0HFwE826tupCWQ_eY8p5SHm_m4RKTGoAFv7Tgb0VA@mail.gmail.com>

On Tue, Apr 3, 2012 at 3:14 PM, Alan Brown <ajb2 at mssl.ucl.ac.uk> wrote:
>
> Real Dumb Question[tm] time....
>
> Has anyone tried putting bcache/flashcache in front of shared storage in a
> GFS2 cluster (on each node, of course)

I can't talk about bcache but have worked with flashcache a bit, and
there's a presentation of mine on how to use it in clustering at
http://www.hastexo.com/resources/presentations/storage-replication-high-performance-high-availability-environments
(which is all about Pacemaker, though).

But for GFS2 specifically:

> Did it work?

It won't.

> Should it work?

No.

> Is it safe?

No. There's no cluster awareness the way you envision it, and there's
no way to do multi-master replication of the flashcache cache device,
which you would need.

> Are there ways of making it safe?

Implement the above, and it might be. (You don't want to.)

> Am I mad for thinking about it?

Ahum, well, now that you mention it... ;)

> Rationale:
>
> Spinning disks are slow to seek, large arrays even more so.
>
> As soon as there's a significant load on our GFS2 cluster the random io
> limitations of the SAN hardware become the single most important factor
> limiting performance.
>
> Only "so much" ram can be installed in any hardware to increase page and
> dentry caching before physical limits are hit.
>
> SSD SAN arrays are hideously expensive and can't always be justified to "the
> powers that be".
>
> Universities are always tightly funded, but there are many other entities
> facing similar problems.

I think you've got two possibilities:

1. Stick SSD based caching into your SAN. Google for CacheCade or
MaxCache for some vendor implementations.
2. Consider ditching your GFS2 for SSD based GlusterFS replication.

I realize option 2 may get me booed off the list, and I know nothing
about your requirements other than what you posted here, but if you
just want something that is writable from all nodes and frees you from
your SAN, then that might be a possibility.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now



From jstoner at opsource.net  Tue Apr  3 19:58:05 2012
From: jstoner at opsource.net (Jeff Stoner)
Date: Tue, 3 Apr 2012 15:58:05 -0400
Subject: [Linux-cluster] Documentation about the fence agent interface
	to RHCS?
In-Reply-To: <4F75F7A9.9030608@alteeve.ca>
References: <CAPEiEj5Uo9iAGygbnP_b7LN9Q=mxkT_-RFKB7gwSdXTgM8+ZFg@mail.gmail.com><4F75F7A9.9030608@alteeve.ca>
Message-ID: <CADYQ593NicpYCGgdUE6j_-f3RdLEjMSA18wxFLsL09VC8aQ3uw@mail.gmail.com>

Any luci devs on the list? I'm looking for info on integrating fencing
agents into luci. Once the Powers That Be allow me to release our fencing
agent, I'd like to take a stab at making it easier use via luci.

On Fri, Mar 30, 2012 at 2:12 PM, Digimer <lists at alteeve.ca> wrote:

> On 03/30/2012 10:00 AM, Jonathan Barber wrote:
> > I'm writing a fencing agent and would like to know is there a document
> > describing the interface that fencing agents should support. i.e. how
> > are arguments passed to the fence agent, what exit codes represent, is
> > anything done with the agents standard out/error.
> >
> > I've looked at the agents that ship with RHCS and have some idea
> > what's going on, but it'd be nice to have to documentation to confirm
> > my suspicions.
> >
> > Cheers
>
> Check out this: https://fedorahosted.org/cluster/wiki/FenceAgentAPI
>
> If you need help/clarity, let me know.
>
> --
> Digimer
> Papers and Projects: https://alteeve.com
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
*Jeff Stoner ******| **Cloud Evangelist*
O +1-703-668-1920 | M +1-703-475-7720 | E jstoner at opsource.net
Op*Source, Inc. **|* www.opsource.net | Twitter
@opsource.net<http://www.twitter.com/opsource>

Red Hat Certified Engineer (cert number 805009770342158)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120403/4c1d417b/attachment.htm>

From lists at alteeve.ca  Tue Apr  3 23:00:27 2012
From: lists at alteeve.ca (Digimer)
Date: Tue, 03 Apr 2012 16:00:27 -0700
Subject: [Linux-cluster] Documentation about the fence agent interface
 to RHCS?
In-Reply-To: <CADYQ593NicpYCGgdUE6j_-f3RdLEjMSA18wxFLsL09VC8aQ3uw@mail.gmail.com>
References: <CAPEiEj5Uo9iAGygbnP_b7LN9Q=mxkT_-RFKB7gwSdXTgM8+ZFg@mail.gmail.com><4F75F7A9.9030608@alteeve.ca>
	<CADYQ593NicpYCGgdUE6j_-f3RdLEjMSA18wxFLsL09VC8aQ3uw@mail.gmail.com>
Message-ID: <4F7B810B.7070608@alteeve.ca>

On 04/03/2012 12:58 PM, Jeff Stoner wrote:
> Any luci devs on the list? I'm looking for info on integrating fencing
> agents into luci. Once the Powers That Be allow me to release our
> fencing agent, I'd like to take a stab at making it easier use via luci.

Baring a correction from someone more in the know...

I believe that, so long as your fence agent outputs it's metadata
properly, then luci should use it. The trick is getting it added to the
fence-agents RPM, which I can help with.

-- 
Digimer
Papers and Projects: https://alteeve.com



From fdinitto at redhat.com  Wed Apr  4 04:46:16 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 04 Apr 2012 06:46:16 +0200
Subject: [Linux-cluster] Documentation about the fence agent interface
 to RHCS?
In-Reply-To: <CADYQ593NicpYCGgdUE6j_-f3RdLEjMSA18wxFLsL09VC8aQ3uw@mail.gmail.com>
References: <CAPEiEj5Uo9iAGygbnP_b7LN9Q=mxkT_-RFKB7gwSdXTgM8+ZFg@mail.gmail.com><4F75F7A9.9030608@alteeve.ca>
	<CADYQ593NicpYCGgdUE6j_-f3RdLEjMSA18wxFLsL09VC8aQ3uw@mail.gmail.com>
Message-ID: <4F7BD218.60109@redhat.com>

On 04/03/2012 09:58 PM, Jeff Stoner wrote:
> Any luci devs on the list? I'm looking for info on integrating fencing
> agents into luci. Once the Powers That Be allow me to release our
> fencing agent, I'd like to take a stab at making it easier use via luci.

Let's get the agent upstream first and in good shape (license, metadata
output, man pages and all of that), then adding it to luci is "simple".

Fabio

> 
> On Fri, Mar 30, 2012 at 2:12 PM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
> 
>     On 03/30/2012 10:00 AM, Jonathan Barber wrote:
>     > I'm writing a fencing agent and would like to know is there a document
>     > describing the interface that fencing agents should support. i.e. how
>     > are arguments passed to the fence agent, what exit codes represent, is
>     > anything done with the agents standard out/error.
>     >
>     > I've looked at the agents that ship with RHCS and have some idea
>     > what's going on, but it'd be nice to have to documentation to confirm
>     > my suspicions.
>     >
>     > Cheers
> 
>     Check out this: https://fedorahosted.org/cluster/wiki/FenceAgentAPI
> 
>     If you need help/clarity, let me know.
> 
>     --
>     Digimer
>     Papers and Projects: https://alteeve.com
> 
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> 
> -- 
> *Jeff Stoner ***********| ****Cloud Evangelist**
> O +1-703-668-1920 | M +1-703-475-7720 | E jstoner at opsource.net
> <mailto:jstoner at opsource.net>
> Op**Source, Inc. ****|** www.opsource.net
> <http://www.opsource.net/> | Twitter @opsource.net
> <http://www.twitter.com/opsource>
> 
> Red Hat Certified Engineer (cert number 805009770342158)
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From lists at alteeve.ca  Wed Apr  4 04:58:04 2012
From: lists at alteeve.ca (Digimer)
Date: Tue, 03 Apr 2012 21:58:04 -0700
Subject: [Linux-cluster] Documentation about the fence agent interface
 to RHCS?
In-Reply-To: <4F7BD218.60109@redhat.com>
References: <CAPEiEj5Uo9iAGygbnP_b7LN9Q=mxkT_-RFKB7gwSdXTgM8+ZFg@mail.gmail.com><4F75F7A9.9030608@alteeve.ca>
	<CADYQ593NicpYCGgdUE6j_-f3RdLEjMSA18wxFLsL09VC8aQ3uw@mail.gmail.com>
	<4F7BD218.60109@redhat.com>
Message-ID: <4F7BD4DC.2090508@alteeve.ca>

Exactly the person I was hoping would chime in. :)

On 04/03/2012 09:46 PM, Fabio M. Di Nitto wrote:
> On 04/03/2012 09:58 PM, Jeff Stoner wrote:
>> Any luci devs on the list? I'm looking for info on integrating fencing
>> agents into luci. Once the Powers That Be allow me to release our
>> fencing agent, I'd like to take a stab at making it easier use via luci.
> 
> Let's get the agent upstream first and in good shape (license, metadata
> output, man pages and all of that), then adding it to luci is "simple".
> 
> Fabio
> 
>>
>> On Fri, Mar 30, 2012 at 2:12 PM, Digimer <lists at alteeve.ca
>> <mailto:lists at alteeve.ca>> wrote:
>>
>>     On 03/30/2012 10:00 AM, Jonathan Barber wrote:
>>     > I'm writing a fencing agent and would like to know is there a document
>>     > describing the interface that fencing agents should support. i.e. how
>>     > are arguments passed to the fence agent, what exit codes represent, is
>>     > anything done with the agents standard out/error.
>>     >
>>     > I've looked at the agents that ship with RHCS and have some idea
>>     > what's going on, but it'd be nice to have to documentation to confirm
>>     > my suspicions.
>>     >
>>     > Cheers
>>
>>     Check out this: https://fedorahosted.org/cluster/wiki/FenceAgentAPI
>>
>>     If you need help/clarity, let me know.
>>
>>     --
>>     Digimer
>>     Papers and Projects: https://alteeve.com
>>
>>     --
>>     Linux-cluster mailing list
>>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>     https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>>
>> -- 
>> *Jeff Stoner ***********| ****Cloud Evangelist**
>> O +1-703-668-1920 | M +1-703-475-7720 | E jstoner at opsource.net
>> <mailto:jstoner at opsource.net>
>> Op**Source, Inc. ****|** www.opsource.net
>> <http://www.opsource.net/> | Twitter @opsource.net
>> <http://www.twitter.com/opsource>
>>
>> Red Hat Certified Engineer (cert number 805009770342158)
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Digimer
Papers and Projects: https://alteeve.com



From parvez.h.shaikh at gmail.com  Wed Apr  4 05:41:37 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 4 Apr 2012 11:11:37 +0530
Subject: [Linux-cluster] Multicast address by CMAN
Message-ID: <CAKrd5335zTWEQ8_+cUgG1cRdDadm4qwPxLMHCV3CaQWY_MrS+Q@mail.gmail.com>

Hi all,

As per my understanding, CMAN uses cluster name to internally generate
multi-cast address. In my cluster.conf

Having a cluster with same name in a given network leads to issue and is
undesirable.

I want to know is there anyway to find if multicast address is already in
use by some other cluster, so as to avoid using name that generate same
multicast IP or for that matter configuring same multicast IP in
cluster.conf

Thanks,
Parvez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120404/352a5201/attachment.htm>

From fdinitto at redhat.com  Wed Apr  4 07:02:23 2012
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 04 Apr 2012 09:02:23 +0200
Subject: [Linux-cluster] Multicast address by CMAN
In-Reply-To: <CAKrd5335zTWEQ8_+cUgG1cRdDadm4qwPxLMHCV3CaQWY_MrS+Q@mail.gmail.com>
References: <CAKrd5335zTWEQ8_+cUgG1cRdDadm4qwPxLMHCV3CaQWY_MrS+Q@mail.gmail.com>
Message-ID: <4F7BF1FF.4040203@redhat.com>

On 4/4/2012 7:41 AM, Parvez Shaikh wrote:
> Hi all,
> 
> As per my understanding, CMAN uses cluster name to internally generate
> multi-cast address. In my cluster.conf
> 
> Having a cluster with same name in a given network leads to issue and is
> undesirable.
> 
> I want to know is there anyway to find if multicast address is already
> in use by some other cluster, so as to avoid using name that generate
> same multicast IP or for that matter configuring same multicast IP in
> cluster.conf

cman_tool status

will show the multicast address in use by a given cluster.

Fabio



From emi2fast at gmail.com  Wed Apr  4 07:11:58 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Wed, 4 Apr 2012 09:11:58 +0200
Subject: [Linux-cluster] Multicast address by CMAN
In-Reply-To: <CAKrd5335zTWEQ8_+cUgG1cRdDadm4qwPxLMHCV3CaQWY_MrS+Q@mail.gmail.com>
References: <CAKrd5335zTWEQ8_+cUgG1cRdDadm4qwPxLMHCV3CaQWY_MrS+Q@mail.gmail.com>
Message-ID: <CAE7pJ3Ag96+ARE0dz82pq+GCB5gVpQYM4f0pm0bPn-1PgNNp6A@mail.gmail.com>

One simple way it's

netstat -gn in a diferent cluster

Il giorno 04 aprile 2012 07:41, Parvez Shaikh
<parvez.h.shaikh at gmail.com>ha scritto:

> Hi all,
>
> As per my understanding, CMAN uses cluster name to internally generate
> multi-cast address. In my cluster.conf
>
> Having a cluster with same name in a given network leads to issue and is
> undesirable.
>
> I want to know is there anyway to find if multicast address is already in
> use by some other cluster, so as to avoid using name that generate same
> multicast IP or for that matter configuring same multicast IP in
> cluster.conf
>
> Thanks,
> Parvez
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120404/3fb6d2b3/attachment.htm>

From schlegel at riege.com  Wed Apr  4 12:25:53 2012
From: schlegel at riege.com (Gunther Schlegel)
Date: Wed, 04 Apr 2012 14:25:53 +0200
Subject: [Linux-cluster] qdiskd in heuristics mode only?
Message-ID: <4F7C3DD1.5060208@riege.com>

Hi,

is there any way to prevent fencing if the qdisk quorum partition can't 
be accessed? (yes, that does make sense!)

Scenario is like this:

- 2 node cluster, RHEL6.2, internal data storage (mysql 
multi-master-replication, no GFS involved)
- qdiskd is in place for two reasons:
   1) I need to run some heuristics
   2) to gather quorum if only one node starts up.
- quorum partition is on iSCSI SAN
- SAN storage is not required for the cluster services to operate at all 
(left aside it should work at node startup. But if the iscsi link goes 
down later on there is no need to actually fence a node as long as the 
network cluster communication between these two nodes is fine).

SAN firmware upgrades interrupt the iSCSI storage for about 40 seconds 
(multipathing et al is properly set up and working fine, SAN controller 
failover just takes that long). To mitigate that I need to set quite big 
totem consensus timeouts. Do not like that, but ok.

But the qdiskd keeps on fencing the nodes as soon as quorum partition 
access is restored. Is there any hidden setting to prevent that?

best regards, Gunther

-- 
Gunther Schlegel
Head of IT Infrastructure




.............................................................
Riege Software International GmbH  Phone: +49 2159 91480
Mollsfeld 10                       Fax: +49 2159 914811
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel at riege.com
--                                 --
Commercial Register:               Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
VAT Reg No.: DE120585842           Gabriele  Riege
                                   Johannes  Riege
                                   Tobias    Riege
.............................................................
           YOU CARE FOR FREIGHT, WE CARE FOR YOU          



-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 346 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120404/7b38b25f/attachment.vcf>

From wsfax.alu.es at gmail.com  Thu Apr  5 22:19:15 2012
From: wsfax.alu.es at gmail.com (wsfax alu.es)
Date: Fri, 6 Apr 2012 00:19:15 +0200
Subject: [Linux-cluster] Cluster failure, dlm overload
Message-ID: <CAKuCbTDfKqOEg0KdU8rCQtkCBzixnabuFNkQFyn4Om7iiq4yNw@mail.gmail.com>

Hi,

First of all, thanks for your time.

A five node cluster that is sharing several GFS filesystem is having total
blocks of filesystem activity. Around one block each week. These blocks
appeared several weeks ago, after more than three years in service. Cluster
is restored after restart of all cluster nodes ;-)

When these blocks appears, we can see dlm send and receive process with a
high level of CPU consumption, network traffic is a also ten times the
normal one.

A capture (wireshark) of network traffic in DLM port shows thousand of
messages per second. In particular, all "request message" are replied with
a "request reply" where errno=EBADR, Lookup messages seems ok.

The cluster is with a software version a few outdated, the one of RedHat
2.6.18, but not possible to upgrade easily.

Any suggestion is welcome.

Kind regards,
ALU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120406/fbb61203/attachment.htm>

From dbourque at accuweather.com  Fri Apr  6 15:17:11 2012
From: dbourque at accuweather.com (Daniel Bourque)
Date: Fri, 6 Apr 2012 15:17:11 +0000
Subject: [Linux-cluster] checking syntax errors default_event_script.sl
Message-ID: <9B922D75F5EA6A43AB899119482AB5E821072A88@exch-db02.accu.accuwx.com>

Hi,

background:

  I'm working on adding load balancing via RIND. I discovered that not every events are passed to event scripts defined in cluster.conf, therefore I have to modify  /usr/share/cluster/default_event_script.sl . In order to not have to restart rgmanager all the time, I changed  /usr/share/cluster/default_event_script.sl so that it contains only this:

evalfile("/<mount point available on all nodes>/default_event_script.sl");

This allows me to change the main RIND script live.

the problem:

I would like to work be able to work a copy, and do syntax error checks via "slsh -t" before overwriting the live one. I can't simply do that because slsh doesn't find the definition for all the functions and variables used in default_event_script.sl. where are the libraries I need to include in my SLSH_PATH ?

Thanks !


--
Daniel Bourque
Sr. Systems Engineer
AccuWeather

Office (316) 266-8013
Office (316) 266-8000 ext. 8013
Mobile (316) 640-1024
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120406/c450edef/attachment.htm>

From ming-ming.chen at hp.com  Fri Apr  6 17:32:32 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Fri, 6 Apr 2012 17:32:32 +0000
Subject: [Linux-cluster] fail to enable the vm in a cluster with vm service
In-Reply-To: <1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
Message-ID: <1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net>

 Hi,

I have a two node cluster with a vm service. I can do  the vm migrate using clusvcadm (clusvcadm -M vm:vm297 -m node2). However if I use "clusvcadm -d vm:vm297" to disable/stop the vm, and then try to use "clusvcadm -e vm:vm297" to start it , then it will fail. However, I can manually  create the vm297  sing virsh (virsh create /abc/config/vm297.xml).  Any help and comment will be appreciated. Thanks in advance.

Ming



The following message is from the rgmanager.log file:

**** Stop the vm vm297 successfully *****

Apr 06 09:35:57 rgmanager 1 events processed

Apr 06 09:36:05 rgmanager Stopping service vm:vm297

Apr 06 09:36:05 rgmanager [vm] Using /abc/config//vm297.xml

Apr 06 09:36:26 rgmanager 1 events processed

Apr 06 09:36:39 rgmanager [script] Executing /etc/init.d/libvirtd status

****** start the vm 297 failed ********

Apr 06 09:36:49 rgmanager No other nodes have seen vm:vm297

Apr 06 09:36:49 rgmanager Starting disabled service vm:vm297

Apr 06 09:36:49 rgmanager [vm] Using /abc/config//vm297.xml

Apr 06 09:36:49 rgmanager [vm] /abc/config//vm297.xml is XML; using virsh

Apr 06 09:36:49 rgmanager [vm] virsh create /abc/config//vm297.xml

Apr 06 09:36:49 rgmanager start on vm "vm297" returned 1 (generic error)

Apr 06 09:36:49 rgmanager #68: Failed to start vm:vm297; return value: 1

Apr 06 09:36:49 rgmanager Stopping failed service vm:vm297

Apr 06 09:36:49 rgmanager Stopping service vm:vm297

Apr 06 09:36:49 rgmanager [vm] Virtual machine vm297 is

Apr 06 09:36:49 rgmanager Service vm:vm297 is recovering

Apr 06 09:36:49 rgmanager #71: Relocating failed service vm:vm297

Apr 06 09:36:49 rgmanager Service vm:vm297 is stopped

Apr 06 09:36:55 rgmanager 2 events processed





My cluster.conf file is :

<?xml version="1.0"?>

<cluster config_version="2" name="vmcluster">

      <logging debug="on"/>

      <cman expected_votes="1" two_node="1"/>

      <clusternodes>

            <clusternode name="shr297.cup.hp.com" nodeid="1">

                  <fence>

                        <method name="ilo">

                              <device action="reboot" name="shr298"/>

                        </method>

                  </fence>

            </clusternode>

            <clusternode name="shr330.cup.hp.com" nodeid="2">

                  <fence>

                        <method name="ilo">

                              <device action="reboot" name="shr331"/>

                        </method>

                  </fence>

            </clusternode>

      </clusternodes>

      <fencedevices>

            <fencedevice agent="fence_ilo" ipaddr="16.89.112.200" login="Administrator" name="shr298" passwd="lsdt4acsl"/>

            <fencedevice agent="fence_ilo" ipaddr="16.89.112.17" login="Administrator" name="shr331" passwd="lsdt4acsl"/>

      </fencedevices>

      <rm log_facility="local4" log_level="7">

            <service autostart="1" exclusive="0" name="storage" recovery="restart">

                  <script file="/etc/init.d/libvirtd" name="libvirt"/>

            </service>

            <vm autostart="1" exclusive="0" max_restarts="1" name="vm297" path="/abc/config/" recovery="restart" restart_expire_time="600"/>

      </rm>

</cluster>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120406/b5bb7036/attachment.htm>

From emi2fast at gmail.com  Fri Apr  6 21:43:16 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Fri, 6 Apr 2012 23:43:16 +0200
Subject: [Linux-cluster] fail to enable the vm in a cluster with vm
	service
In-Reply-To: <1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net>
Message-ID: <CAE7pJ3C_V4Qkrs44CMtjXMZLoVrwGOHQh9NmHu6NjwZgxs7Gaw@mail.gmail.com>

I don't know why put libvirtd daemon in cluster

ummmm

Il giorno 06 aprile 2012 19:32, Chen, Ming Ming <ming-ming.chen at hp.com> ha
scritto:

>   Hi,****
>
> I have a two node cluster with a vm service. I can do  the vm migrate
> using clusvcadm (clusvcadm ?M vm:vm297 ?m node2). However if I use
> ?clusvcadm ?d vm:vm297? to disable/stop the vm, and then try to use
> ?clusvcadm ?e vm:vm297? to start it , then it will fail. However, I can
> manually  create the vm297  sing virsh (virsh create
> /abc/config/vm297.xml).  Any help and comment will be appreciated. Thanks
> in advance. ****
>
> Ming****
>
> ** **
>
> The following message is from the rgmanager.log file:****
>
> **** Stop the vm vm297 successfully *********
>
> Apr 06 09:35:57 rgmanager 1 events processed****
>
> Apr 06 09:36:05 rgmanager Stopping service vm:vm297****
>
> Apr 06 09:36:05 rgmanager [vm] Using /abc/config//vm297.xml****
>
> Apr 06 09:36:26 rgmanager 1 events processed****
>
> Apr 06 09:36:39 rgmanager [script] Executing /etc/init.d/libvirtd status**
> **
>
> ****** start the vm 297 failed ************
>
> Apr 06 09:36:49 rgmanager No other nodes have seen vm:vm297****
>
> Apr 06 09:36:49 rgmanager Starting disabled service vm:vm297****
>
> Apr 06 09:36:49 rgmanager [vm] Using /abc/config//vm297.xml****
>
> Apr 06 09:36:49 rgmanager [vm] /abc/config//vm297.xml is XML; using virsh*
> ***
>
> Apr 06 09:36:49 rgmanager [vm] virsh create /abc/config//vm297.xml****
>
> Apr 06 09:36:49 rgmanager start on vm "vm297" returned 1 (generic error)**
> **
>
> Apr 06 09:36:49 rgmanager #68: Failed to start vm:vm297; return value: 1**
> **
>
> Apr 06 09:36:49 rgmanager Stopping failed service vm:vm297****
>
> Apr 06 09:36:49 rgmanager Stopping service vm:vm297****
>
> Apr 06 09:36:49 rgmanager [vm] Virtual machine vm297 is ****
>
> Apr 06 09:36:49 rgmanager Service vm:vm297 is recovering****
>
> Apr 06 09:36:49 rgmanager #71: Relocating failed service vm:vm297****
>
> Apr 06 09:36:49 rgmanager Service vm:vm297 is stopped****
>
> Apr 06 09:36:55 rgmanager 2 events processed****
>
> ** **
>
> ** **
>
> My cluster.conf file is :****
>
> <?xml version="1.0"?>****
>
> <cluster config_version="2" name="vmcluster">****
>
>       <logging debug="on"/>****
>
>       <cman expected_votes="1" two_node="1"/>****
>
>       <clusternodes>****
>
>             <clusternode name="shr297.cup.hp.com" nodeid="1">****
>
>                   <fence>****
>
>                         <method name="ilo">****
>
>                               <device action="reboot" name="shr298"/>****
>
>                         </method>****
>
>                   </fence>****
>
>             </clusternode>****
>
>             <clusternode name="shr330.cup.hp.com" nodeid="2">****
>
>                   <fence>****
>
>                         <method name="ilo">****
>
>                               <device action="reboot" name="shr331"/>****
>
>                         </method>****
>
>                   </fence>****
>
>             </clusternode>****
>
>       </clusternodes>****
>
>       <fencedevices>****
>
>             <fencedevice agent="fence_ilo" ipaddr="16.89.112.200"
> login="Administrator" name="shr298" passwd="lsdt4acsl"/>****
>
>             <fencedevice agent="fence_ilo" ipaddr="16.89.112.17"
> login="Administrator" name="shr331" passwd="lsdt4acsl"/>****
>
>       </fencedevices>****
>
>       <rm log_facility="local4" log_level="7">****
>
>             <service autostart="1" exclusive="0" name="storage"
> recovery="restart">****
>
>                   <script file="/etc/init.d/libvirtd" name="libvirt"/>****
>
>             </service>****
>
>             <vm autostart="1" exclusive="0" max_restarts="1" name="vm297"
> path="/abc/config/" recovery="restart" restart_expire_time="600"/>****
>
>       </rm>****
>
> </cluster>****
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120406/560833db/attachment.htm>

From lists at alteeve.ca  Sat Apr  7 03:06:45 2012
From: lists at alteeve.ca (Digimer)
Date: Fri, 06 Apr 2012 23:06:45 -0400
Subject: [Linux-cluster] fail to enable the vm in a cluster with vm
	service
In-Reply-To: <CAE7pJ3C_V4Qkrs44CMtjXMZLoVrwGOHQh9NmHu6NjwZgxs7Gaw@mail.gmail.com>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net>
	<CAE7pJ3C_V4Qkrs44CMtjXMZLoVrwGOHQh9NmHu6NjwZgxs7Gaw@mail.gmail.com>
Message-ID: <4F7FAF45.8070104@alteeve.ca>

On 04/06/2012 05:43 PM, emmanuel segura wrote:
> I don't know why put libvirtd daemon in cluster
> 
> ummmm

I'm coming into this conversation a bit late, but from looking at your
cluster.conf, I'd guess that you need to work on it some more. You
should have the libvirtd resource defined, then reference it in the service.

Next, the cluster tries to start the VM and the service in parallel. If
the libvirtd daemon is not started fast enough, the attempt to start the
VM will fail. I would try starting the libvirtd daemon using init.d.

-- 
Digimer
Papers and Projects: https://alteeve.com



From ming-ming.chen at hp.com  Mon Apr  9 16:36:52 2012
From: ming-ming.chen at hp.com (Chen, Ming Ming)
Date: Mon, 9 Apr 2012 16:36:52 +0000
Subject: [Linux-cluster] fail to enable the vm in a cluster with
	vm	service
In-Reply-To: <4F7FAF45.8070104@alteeve.ca>
References: <CAKrd530bTYqYSxZBf-aVjEjVHhWu-_KgP_9VjmTy2e913eD8hQ@mail.gmail.com>
	<1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net>
	<1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net>
	<CAE7pJ3C_V4Qkrs44CMtjXMZLoVrwGOHQh9NmHu6NjwZgxs7Gaw@mail.gmail.com>
	<4F7FAF45.8070104@alteeve.ca>
Message-ID: <1D241511770E2F4BA89AFD224EDD527117B904A3@G9W0737.americas.hpqcorp.net>

Ok, and I'll try to do that and give it another try.
Thanks
Ming

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Friday, April 06, 2012 8:07 PM
To: linux clustering
Subject: Re: [Linux-cluster] fail to enable the vm in a cluster with vm service

On 04/06/2012 05:43 PM, emmanuel segura wrote:
> I don't know why put libvirtd daemon in cluster
>
> ummmm

I'm coming into this conversation a bit late, but from looking at your
cluster.conf, I'd guess that you need to work on it some more. You
should have the libvirtd resource defined, then reference it in the service.

Next, the cluster tries to start the VM and the service in parallel. If
the libvirtd daemon is not started fast enough, the attempt to start the
VM will fail. I would try starting the libvirtd daemon using init.d.

--
Digimer
Papers and Projects: https://alteeve.com

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From parvez.h.shaikh at gmail.com  Wed Apr 11 06:14:43 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 11 Apr 2012 11:44:43 +0530
Subject: [Linux-cluster] clurgmgrd : <notice> relocating a service to better
	node
Message-ID: <CAKrd533Omvc1wFKViKYfSXax1XWi2jy7hhKN18XEKggOZSe6-A@mail.gmail.com>

Hi,

When I start or enable a service (that was previously disabled) on a a
cluster node, I see message saying clurmgrd relocating service to "better"
node.

I am not understanding why. I can relocate service back to a node where I
see above message and it runs fine there.

What does "better" node could mean? Better in what sense as hardware and
software configurations of both cluster nodes is same. What situation could
possibly trigger this?

Thanks,
Parvez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120411/d786b79f/attachment.htm>

From lists at alteeve.ca  Wed Apr 11 06:21:49 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 11 Apr 2012 02:21:49 -0400
Subject: [Linux-cluster] clurgmgrd : <notice> relocating a service to
 better node
In-Reply-To: <CAKrd533Omvc1wFKViKYfSXax1XWi2jy7hhKN18XEKggOZSe6-A@mail.gmail.com>
References: <CAKrd533Omvc1wFKViKYfSXax1XWi2jy7hhKN18XEKggOZSe6-A@mail.gmail.com>
Message-ID: <4F8522FD.60906@alteeve.ca>

On 04/11/2012 02:14 AM, Parvez Shaikh wrote:
> Hi,
> 
> When I start or enable a service (that was previously disabled) on a a
> cluster node, I see message saying clurmgrd relocating service to
> "better" node.
> 
> I am not understanding why. I can relocate service back to a node where
> I see above message and it runs fine there.
> 
> What does "better" node could mean? Better in what sense as hardware and
> software configurations of both cluster nodes is same. What situation
> could possibly trigger this?
> 
> Thanks,
> Parvez

What version of the cluster software are you using? What is the
configuration? To get help, you need to share more details. :)

-- 
Digimer
Papers and Projects: https://alteeve.com



From parvez.h.shaikh at gmail.com  Wed Apr 11 06:31:06 2012
From: parvez.h.shaikh at gmail.com (Parvez Shaikh)
Date: Wed, 11 Apr 2012 12:01:06 +0530
Subject: [Linux-cluster] clurgmgrd : <notice> relocating a service to
 better node
In-Reply-To: <4F8522FD.60906@alteeve.ca>
References: <CAKrd533Omvc1wFKViKYfSXax1XWi2jy7hhKN18XEKggOZSe6-A@mail.gmail.com>
	<4F8522FD.60906@alteeve.ca>
Message-ID: <CAKrd5325z7NoA=T1bAYmnXomKv0cPOhRGCD_F_Bt1--J2nfshg@mail.gmail.com>

Hi Digimer,


cman_tool version
6.2.0 config 3

RPM versions -

cman-2.0.115-34.el5
rgmanager-2.0.52-6.el5

I am on RHEL 5.5

The configuration is like this -

Cluster of 2 nodes. Each node is IBM Blade hosted in chassis. Private
network within chassis is used for heartbeat across cluster nodes and other
cluster service consist of IP resource and my own server which listens on
this IP resource.

cluster.conf file -

<?xml version="1.0"?>
> <cluster alias="PCluster" config_version="3" name="PCluster">
>   <clusternodes>
>     <clusternode name="my_blade2.my_domain" nodeid="2" votes="1">
>       <fence>
>         <method name="1">
>           <device blade="2" missing_as_off="1" name="BladeCenterFencing"/>
>         </method>
>       </fence>
>     </clusternode>
>     <clusternode name="my_blade1.my_domain" nodeid="1" votes="1">
>       <fence>
>         <method name="1">
>           <device blade="1" missing_as_off="1" name="BladeCenterFencing"/>
>         </method>
>       </fence>
>     </clusternode>
>   </clusternodes>
>   <cman expected_votes="1" two_node="1"/>
>   <fencedevices>
>     <fencedevice agent="fence_bladecenter" ipaddr="XXXXX" login="USERID"
> name="BladeCenterFencing" passwd="XXXXX"/>
>   </fencedevices>
>   <rm>
>     <resources>
>       <script file="/localhome/parvez/my_ha" name="my_HaAgent"/>
>       <ip address="192.168.11.171" monitor_link="1"/>
>       <ip address="192.168.11.175" monitor_link="1"/>
>       <ip address="192.168.11.176" monitor_link="1"/>
>     </resources>
>     <failoverdomains>
>       <failoverdomain name="my_domain" nofailback="1" ordered="1"
> restricted="1">
>         <failoverdomainnode name="my_blade2.my_domain" priority="2"/>
>         <failoverdomainnode name="my_blade1.my_domain" priority="1"/>
>       </failoverdomain>
>     </failoverdomains>
>     <service autostart="0" domain="my_domain" name="my_proc"
> recovery="relocate">
>       <script ref="my_HaAgent"/>
>       <ip ref="192.168.11.175"/>
>     </service>
>   </rm>
>   <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="0"/>
> </cluster>
>

On Wed, Apr 11, 2012 at 11:51 AM, Digimer <lists at alteeve.ca> wrote:

> On 04/11/2012 02:14 AM, Parvez Shaikh wrote:
> > Hi,
> >
> > When I start or enable a service (that was previously disabled) on a a
> > cluster node, I see message saying clurmgrd relocating service to
> > "better" node.
> >
> > I am not understanding why. I can relocate service back to a node where
> > I see above message and it runs fine there.
> >
> > What does "better" node could mean? Better in what sense as hardware and
> > software configurations of both cluster nodes is same. What situation
> > could possibly trigger this?
> >
> > Thanks,
> > Parvez
>
> What version of the cluster software are you using? What is the
> configuration? To get help, you need to share more details. :)
>
> --
> Digimer
> Papers and Projects: https://alteeve.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120411/1df8ae96/attachment.htm>

From lists at alteeve.ca  Wed Apr 11 06:42:07 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 11 Apr 2012 02:42:07 -0400
Subject: [Linux-cluster] clurgmgrd : <notice> relocating a service to
 better node
In-Reply-To: <CAKrd5325z7NoA=T1bAYmnXomKv0cPOhRGCD_F_Bt1--J2nfshg@mail.gmail.com>
References: <CAKrd533Omvc1wFKViKYfSXax1XWi2jy7hhKN18XEKggOZSe6-A@mail.gmail.com>
	<4F8522FD.60906@alteeve.ca>
	<CAKrd5325z7NoA=T1bAYmnXomKv0cPOhRGCD_F_Bt1--J2nfshg@mail.gmail.com>
Message-ID: <4F8527BF.6050701@alteeve.ca>

Still shy on info. I shall assume that you are trying to start 'my_proc'
on 'my_blade2.my_domain' and it moves to 'my_blade1.my_domain'? What do
the syslogs say when you start the resource? What is the output of
'cman_tool status'.

Please, provide all data you think might be relevant. It's easier to
ignore useless data than it is to guess about important but missing data.

A few additional comments:

* When sharing configs, please leave everything as is, except passwords.
Trying to obfuscate specific-but-harmless issues, you may remove the problem

* clean_start="1" is not wise. It's telling the node to assume that the
peer is dead when it starts but can't talk to it's peer. This is asking
for a split-brain.

* missing_as_off="1" is also dangerous, as it makes an assumption about
a node's state, rather than verifying it.

* You have three IP resources defined, but only one in use.

Digimer

On 04/11/2012 02:31 AM, Parvez Shaikh wrote:
> Hi Digimer,
> 
> 
> cman_tool version
> 6.2.0 config 3
> 
> RPM versions -
> 
> cman-2.0.115-34.el5
> rgmanager-2.0.52-6.el5
> 
> I am on RHEL 5.5
> 
> The configuration is like this -
> 
> Cluster of 2 nodes. Each node is IBM Blade hosted in chassis. Private
> network within chassis is used for heartbeat across cluster nodes and
> other cluster service consist of IP resource and my own server which
> listens on this IP resource.
> 
> cluster.conf file -
> 
>     <?xml version="1.0"?>
>     <cluster alias="PCluster" config_version="3" name="PCluster">
>       <clusternodes>
>         <clusternode name="my_blade2.my_domain" nodeid="2" votes="1">
>           <fence>
>             <method name="1">
>               <device blade="2" missing_as_off="1"
>     name="BladeCenterFencing"/>
>             </method>
>           </fence>
>         </clusternode>
>         <clusternode name="my_blade1.my_domain" nodeid="1" votes="1">
>           <fence>
>             <method name="1">
>               <device blade="1" missing_as_off="1"
>     name="BladeCenterFencing"/>
>             </method>
>           </fence>
>         </clusternode>
>       </clusternodes>
>       <cman expected_votes="1" two_node="1"/>
>       <fencedevices>
>         <fencedevice agent="fence_bladecenter" ipaddr="XXXXX"
>     login="USERID" name="BladeCenterFencing" passwd="XXXXX"/>
>       </fencedevices>
>       <rm>
>         <resources>
>           <script file="/localhome/parvez/my_ha" name="my_HaAgent"/>
>           <ip address="192.168.11.171" monitor_link="1"/>
>           <ip address="192.168.11.175" monitor_link="1"/>
>           <ip address="192.168.11.176" monitor_link="1"/>
>         </resources>
>         <failoverdomains>
>           <failoverdomain name="my_domain" nofailback="1" ordered="1"
>     restricted="1">
>             <failoverdomainnode name="my_blade2.my_domain" priority="2"/>
>             <failoverdomainnode name="my_blade1.my_domain" priority="1"/>
>           </failoverdomain>
>         </failoverdomains>
>         <service autostart="0" domain="my_domain" name="my_proc"
>     recovery="relocate">
>           <script ref="my_HaAgent"/>
>           <ip ref="192.168.11.175"/>
>         </service>
>       </rm>
>       <fence_daemon clean_start="1" post_fail_delay="0"
>     post_join_delay="0"/>
>     </cluster>
> 
> 
> On Wed, Apr 11, 2012 at 11:51 AM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
> 
>     On 04/11/2012 02:14 AM, Parvez Shaikh wrote:
>     > Hi,
>     >
>     > When I start or enable a service (that was previously disabled) on a a
>     > cluster node, I see message saying clurmgrd relocating service to
>     > "better" node.
>     >
>     > I am not understanding why. I can relocate service back to a node
>     where
>     > I see above message and it runs fine there.
>     >
>     > What does "better" node could mean? Better in what sense as
>     hardware and
>     > software configurations of both cluster nodes is same. What situation
>     > could possibly trigger this?
>     >
>     > Thanks,
>     > Parvez
> 
>     What version of the cluster software are you using? What is the
>     configuration? To get help, you need to share more details. :)
> 
>     --
>     Digimer
>     Papers and Projects: https://alteeve.com
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.com



From sadvary at dickinson.edu  Wed Apr 11 11:24:47 2012
From: sadvary at dickinson.edu (Sadvary, Bill)
Date: Wed, 11 Apr 2012 11:24:47 +0000
Subject: [Linux-cluster] Clustering tomcat
In-Reply-To: <48D7FCC5EAF8774B93030C15BA6F446C983B37FF@DKNEXCHMB1.FAS.LCL>
References: <48D7FCC5EAF8774B93030C15BA6F446C983B37FF@DKNEXCHMB1.FAS.LCL>
Message-ID: <48D7FCC5EAF8774B93030C15BA6F446C983B3B9B@DKNEXCHMB1.FAS.LCL>


Hi,

I'm having some difficulty getting a tomcat cluster service up and running with Centos v6.2 and Tomcat6.

The service won't start tomcat and it keeps ping-ponging back and forth between the servers every 30 seconds.  

Below is the cluster.conf file, "messages" file and the rgmanager.log

Any help would be appreciated.

Thanks,
-Bill


Here's my cluster.conf
---------------------------

<?xml version="1.0"?>
<cluster config_version="11" name="AUTHCLUSTERDEV">
        <cman expected_votes="1" two_node="1"/>
        <clusternodes>
                <clusternode name="AUTHCLUSTER1DEV" nodeid="1">
                        <fence>
                                <method name="single"/>
                        </fence>
                </clusternode>
                <clusternode name="AUTHCLUSTER2DEV" nodeid="2">
                        <fence>
                                <method name="single"/>
                        </fence>
                </clusternode>
        </clusternodes>
        <rm>
                <failoverdomains>
                        <failoverdomain name="failoverDom" nofailback="1" ordered="0" restricted="0">
                                <failoverdomainnode name="AUTHCLUSTER1DEV" priority="1"/>
                                <failoverdomainnode name="AUTHCLUSTER2DEV" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="172.16.223.69" monitor_link="1"/>
                        <tomcat-6 config_file="/etc/tomcat6/tomcat6.conf" name="tomcat6" shutdown_wait="30"/>
                </resources>
                <service domain="failoverDom" name="ipservice" recovery="relocate">
                        <ip ref="172.16.223.69">
                                <tomcat-6 ref="tomcat6"/>
                        </ip>
                </service>
        </rm>
        <logging debug="on"/>
</cluster>

Here's the "messages" file after one full cycle of ping-pongs
------------------------------------------------------------------------
Apr 10 10:09:44 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice is now running on member 2
Apr 10 10:10:55 DKNAUTH1DEV rgmanager[2191]: Recovering failed service service:ipservice
Apr 10 10:10:56 DKNAUTH1DEV rgmanager[8695]: [ip] Adding IPv4 address 172.16.223.69/28 to eth2
Apr 10 10:11:00 DKNAUTH1DEV rgmanager[8837]: [tomcat-6] Starting Service tomcat-6:tomcat6
Apr 10 10:11:00 DKNAUTH1DEV ntpd[1938]: Listening on interface #81 eth2, 172.16.223.69#123 Enabled
Apr 10 10:11:01 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice started
Apr 10 10:12:09 DKNAUTH1DEV rgmanager[9694]: [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed
Apr 10 10:12:09 DKNAUTH1DEV rgmanager[9714]: [tomcat-6] Monitoring Service tomcat-6:tomcat6 > Service Is Not Running
Apr 10 10:12:09 DKNAUTH1DEV rgmanager[2191]: status on tomcat-6 "tomcat6" returned 1 (generic error)
Apr 10 10:12:09 DKNAUTH1DEV rgmanager[2191]: Stopping service service:ipservice
Apr 10 10:12:09 DKNAUTH1DEV rgmanager[9805]: [tomcat-6] Stopping Service tomcat-6:tomcat6
Apr 10 10:12:10 DKNAUTH1DEV rgmanager[9825]: [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed - File Doesn'
Apr 10 10:12:10 DKNAUTH1DEV rgmanager[9845]: [tomcat-6] Stopping Service tomcat-6:tomcat6 > Succeed
Apr 10 10:12:10 DKNAUTH1DEV rgmanager[9896]: [ip] Removing IPv4 address 172.16.223.69/28 from eth2
Apr 10 10:12:11 DKNAUTH1DEV ntpd[1938]: Deleting interface #81 eth2, 172.16.223.69#123, interface stats: received=0, sent=0, dropped=0, active_time=71 secs
Apr 10 10:12:20 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice is recovering
Apr 10 10:12:24 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice is now running on member 2

The rgmanager.log for the same time duration
--------------------------------------------------------
Apr 10 10:09:44 rgmanager Service service:ipservice is now running on member 2
Apr 10 10:09:49 rgmanager 2 events processed
Apr 10 10:10:55 rgmanager Recovering failed service service:ipservice
Apr 10 10:10:56 rgmanager [ip] Link for eth2: Detected
Apr 10 10:10:56 rgmanager [ip] Adding IPv4 address 172.16.223.69/28 to eth2
Apr 10 10:10:56 rgmanager [ip] Pinging addr 172.16.223.69 from dev eth2
Apr 10 10:10:59 rgmanager [ip] Sending gratuitous ARP: 172.16.223.69 00:15:5d:98:91:05 brd ff:ff:ff:ff:ff:ff
Apr 10 10:11:00 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6
Apr 10 10:11:00 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6 > Succeed
Apr 10 10:11:00 rgmanager [tomcat-6] Starting Service tomcat-6:tomcat6
Apr 10 10:11:00 rgmanager 1 events processed
Apr 10 10:11:00 rgmanager [tomcat-6] Looking For IP Addresses
Apr 10 10:11:01 rgmanager [tomcat-6] 1 IP addresses found for ipservice/tomcat6
Apr 10 10:11:01 rgmanager [tomcat-6] Looking For IP Addresses > Succeed -  IP Addresses Found
Apr 10 10:11:01 rgmanager [tomcat-6] Checking: SHA1 checksum of config file /tomcat-6/tomcat-6:tomcat6/conf/server.xml
Apr 10 10:11:01 rgmanager [tomcat-6] Checking: SHA1 checksum > succeed
Apr 10 10:11:01 rgmanager [tomcat-6] Generating New Config File /tomcat-6/tomcat-6:tomcat6/conf/server.xml From /usr/share/tomcat6/conf/server.xml
Apr 10 10:11:01 rgmanager [tomcat-6] Generating New Config File /tomcat-6/tomcat-6:tomcat6/conf/server.xml From /usr/share/tomcat6/conf/server.xml > SucApr 10 10:11:01 rgmanager [tomcat-6] Starting Service tomcat-6:tomcat6 > Succeed
Apr 10 10:11:01 rgmanager Service service:ipservice started
Apr 10 10:11:07 rgmanager 1 events processed
Apr 10 10:11:29 rgmanager [ip] Checking 172.16.223.69, Level 0
Apr 10 10:11:29 rgmanager [ip] 172.16.223.69 present on eth2
Apr 10 10:11:29 rgmanager [ip] Link for eth2: Detected
Apr 10 10:11:29 rgmanager [ip] Link detected on eth2
Apr 10 10:11:49 rgmanager [ip] Checking 172.16.223.69, Level 0
Apr 10 10:11:49 rgmanager [ip] 172.16.223.69 present on eth2
Apr 10 10:11:49 rgmanager [ip] Link for eth2: Detected
Apr 10 10:11:49 rgmanager [ip] Link detected on eth2
Apr 10 10:12:09 rgmanager [ip] Checking 172.16.223.69, Level 10
Apr 10 10:12:09 rgmanager [ip] 172.16.223.69 present on eth2
Apr 10 10:12:09 rgmanager [ip] Link for eth2: Detected
Apr 10 10:12:09 rgmanager [ip] Link detected on eth2
Apr 10 10:12:09 rgmanager [ip] Local ping to 172.16.223.69 succeeded
Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6
Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6 > Succeed
Apr 10 10:12:09 rgmanager [tomcat-6] Monitoring Service tomcat-6:tomcat6
Apr 10 10:12:09 rgmanager [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed
Apr 10 10:12:09 rgmanager [tomcat-6] Monitoring Service tomcat-6:tomcat6 > Service Is Not Running
Apr 10 10:12:09 rgmanager status on tomcat-6 "tomcat6" returned 1 (generic error)
Apr 10 10:12:09 rgmanager Stopping service service:ipservice
Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6
Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6 > Succeed
Apr 10 10:12:09 rgmanager [tomcat-6] Stopping Service tomcat-6:tomcat6
Apr 10 10:12:10 rgmanager [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed - File Doesn'Apr 10 10:12:10 rgmanager [tomcat-6] Stopping Service tomcat-6:tomcat6 > Succeed
Apr 10 10:12:10 rgmanager [ip] Removing IPv4 address 172.16.223.69/28 from eth2
Apr 10 10:12:20 rgmanager Service service:ipservice is recovering
Apr 10 10:12:20 rgmanager Sent remote-start request to 2
Apr 10 10:12:24 rgmanager Service service:ipservice is now running on member 2
Apr 10 10:12:29 rgmanager 2 events processed
Apr 10 10:12:39 rgmanager Forwarding req. to AUTHCLUSTER2DEV.
Apr 10 10:12:40 rgmanager FW: Forwarding disable request to 2
Apr 10 10:12:55 rgmanager 1 events processed




From wsfax.alu.es at gmail.com  Wed Apr 11 15:17:29 2012
From: wsfax.alu.es at gmail.com (wsfax alu.es)
Date: Wed, 11 Apr 2012 17:17:29 +0200
Subject: [Linux-cluster] Cluster failure, dlm overload
Message-ID: <CAKuCbTC5eHm5Evq2k1dM0mVTnUft+eyhm6aK6_ETi9KTG8OSYw@mail.gmail.com>

Update of the information about this problem.

We see that the loop that causes the overload of "dlm" is:


   1. Node 1 sends a "lookup" message, related to some filesystem and
   inode, to the master node (node 3), asking for the current owner of this
   element.
   2. Node 3 replies "the owner of this element is now the node 4".
   3. Node 1 sends a "request" message to node 4.
   4. Node 4 replies "I have not it" (error code EBADR = -53).
   5. goto step 1

This loop appends several hundreds per seconds, multiplied by all
filesystem and inodes with this problem. In total, several tenths of
thousands messages in DLM, until restart of the cluster

Kind regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120411/0e8fec16/attachment.htm>

From lists at alteeve.ca  Wed Apr 11 15:44:19 2012
From: lists at alteeve.ca (Digimer)
Date: Wed, 11 Apr 2012 11:44:19 -0400
Subject: [Linux-cluster] Clustering tomcat
In-Reply-To: <48D7FCC5EAF8774B93030C15BA6F446C983B3B9B@DKNEXCHMB1.FAS.LCL>
References: <48D7FCC5EAF8774B93030C15BA6F446C983B37FF@DKNEXCHMB1.FAS.LCL>
	<48D7FCC5EAF8774B93030C15BA6F446C983B3B9B@DKNEXCHMB1.FAS.LCL>
Message-ID: <4F85A6D3.8060203@alteeve.ca>

On 04/11/2012 07:24 AM, Sadvary, Bill wrote:
> 
> Hi,
> 
> I'm having some difficulty getting a tomcat cluster service up and running with Centos v6.2 and Tomcat6.
> 
> The service won't start tomcat and it keeps ping-ponging back and forth between the servers every 30 seconds.  
> 
> Below is the cluster.conf file, "messages" file and the rgmanager.log
> 
> Any help would be appreciated.
> 
> Thanks,
> -Bill
> 
> 
> Here's my cluster.conf
> ---------------------------
> 
> <?xml version="1.0"?>
> <cluster config_version="11" name="AUTHCLUSTERDEV">
>         <cman expected_votes="1" two_node="1"/>
>         <clusternodes>
>                 <clusternode name="AUTHCLUSTER1DEV" nodeid="1">
>                         <fence>
>                                 <method name="single"/>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="AUTHCLUSTER2DEV" nodeid="2">
>                         <fence>
>                                 <method name="single"/>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="failoverDom" nofailback="1" ordered="0" restricted="0">
>                                 <failoverdomainnode name="AUTHCLUSTER1DEV" priority="1"/>
>                                 <failoverdomainnode name="AUTHCLUSTER2DEV" priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <ip address="172.16.223.69" monitor_link="1"/>
>                         <tomcat-6 config_file="/etc/tomcat6/tomcat6.conf" name="tomcat6" shutdown_wait="30"/>
>                 </resources>
>                 <service domain="failoverDom" name="ipservice" recovery="relocate">
>                         <ip ref="172.16.223.69">
>                                 <tomcat-6 ref="tomcat6"/>
>                         </ip>
>                 </service>
>         </rm>
>         <logging debug="on"/>
> </cluster>
> 
> Here's the "messages" file after one full cycle of ping-pongs
> ------------------------------------------------------------------------
> Apr 10 10:09:44 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice is now running on member 2
> Apr 10 10:10:55 DKNAUTH1DEV rgmanager[2191]: Recovering failed service service:ipservice
> Apr 10 10:10:56 DKNAUTH1DEV rgmanager[8695]: [ip] Adding IPv4 address 172.16.223.69/28 to eth2
> Apr 10 10:11:00 DKNAUTH1DEV rgmanager[8837]: [tomcat-6] Starting Service tomcat-6:tomcat6
> Apr 10 10:11:00 DKNAUTH1DEV ntpd[1938]: Listening on interface #81 eth2, 172.16.223.69#123 Enabled
> Apr 10 10:11:01 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice started
> Apr 10 10:12:09 DKNAUTH1DEV rgmanager[9694]: [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed
> Apr 10 10:12:09 DKNAUTH1DEV rgmanager[9714]: [tomcat-6] Monitoring Service tomcat-6:tomcat6 > Service Is Not Running
> Apr 10 10:12:09 DKNAUTH1DEV rgmanager[2191]: status on tomcat-6 "tomcat6" returned 1 (generic error)
> Apr 10 10:12:09 DKNAUTH1DEV rgmanager[2191]: Stopping service service:ipservice
> Apr 10 10:12:09 DKNAUTH1DEV rgmanager[9805]: [tomcat-6] Stopping Service tomcat-6:tomcat6
> Apr 10 10:12:10 DKNAUTH1DEV rgmanager[9825]: [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed - File Doesn'
> Apr 10 10:12:10 DKNAUTH1DEV rgmanager[9845]: [tomcat-6] Stopping Service tomcat-6:tomcat6 > Succeed
> Apr 10 10:12:10 DKNAUTH1DEV rgmanager[9896]: [ip] Removing IPv4 address 172.16.223.69/28 from eth2
> Apr 10 10:12:11 DKNAUTH1DEV ntpd[1938]: Deleting interface #81 eth2, 172.16.223.69#123, interface stats: received=0, sent=0, dropped=0, active_time=71 secs
> Apr 10 10:12:20 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice is recovering
> Apr 10 10:12:24 DKNAUTH1DEV rgmanager[2191]: Service service:ipservice is now running on member 2
> 
> The rgmanager.log for the same time duration
> --------------------------------------------------------
> Apr 10 10:09:44 rgmanager Service service:ipservice is now running on member 2
> Apr 10 10:09:49 rgmanager 2 events processed
> Apr 10 10:10:55 rgmanager Recovering failed service service:ipservice
> Apr 10 10:10:56 rgmanager [ip] Link for eth2: Detected
> Apr 10 10:10:56 rgmanager [ip] Adding IPv4 address 172.16.223.69/28 to eth2
> Apr 10 10:10:56 rgmanager [ip] Pinging addr 172.16.223.69 from dev eth2
> Apr 10 10:10:59 rgmanager [ip] Sending gratuitous ARP: 172.16.223.69 00:15:5d:98:91:05 brd ff:ff:ff:ff:ff:ff
> Apr 10 10:11:00 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6
> Apr 10 10:11:00 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6 > Succeed
> Apr 10 10:11:00 rgmanager [tomcat-6] Starting Service tomcat-6:tomcat6
> Apr 10 10:11:00 rgmanager 1 events processed
> Apr 10 10:11:00 rgmanager [tomcat-6] Looking For IP Addresses
> Apr 10 10:11:01 rgmanager [tomcat-6] 1 IP addresses found for ipservice/tomcat6
> Apr 10 10:11:01 rgmanager [tomcat-6] Looking For IP Addresses > Succeed -  IP Addresses Found
> Apr 10 10:11:01 rgmanager [tomcat-6] Checking: SHA1 checksum of config file /tomcat-6/tomcat-6:tomcat6/conf/server.xml
> Apr 10 10:11:01 rgmanager [tomcat-6] Checking: SHA1 checksum > succeed
> Apr 10 10:11:01 rgmanager [tomcat-6] Generating New Config File /tomcat-6/tomcat-6:tomcat6/conf/server.xml From /usr/share/tomcat6/conf/server.xml
> Apr 10 10:11:01 rgmanager [tomcat-6] Generating New Config File /tomcat-6/tomcat-6:tomcat6/conf/server.xml From /usr/share/tomcat6/conf/server.xml > SucApr 10 10:11:01 rgmanager [tomcat-6] Starting Service tomcat-6:tomcat6 > Succeed
> Apr 10 10:11:01 rgmanager Service service:ipservice started
> Apr 10 10:11:07 rgmanager 1 events processed
> Apr 10 10:11:29 rgmanager [ip] Checking 172.16.223.69, Level 0
> Apr 10 10:11:29 rgmanager [ip] 172.16.223.69 present on eth2
> Apr 10 10:11:29 rgmanager [ip] Link for eth2: Detected
> Apr 10 10:11:29 rgmanager [ip] Link detected on eth2
> Apr 10 10:11:49 rgmanager [ip] Checking 172.16.223.69, Level 0
> Apr 10 10:11:49 rgmanager [ip] 172.16.223.69 present on eth2
> Apr 10 10:11:49 rgmanager [ip] Link for eth2: Detected
> Apr 10 10:11:49 rgmanager [ip] Link detected on eth2
> Apr 10 10:12:09 rgmanager [ip] Checking 172.16.223.69, Level 10
> Apr 10 10:12:09 rgmanager [ip] 172.16.223.69 present on eth2
> Apr 10 10:12:09 rgmanager [ip] Link for eth2: Detected
> Apr 10 10:12:09 rgmanager [ip] Link detected on eth2
> Apr 10 10:12:09 rgmanager [ip] Local ping to 172.16.223.69 succeeded
> Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6
> Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6 > Succeed
> Apr 10 10:12:09 rgmanager [tomcat-6] Monitoring Service tomcat-6:tomcat6
> Apr 10 10:12:09 rgmanager [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed
> Apr 10 10:12:09 rgmanager [tomcat-6] Monitoring Service tomcat-6:tomcat6 > Service Is Not Running
> Apr 10 10:12:09 rgmanager status on tomcat-6 "tomcat6" returned 1 (generic error)
> Apr 10 10:12:09 rgmanager Stopping service service:ipservice
> Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6
> Apr 10 10:12:09 rgmanager [tomcat-6] Verifying Configuration Of tomcat-6:tomcat6 > Succeed
> Apr 10 10:12:09 rgmanager [tomcat-6] Stopping Service tomcat-6:tomcat6
> Apr 10 10:12:10 rgmanager [tomcat-6] Checking Existence Of File /var/run/cluster/tomcat-6/tomcat-6:tomcat6.pid [tomcat-6:tomcat6] > Failed - File Doesn'Apr 10 10:12:10 rgmanager [tomcat-6] Stopping Service tomcat-6:tomcat6 > Succeed
> Apr 10 10:12:10 rgmanager [ip] Removing IPv4 address 172.16.223.69/28 from eth2
> Apr 10 10:12:20 rgmanager Service service:ipservice is recovering
> Apr 10 10:12:20 rgmanager Sent remote-start request to 2
> Apr 10 10:12:24 rgmanager Service service:ipservice is now running on member 2
> Apr 10 10:12:29 rgmanager 2 events processed
> Apr 10 10:12:39 rgmanager Forwarding req. to AUTHCLUSTER2DEV.
> Apr 10 10:12:40 rgmanager FW: Forwarding disable request to 2
> Apr 10 10:12:55 rgmanager 1 events processed

I've not used tomcat (or it's RA), so I can't speak to it specifically.
It looks like the RA is returning a bad exit code though... If you look
at /usr/share/cluster/tomcat-6.sh, you might be able to suss out what it
is failing on.

As an aside; you need a proper fence device. As it is now, a node
failure will hang your cluster as 'single' is not defined from what I
see. Have you tested a node failure?

-- 
Digimer
Papers and Projects: https://alteeve.com



From member at linkedin.com  Thu Apr 12 05:12:38 2012
From: member at linkedin.com (anuj chauhan via LinkedIn)
Date: Thu, 12 Apr 2012 05:12:38 +0000 (UTC)
Subject: [Linux-cluster] Invitation to connect on LinkedIn
Message-ID: <902064429.1510342.1334207558153.JavaMail.app@ela4-app0128.prod>

LinkedIn
------------




    anuj chauhan requested to add you as a connection on LinkedIn:
  

------------------------------------------

Krishna,

I'd like to add you to my professional network on LinkedIn.

- anuj

Accept invitation from anuj chauhan
http://www.linkedin.com/e/-odgn7o-h0xcqcjo-d/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/blk/I74567276_45/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_djhvdzsOdPoRd3t9bQF1dmMMoD5pbPwNc3sVc3sQdzoLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=0OAA91GTdUjBc1

View invitation from anuj chauhan
http://www.linkedin.com/e/-odgn7o-h0xcqcjo-d/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/blk/I74567276_45/0Rd5YSdP8TdzkQdQALqnpPbOYWrSlI/svi/?hs=false&tok=3cD3GKkTxUjBc1

------------------------------------------

Why might connecting with anuj chauhan be a good idea?

anuj chauhan's connections could be useful to you:

After accepting anuj chauhan's invitation, check anuj chauhan's connections to see who else you may know and who you might want an introduction to. Building these connections can create opportunities in the future.
 
-- 
(c) 2012, LinkedIn Corporation
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120412/754747d8/attachment.htm>

From platsakos at gmail.com  Thu Apr 12 13:04:16 2012
From: platsakos at gmail.com (AK)
Date: Thu, 12 Apr 2012 15:04:16 +0200
Subject: [Linux-cluster] Invitation to connect on LinkedIn
In-Reply-To: <902064429.1510342.1334207558153.JavaMail.app@ela4-app0128.prod>
References: <902064429.1510342.1334207558153.JavaMail.app@ela4-app0128.prod>
Message-ID: <4F86D2D0.9040603@gmail.com>

Ah, the evils of mass invite
On 4/12/12 7:12 AM, anuj chauhan via LinkedIn wrote:
> LinkedIn
> ------------
>
>
>
>
>     anuj chauhan requested to add you as a connection on LinkedIn:
>   
>
> ------------------------------------------
>
> Krishna,
>
> I'd like to add you to my professional network on LinkedIn.
>
> - anuj
>
> Accept invitation from anuj chauhan
> http://www.linkedin.com/e/-odgn7o-h0xcqcjo-d/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/blk/I74567276_45/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_djhvdzsOdPoRd3t9bQF1dmMMoD5pbPwNc3sVc3sQdzoLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=0OAA91GTdUjBc1
>
> View invitation from anuj chauhan
> http://www.linkedin.com/e/-odgn7o-h0xcqcjo-d/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/blk/I74567276_45/0Rd5YSdP8TdzkQdQALqnpPbOYWrSlI/svi/?hs=false&tok=3cD3GKkTxUjBc1
>
> ------------------------------------------
>
> Why might connecting with anuj chauhan be a good idea?
>
> anuj chauhan's connections could be useful to you:
>
> After accepting anuj chauhan's invitation, check anuj chauhan's connections to see who else you may know and who you might want an introduction to. Building these connections can create opportunities in the future.
>  
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120412/e9b45851/attachment.htm>

From ajb2 at mssl.ucl.ac.uk  Thu Apr 12 13:36:58 2012
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Thu, 12 Apr 2012 14:36:58 +0100
Subject: [Linux-cluster] Invitation to connect on LinkedIn
In-Reply-To: <4F86D2D0.9040603@gmail.com>
References: <902064429.1510342.1334207558153.JavaMail.app@ela4-app0128.prod>
	<4F86D2D0.9040603@gmail.com>
Message-ID: <4F86DA7A.7070208@mssl.ucl.ac.uk>

On 12/04/12 14:04, AK wrote:
> Ah, the evils of mass invite

And the evils of Linkedin in particular.

<rant>

The only way to stop getting invites is to setup a Linkedin account
yourself and from that point you _cannot_ opt out of receiving mail from
them from time to time.

</rant>

I regard them as spammers of the darkest hat colour, deserving of a
routing blackhole wherever found.





From emi2fast at gmail.com  Thu Apr 12 14:25:23 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Thu, 12 Apr 2012 16:25:23 +0200
Subject: [Linux-cluster] Redhat without qdisk
Message-ID: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>

Hello all List

I have a big question about qdisk in a two cluster

one of out clients has too many cluster in two nodes configuration and one
RedHat technical came to us and said that there is no need to use the qdisk

I would to know if this is true

I really think it's a bad idea

-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120412/26e44748/attachment.htm>

From dbourque at accuweather.com  Thu Apr 12 14:47:28 2012
From: dbourque at accuweather.com (Daniel Bourque)
Date: Thu, 12 Apr 2012 14:47:28 +0000
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
Message-ID: <9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>

A qdisk is just another way to maintain quorum in cluster. there is a special two node cluster mode designed to allow for quorum to be maintained by the surviving node.

<cman two_node="1" expected_votes="1">
</cman>

But it's not very clear to me what happens with fencing if both nodes get partitioned on the network. Do they both try to fence each other off ?


--
Dan

________________________________
From: linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com] on behalf of emmanuel segura [emi2fast at gmail.com]
Sent: Thursday, April 12, 2012 9:25 AM
To: linux clustering
Subject: [Linux-cluster] Redhat without qdisk

Hello all List

I have a big question about qdisk in a two cluster

one of out clients has too many cluster in two nodes configuration and one RedHat technical came to us and said that there is no need to use the qdisk

I would to know if this is true

I really think it's a bad idea

--
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120412/f45cd6d9/attachment.htm>

From John.Anderson4 at weyerhaeuser.com  Thu Apr 12 15:12:31 2012
From: John.Anderson4 at weyerhaeuser.com (Anderson, John(Patni))
Date: Thu, 12 Apr 2012 08:12:31 -0700
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
Message-ID: <735D247930DCA240B50EB36F4586D25A27DF002957@WAFEDIXMCMS10.corp.weyer.pri>

I don't normally chime in, but running RHEL on a two node cluster we have found that the only way to ensure fencing works properly is with a 2gb qdisk.  The option you were told works correctly less than 50% of the time as both attempt to fence the other.  We set up a delay on the passive node so if the active node gets in trouble the qdisk switches to the passive at the end of the delay, and the fencing takes place.  The process takes about 5 minutes or so.

John A.

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Daniel Bourque
Sent: Thursday, April 12, 2012 7:47 AM
To: linux clustering
Subject: Re: [Linux-cluster] Redhat without qdisk

A qdisk is just another way to maintain quorum in cluster. there is a special two node cluster mode designed to allow for quorum to be maintained by the surviving node.

<cman two_node="1" expected_votes="1">
</cman>

But it's not very clear to me what happens with fencing if both nodes get partitioned on the network. Do they both try to fence each other off ?


--
Dan

________________________________
From: linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com] on behalf of emmanuel segura [emi2fast at gmail.com]
Sent: Thursday, April 12, 2012 9:25 AM
To: linux clustering
Subject: [Linux-cluster] Redhat without qdisk
Hello all List

I have a big question about qdisk in a two cluster

one of out clients has too many cluster in two nodes configuration and one RedHat technical came to us and said that there is no need to use the qdisk

I would to know if this is true

I really think it's a bad idea

--
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120412/ecbc5c4f/attachment.htm>

From emi2fast at gmail.com  Thu Apr 12 15:18:34 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Thu, 12 Apr 2012 17:18:34 +0200
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
Message-ID: <CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>

That's right

you'll found your cluster partitioned and if you "<cman two_node="1"
expected_votes="1">" as redhat setting our cluster maybe you get data
corruption


Because every node can operate with one vote an rich the quorum state

Forse the fencing problem redhat implement a work around as permanent
solution

fence delay for some cluster agents

For fence_scsi in Redhat 5.X the redhat support says it's ok for production
BAAAAA not tre

AND i the redhat technical tells the cluster don't require the quorum disk
UMMMMMMMMMMMMM



Il giorno 12 aprile 2012 16:47, Daniel Bourque
<dbourque at accuweather.com>ha scritto:

>  A qdisk is just another way to maintain quorum in cluster. there is a
> special two node cluster mode designed to allow for quorum to be maintained
> by the surviving node.
>
>  <cman two_node="1" expected_votes="1">
> </cman>
>
>  But it's not very clear to me what happens with fencing if both nodes
> get partitioned on the network. Do they both try to fence each other off ?
>
>
> --
> Dan
>
>     ------------------------------
> *From:* linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com]
> on behalf of emmanuel segura [emi2fast at gmail.com]
> *Sent:* Thursday, April 12, 2012 9:25 AM
> *To:* linux clustering
> *Subject:* [Linux-cluster] Redhat without qdisk
>
>  Hello all List
>
> I have a big question about qdisk in a two cluster
>
> one of out clients has too many cluster in two nodes configuration and one
> RedHat technical came to us and said that there is no need to use the
> qdisk
>
> I would to know if this is true
>
> I really think it's a bad idea
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120412/9dab01f4/attachment.htm>

From rohara at redhat.com  Thu Apr 12 15:51:05 2012
From: rohara at redhat.com (Ryan O'Hara)
Date: Thu, 12 Apr 2012 10:51:05 -0500
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
	<CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
Message-ID: <4F86F9E9.2050808@redhat.com>

On 04/12/2012 10:18 AM, emmanuel segura wrote:
> That's right
>
> you'll found your cluster partitioned and if you "<cman two_node="1"
> expected_votes="1">" as redhat setting our cluster maybe you get data
> corruption

How? What fence agent are you using? I've used this configuration for 
years and never had data corruption.

> Because every node can operate with one vote an rich the quorum state
>
> Forse the fencing problem redhat implement a work around as permanent
> solution
>
> fence delay for some cluster agents
>
> For fence_scsi in Redhat 5.X the redhat support says it's ok for production
> BAAAAA not tre

What are you concerns about fence_scsi?

> AND i the redhat technical tells the cluster don't require the quorum disk
> UMMMMMMMMMMMMM

Can you explain why qdisk would be required?

Ryan


> Il giorno 12 aprile 2012 16:47, Daniel Bourque
> <dbourque at accuweather.com>ha scritto:
>
>>   A qdisk is just another way to maintain quorum in cluster. there is a
>> special two node cluster mode designed to allow for quorum to be maintained
>> by the surviving node.
>>
>>   <cman two_node="1" expected_votes="1">
>> </cman>
>>
>>   But it's not very clear to me what happens with fencing if both nodes
>> get partitioned on the network. Do they both try to fence each other off ?
>>
>>
>> --
>> Dan
>>
>>      ------------------------------
>> *From:* linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com]
>> on behalf of emmanuel segura [emi2fast at gmail.com]
>> *Sent:* Thursday, April 12, 2012 9:25 AM
>> *To:* linux clustering
>> *Subject:* [Linux-cluster] Redhat without qdisk
>>
>>   Hello all List
>>
>> I have a big question about qdisk in a two cluster
>>
>> one of out clients has too many cluster in two nodes configuration and one
>> RedHat technical came to us and said that there is no need to use the
>> qdisk
>>
>> I would to know if this is true
>>
>> I really think it's a bad idea
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From emi2fast at gmail.com  Thu Apr 12 15:48:07 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Thu, 12 Apr 2012 17:48:07 +0200
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <735D247930DCA240B50EB36F4586D25A27DF002957@WAFEDIXMCMS10.corp.weyer.pri>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
	<735D247930DCA240B50EB36F4586D25A27DF002957@WAFEDIXMCMS10.corp.weyer.pri>
Message-ID: <CAE7pJ3DozgC2XiyHwtoVWFWFmTY02BXpqrE1nk+MZTxQejxEnw@mail.gmail.com>

Hello Aderson

Talking about the delay parameter, I think the people work in the redhat
support doesn't the man pages

Because i hate the work around of using delay in the fencing session

I found some times ago
========================================================
 master_wins="0"
            If set to 1 (on), only the qdiskd master will advertise its
votes to CMAN.  In  a  network  partition,  only  the
            qdisk master will provide votes to CMAN.  Consequently, that
node will automatically "win" in a fence race.
========================================================

And i can tell you work well, just the master fence the passive node

I would to know

what are the necessary requirements to work in redhat?

Because i received some many stupid answers for the support

One example

The fisrt time i found a fencing Race
===================================================
The Redhat answer was

I see you use a cluster private network and public, you can use a *heuristics
for monitor both networks
Other Redhat Limitation, the ip.sh doesn't monitor the connectivity with
gateway just the Link*
====================================================

I don't whay Redhat doesn't change this type of cluster

I prefer corosync+pacemaker

Il giorno 12 aprile 2012 17:12, Anderson, John(Patni) <
John.Anderson4 at weyerhaeuser.com> ha scritto:

> I don?t normally chime in, but running RHEL on a two node cluster we have
> found that the only way to ensure fencing works properly is with a 2gb
> qdisk.  The option you were told works correctly less than 50% of the time
> as both attempt to fence the other.  We set up a delay on the passive node
> so if the active node gets in trouble the qdisk switches to the passive at
> the end of the delay, and the fencing takes place.  The process takes about
> 5 minutes or so.****
>
> ** **
>
> *John A.*
>
> ** **
>
> *From:* linux-cluster-bounces at redhat.com [mailto:
> linux-cluster-bounces at redhat.com] *On Behalf Of *Daniel Bourque
> *Sent:* Thursday, April 12, 2012 7:47 AM
> *To:* linux clustering
> *Subject:* Re: [Linux-cluster] Redhat without qdisk****
>
> ** **
>
> A qdisk is just another way to maintain quorum in cluster. there is a
> special two node cluster mode designed to allow for quorum to be maintained
> by the surviving node. ****
>
> ** **
>
> <cman two_node="1" expected_votes="1">****
>
> </cman>****
>
> ** **
>
> But it's not very clear to me what happens with fencing if both nodes get
> partitioned on the network. Do they both try to fence each other off ?****
>
> ** **
>
> ** **
>
> --
> Dan****
>
> ** **
> ------------------------------
>
> *From:* linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com]
> on behalf of emmanuel segura [emi2fast at gmail.com]
> *Sent:* Thursday, April 12, 2012 9:25 AM
> *To:* linux clustering
> *Subject:* [Linux-cluster] Redhat without qdisk****
>
> Hello all List
>
> I have a big question about qdisk in a two cluster
>
> one of out clients has too many cluster in two nodes configuration and one
> RedHat technical came to us and said that there is no need to use the qdisk
>
> I would to know if this is true
>
> I really think it's a bad idea
>
> --
> esta es mi vida e me la vivo hasta que dios quiera****
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120412/2b0ca5c3/attachment.htm>

From lhh at redhat.com  Thu Apr 12 16:04:36 2012
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 12 Apr 2012 12:04:36 -0400
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
	<CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
Message-ID: <4F86FD14.2060501@redhat.com>

On 04/12/2012 11:18 AM, emmanuel segura wrote:
> That's right
>
> you'll found your cluster partitioned and if you "<cman two_node="1"
> expected_votes="1">" as redhat setting our cluster maybe you get data
> corruption

GFS2 and rgmanager depend on fencing completion prior to service 
recovery or GFS2's journal recovery.

Most two-node cluster partitions are caused by things like cable pulls. 
  What normally happens is the node that lost its network cable then 
fails to fence.

No fencing, no recovery.


> Forse the fencing problem redhat implement a work around as permanent
> solution
>
> fence delay for some cluster agents
>
> For fence_scsi in Redhat 5.X the redhat support says it's ok for
> production BAAAAA not tre

Delays simply let you figure out which host "wins" if the partition was 
caused by a temporal network outage or if the fencing devices are on a 
network that is not used by the cluster for communication.


> AND i the redhat technical tells the cluster don't require the quorum disk
> UMMMMMMMMMMMMM

Quorum (whether augmented by something like qdiskd or not) doesn't 
prevent data corruption.  Fencing does.

A node which does not have quorum can -still- write to disks, which is 
why you must fence it prior to doing anything else.

As an aside, qdiskd requires write access to storage before it can even 
make up its own mind about quorum.

-- Lon



From emi2fast at gmail.com  Thu Apr 12 16:04:47 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Thu, 12 Apr 2012 18:04:47 +0200
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <4F86F9E9.2050808@redhat.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
	<CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
	<4F86F9E9.2050808@redhat.com>
Message-ID: <CAE7pJ3BLZKpt3ieP+gv8woF=cXX9Q6pcZ5HL1u3RCczvfHdRrA@mail.gmail.com>

If don't use the qdisk who is the master in the split-brain?

Who take the resources?

What are you concerns about fence_scsi?

fence_scsi in redhat 5.X doesn't reboot the node, the fenced node neve
release the resources

Why redhat made the qdisk as Tie-breakers and some people from support say
it's one optional or some time says is not needed?



Il giorno 12 aprile 2012 17:51, Ryan O'Hara <rohara at redhat.com> ha scritto:

> On 04/12/2012 10:18 AM, emmanuel segura wrote:
>
>> That's right
>>
>> you'll found your cluster partitioned and if you "<cman two_node="1"
>> expected_votes="1">" as redhat setting our cluster maybe you get data
>> corruption
>>
>
> How? What fence agent are you using? I've used this configuration for
> years and never had data corruption.
>
>
>  Because every node can operate with one vote an rich the quorum state
>>
>> Forse the fencing problem redhat implement a work around as permanent
>> solution
>>
>> fence delay for some cluster agents
>>
>> For fence_scsi in Redhat 5.X the redhat support says it's ok for
>> production
>> BAAAAA not tre
>>
>
> What are you concerns about fence_scsi?
>
>
>  AND i the redhat technical tells the cluster don't require the quorum disk
>> UMMMMMMMMMMMMM
>>
>
> Can you explain why qdisk would be required?
>
> Ryan
>
>
>  Il giorno 12 aprile 2012 16:47, Daniel Bourque
>> <dbourque at accuweather.com>ha scritto:
>>
>>   A qdisk is just another way to maintain quorum in cluster. there is a
>>> special two node cluster mode designed to allow for quorum to be
>>> maintained
>>> by the surviving node.
>>>
>>>  <cman two_node="1" expected_votes="1">
>>> </cman>
>>>
>>>  But it's not very clear to me what happens with fencing if both nodes
>>> get partitioned on the network. Do they both try to fence each other off
>>> ?
>>>
>>>
>>> --
>>> Dan
>>>
>>>     ------------------------------
>>> *From:* linux-cluster-bounces at redhat.**com<linux-cluster-bounces at redhat.com>[
>>> linux-cluster-bounces at redhat.**com <linux-cluster-bounces at redhat.com>]
>>>
>>> on behalf of emmanuel segura [emi2fast at gmail.com]
>>> *Sent:* Thursday, April 12, 2012 9:25 AM
>>> *To:* linux clustering
>>> *Subject:* [Linux-cluster] Redhat without qdisk
>>>
>>>
>>>  Hello all List
>>>
>>> I have a big question about qdisk in a two cluster
>>>
>>> one of out clients has too many cluster in two nodes configuration and
>>> one
>>> RedHat technical came to us and said that there is no need to use the
>>> qdisk
>>>
>>> I would to know if this is true
>>>
>>> I really think it's a bad idea
>>>
>>> --
>>> esta es mi vida e me la vivo hasta que dios quiera
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/**mailman/listinfo/linux-cluster<https://www.redhat.com/mailman/listinfo/linux-cluster>
>>>
>>>
>>
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/**mailman/listinfo/linux-cluster<https://www.redhat.com/mailman/listinfo/linux-cluster>
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/**mailman/listinfo/linux-cluster<https://www.redhat.com/mailman/listinfo/linux-cluster>
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120412/d0907d4b/attachment.htm>

From lists at alteeve.ca  Thu Apr 12 16:18:42 2012
From: lists at alteeve.ca (Digimer)
Date: Thu, 12 Apr 2012 12:18:42 -0400
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <4F86F9E9.2050808@redhat.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
	<CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
	<4F86F9E9.2050808@redhat.com>
Message-ID: <4F870062.9030005@alteeve.ca>

On 04/12/2012 11:51 AM, Ryan O'Hara wrote:
> On 04/12/2012 10:18 AM, emmanuel segura wrote:
>> That's right
>>
>> you'll found your cluster partitioned and if you "<cman two_node="1"
>> expected_votes="1">" as redhat setting our cluster maybe you get data
>> corruption
> 
> How? What fence agent are you using? I've used this configuration for
> years and never had data corruption.

Same. I build almost exclusively two-node clusters without qdisk and
they are perfectly stable. The trick is to have your fencing setup well
and tested. Personally, I recommend IPMI/iLO/DRAC/RSA as the first fence
device with a switched PDU as a backup. The 'delay' option will help
ensure only one node dies in a split condition.

>> Because every node can operate with one vote an rich the quorum state
>>
>> Forse the fencing problem redhat implement a work around as permanent
>> solution
>>
>> fence delay for some cluster agents
>>
>> For fence_scsi in Redhat 5.X the redhat support says it's ok for
>> production
>> BAAAAA not tre
> 
> What are you concerns about fence_scsi?
> 
>> AND i the redhat technical tells the cluster don't require the quorum
>> disk
>> UMMMMMMMMMMMMM
> 
> Can you explain why qdisk would be required?
> 
> Ryan

It's true that a qdisk helps a bit, and it is true that without it
quorum is effectively disabled. However, services will never start on
both nodes so long as you've configured the cluster to never make
assumptions about the other node and that you have proper fencing in place.

The reason is that, when a partition occurs, both nodes will trigger
fenced. In turn, fenced informs DLM which stops providing locks. This
effectively blocks the cluster as gfs2, clvmd and rgmanager all use
locks. Once one of the fence methods succeeds, DLM is again informed and
it resumes providing locks. Of course, at this point, the other node is
dead so the remaining node can be confident it is the only one providing
services.

Hope that helps expand on what Ryan and the others are saying.

-- 
Digimer
Papers and Projects: https://alteeve.com



From lists at alteeve.ca  Thu Apr 12 16:25:16 2012
From: lists at alteeve.ca (Digimer)
Date: Thu, 12 Apr 2012 12:25:16 -0400
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <CAE7pJ3BLZKpt3ieP+gv8woF=cXX9Q6pcZ5HL1u3RCczvfHdRrA@mail.gmail.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
	<CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
	<4F86F9E9.2050808@redhat.com>
	<CAE7pJ3BLZKpt3ieP+gv8woF=cXX9Q6pcZ5HL1u3RCczvfHdRrA@mail.gmail.com>
Message-ID: <4F8701EC.6040203@alteeve.ca>

On 04/12/2012 12:04 PM, emmanuel segura wrote:
> If don't use the qdisk who is the master in the split-brain?
> 
> Who take the resources?
> 
> What are you concerns about fence_scsi?
> 
> fence_scsi in redhat 5.X doesn't reboot the node, the fenced node neve
> release the resources
> 
> Why redhat made the qdisk as Tie-breakers and some people from support
> say it's one optional or some time says is not needed?

I could see an argument for qdisk with fabric fencing. However, with
power fencing, the node to take resources is the fastest node. The
slower node will be powered off before the survivor recovers services.

-- 
Digimer
Papers and Projects: https://alteeve.com



From lhh at redhat.com  Thu Apr 12 16:31:44 2012
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 12 Apr 2012 12:31:44 -0400
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <CAE7pJ3BLZKpt3ieP+gv8woF=cXX9Q6pcZ5HL1u3RCczvfHdRrA@mail.gmail.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
	<CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
	<4F86F9E9.2050808@redhat.com>
	<CAE7pJ3BLZKpt3ieP+gv8woF=cXX9Q6pcZ5HL1u3RCczvfHdRrA@mail.gmail.com>
Message-ID: <4F870370.5030803@redhat.com>

On 04/12/2012 12:04 PM, emmanuel segura wrote:
> If don't use the qdisk who is the master in the split-brain?

Remember, DLM/rgmanager recovery is performed -after- fencing.  In a two 
node cluster:

with no qdisk (two_node="1"):
- by default, both nodes go to fence at the same time :(
+ fencing delay helps administrator predetermine which node wins
   + combine with failover domain rules for a bonus

with qdiskd:
+ (given proper heuristics or master_wins) only one node fences
- with master_wins, no method to predetermine who "wins"
- incorrect configuration is no better than two_node="1": both
   nodes will fence at the same time

Qdiskd works by simply adjusting quorum: Fencing requires quorum, so by 
taking it away from a host, you can prevent that host from fencing.

The problem with qdiskd is that the 'configuration' bit has historically 
been significantly harder than it needed to be (hidsight...).

In STABLE32, this has been alleviated quite a lot, though - it pretty 
much configures itself.  For two node clusters using STABLE32:

   <quorumd label="foo" />


> fence_scsi in redhat 5.X doesn't reboot the node, the fenced node neve
> release the resources

More or less.


> Why redhat made the qdisk as Tie-breakers and some people from support
> say it's one optional or some time says is not needed?

It is optional and is often not needed.  It was developed really for two 
purposes:

- to help resolve fencing races (which can be resolved using delays or 
other tactics)

- to allow 'last-man-standing' in >2-node clusters.

With qdiskd you can go from 4 to 1 node (given properly configured 
heuristics).  The other 3 nodes then, because heuristics fail, can't 
"gang up" (by forming a quorum) on the surviving node and take over - 
this means your critical service stays running and available.  The 
problem is that, in practice, the "last node" is rarely able to handle 
the workload.

This behavior is obviated by features in corosync 2.0, which gives 
administrators the ability to state that a -new- quorum can only form if 
all members are present (but joining an existing quorum is always allowed).

-- Lon



From list at nexusnebula.net  Thu Apr 12 20:38:11 2012
From: list at nexusnebula.net (Benjamin Kingston)
Date: Thu, 12 Apr 2012 13:38:11 -0700
Subject: [Linux-cluster] Node can't rejoin after failure if iptables is
	enabled
Message-ID: <CAHA1JW+L87SJizZhUTmEWh29GUxgwXVY62eZej5ac-Lj7rHyOw@mail.gmail.com>

Hey guys, hopefully someone can help me here.

If I start both hosts up at the same time, they join just fine, but if I
reboot one, I have to disable iptables on both hosts

Here's the IP tables I have going for now after trying some troubleshooting:

# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -i admin -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -i local -s 10.254.99.4 -m state --state NEW -m tcp -p tcp --dport
22 -j ACCEPT
-A INPUT -i local -s 224.0.0.0/4 -m state --state NEW -j ACCEPT
-A INPUT -i local -s 239.192.86.2 -m state --state NEW -j ACCEPT
-A INPUT -i local -s 10.254.99.4 -m state --state NEW -m multiport -p udp
--dports 5404,5405 -j ACCEPT
-A INPUT -i local -s 10.254.99.4 -m state --state NEW -m tcp -p tcp --dport
11111 -j ACCEPT
-A INPUT -i local -s 10.254.99.4 -m state --state NEW -m tcp -p tcp --dport
16851 -j ACCEPT
-A INPUT -i local -s 10.254.99.4 -m state --state NEW -m tcp -p tcp --dport
8084 -j ACCEPT
-A INPUT -i local -s 10.254.99.4 -m state --state NEW -m tcp -p tcp --dport
21064 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

Of caurse the source IPs  are changed on the other node. Am I missing
anything or does something look too restrictive? I appreciate the help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120412/122cef06/attachment.htm>

From schlegel at riege.com  Fri Apr 13 09:20:27 2012
From: schlegel at riege.com (Gunther Schlegel)
Date: Fri, 13 Apr 2012 11:20:27 +0200
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <4F870370.5030803@redhat.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
	<CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
	<4F86F9E9.2050808@redhat.com>
	<CAE7pJ3BLZKpt3ieP+gv8woF=cXX9Q6pcZ5HL1u3RCczvfHdRrA@mail.gmail.com>
	<4F870370.5030803@redhat.com>
Message-ID: <EB7B943F-D673-436D-97B4-BE8A985F591E@riege.com>

Hi Lon,

>> Why redhat made the qdisk as Tie-breakers and some people from support
>> say it's one optional or some time says is not needed?
> 
> It is optional and is often not needed.  It was developed really for two purposes:
> 
> - to help resolve fencing races (which can be resolved using delays or other tactics)
> 
> - to allow 'last-man-standing' in >2-node clusters.
> 
> With qdiskd you can go from 4 to 1 node (given properly configured heuristics).  The other 3 nodes then, because heuristics fail, can't "gang up" (by forming a quorum) on the surviving node and take over - this means your critical service stays running and available.  The problem is that, in practice, the "last node" is rarely able to handle the workload.
> 
> This behavior is obviated by features in corosync 2.0, which gives administrators the ability to state that a -new- quorum can only form if all members are present (but joining an existing quorum is always allowed).


Is this in RHEL6? I am still trying to solve the following situation:

- 2 node cluster without need for shared storage (no gfs)
- qdiskd in place because of the heuristics. 
- Cluster is fine if both nodes have network communication and heuristics reach the minimum score. 

Problem: if the shared storage the qdisk resides on becomes unavailable (but everything else is fine) a node will be fenced. It actually happens at the time the shared storage comes back online, the node re-establishing the storage link first wins and fences the other one. I try to mitigate that with loooong timeout settings, but therefore a necessary cluster switch eviction is also delayed. 

I would really appreciate if the qdiskd would withdraw it's quorum vote and not do any fencing at all. The cluster would survive as quorum is also gathered if the cluster network connection is established. 

best regards, Gunther


Gunther Schlegel
Head of IT Infrastructure


-- 


.............................................................
Riege Software International GmbH  Phone: +49 2159 91480
Mollsfeld 10                       Fax: +49 2159 914811
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel at riege.com
--                                 --
Commercial Register:               Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
VAT Reg No.: DE120585842           Gabriele  Riege
                                   Johannes  Riege
                                   Tobias    Riege
.............................................................
           YOU CARE FOR FREIGHT, WE CARE FOR YOU          






From emi2fast at gmail.com  Fri Apr 13 11:20:46 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Fri, 13 Apr 2012 13:20:46 +0200
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <EB7B943F-D673-436D-97B4-BE8A985F591E@riege.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
	<CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
	<4F86F9E9.2050808@redhat.com>
	<CAE7pJ3BLZKpt3ieP+gv8woF=cXX9Q6pcZ5HL1u3RCczvfHdRrA@mail.gmail.com>
	<4F870370.5030803@redhat.com>
	<EB7B943F-D673-436D-97B4-BE8A985F591E@riege.com>
Message-ID: <CAE7pJ3D_-kpMTcRUCSk2jmYyZfs49bS808hwS81NOjM_Bs+QSg@mail.gmail.com>

But if you have problem with your storage it's normale the node goes
fenced, because your cluster services depends on storage

Remember the storage it's a critical component of a cluster

Or maybe you wold like to have a cluster running without san disk



Il giorno 13 aprile 2012 11:20, Gunther Schlegel <schlegel at riege.com> ha
scritto:

> Hi Lon,
>
> >> Why redhat made the qdisk as Tie-breakers and some people from support
> >> say it's one optional or some time says is not needed?
> >
> > It is optional and is often not needed.  It was developed really for two
> purposes:
> >
> > - to help resolve fencing races (which can be resolved using delays or
> other tactics)
> >
> > - to allow 'last-man-standing' in >2-node clusters.
> >
> > With qdiskd you can go from 4 to 1 node (given properly configured
> heuristics).  The other 3 nodes then, because heuristics fail, can't "gang
> up" (by forming a quorum) on the surviving node and take over - this means
> your critical service stays running and available.  The problem is that, in
> practice, the "last node" is rarely able to handle the workload.
> >
> > This behavior is obviated by features in corosync 2.0, which gives
> administrators the ability to state that a -new- quorum can only form if
> all members are present (but joining an existing quorum is always allowed).
>
>
> Is this in RHEL6? I am still trying to solve the following situation:
>
> - 2 node cluster without need for shared storage (no gfs)
> - qdiskd in place because of the heuristics.
> - Cluster is fine if both nodes have network communication and heuristics
> reach the minimum score.
>
> Problem: if the shared storage the qdisk resides on becomes unavailable
> (but everything else is fine) a node will be fenced. It actually happens at
> the time the shared storage comes back online, the node re-establishing the
> storage link first wins and fences the other one. I try to mitigate that
> with loooong timeout settings, but therefore a necessary cluster switch
> eviction is also delayed.
>
> I would really appreciate if the qdiskd would withdraw it's quorum vote
> and not do any fencing at all. The cluster would survive as quorum is also
> gathered if the cluster network connection is established.
>
> best regards, Gunther
>
>
> Gunther Schlegel
> Head of IT Infrastructure
>
>
> --
>
>
> .............................................................
> Riege Software International GmbH  Phone: +49 2159 91480
> Mollsfeld 10                       Fax: +49 2159 914811
> 40670 Meerbusch                    Web: www.riege.com
> Germany                            E-Mail: schlegel at riege.com
> --                                 --
> Commercial Register:               Managing Directors:
> Amtsgericht Neuss HRB-NR 4207      Christian Riege
> VAT Reg No.: DE120585842           Gabriele  Riege
>                                   Johannes  Riege
>                                   Tobias    Riege
> .............................................................
>           YOU CARE FOR FREIGHT, WE CARE FOR YOU
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120413/38e96c7e/attachment.htm>

From nicolas at ecarnot.net  Fri Apr 13 20:23:21 2012
From: nicolas at ecarnot.net (Nicolas Ecarnot)
Date: Fri, 13 Apr 2012 22:23:21 +0200
Subject: [Linux-cluster] LVM clustered snapshots support
Message-ID: <4F888B39.6080009@ecarnot.net>

Hi,

Before finishing our migration from windows 2003 storage servers to 
clustered samba, we hoped to provide our users the VSS feature.
It is working fine on a single server, but trying to create a snapshot 
with clvm (clustered LV) leads to the display of the message telling 
that clustered snapshots are not (yet) supported.

Following the LVM2 release notes, I see that since a big couple of 
months, a workaround has been implemented, and I was wondering if one of 
you has already used it in production?
In that case, I would be glad to hear from you to discuss about which 
better way you upgraded your LVM layer.

Thanks.

-- 
Nicolas Ecarnot



From massimo.mastrilli at cadland.it  Fri Apr 13 21:13:57 2012
From: massimo.mastrilli at cadland.it (massimo.mastrilli at cadland.it)
Date: Fri, 13 Apr 2012 23:13:57 +0200
Subject: [Linux-cluster] =?iso-8859-1?q?AUTO=3A_Massimo_Mastrilli_=E8_asse?=
	=?iso-8859-1?q?nte_dall=27ufficio?=
Message-ID: <OF61531676.C40496CA-ONC12579DF.0074A236-C12579DF.0074A237@cadland.it>


Sono fuori dall'ufficio da Ven 13/04/2012 fino a Sab 14/04/2012.




Nota: Questa ? una risposta automatizzata al messaggio  "Re:
[Linux-cluster] Redhat without qdisk" inviata il 13/04/2012 11.20.27.

Questa ? l'unica notifica che verr? ricevuta mentre la persona ? assente.




From nivek.cao at gmail.com  Sun Apr 15 04:30:42 2012
From: nivek.cao at gmail.com (nivek.cao at gmail.com)
Date: Sun, 15 Apr 2012 12:30:42 +0800
Subject: [Linux-cluster] does RHCS support IBM db2 v9.7 DPF
Message-ID: <CADRy922qkc=ttFEM9uemBJ0hvm2GNCOBLd+Fh=d60YLW-YdZtA@mail.gmail.com>

Dear All,

I find the list of services that RHCS supports in this document
Red_Hat_Enterprise_Linux-5-Cluster_Administration-en-US.pdf

Red Hat Cluster supports the following HA services:

    Apache
    Application (Script)
    LVM (HA LVM)
    MySQL
    NFS
    Open LDAP
    Oracle
    PostgreSQL 8
    Samba
    Note
    Red Hat Enterprise Linux 5 does not support running Clustered Samba in
an active/active configuration, in which Samba serves the same shared file
system from multiple nodes. Red Hat Enterprise Linux 5 does support running
Samba in a cluster in active/passive mode, with failover from one node to
the other nodes in a cluster. Note that if failover occurs, locking states
are lost and active connections to Samba are severed so that the clients
must reconnect.
    SAP
    Tomcat 5

so I can't deploy a db2 DPF database in RHCS environment?
what else ha software under linux can I choose?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120415/7af1b327/attachment.htm>

From list at fajar.net  Sun Apr 15 04:53:56 2012
From: list at fajar.net (Fajar A. Nugraha)
Date: Sun, 15 Apr 2012 11:53:56 +0700
Subject: [Linux-cluster] does RHCS support IBM db2 v9.7 DPF
In-Reply-To: <CADRy922qkc=ttFEM9uemBJ0hvm2GNCOBLd+Fh=d60YLW-YdZtA@mail.gmail.com>
References: <CADRy922qkc=ttFEM9uemBJ0hvm2GNCOBLd+Fh=d60YLW-YdZtA@mail.gmail.com>
Message-ID: <CAG1y0seba=MbwUx5_DA4MNwCGwW1_e51j8rovVo=LwWKmDKtjA@mail.gmail.com>

On Sun, Apr 15, 2012 at 11:30 AM, nivek.cao at gmail.com
<nivek.cao at gmail.com> wrote:
> Red Hat Cluster supports the following HA services:

> so I can't deploy a db2 DPF database in RHCS environment?

Supported basically means:
- the required software is available
- it's tested
- If you purchase support contract, you can file support ticket if
something goes wrong.

You shoudl be able to add any generic service IF:
- you know how to start it using a script or able to write one
- you know how to monitor the service state (e.g. up/down) using a
script or able to write one
- you know how to use RHCS for generic service (IIRC this one's
available in the documentation)

> what else ha software under linux can I choose?

Ask Google?
There's Linux-HA: http://linux-ha.org/wiki/Resource_agents

-- 
Fajar



From kailash.kumawat at rudrainfotainment.com  Sun Apr 15 17:59:02 2012
From: kailash.kumawat at rudrainfotainment.com (kailash kumawat)
Date: Sun, 15 Apr 2012 23:29:02 +0530
Subject: [Linux-cluster] Step by Step installation process of Clustering in
	Linux
Message-ID: <CALO-jX6u+GcizB2EtYDhVxptHE5G7khFE8TEuPJXorSJ-odPiw@mail.gmail.com>

Hi

Can you send me the step by step installation and configuration process of
Clustering in rhel5

-- 
Regards
*Kailash Kumawat*
System Admin
09167396313
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120415/048dcd29/attachment.htm>

From lists at alteeve.ca  Sun Apr 15 18:21:45 2012
From: lists at alteeve.ca (Digimer)
Date: Sun, 15 Apr 2012 14:21:45 -0400
Subject: [Linux-cluster] Step by Step installation process of Clustering
 in Linux
In-Reply-To: <CALO-jX6u+GcizB2EtYDhVxptHE5G7khFE8TEuPJXorSJ-odPiw@mail.gmail.com>
References: <CALO-jX6u+GcizB2EtYDhVxptHE5G7khFE8TEuPJXorSJ-odPiw@mail.gmail.com>
Message-ID: <4F8B11B9.60002@alteeve.ca>

On 04/15/2012 01:59 PM, kailash kumawat wrote:
> Hi
> 
> Can you send me the step by step installation and configuration process
> of Clustering in rhel5
> 
> -- 
> Regards
> *Kailash Kumawat*
> System Admin
> 09167396313

There are many, many types of clusters out there. What kind of cluster
are you trying to build? Also, if this is a new project, why not use
RHEL 6? There have been many, many improvements plus the support for
RHEL 6 will go on for much longer.

That said; I have two tutorials for building VH clusters. An old one
using Xen on RHEL 5, which I don't recommend for modern MS Windows guests:

https://alteeve.com/w/Red_Hat_Cluster_Service_2_Tutorial

If you can switch up to RHEL 6, I have a much newer, much better
tutorial for a VM cluster using KVM, which also adds redundant
networking and works very well with all MS Windows guests;

https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial

HTH

-- 
Digimer
Papers and Projects: https://alteeve.com



From rmitchel at redhat.com  Sun Apr 15 23:23:29 2012
From: rmitchel at redhat.com (Ryan Mitchell)
Date: Mon, 16 Apr 2012 09:23:29 +1000
Subject: [Linux-cluster] clurgmgrd : <notice> relocating a service to
 better node
In-Reply-To: <CAKrd5325z7NoA=T1bAYmnXomKv0cPOhRGCD_F_Bt1--J2nfshg@mail.gmail.com>
References: <CAKrd533Omvc1wFKViKYfSXax1XWi2jy7hhKN18XEKggOZSe6-A@mail.gmail.com>	<4F8522FD.60906@alteeve.ca>
	<CAKrd5325z7NoA=T1bAYmnXomKv0cPOhRGCD_F_Bt1--J2nfshg@mail.gmail.com>
Message-ID: <4F8B5871.9030903@redhat.com>

On 04/11/2012 04:31 PM, Parvez Shaikh wrote:
> cluster.conf file -
>
>     ....
>     <failoverdomains>
>     <failoverdomain name="my_domain" nofailback="1" ordered="1"
>     restricted="1">
>     <failoverdomainnode name="my_blade2.my_domain" priority="2"/>
>     <failoverdomainnode name="my_blade1.my_domain" priority="1"/>
>     </failoverdomain>
>     </failoverdomains>
>
You have an "ordered" failoverdomain and you have priorities on your 
nodes.  According to the setting above, your preferred node is 
"my_blade1.my_domain".  That means that if the service is running on 
"my_blade2" (2nd-preferred) and "my_blade1" (1st-preferred) transitions 
to available, the service will relocate back onto that node.

More info is available on: man 8 clurgmgrd:
---
        ordered domain : The order specified in the configuration 
dictates the order of preference  of  members  within the  domain.  The  
highest-ranking member of the domain will run the service whenever it is 
online.  This means that if member A has a higher rank than member B, 
the service will migrate to A if it was running  on  B  if  A 
transitions from offline to online.
---

You can stop this behavior by either disabling "ordered" for the 
"my_domain" failover domain or by giving both nodes the same priority in 
their failover domain entries.

However, you also seem to have "nofailback=1" set, which should prevent 
the behavior you describe.  Again from "man 8 clurgmgrd":
---
        nofailback : Enabling this option for an ordered failover domain 
will prevent automated fail-back after a more-preferred node rejoins the 
cluster. Consequently, nofailback requires an ordered domain in order to 
be meaning-ful.  When nofailback is used, the following two behaviors 
should be noted:
         * If a subset of cluster nodes forms a quorum, the node with 
the highest priority in the  failover  domain  is selected  to run a 
service bound to the domain. After this point, a higher priority member 
joining the cluster will not trigger a relocation.
         * When a service is running outside of its unrestricted 
failover domain and a cluster member boots which is  a part  of the 
service's failover domain, the service will relocate to that member. 
That is, nofailback does not prevent transitions from outside of a 
failover domain to inside a failover domain. After this point, a  higher 
priority member joining the cluster will not trigger a relocation.
---

The second point does not apply to your configuration because both nodes 
are in the failover domain.

Can we see some message logs of an event when the service fails over?

Ryan Mitchell
Red Hat Global Support Services
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120416/fe388517/attachment.htm>

From schlegel at riege.com  Mon Apr 16 15:15:52 2012
From: schlegel at riege.com (Gunther Schlegel)
Date: Mon, 16 Apr 2012 17:15:52 +0200
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <CAE7pJ3D_-kpMTcRUCSk2jmYyZfs49bS808hwS81NOjM_Bs+QSg@mail.gmail.com>
References: <CAE7pJ3DwvwiJvXfOKsdJ0dvJhqhQkq7NOdL1u-v9TS=tu08CcA@mail.gmail.com>
	<9B922D75F5EA6A43AB899119482AB5E82107300D@exch-db02.accu.accuwx.com>
	<CAE7pJ3AuEr8J5HOS=O48KLB7ECW+xWFgg8_g9zY5C4w8qR2UkA@mail.gmail.com>
	<4F86F9E9.2050808@redhat.com>
	<CAE7pJ3BLZKpt3ieP+gv8woF=cXX9Q6pcZ5HL1u3RCczvfHdRrA@mail.gmail.com>
	<4F870370.5030803@redhat.com>
	<EB7B943F-D673-436D-97B4-BE8A985F591E@riege.com>
	<CAE7pJ3D_-kpMTcRUCSk2jmYyZfs49bS808hwS81NOjM_Bs+QSg@mail.gmail.com>
Message-ID: <4F8C37A8.6040702@riege.com>

Hi,

> But if you have problem with your storage it's normale the node goes
> fenced, because your cluster services depends on storage

Well, no, my clustered services do not depend on SAN storage.

> Or maybe you wold like to have a cluster running without san disk

I need the qdisk for two reasons:

- heuristics
- to safely achieve quorum in a two-node-cluster if only one node is up.

regards, Gunther

> Il giorno 13 aprile 2012 11:20, Gunther Schlegel <schlegel at riege.com
> <mailto:schlegel at riege.com>> ha scritto:
>
>     Hi Lon,
>
>     > > Why redhat made the qdisk as Tie-breakers and some people from
>     support
>     > > say it's one optional or some time says is not needed?
>     >
>     >  It is optional and is often not needed.  It was developed really
>     for two purposes:
>     >
>     >  - to help resolve fencing races (which can be resolved using
>     delays or other tactics)
>     >
>     >  - to allow 'last-man-standing' in >2-node clusters.
>     >
>     >  With qdiskd you can go from 4 to 1 node (given properly configured
>     heuristics).  The other 3 nodes then, because heuristics fail, can't
>     "gang up" (by forming a quorum) on the surviving node and take over
>     - this means your critical service stays running and available.  The
>     problem is that, in practice, the "last node" is rarely able to
>     handle the workload.
>     >
>     >  This behavior is obviated by features in corosync 2.0, which gives
>     administrators the ability to state that a -new- quorum can only
>     form if all members are present (but joining an existing quorum is
>     always allowed).
>
>
>     Is this in RHEL6? I am still trying to solve the following situation:
>
>     - 2 node cluster without need for shared storage (no gfs)
>     - qdiskd in place because of the heuristics.
>     - Cluster is fine if both nodes have network communication and
>     heuristics reach the minimum score.
>
>     Problem: if the shared storage the qdisk resides on becomes
>     unavailable (but everything else is fine) a node will be fenced. It
>     actually happens at the time the shared storage comes back online,
>     the node re-establishing the storage link first wins and fences the
>     other one. I try to mitigate that with loooong timeout settings, but
>     therefore a necessary cluster switch eviction is also delayed.
>
>     I would really appreciate if the qdiskd would withdraw it's quorum
>     vote and not do any fencing at all. The cluster would survive as
>     quorum is also gathered if the cluster network connection is
>     established.
>
>     best regards, Gunther
>
>
>     Gunther Schlegel
>     Head of IT Infrastructure
>
>
>     --
>
>
>     .............................................................
>     Riege Software International GmbH  Phone: +49 2159 91480
>     <tel:%2B49%202159%2091480>
>     Mollsfeld 10                       Fax: +49 2159 914811
>     <tel:%2B49%202159%20914811>
>     40670 Meerbusch                    Web: www.riege.com
>     <http://www.riege.com>
>     Germany                            E-Mail: schlegel at riege.com
>     <mailto:schlegel at riege.com>
>     --                                 --
>     Commercial Register:               Managing Directors:
>     Amtsgericht Neuss HRB-NR 4207      Christian Riege
>     VAT Reg No.: DE120585842           Gabriele  Riege
>                                        Johannes  Riege
>                                        Tobias    Riege
>     .............................................................
>                YOU CARE FOR FREIGHT, WE CARE FOR YOU
>
>
>
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gunther Schlegel
Head of IT Infrastructure




.............................................................
Riege Software International GmbH  Phone: +49 2159 91480
Mollsfeld 10                       Fax: +49 2159 914811
40670 Meerbusch                    Web: www.riege.com
Germany                            E-Mail: schlegel at riege.com
--                                 --
Commercial Register:               Managing Directors:
Amtsgericht Neuss HRB-NR 4207      Christian Riege
VAT Reg No.: DE120585842           Gabriele  Riege
                                   Johannes  Riege
                                   Tobias    Riege
.............................................................
           YOU CARE FOR FREIGHT, WE CARE FOR YOU          



-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 346 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120416/1073a7ce/attachment.vcf>

From andrew at beekhof.net  Tue Apr 17 01:02:58 2012
From: andrew at beekhof.net (Andrew Beekhof)
Date: Tue, 17 Apr 2012 11:02:58 +1000
Subject: [Linux-cluster] Fwd: [Linux-HA] fence_nut fencing agent - use NUT
	to fence via UPS
In-Reply-To: <4F889FCD.9090706@nevis.columbia.edu>
References: <4F4FFA1E.6010900@nevis.columbia.edu>
	<4F889FCD.9090706@nevis.columbia.edu>
Message-ID: <CAEDLWG32rNfv1NORja=uf+V+2JAjXaYOYpYH=Ft1RwGBoSDspQ@mail.gmail.com>

William has updated the fence_nut agent, maybe someone wants to review
the changes?


---------- Forwarded message ----------
From: William Seligman <seligman at nevis.columbia.edu>
Date: Sat, Apr 14, 2012 at 7:51 AM
Subject: Re: [Linux-HA] fence_nut fencing agent - use NUT to fence via UPS
To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>


On 3/1/12 5:37 PM, William Seligman wrote:
> After days spent debugging a fencing issue with my cluster, I know for certain
> that this fencing agent works, at least for me. I'd like to contribute it to the
> Linux HA community.
>
> In my cluster, the fencing mechanism is to use NUT (Network UPS Tools;
> <http://www.networkupstools.org/> to turn off power to a node. About 1.5 years
> ago, I contributed a NUT-based fencing agent for Pacemaker 1.0:
>
> <http://oss.clusterlabs.org/pipermail/pacemaker/2010-August/007347.html>
>
> That script doesn't work for stonith-ng. So here's a new agent, written in perl,
> and tested under pacemaker-1.1.6 and nut-2.4.3.
>
> I know there's a fence_apc_snmp agent that already in resource-agents. However,
> that agent only works with APC devices with multiple outlet control; it displays
> an error messages when used with my UPSes. This script is for those who'd rather
> use NUT than play with SNMP MIBs.

I've made some improvements to the NUT-based fencing agent I contributed before.
The changes are:

- A more rigorous approach to the error codes returned by the agent.

- Added options to delay the times between issuing a poweron/poweroff command
and verifying that the UPS responds.

The revised fence_nut agent is at <http://pastebin.com/sQdqWKQq>.

--
Bill Seligman ? ? ? ? ? ? | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman at nevis.columbia.edu
PO Box 137 ? ? ? ? ? ? ? ?|
Irvington NY 10533 USA ? ?| http://www.nevis.columbia.edu/~seligman/


_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



From kevin.g1.wang at nsn.com  Thu Apr 19 05:14:38 2012
From: kevin.g1.wang at nsn.com (Wang, Kevin G1. (NSN - CN/Cheng Du))
Date: Thu, 19 Apr 2012 13:14:38 +0800
Subject: [Linux-cluster] Script failed to run on RHCS,
	but it is successful on manually
Message-ID: <44A5364BC1FA1E42B1F133529EC2C729020C34EC@CNBEEXC007.nsn-intra.net>

HI,

tomcat_agent script failed to run when RHCS started. But I can run it
successfully on manually. could you please check my script and tell me
what problem?Below is cluster configuration and script:
[root at db05 init.d]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="4" name="NLS_Test">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="db05" nodeid="1" votes="1">
                        <fence>
                                <method name="1"/>
                        </fence>
                </clusternode>
                <clusternode name="db07" nodeid="2" votes="1">
                        <fence>
                                <method name="1"/>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" auth="none"
ipaddr="10.69.128.25" login="test" name="ilo_db05"
passwd="Administrator"/>
                <fencedevice agent="fence_ipmilan" auth="none"
ipaddr="10.69.128.27" login="test" name="ilo_db07"
passwd="Administrator"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="ALLFOD" ordered="1"
restricted="1">
                                <failoverdomainnode name="db05"
priority="3"/>
                                <failoverdomainnode name="db07"
priority="4"/>
                        </failoverdomain>
                        <failoverdomain name="ODDFOD" ordered="1"
restricted="1">
                                <failoverdomainnode name="db05"
priority="3"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <script file="/etc/init.d/tomcat_agent"
name="tomcat_agent"/>

                </resources>

                <service autostart="1" domain="allFOD" name="tomcat"
recovery="restart">
                        <ip address="198.18.27.125/24"
monitor_link="1"/>
                        <fs device="/dev/mapper/nls_testp2"
force_fsck="1" force_unmount="0" fstype="ext3"
mountpoint="/opt/nls/float/tomcat" name="tomcat" options=""
self_fence="0"/>
                        <script ref="tomcat_agent"/>
                </service>

        </rm>
</cluster>
[root at db05 init.d]# cat tomcat_agent
#!/bin/bash
# file: tomcat_agent
# desc: Tomcat service agent, invoked by RHCS

source /etc/init.d/core_agent

TOMCAT_DIR=`ls ${_tomcat_home} | grep tomcat`
TOMCAT_BIN_DIR="${_tomcat_home}/${TOMCAT_DIR}/bin"

RETVAL=1
TOMCAT_STOP="./shutdown.sh"
TOMCAT_START="./startup.sh"
status() {
echo "status test" >> /tmp/wxg.txt
    #TODO: to monitor the port or whatever else in drop1a
    ps aux | grep -v grep | grep ${TOMCAT_DIR} 2>&1 > /dev/null
    return $?
}


start() {
echo "start" >>/tmp/wxg.txt
    sudo -i -u nls sh -c
"/opt/nls/float/tomcat/apache-tomcat-6.0.33/bin/startup.sh" 2>&1 >
/dev/null
    sleep 3
    status
        return $?
}

stop() {
echo "stop" >>/tmp/wxg.txt
sudo -i -u nls sh -c
"/opt/nls/float/tomcat/apache-tomcat-6.0.33/bin/shutdown.sh" 2>&1 >
/dev/null
    sleep 3 
    status
    if [ $? -ne 0 ]; then return 0; fi
}


case "$1" in
        start)
                start
                RETVAL=$?
        ;;
        stop)
                stop
                RETVAL=$?
        ;;
        status)
                status
                RETVAL=$?
        ;;
        restart)
                echo $1
                stop
                start
                RETVAL=$?
        ;;
        *)
               echo $1
                logger "Usage: $0 {start|stop|status|restart}"
                RETVAL=2
        ;;
esac
exit ${RETVAL}

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120419/14184dd1/attachment.htm>

From mgrac at redhat.com  Thu Apr 19 07:04:39 2012
From: mgrac at redhat.com (Marek Grac)
Date: Thu, 19 Apr 2012 09:04:39 +0200
Subject: [Linux-cluster] Script failed to run on RHCS,
 but it is successful on manually
In-Reply-To: <44A5364BC1FA1E42B1F133529EC2C729020C34EC@CNBEEXC007.nsn-intra.net>
References: <44A5364BC1FA1E42B1F133529EC2C729020C34EC@CNBEEXC007.nsn-intra.net>
Message-ID: <4F8FB907.40508@redhat.com>

On 04/19/2012 07:14 AM, Wang, Kevin G1. (NSN - CN/Cheng Du) wrote:
>
> HI,
>
> tomcat_agentscript failed to run when RHCS started.ButIcan run it 
> successfully on manually. could you please check my script and tell me 
> what
>
> problem?Below isclusterconfiguration and script:
>
Is there a reason why you  don't want to use prepared resource agent for 
tomcat5 or tomcat6 ?

m,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120419/799f2aad/attachment.htm>

From kevin.g1.wang at nsn.com  Thu Apr 19 07:52:36 2012
From: kevin.g1.wang at nsn.com (Wang, Kevin G1. (NSN - CN/Cheng Du))
Date: Thu, 19 Apr 2012 15:52:36 +0800
Subject: [Linux-cluster] Script failed to run on RHCS,
	but it is successful on manually
In-Reply-To: <4F8FB907.40508@redhat.com>
References: <44A5364BC1FA1E42B1F133529EC2C729020C34EC@CNBEEXC007.nsn-intra.net>
	<4F8FB907.40508@redhat.com>
Message-ID: <44A5364BC1FA1E42B1F133529EC2C729020F834D@CNBEEXC007.nsn-intra.net>

HI,Grac

 

Where is prepared resource agent? Can you send me one ? Thanks

 

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of ext Marek Grac
Sent: Thursday, April 19, 2012 3:05 PM
To: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] Script failed to run on RHCS, but it is
successful on manually

 

On 04/19/2012 07:14 AM, Wang, Kevin G1. (NSN - CN/Cheng Du) wrote: 

HI,

tomcat_agent script failed to run when RHCS started. But I can run it
successfully on manually. could you please check my script and tell me
what 

	problem?Below is cluster configuration and script:

Is there a reason why you  don't want to use prepared resource agent for
tomcat5 or tomcat6 ? 

m,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120419/43c25c89/attachment.htm>

From mgrac at redhat.com  Thu Apr 19 08:04:27 2012
From: mgrac at redhat.com (Marek Grac)
Date: Thu, 19 Apr 2012 10:04:27 +0200
Subject: [Linux-cluster] Script failed to run on RHCS,
 but it is successful on manually
In-Reply-To: <44A5364BC1FA1E42B1F133529EC2C729020F834D@CNBEEXC007.nsn-intra.net>
References: <44A5364BC1FA1E42B1F133529EC2C729020C34EC@CNBEEXC007.nsn-intra.net>
	<4F8FB907.40508@redhat.com>
	<44A5364BC1FA1E42B1F133529EC2C729020F834D@CNBEEXC007.nsn-intra.net>
Message-ID: <4F8FC70B.1000508@redhat.com>

On 04/19/2012 09:52 AM, Wang, Kevin G1. (NSN - CN/Cheng Du) wrote:
>
> HI,Grac
>
> Where is prepared resource agent? Can you send me one ? Thanks
>
>
For the most frequent applications there are prepared resource agents 
scripts - package resource-agents (or 
https://github.com/ClusterLabs/resource-agents). More info on 
https://fedorahosted.org/cluster/wiki/RGManager

m,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120419/34ad02e0/attachment.htm>

From gounini.geekarea at gmail.com  Wed Apr 25 13:48:53 2012
From: gounini.geekarea at gmail.com (GouNiNi)
Date: Wed, 25 Apr 2012 15:48:53 +0200 (CEST)
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <4F8C37A8.6040702@riege.com>
Message-ID: <dba31725-f695-4d4b-bdc3-19601373792a@dmzdebi01>

Hello,

I manage many cluster and I had lot of problem as long as I used quorum device.
Until I remove this, my cluster are stable. I agree whith Lon Hohberger and Ryan O'Hara point of view.

You have to ask you : Do I need specific check (heuristic) or do I need scenario like "all but one" ?
If your answer are "no", please don't use quorum device.

The only problem you have to correct will be when network communication split two nodes. Both think to have quorum and try to fence the other (split brain situation). To prevent this, last clusterv2 provides redondance ring (not officially supported on RHEL5) and RHEL6 have similare feature I think.

Regards,

-- 
  .`'`.   GouNiNi
 :  ': :  
 `. ` .`  GNU/Linux
   `'`    http://www.geekarea.fr


----- Mail original -----
> De: "Gunther Schlegel" <schlegel at riege.com>
> ?: linux-cluster at redhat.com
> Envoy?: Lundi 16 Avril 2012 17:15:52
> Objet: Re: [Linux-cluster] Redhat without qdisk
> 
> Hi,
> 
> > But if you have problem with your storage it's normale the node
> > goes
> > fenced, because your cluster services depends on storage
> 
> Well, no, my clustered services do not depend on SAN storage.
> 
> > Or maybe you wold like to have a cluster running without san disk
> 
> I need the qdisk for two reasons:
> 
> - heuristics
> - to safely achieve quorum in a two-node-cluster if only one node is
> up.
> 
> regards, Gunther
> 
> > Il giorno 13 aprile 2012 11:20, Gunther Schlegel
> > <schlegel at riege.com
> > <mailto:schlegel at riege.com>> ha scritto:
> >
> >     Hi Lon,
> >
> >     > > Why redhat made the qdisk as Tie-breakers and some people
> >     > > from
> >     support
> >     > > say it's one optional or some time says is not needed?
> >     >
> >     >  It is optional and is often not needed.  It was developed
> >     >  really
> >     for two purposes:
> >     >
> >     >  - to help resolve fencing races (which can be resolved using
> >     delays or other tactics)
> >     >
> >     >  - to allow 'last-man-standing' in >2-node clusters.
> >     >
> >     >  With qdiskd you can go from 4 to 1 node (given properly
> >     >  configured
> >     heuristics).  The other 3 nodes then, because heuristics fail,
> >     can't
> >     "gang up" (by forming a quorum) on the surviving node and take
> >     over
> >     - this means your critical service stays running and available.
> >      The
> >     problem is that, in practice, the "last node" is rarely able to
> >     handle the workload.
> >     >
> >     >  This behavior is obviated by features in corosync 2.0, which
> >     >  gives
> >     administrators the ability to state that a -new- quorum can
> >     only
> >     form if all members are present (but joining an existing quorum
> >     is
> >     always allowed).
> >
> >
> >     Is this in RHEL6? I am still trying to solve the following
> >     situation:
> >
> >     - 2 node cluster without need for shared storage (no gfs)
> >     - qdiskd in place because of the heuristics.
> >     - Cluster is fine if both nodes have network communication and
> >     heuristics reach the minimum score.
> >
> >     Problem: if the shared storage the qdisk resides on becomes
> >     unavailable (but everything else is fine) a node will be
> >     fenced. It
> >     actually happens at the time the shared storage comes back
> >     online,
> >     the node re-establishing the storage link first wins and fences
> >     the
> >     other one. I try to mitigate that with loooong timeout
> >     settings, but
> >     therefore a necessary cluster switch eviction is also delayed.
> >
> >     I would really appreciate if the qdiskd would withdraw it's
> >     quorum
> >     vote and not do any fencing at all. The cluster would survive
> >     as
> >     quorum is also gathered if the cluster network connection is
> >     established.
> >
> >     best regards, Gunther
> >
> >
> >     Gunther Schlegel
> >     Head of IT Infrastructure
> >
> >
> >     --
> >
> >
> >     .............................................................
> >     Riege Software International GmbH  Phone: +49 2159 91480
> >     <tel:%2B49%202159%2091480>
> >     Mollsfeld 10                       Fax: +49 2159 914811
> >     <tel:%2B49%202159%20914811>
> >     40670 Meerbusch                    Web: www.riege.com
> >     <http://www.riege.com>
> >     Germany                            E-Mail: schlegel at riege.com
> >     <mailto:schlegel at riege.com>
> >     --                                 --
> >     Commercial Register:               Managing Directors:
> >     Amtsgericht Neuss HRB-NR 4207      Christian Riege
> >     VAT Reg No.: DE120585842           Gabriele  Riege
> >                                        Johannes  Riege
> >                                        Tobias    Riege
> >     .............................................................
> >                YOU CARE FOR FREIGHT, WE CARE FOR YOU
> >
> >
> >
> >
> >     --
> >     Linux-cluster mailing list
> >     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> >     https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> >
> > --
> > esta es mi vida e me la vivo hasta que dios quiera
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Gunther Schlegel
> Head of IT Infrastructure
> 
> 
> 
> 
> .............................................................
> Riege Software International GmbH  Phone: +49 2159 91480
> Mollsfeld 10                       Fax: +49 2159 914811
> 40670 Meerbusch                    Web: www.riege.com
> Germany                            E-Mail: schlegel at riege.com
> --                                 --
> Commercial Register:               Managing Directors:
> Amtsgericht Neuss HRB-NR 4207      Christian Riege
> VAT Reg No.: DE120585842           Gabriele  Riege
>                                    Johannes  Riege
>                                    Tobias    Riege
> .............................................................
>            YOU CARE FOR FREIGHT, WE CARE FOR YOU
> 
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From gounini.geekarea at gmail.com  Wed Apr 25 14:06:38 2012
From: gounini.geekarea at gmail.com (GouNiNi)
Date: Wed, 25 Apr 2012 16:06:38 +0200 (CEST)
Subject: [Linux-cluster] Clustering NFS4 with bests practices (bind)
In-Reply-To: <cb55676f-53f4-4e82-9edf-bb2abf093fa6@dmzdebi01>
Message-ID: <41b266b3-d078-417b-a011-ccc4665ded64@dmzdebi01>

Hello,

My question is similar with http://www.redhat.com/archives/linux-cluster/2007-April/msg00125.html but it was in 2007.
I have a cluster with HALVM + ext3 (no GFS). I need to bind some directories in my chroot NFS4 directory for various technical reaseon.

I tried many configurations but no succes. Here is one :

<resources>
        <ip address="XX.XX.XX.XX/26" monitor_link="1"/>
        <lvm lv_name="lv_applis_foobar" name="lv_applis_foobar" vg_name="vg_applis_foobar"/>
        <fs device="/dev/vg_applis_foobar/lv_applis_foobar" force_unmount="1" fstype="ext3" mountpoint="/applis/foobar" name="fs_applis_foobar"/>
        <fs device="/applis/foobar" force_unmount="1" mountpoint="/exports/applis/foobar" name="bind_applis_foobar" options="bind"/>
        <nfsexport name="/exports/applis/foobar"/>
        <nfsclient fsid="100" name="exp_/exports/applis/foobar" options="rw,sync" path="/exports/applis/foobar" target="*.pma-dstage"/>
</resources>
<service autostart="1" domain="data" name="files.foobar.com" nfslock="1" recovery="relocate">
        <lvm ref="lv_applis_foobar">
                <fs ref="fs_applis_foobar">
                        <fs ref="bind_applis_foobar">
                                <nfsexport ref="/exports/applis/foobar">
                                        <nfsclient ref="exp_/exports/applis/foobar"/>
                                </nfsexport>
                        </fs>
                </fs>
        </lvm>
        <ip ref="XX.XX.XX.XX/26"/>
</service>

Logs say :

Apr 24 17:53:08 hostname rgmanager[13264]: [fs] start_filesystem: Could not match /applis/foobar with a real device
Apr 24 17:53:15 hostanme rgmanager[11094]: start on fs "bind_applis_foobar" returned 2 (invalid argument(s))

Do you already use bind option in cluster.conf?

Regards,

---
Jean-Daniel Bonnetot



From emi2fast at gmail.com  Wed Apr 25 16:05:09 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Wed, 25 Apr 2012 18:05:09 +0200
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <dba31725-f695-4d4b-bdc3-19601373792a@dmzdebi01>
References: <4F8C37A8.6040702@riege.com>
	<dba31725-f695-4d4b-bdc3-19601373792a@dmzdebi01>
Message-ID: <CAE7pJ3BZoyaviG9s8HHf8JtYQMb9UuLk7N03-2gHc1C6jQyCog@mail.gmail.com>

Hello GouNiNi

I use cluster with qdisk && master_wins=1 and without heuristic parameter
and i can tell you, i never found a problem

I have a normal quetion, if i have a component problem in a cluster. What
can i do?

1:Remove the component
2:Look for a solution

Sorry but i prefer the sencod choice :-)


Il giorno 25 aprile 2012 15:48, GouNiNi <gounini.geekarea at gmail.com> ha
scritto:

> Hello,
>
> I manage many cluster and I had lot of problem as long as I used quorum
> device.
> Until I remove this, my cluster are stable. I agree whith Lon Hohberger
> and Ryan O'Hara point of view.
>
> You have to ask you : Do I need specific check (heuristic) or do I need
> scenario like "all but one" ?
> If your answer are "no", please don't use quorum device.
>
> The only problem you have to correct will be when network communication
> split two nodes. Both think to have quorum and try to fence the other
> (split brain situation). To prevent this, last clusterv2 provides
> redondance ring (not officially supported on RHEL5) and RHEL6 have similare
> feature I think.
>
> Regards,
>
> --
>  .`'`.   GouNiNi
>  :  ': :
>  `. ` .`  GNU/Linux
>   `'`    http://www.geekarea.fr
>
>
> ----- Mail original -----
> > De: "Gunther Schlegel" <schlegel at riege.com>
> > ?: linux-cluster at redhat.com
> > Envoy?: Lundi 16 Avril 2012 17:15:52
> > Objet: Re: [Linux-cluster] Redhat without qdisk
> >
> > Hi,
> >
> > > But if you have problem with your storage it's normale the node
> > > goes
> > > fenced, because your cluster services depends on storage
> >
> > Well, no, my clustered services do not depend on SAN storage.
> >
> > > Or maybe you wold like to have a cluster running without san disk
> >
> > I need the qdisk for two reasons:
> >
> > - heuristics
> > - to safely achieve quorum in a two-node-cluster if only one node is
> > up.
> >
> > regards, Gunther
> >
> > > Il giorno 13 aprile 2012 11:20, Gunther Schlegel
> > > <schlegel at riege.com
> > > <mailto:schlegel at riege.com>> ha scritto:
> > >
> > >     Hi Lon,
> > >
> > >     > > Why redhat made the qdisk as Tie-breakers and some people
> > >     > > from
> > >     support
> > >     > > say it's one optional or some time says is not needed?
> > >     >
> > >     >  It is optional and is often not needed.  It was developed
> > >     >  really
> > >     for two purposes:
> > >     >
> > >     >  - to help resolve fencing races (which can be resolved using
> > >     delays or other tactics)
> > >     >
> > >     >  - to allow 'last-man-standing' in >2-node clusters.
> > >     >
> > >     >  With qdiskd you can go from 4 to 1 node (given properly
> > >     >  configured
> > >     heuristics).  The other 3 nodes then, because heuristics fail,
> > >     can't
> > >     "gang up" (by forming a quorum) on the surviving node and take
> > >     over
> > >     - this means your critical service stays running and available.
> > >      The
> > >     problem is that, in practice, the "last node" is rarely able to
> > >     handle the workload.
> > >     >
> > >     >  This behavior is obviated by features in corosync 2.0, which
> > >     >  gives
> > >     administrators the ability to state that a -new- quorum can
> > >     only
> > >     form if all members are present (but joining an existing quorum
> > >     is
> > >     always allowed).
> > >
> > >
> > >     Is this in RHEL6? I am still trying to solve the following
> > >     situation:
> > >
> > >     - 2 node cluster without need for shared storage (no gfs)
> > >     - qdiskd in place because of the heuristics.
> > >     - Cluster is fine if both nodes have network communication and
> > >     heuristics reach the minimum score.
> > >
> > >     Problem: if the shared storage the qdisk resides on becomes
> > >     unavailable (but everything else is fine) a node will be
> > >     fenced. It
> > >     actually happens at the time the shared storage comes back
> > >     online,
> > >     the node re-establishing the storage link first wins and fences
> > >     the
> > >     other one. I try to mitigate that with loooong timeout
> > >     settings, but
> > >     therefore a necessary cluster switch eviction is also delayed.
> > >
> > >     I would really appreciate if the qdiskd would withdraw it's
> > >     quorum
> > >     vote and not do any fencing at all. The cluster would survive
> > >     as
> > >     quorum is also gathered if the cluster network connection is
> > >     established.
> > >
> > >     best regards, Gunther
> > >
> > >
> > >     Gunther Schlegel
> > >     Head of IT Infrastructure
> > >
> > >
> > >     --
> > >
> > >
> > >     .............................................................
> > >     Riege Software International GmbH  Phone: +49 2159 91480
> > >     <tel:%2B49%202159%2091480>
> > >     Mollsfeld 10                       Fax: +49 2159 914811
> > >     <tel:%2B49%202159%20914811>
> > >     40670 Meerbusch                    Web: www.riege.com
> > >     <http://www.riege.com>
> > >     Germany                            E-Mail: schlegel at riege.com
> > >     <mailto:schlegel at riege.com>
> > >     --                                 --
> > >     Commercial Register:               Managing Directors:
> > >     Amtsgericht Neuss HRB-NR 4207      Christian Riege
> > >     VAT Reg No.: DE120585842           Gabriele  Riege
> > >                                        Johannes  Riege
> > >                                        Tobias    Riege
> > >     .............................................................
> > >                YOU CARE FOR FREIGHT, WE CARE FOR YOU
> > >
> > >
> > >
> > >
> > >     --
> > >     Linux-cluster mailing list
> > >     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> > >     https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > >
> > >
> > >
> > > --
> > > esta es mi vida e me la vivo hasta que dios quiera
> > >
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Gunther Schlegel
> > Head of IT Infrastructure
> >
> >
> >
> >
> > .............................................................
> > Riege Software International GmbH  Phone: +49 2159 91480
> > Mollsfeld 10                       Fax: +49 2159 914811
> > 40670 Meerbusch                    Web: www.riege.com
> > Germany                            E-Mail: schlegel at riege.com
> > --                                 --
> > Commercial Register:               Managing Directors:
> > Amtsgericht Neuss HRB-NR 4207      Christian Riege
> > VAT Reg No.: DE120585842           Gabriele  Riege
> >                                    Johannes  Riege
> >                                    Tobias    Riege
> > .............................................................
> >            YOU CARE FOR FREIGHT, WE CARE FOR YOU
> >
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120425/49f4e342/attachment.htm>

From gounini.geekarea at gmail.com  Thu Apr 26 10:12:18 2012
From: gounini.geekarea at gmail.com (GouNiNi)
Date: Thu, 26 Apr 2012 12:12:18 +0200 (CEST)
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <CAE7pJ3BZoyaviG9s8HHf8JtYQMb9UuLk7N03-2gHc1C6jQyCog@mail.gmail.com>
Message-ID: <56fad37c-84b4-4b2d-ba8b-f2e39e404ba6@dmzdebi01>

Hello Emmanuel,

My idea is deferent, a coponent is available and can give me some possibility. 
I don't need this features, why should I add something which can grow problems ;) ?

In my case, problem was also that my storage acces was unstable, so qdisk was very annoying.

Regards

-- 
  .`'`.   GouNiNi
 :  ': :  
 `. ` .`  GNU/Linux
   `'`    http://www.geekarea.fr


----- Mail original -----
> De: "emmanuel segura" <emi2fast at gmail.com>
> ?: "linux clustering" <linux-cluster at redhat.com>
> Envoy?: Mercredi 25 Avril 2012 18:05:09
> Objet: Re: [Linux-cluster] Redhat without qdisk
> 
> 
> 
> Hello GouNiNi
> 
> I use cluster with qdisk && master_wins=1 and without heuristic
> parameter and i can tell you, i never found a problem
> 
> I have a normal quetion, if i have a component problem in a cluster.
> What can i do?
> 
> 1:Remove the component
> 2:Look for a solution
> 
> Sorry but i prefer the sencod choice :-)
> 
> 
> 
> Il giorno 25 aprile 2012 15:48, GouNiNi < gounini.geekarea at gmail.com
> > ha scritto:
> 
> 
> Hello,
> 
> I manage many cluster and I had lot of problem as long as I used
> quorum device.
> Until I remove this, my cluster are stable. I agree whith Lon
> Hohberger and Ryan O'Hara point of view.
> 
> You have to ask you : Do I need specific check (heuristic) or do I
> need scenario like "all but one" ?
> If your answer are "no", please don't use quorum device.
> 
> The only problem you have to correct will be when network
> communication split two nodes. Both think to have quorum and try to
> fence the other (split brain situation). To prevent this, last
> clusterv2 provides redondance ring (not officially supported on
> RHEL5) and RHEL6 have similare feature I think.
> 
> Regards,
> 
> --
> .`'`. GouNiNi
> : ': :
> `. ` .` GNU/Linux
> `'` http://www.geekarea.fr
> 
> 
> ----- Mail original -----
> > De: "Gunther Schlegel" < schlegel at riege.com >
> > ?: linux-cluster at redhat.com
> > Envoy?: Lundi 16 Avril 2012 17:15:52
> > Objet: Re: [Linux-cluster] Redhat without qdisk
> 
> 
> > 
> > Hi,
> > 
> > > But if you have problem with your storage it's normale the node
> > > goes
> > > fenced, because your cluster services depends on storage
> > 
> > Well, no, my clustered services do not depend on SAN storage.
> > 
> > > Or maybe you wold like to have a cluster running without san disk
> > 
> > I need the qdisk for two reasons:
> > 
> > - heuristics
> > - to safely achieve quorum in a two-node-cluster if only one node
> > is
> > up.
> > 
> > regards, Gunther
> > 
> > > Il giorno 13 aprile 2012 11:20, Gunther Schlegel
> > > < schlegel at riege.com
> > > <mailto: schlegel at riege.com >> ha scritto:
> > > 
> > > Hi Lon,
> > > 
> > > > > Why redhat made the qdisk as Tie-breakers and some people
> > > > > from
> > > support
> > > > > say it's one optional or some time says is not needed?
> > > > 
> > > > It is optional and is often not needed. It was developed
> > > > really
> > > for two purposes:
> > > > 
> > > > - to help resolve fencing races (which can be resolved using
> > > delays or other tactics)
> > > > 
> > > > - to allow 'last-man-standing' in >2-node clusters.
> > > > 
> > > > With qdiskd you can go from 4 to 1 node (given properly
> > > > configured
> > > heuristics). The other 3 nodes then, because heuristics fail,
> > > can't
> > > "gang up" (by forming a quorum) on the surviving node and take
> > > over
> > > - this means your critical service stays running and available.
> > > The
> > > problem is that, in practice, the "last node" is rarely able to
> > > handle the workload.
> > > > 
> > > > This behavior is obviated by features in corosync 2.0, which
> > > > gives
> > > administrators the ability to state that a -new- quorum can
> > > only
> > > form if all members are present (but joining an existing quorum
> > > is
> > > always allowed).
> > > 
> > > 
> > > Is this in RHEL6? I am still trying to solve the following
> > > situation:
> > > 
> > > - 2 node cluster without need for shared storage (no gfs)
> > > - qdiskd in place because of the heuristics.
> > > - Cluster is fine if both nodes have network communication and
> > > heuristics reach the minimum score.
> > > 
> > > Problem: if the shared storage the qdisk resides on becomes
> > > unavailable (but everything else is fine) a node will be
> > > fenced. It
> > > actually happens at the time the shared storage comes back
> > > online,
> > > the node re-establishing the storage link first wins and fences
> > > the
> > > other one. I try to mitigate that with loooong timeout
> > > settings, but
> > > therefore a necessary cluster switch eviction is also delayed.
> > > 
> > > I would really appreciate if the qdiskd would withdraw it's
> > > quorum
> > > vote and not do any fencing at all. The cluster would survive
> > > as
> > > quorum is also gathered if the cluster network connection is
> > > established.
> > > 
> > > best regards, Gunther
> > > 
> > > 
> > > Gunther Schlegel
> > > Head of IT Infrastructure
> > > 
> > > 
> > > --
> > > 
> > > 
> > > .............................................................
> > > Riege Software International GmbH Phone: +49 2159 91480
> > > <tel:%2B49%202159%2091480>
> > > Mollsfeld 10 Fax: +49 2159 914811
> > > <tel:%2B49%202159%20914811>
> > > 40670 Meerbusch Web: www.riege.com
> > > < http://www.riege.com >
> > > Germany E-Mail: schlegel at riege.com
> > > <mailto: schlegel at riege.com >
> > > -- --
> > > Commercial Register: Managing Directors:
> > > Amtsgericht Neuss HRB-NR 4207 Christian Riege
> > > VAT Reg No.: DE120585842 Gabriele Riege
> > > Johannes Riege
> > > Tobias Riege
> > > .............................................................
> > > YOU CARE FOR FREIGHT, WE CARE FOR YOU
> > > 
> > > 
> > > 
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com <mailto: Linux-cluster at redhat.com >
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > > 
> > > 
> > > 
> > > --
> > > esta es mi vida e me la vivo hasta que dios quiera
> > > 
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > --
> > Gunther Schlegel
> > Head of IT Infrastructure
> > 
> > 
> > 
> > 
> > .............................................................
> > Riege Software International GmbH Phone: +49 2159 91480
> > Mollsfeld 10 Fax: +49 2159 914811
> > 40670 Meerbusch Web: www.riege.com
> > Germany E-Mail: schlegel at riege.com
> > -- --
> > Commercial Register: Managing Directors:
> > Amtsgericht Neuss HRB-NR 4207 Christian Riege
> > VAT Reg No.: DE120585842 Gabriele Riege
> > Johannes Riege
> > Tobias Riege
> > .............................................................
> > YOU CARE FOR FREIGHT, WE CARE FOR YOU
> > 
> > 
> > 
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> esta es mi vida e me la vivo hasta que dios quiera
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From emi2fast at gmail.com  Thu Apr 26 09:37:44 2012
From: emi2fast at gmail.com (emmanuel segura)
Date: Thu, 26 Apr 2012 11:37:44 +0200
Subject: [Linux-cluster] Redhat without qdisk
In-Reply-To: <56fad37c-84b4-4b2d-ba8b-f2e39e404ba6@dmzdebi01>
References: <CAE7pJ3BZoyaviG9s8HHf8JtYQMb9UuLk7N03-2gHc1C6jQyCog@mail.gmail.com>
	<56fad37c-84b4-4b2d-ba8b-f2e39e404ba6@dmzdebi01>
Message-ID: <CAE7pJ3Bc5uHoxJ6n0M0_4Mn2Ef6xSAXW8GxpG+RKjFVE_kA5Wg@mail.gmail.com>

Hello GouNiNi

Why you don't try to find your problem with the storage?

I think this is the best thing you can do

Il giorno 26 aprile 2012 12:12, GouNiNi <gounini.geekarea at gmail.com> ha
scritto:

> Hello Emmanuel,
>
> My idea is deferent, a coponent is available and can give me some
> possibility.
> I don't need this features, why should I add something which can grow
> problems ;) ?
>
> In my case, problem was also that my storage acces was unstable, so qdisk
> was very annoying.
>
> Regards
>
> --
>  .`'`.   GouNiNi
>  :  ': :
>  `. ` .`  GNU/Linux
>   `'`    http://www.geekarea.fr
>
>
> ----- Mail original -----
> > De: "emmanuel segura" <emi2fast at gmail.com>
> > ?: "linux clustering" <linux-cluster at redhat.com>
> > Envoy?: Mercredi 25 Avril 2012 18:05:09
> > Objet: Re: [Linux-cluster] Redhat without qdisk
> >
> >
> >
> > Hello GouNiNi
> >
> > I use cluster with qdisk && master_wins=1 and without heuristic
> > parameter and i can tell you, i never found a problem
> >
> > I have a normal quetion, if i have a component problem in a cluster.
> > What can i do?
> >
> > 1:Remove the component
> > 2:Look for a solution
> >
> > Sorry but i prefer the sencod choice :-)
> >
> >
> >
> > Il giorno 25 aprile 2012 15:48, GouNiNi < gounini.geekarea at gmail.com
> > > ha scritto:
> >
> >
> > Hello,
> >
> > I manage many cluster and I had lot of problem as long as I used
> > quorum device.
> > Until I remove this, my cluster are stable. I agree whith Lon
> > Hohberger and Ryan O'Hara point of view.
> >
> > You have to ask you : Do I need specific check (heuristic) or do I
> > need scenario like "all but one" ?
> > If your answer are "no", please don't use quorum device.
> >
> > The only problem you have to correct will be when network
> > communication split two nodes. Both think to have quorum and try to
> > fence the other (split brain situation). To prevent this, last
> > clusterv2 provides redondance ring (not officially supported on
> > RHEL5) and RHEL6 have similare feature I think.
> >
> > Regards,
> >
> > --
> > .`'`. GouNiNi
> > : ': :
> > `. ` .` GNU/Linux
> > `'` http://www.geekarea.fr
> >
> >
> > ----- Mail original -----
> > > De: "Gunther Schlegel" < schlegel at riege.com >
> > > ?: linux-cluster at redhat.com
> > > Envoy?: Lundi 16 Avril 2012 17:15:52
> > > Objet: Re: [Linux-cluster] Redhat without qdisk
> >
> >
> > >
> > > Hi,
> > >
> > > > But if you have problem with your storage it's normale the node
> > > > goes
> > > > fenced, because your cluster services depends on storage
> > >
> > > Well, no, my clustered services do not depend on SAN storage.
> > >
> > > > Or maybe you wold like to have a cluster running without san disk
> > >
> > > I need the qdisk for two reasons:
> > >
> > > - heuristics
> > > - to safely achieve quorum in a two-node-cluster if only one node
> > > is
> > > up.
> > >
> > > regards, Gunther
> > >
> > > > Il giorno 13 aprile 2012 11:20, Gunther Schlegel
> > > > < schlegel at riege.com
> > > > <mailto: schlegel at riege.com >> ha scritto:
> > > >
> > > > Hi Lon,
> > > >
> > > > > > Why redhat made the qdisk as Tie-breakers and some people
> > > > > > from
> > > > support
> > > > > > say it's one optional or some time says is not needed?
> > > > >
> > > > > It is optional and is often not needed. It was developed
> > > > > really
> > > > for two purposes:
> > > > >
> > > > > - to help resolve fencing races (which can be resolved using
> > > > delays or other tactics)
> > > > >
> > > > > - to allow 'last-man-standing' in >2-node clusters.
> > > > >
> > > > > With qdiskd you can go from 4 to 1 node (given properly
> > > > > configured
> > > > heuristics). The other 3 nodes then, because heuristics fail,
> > > > can't
> > > > "gang up" (by forming a quorum) on the surviving node and take
> > > > over
> > > > - this means your critical service stays running and available.
> > > > The
> > > > problem is that, in practice, the "last node" is rarely able to
> > > > handle the workload.
> > > > >
> > > > > This behavior is obviated by features in corosync 2.0, which
> > > > > gives
> > > > administrators the ability to state that a -new- quorum can
> > > > only
> > > > form if all members are present (but joining an existing quorum
> > > > is
> > > > always allowed).
> > > >
> > > >
> > > > Is this in RHEL6? I am still trying to solve the following
> > > > situation:
> > > >
> > > > - 2 node cluster without need for shared storage (no gfs)
> > > > - qdiskd in place because of the heuristics.
> > > > - Cluster is fine if both nodes have network communication and
> > > > heuristics reach the minimum score.
> > > >
> > > > Problem: if the shared storage the qdisk resides on becomes
> > > > unavailable (but everything else is fine) a node will be
> > > > fenced. It
> > > > actually happens at the time the shared storage comes back
> > > > online,
> > > > the node re-establishing the storage link first wins and fences
> > > > the
> > > > other one. I try to mitigate that with loooong timeout
> > > > settings, but
> > > > therefore a necessary cluster switch eviction is also delayed.
> > > >
> > > > I would really appreciate if the qdiskd would withdraw it's
> > > > quorum
> > > > vote and not do any fencing at all. The cluster would survive
> > > > as
> > > > quorum is also gathered if the cluster network connection is
> > > > established.
> > > >
> > > > best regards, Gunther
> > > >
> > > >
> > > > Gunther Schlegel
> > > > Head of IT Infrastructure
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > .............................................................
> > > > Riege Software International GmbH Phone: +49 2159 91480
> > > > <tel:%2B49%202159%2091480>
> > > > Mollsfeld 10 Fax: +49 2159 914811
> > > > <tel:%2B49%202159%20914811>
> > > > 40670 Meerbusch Web: www.riege.com
> > > > < http://www.riege.com >
> > > > Germany E-Mail: schlegel at riege.com
> > > > <mailto: schlegel at riege.com >
> > > > -- --
> > > > Commercial Register: Managing Directors:
> > > > Amtsgericht Neuss HRB-NR 4207 Christian Riege
> > > > VAT Reg No.: DE120585842 Gabriele Riege
> > > > Johannes Riege
> > > > Tobias Riege
> > > > .............................................................
> > > > YOU CARE FOR FREIGHT, WE CARE FOR YOU
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster at redhat.com <mailto: Linux-cluster at redhat.com >
> > > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > esta es mi vida e me la vivo hasta que dios quiera
> > > >
> > > >
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster at redhat.com
> > > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > > --
> > > Gunther Schlegel
> > > Head of IT Infrastructure
> > >
> > >
> > >
> > >
> > > .............................................................
> > > Riege Software International GmbH Phone: +49 2159 91480
> > > Mollsfeld 10 Fax: +49 2159 914811
> > > 40670 Meerbusch Web: www.riege.com
> > > Germany E-Mail: schlegel at riege.com
> > > -- --
> > > Commercial Register: Managing Directors:
> > > Amtsgericht Neuss HRB-NR 4207 Christian Riege
> > > VAT Reg No.: DE120585842 Gabriele Riege
> > > Johannes Riege
> > > Tobias Riege
> > > .............................................................
> > > YOU CARE FOR FREIGHT, WE CARE FOR YOU
> > >
> > >
> > >
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > --
> > esta es mi vida e me la vivo hasta que dios quiera
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120426/ccd4e9b3/attachment.htm>

From gounini.geekarea at gmail.com  Thu Apr 26 10:59:41 2012
From: gounini.geekarea at gmail.com (GouNiNi)
Date: Thu, 26 Apr 2012 12:59:41 +0200 (CEST)
Subject: [Linux-cluster] Clustering NFS4 with bests practices (bind)
In-Reply-To: <41b266b3-d078-417b-a011-ccc4665ded64@dmzdebi01>
Message-ID: <6ecb350a-81e1-47f7-b848-86ca58b1eeef@dmzdebi01>

Hello,

Support respond that is not possible at this time.
I have to write script to bind directories after <fs> mounts.

Other solution is to make one fs per exports... not very simple.

Regards,

-- 
  .`'`.   GouNiNi
 :  ': :  
 `. ` .`  GNU/Linux
   `'`    http://www.geekarea.fr


----- Mail original -----
> De: "GouNiNi" <gounini.geekarea at gmail.com>
> ?: "linux clustering" <linux-cluster at redhat.com>
> Envoy?: Mercredi 25 Avril 2012 16:06:38
> Objet: [Linux-cluster] Clustering NFS4 with bests practices (bind)
> 
> Hello,
> 
> My question is similar with
> http://www.redhat.com/archives/linux-cluster/2007-April/msg00125.html
> but it was in 2007.
> I have a cluster with HALVM + ext3 (no GFS). I need to bind some
> directories in my chroot NFS4 directory for various technical
> reaseon.
> 
> I tried many configurations but no succes. Here is one :
> 
> <resources>
>         <ip address="XX.XX.XX.XX/26" monitor_link="1"/>
>         <lvm lv_name="lv_applis_foobar" name="lv_applis_foobar"
>         vg_name="vg_applis_foobar"/>
>         <fs device="/dev/vg_applis_foobar/lv_applis_foobar"
>         force_unmount="1" fstype="ext3" mountpoint="/applis/foobar"
>         name="fs_applis_foobar"/>
>         <fs device="/applis/foobar" force_unmount="1"
>         mountpoint="/exports/applis/foobar"
>         name="bind_applis_foobar" options="bind"/>
>         <nfsexport name="/exports/applis/foobar"/>
>         <nfsclient fsid="100" name="exp_/exports/applis/foobar"
>         options="rw,sync" path="/exports/applis/foobar"
>         target="*.pma-dstage"/>
> </resources>
> <service autostart="1" domain="data" name="files.foobar.com"
> nfslock="1" recovery="relocate">
>         <lvm ref="lv_applis_foobar">
>                 <fs ref="fs_applis_foobar">
>                         <fs ref="bind_applis_foobar">
>                                 <nfsexport
>                                 ref="/exports/applis/foobar">
>                                         <nfsclient
>                                         ref="exp_/exports/applis/foobar"/>
>                                 </nfsexport>
>                         </fs>
>                 </fs>
>         </lvm>
>         <ip ref="XX.XX.XX.XX/26"/>
> </service>
> 
> Logs say :
> 
> Apr 24 17:53:08 hostname rgmanager[13264]: [fs] start_filesystem:
> Could not match /applis/foobar with a real device
> Apr 24 17:53:15 hostanme rgmanager[11094]: start on fs
> "bind_applis_foobar" returned 2 (invalid argument(s))
> 
> Do you already use bind option in cluster.conf?
> 
> Regards,
> 
> ---
> Jean-Daniel Bonnetot
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>